AI Radar

Short signals on where production AI is heading: patterns I see in client work, open source, and research. Not predictions; notes for builders.

Eval-first RAG

Teams ship retrieval before they ship evals: then wonder why quality regresses.

Golden datasets, offline regression, and human review loops are becoming table stakes. The best RAG projects treat eval infrastructure as part of v1, not a post-launch patch.

RAG
evals
production

Agent graphs over chains

Linear chains are giving way to explicit state machines and graph orchestration.

LangGraph-style workflows, checkpointing, and human-in-the-loop steps map better to real business processes than one-shot prompt chains.

agents
orchestration

Right-sized models

Routing between small local models and frontier APIs is a cost and latency win.

Classification, extraction, and guardrails often run fine on smaller models; reserve frontier calls for reasoning-heavy steps.

cost
inference
routing

LLM observability as product

Tracing token cost, latency, and failure modes is product analytics for AI apps.

Dashboards for drift, hallucination rate, and tool-call success are what separate demos from systems operators can trust.

MLOps
monitoring

Multimodal in the loop

Vision + document parsing is entering standard agent toolkits.

Invoices, screenshots, and PDFs flow through the same agent frameworks as text: extraction and verification steps matter more than the base model choice.

multimodal
agents

Open weights in production

Fine-tuned open models are viable for domain-specific pipelines with data sensitivity.

When privacy, cost, or latency dominates, self-hosted inference with vLLM/Ollama plus a thin API layer is a real architecture: not just a research exercise.

open source
privacy