AI Radar

    Short signals on where production AI is heading: patterns I see in client work, open source, and research. Not predictions; notes for builders.

    Eval-first RAG

    Teams ship retrieval before they ship evals: then wonder why quality regresses.

    Golden datasets, offline regression, and human review loops are becoming table stakes. The best RAG projects treat eval infrastructure as part of v1, not a post-launch patch.

    • RAG
    • evals
    • production

    Agent graphs over chains

    Linear chains are giving way to explicit state machines and graph orchestration.

    LangGraph-style workflows, checkpointing, and human-in-the-loop steps map better to real business processes than one-shot prompt chains.

    • agents
    • orchestration

    Right-sized models

    Routing between small local models and frontier APIs is a cost and latency win.

    Classification, extraction, and guardrails often run fine on smaller models; reserve frontier calls for reasoning-heavy steps.

    • cost
    • inference
    • routing

    LLM observability as product

    Tracing token cost, latency, and failure modes is product analytics for AI apps.

    Dashboards for drift, hallucination rate, and tool-call success are what separate demos from systems operators can trust.

    • MLOps
    • monitoring

    Multimodal in the loop

    Vision + document parsing is entering standard agent toolkits.

    Invoices, screenshots, and PDFs flow through the same agent frameworks as text: extraction and verification steps matter more than the base model choice.

    • multimodal
    • agents

    Open weights in production

    Fine-tuned open models are viable for domain-specific pipelines with data sensitivity.

    When privacy, cost, or latency dominates, self-hosted inference with vLLM/Ollama plus a thin API layer is a real architecture: not just a research exercise.

    • open source
    • privacy