What I'm tracking

    A running watchlist of models, tools, papers, and communities I follow for production AI: RAG, agents, evals, and shipping real products.

    Models

    • Claude

      Long-context reasoning and tool use for agent workflows.

    • GPT-4o / o-series

      Multimodal APIs and structured outputs for client integrations.

    • Gemini

      Google stack integrations and large context windows.

    Frameworks

    Eval & ops

    • LangSmith

      Tracing, eval datasets, and regression checks for LLM apps.

    • Arize Phoenix

      Observability and embedding drift for RAG systems.

    Inference

    • vLLM

      High-throughput serving for open-weight models.

    • Ollama

      Local model runs for prototyping and offline demos.

    Research

    Community