Multimodal QA systems (text + image + video)

    QA pipelines that ingest text, images, and video for unified retrieval, reasoning, and answer generation.

    Muhammad Zeeshan

    Technologies Used

    Python
    LLMs
    Vision
    RAG
    FastAPI

    Key Features

    1

    Cross-modal ingestion and embedding

    2

    Unified retrieval across modalities

    3

    Grounded answers with source references

    Implementation

    Combined vision encoders, transcript extraction, and vector retrieval so users query mixed media corpora in one interface.

    Results & Impact

    Enabled support and analytics teams to query documentation, screenshots, and walkthrough video in a single flow.