📚 Explore the pre-release sample dataset now
💬 Join the conversation on Discord – connect with other participants, get support, and stay updated. Jump in and introduce yourself 👉 https://discord.gg/YWDQQa8byx
An MM-RAG QA system takes as input an image 𝐼 and a question 𝑄, and outputs an answer 𝐴; the answer is generated by MM-LLMs according to information retrieved from external sources, combined with knowledge internalized in the model. A Multi-turn MM-RAG QA system in addition takes questions and answers from previous turns as context to answer new questions. The answer should provide useful information to answer the question, without adding any hallucination.
Task #1: Single-source Augmentation
In Task #1, we provide an image mock API to access information from an underlying image-based mock KG. The mock KG is indexed by the image and stores structured data associated with the image; answers to the questions may or may not exist in the mock KG.
The mock API takes an image as input and returns similar images from the mock KG along with structured data associated with each image to support answer generation.
This task aims to test the answer generation capability of MM-RAG systems.
To know more about the Meta CRAG-MM challenge, please see: https://www.aicrowd.com/challenges/meta-crag-mm-challenge-2025
📚 Explore the pre-release sample dataset now
💬 Join the conversation on Discord – connect with other participants, get support, and stay updated.👉 Jump in and introduce yourself: https://discord.gg/YWDQQa8byx
An MM-RAG QA system takes as input an image 𝐼 and a question 𝑄, and outputs an answer 𝐴; the answer is generated by MM-LLMs according to information retrieved from external sources, combined with knowledge internalized in the model. A Multi-turn MM-RAG QA system in addition takes questions and answers from previous turns as context to answer new questions. The answer should provide useful information to answer the question, without adding any hallucination.
Task #1: Single-source Augmentation
In Task #1, we provide an image mock API to access information from an underlying image-based mock KG. The mock KG is indexed by the image and stores structured data associated with the image; answers to the questions may or may not exist in the mock KG.
The mock API takes an image as input and returns similar images from the mock KG along with structured data associated with each image to support answer generation.
This task aims to test the answer generation capability of MM-RAG systems.
To know more about the Meta CRAG-MM challenge, please see: https://www.aicrowd.com/challenges/meta-crag-mm-challenge-2025
An MM-RAG QA system takes as input an image 𝐼 and a question 𝑄, and outputs an answer 𝐴; the answer is generated by MM-LLMs according to information retrieved from external sources, combined with knowledge internalized in the model. A Multi-turn MM-RAG QA system in addition takes questions and answers from previous turns as context to answer new questions. The answer should provide useful information to answer the question, without adding any hallucination.
Task #3: Multi-turn QA
Unlike Task #1 and Task #2, Task #3 evaluates the system’s ability to conduct multi-turn conversations. Each conversation consists of 2–6 turns, with later questions may or may not require the image for answering.
This task focuses on testing context understanding to ensure smooth multi-turn interactions.
To know more about the Meta CRAG-MM challenge, please see: https://www.aicrowd.com/challenges/meta-crag-mm-challenge-2025