Loading
2906
43
6
23

Problem Statements

Discord Banner 2

🙋‍♀️ New to the challenge? 🤔 Want to make your first submission? 

⚙️ Access the Starter-Kit here.

 

✨ This challenge is a shared task of the Wordplay - EMNLP 2025 Workshop 📕 

You’re playing your favourite video game, navigating a bustling medieval city on your quest. When you meet a blacksmith, he greets you and mentions last night’s storm that damaged his roof. You ask about a new weapon, and he recalls your last visit, suggests an upgrade, and even offers a discount because you helped him in a previous quest.

NPCs that are context-aware respond naturally and adapt to the world around them to enable dynamic in-game interactions.

But most NPCs today have repetitive, disconnected, and robotic dialogue, struggling to balance small talk with task-driven exchanges—the very elements that make games exciting and immersive.

🎮 Enter the Commonsense Persona-grounded Dialogue Challenge (CPDC 2025)! 🎮

How can we make NPCs feel real? This challenge pushes the boundaries of AI-driven dialogue—creating characters that think, remember, and interact naturally for richer, more immersive game worlds.

🕵️ Introduction

Research on dialogue systems has been ongoing for a long time, but thanks to Transformers and large language models (LLMs), conversational AI has made significant progress and become more human-like. In virtual spaces such as game environments, human-shaped avatars are often used as Non-Player Characters (NPCs). By enabling these NPCs to engage in free conversation, the world can feel more immersive. To enhance this sense of realism, it is essential not only to have natural small talk that aligns with the game’s worldview and the NPCs’ personas but also to support task-oriented dialogues that reflect in-game actions.

With CPDC 2025, our goal is to develop models capable of performing both functions effectively within a single framework.

In CPDC 2023 Task 1, we held a competition focusing on human-like response generation. For CPDC 2025, we are expanding this challenge by designing personas and actions within a game world, aiming to facilitate dialogues that incorporate context, knowledge, and, in some cases, task execution.

This year, the challenge consists of three tasks:

  • Task 1: Task-Oriented Dialogue Agents
  • Task 2: Context-Aware Dialogue Agents
  • Task 3: Integrating Contextual Dialogue and Task Execution 

Each of the three tasks has an independent leaderboard and separate prize pools. Participants can submit to Task 1 or Task 2 individually, but submitting to Task 3 will automatically evaluate the model across all three tasks and leaderboards. We encourage participants to aim for a model that can engage in natural, human-like conversations while executing necessary tasks.

Additionally, each task has two tracks:

  • GPU Track: Allows participants to explore their own methods with any model
  • API Track: Requires participants to compete using the same API, with only prompt adjustments

Each track has its own prize pool, so we encourage you to participate in both.

Participants can use any training data of their choice. Additionally, we will provide a small amount of reference training data for Task 1 and Task 2. For any of Tasks 1 to 3, participants will submit a dialogue response generation system. While the format of the evaluation data will remain the same across tasks, the content of the evaluation data (i.e., the interlocutor’s intentions and the resulting conversation flow) will differ. Models must appropriately engage in conversation and perform actions based on their interlocutor’s needs.

📑 The Task

Task 1: Task-Oriented Dialogue Agents

Participants will submit a dialogue response generation system. A training dataset for Task 1 will be provided, but its use is optional. Participants can also use any other dataset for training. Additionally, to help understand the nature of the task, we will provide a baseline model that can be tested with the provided training data. Please refer to it in the starter kit. 

The submitted systems will be evaluated using dialogue datasets based on personas and roles within the game. The evaluation data will include persona and worldview information as common information, along with available function definitions and role-specific knowledge. Participants will use this information to call functions when necessary and may use the results of these function calls to generate responses.

When discussing objects selected within the game space, information will be provided to establish a common understanding of the referent. Each persona description contains more than five sentences. In addition to the five aspects handled by PeaCoK, the personas are described from multiple perspectives necessary to imagine the character within the game.

PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives (ACL 2023 Outstanding Paper Award)

Task 2: Context-Aware Dialogue Agents

Participants will submit a dialogue response generation system. A training dataset for Task 2 will be provided, but its use is optional. Participants can also use any other dataset for training. Additionally, to help understand the nature of the task, we will provide a baseline model that can be tested with the provided training data. Please refer to it in the starter kit. 

The submitted systems will be evaluated using dialogue datasets based on personas and roles within the game. The evaluation data will include persona and worldview information as common information, along with available function definitions and role-specific knowledge. Based on this information, participants will generate natural and character-appropriate responses.

The format of the common information provided as input is the same for both Task 1 and Task 2. When responding to player utterances, Task 1 may require executing necessary functions depending on the situation, whereas Task 2 involves dialogues without the need for function execution. Therefore, the evaluation data and evaluation methods differ between Task 1 and Task 2.

Task 3: Integrating Contextual Dialogue and Task Execution Agents

The goal is to create a single model that can engage in natural, human-like conversations while also performing necessary tasks. Submitting to Task 3 will automatically result in evaluation under both Task 1 and Task 2. Therefore, participants should prepare a model (or system) that meets the requirements of both tasks.

Note that there is no dedicated evaluation dataset for Task 3. Performance in both tasks will be comprehensively assessed based on the evaluation results of Task 1 and Task 2. Participants will compete on a dedicated Task 3 leaderboard.

By participating in Task 3, you will not only have a chance to win prizes in Task 3 but also gain the opportunity to win prizes across all leaderboards since your submission will be evaluated in Task 1 and Task 2 as well.

📅 Timeline

The challenge will take place across three rounds, each using a different evaluation dataset for ranking the systems.

  • Warm-up Round: 9th April 2025
  • Round 1: 20th April 2025
  • Round 2: 25th May 2025
  • Challenge End: 30th June 2025

🏆 Prizes

The prize pool is a total of 20,000 USD, divided among six tracks. Participating teams are eligible to win prizes across multiple leaderboards in both tracks.

Task 1: Task-Oriented Dialogue (4,000 USD)

  • GPU Track

    • 🥇 First place: 1,000 USD

    • 🥈 Second place: 500 USD

    • 🥉 Third place: 500 USD

  • API Track

    • 🥇 First place: 1,000 USD

    • 🥈 Second place: 500 USD

    • 🥉 Third place: 500 USD

Task 2: Context-Aware Dialogue Agents (4,000 USD)

  • GPU Track

    • 🥇 First place: 1,000 USD

    • 🥈 Second place: 500 USD

    • 🥉 Third place: 500 USD

  • API Track

    • 🥇 First place: 1,000 USD

    • 🥈 Second place: 500 USD

    • 🥉 Third place: 500 USD

Task 3: Integrating Contextual Dialogue and Task Execution (12,000 USD)

  • GPU Track

    • 🥇 First place: 3,000 USD

    • 🥈 Second place: 2,000 USD

    • 🥉 Third place: 1,000 USD

  • API Track

    • 🥇 First place: 3,000 USD

    • 🥈 Second place: 2,000 USD

    • 🥉 Third place: 1,000 USD

Please refer to the Challenge Rules for more details about the open-sourcing criteria for each leaderboard to be eligible for the associated prizes.

This challenge is a shared task of the Wordplay Workshop at EMNLP 2025; participants will get a chance to submit a technical report in the form of a paper, with the exact submission format and venue to be confirmed.

🔗 Quick Links

📖 Citing the Dataset

If you are participating in this challenge or using the dataset, please consider citing the following papers:

Dataset: PeaCoK

@inproceedings{gao-etal-2023-peacok,
title = "{P}ea{C}o{K}: Persona Commonsense Knowledge for Consistent and Engaging Narratives",

author = "Silin Gao and Beatriz Borges and Soyoung Oh and Deniz Bayazit and Saya Kanno and Hiromi Wakaki and Yuki Mitsufuji and Antoine Bosselut", booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2023",
pages = "6569--6591",
}

📱 Challenge Organizing Committee

  • Hiromi Wakaki (Sony)
  • Antoine Bosselut (EPFL)
  • Silin Gao (EPFL)
  • Yuki Mitsufuji (Sony)
  • Yoshinori Maeda (Sony)
  • Yukiko Nishimura (Sony)
  • Keiichi Yamada (Sony)
  • Shiva Sundaram (Sony)
  • Sergey Bashkirov (Sony)
  • Prithviraj Ammanabrolu (UCSD)

If you have queries or feedback or are looking for teammates, drop a message on AIcrowd Community. Don’t forget to hop onto the Discord channel to collaborate with fellow participants & connect directly with the organisers. Share your thoughts, spark collaborations and get your queries addressed promptly.

Participants

Leaderboard

01 kky84176 3.000
02 mohanty 5.000
03
8.000
04
9.000