AIcrowd | wufanyou | Participants

0 Follower

0 Following

wufanyou

FANYOU WU

Activity

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

Mon

Wed

Fri

Challenge Categories

Challenges Entered

18 days left

Meta CRAG - MM Challenge 2025

Meta

Improve RAG with Real-World Benchmarks | KDD Cup 2025

Latest submissions

No submissions made in this challenge.

Completed

Meta Comprehensive RAG Benchmark: KDD Cup 2024

Meta

Improve RAG with Real-World Benchmarks

Latest submissions

See All

failed	267153	Wed, 26 Jun 2024 00:53:07
graded	267152	Wed, 26 Jun 2024 00:53:07
graded	266979	Tue, 25 Jun 2024 02:32:13

Completed

Data Purchasing Challenge 2022

Leibniz Centre for European Economic Research

What data should you label to get the most value for your money?

Latest submissions

See All

graded

173244

Wed, 9 Feb 2022 18:45:47

Completed

ESCI Challenge for Improving Product Search

Amazon Search

Amazon KDD Cup 2022

Latest submissions

See All

graded	195825	Wed, 20 Jul 2022 16:49:08
graded	195824	Wed, 20 Jul 2022 16:47:30
graded	195801	Wed, 20 Jul 2022 15:38:51

Completed

Seismic Facies Identification Challenge

SEAM AI

3D Seismic Image Interpretation by Machine Learning

Latest submissions

No submissions made in this challenge.

Completed

Insurance pricing game

Imperial CPG

Play in a realistic insurance market, compete for profit!

Latest submissions

No submissions made in this challenge.

Completed

Flatland Challenge

SBB

Multi Agent Reinforcement Learning on Trains.

Latest submissions

No submissions made in this challenge.

Completed

Meta KDD Cup 24 - CRAG - Knowledge Graph and Web Retrieval

Meta

Evaluating RAG Systems With Mock KGs and APIs

Latest submissions

See All

graded	263889	Tue, 4 Jun 2024 07:18:11
graded	263880	Tue, 4 Jun 2024 05:08:19
failed	263878	Tue, 4 Jun 2024 04:43:47

Completed

Meta KDD Cup 24 - CRAG - End-to-End Retrieval-Augmented Generation

Meta

Enhance RAG systems With Multiple Web Sources & Mock API

Latest submissions

See All

failed	267153	Wed, 26 Jun 2024 00:53:07
graded	267152	Wed, 26 Jun 2024 00:53:07
failed	265531	Sun, 16 Jun 2024 22:51:10

Participant	Rating

Participant	Rating

TLab Seismic Facies Identification Challenge
View
ETS-Lab ESCI Challenge for Improving Product Search
View
ETSLab Meta Comprehensive RAG Benchmark: KDD Cup 2024
View

Meta CRAG - MM Challenge 2025

The huggingface dataset crag-mm-2025/crag-mm-single-turn-public:v0.1.1 (link) has duplicated domain {2: "plants and gardening", 4: "plants and gardening "}. The label text only differs in one space char. It is a potential data quality issue.

What's the G6e instance size for evaluation?

6 days ago

Hi orgainzer,
I saw the rule mentioned that:

All submissions will be run on a single G6e instance with a NVIDIA L40s GPU with 48GB of GPU memory on AWS.

I want to understand which size [1] of G6e instance is used ? g6e-xlarge? This information will help for our offine evaluation.

[1] Amazon EC2 G6e Instances | Amazon Web Services

Best
Fanyou

Meta Comprehensive RAG Benchmark: KDD Cup 2-9d1937

Final Evaluation Process & Team Scores

9 months ago

Can we obtain the full rankings for the main 3 tasks? At least I want to understand how far I am away from the top teams.

Has the Winner Notification already been sent?

9 months ago

We heard from the organizers by email that some part of the human annotations are still undergoing.

Copied from email:

We are still in the middle of annotations for other challenge tasks and will announce winners by email once the annotations are ready. The official winner announcement for the CRAG challenge will be made in early August.

‼️ ⏰ Select Submission ID before 20th June, 2024 23:55 UTC

10 months ago

Can we confirm which are our final submissions in the google forms after June 20th e.g (June 21st some time when online evaluation finished). It is because some people want to select their final solution based on the round 2 score. Besides the current evaluation system is stuck as a lot of people are submitting to the evaluation system.

‼️ ⏰ Select Submission ID before 20th June, 2024 23:55 UTC

10 months ago

Can we submit the same submissions for all 3 tasks? The aicrowd.json might be the same but the code is able to deal with all three task setting.

Can we submit solution that has not been tested in the online

10 months ago

Hi,

My limit of online submission is low due to debugging, I am wondering for the final submission, can we chooce solution which are not tested online yet.

Best
Fanyou

Submission Fail due to Private Test: Evaluation timed out 😢

10 months ago

@aicrowd_team

Could you help check what the reason is for the failure of those submissions:

Those submissions made small changes to previous successful submissions and all were tested successfully on the provided public dataset. Besides, all those submissions passed the Validation Step but got stuck during the start of the evaluations. So there are no progress bars for the evaluations.

Now I can idenfity which part in my code create the problem. But I am still not able to reproduce it offline. I wish I can get some error message from the log to help me solve the problem.

Best
Fanyou

Phase 1 has released the dataset , and how to appy a cut-off to limit Phase 2?

12 months ago

Hi organizers,

It is apparently there are two teams in Track 1 now (April 30th), use the public testset [1] to obtain nearly full score (~0.98). I am wondering in this senario, how to apply a cut-off in Phase 2? Every participant just need to upload public testset and obtain the similar full score. Is there still potential cut-off?

[1] What does `split` field mean? - #3 by graceyx.yale

Best
Fanyou

Regarding to maxiumn number tokens of response for llama 3

12 months ago

@aicrowd_team Yes. I understand that the code has already had this tokenzier. But Llama 3 had different vocab size (128K vs 32K). In some cases, the output number of tokens will be smaller than that of llama 2 if the output texts are the same. In terms of the model performance, LLama 3 is better (in the report) and I foresee people might use it. So I suggest if we can replace the current tokenzier for truncating predictions to Llama 3’s.

Regarding to maxiumn number tokens of response for llama 3

12 months ago

I want to raise organziers attention that Llama 3 had a larger vocabulary size (128K) comparing to llama 2 (32K). So we need to clear define in the rule that what tokenizer is used to truncate the response (previously the code used llama 2 tokenzier).

Best
Fanyou

Are we allowed to use LLama 3?

About 1 year ago

Hi Organziers,

Meta has introudced Llama 3 and is avalible at huggingface. I am wondering if we can use it for the competition. The Llama 3 - 8B model might be a good choice.

Best
Fanyou

Can we use other LLM at training stage?

About 1 year ago

Hi Organizers,

I want to understand if we can use other LLM (not LLAMA2 family) during the traning stage, specifically, used for RLHF and Data Generation.

Below is the raw request for model:

This KDD Cup requires participants to use Llama models to build their RAG solution. Specially, participants can use or fine-tune the following 4 Llama 2 models from https://llama.meta.com/llama-downloads:

llama-2-7b

llama-2-7b-chat

llama-2-70b

llama-2-70b-chat

Best
Fanyou

Amazon KDD Cup '23: Multilingual Recommendation Ch

Eligiblity for the attendence

About 2 years ago

Hi, Organizer.

I am currently an Amazon employee but does not related to Amazon Search. Each year, I used to attend the KDD Cup to learn and practice and won several top places for KDD CUP before. I am wondering if I am eligible to attend it and eligble for the prize. If I am eligiable to attend but not for the prize, and I am luck to get top places, if it is possible to keep my ranking without granting any cash prize?

The rule is writen as :

People who, during the Challenge Period, are directors, officers, employees, interns, and contractors (“Personnel”) of Sponsor, its parents, subsidiaries, affiliates, and their respective advertising, promotion and public relations agencies, representatives, and agents (collectively, “Challenge Entities”), immediate families members of such Personnel (parents, siblings, children, spouses, and life partners of each) and members of the households of such Personnel (whether related or not) are ineligible to win a prize in this Challenge. Sponsor reserves the right to verify eligibility and adjudicate any eligibility dispute at any time.

Best
Fanyou Wu

ESCI Challenge for Improving Product Search

👑 Final Winners Announcement 👑

Over 2 years ago

Hi, Mohanty

Thanks to the whole Aicrowd team and Amazon search team to organize this year’s KDD CUP. I have a question about the KDD workshop. Do we need to or could we submit a paper for the KDD workshop? As the workshop paper deadline is also Aug 1st. If it is possible to finalize the ranking in advance (e.g, 2-3 days before the paper deadline). And I believe the current rank will not change anymore.

Best
Fanyou

[ETS-Lab] Our solution

Almost 3 years ago

That task one feature will probably work on the private dataset as well that’s why use used it. If you check the product list in task two and task one. You will find a special pattern of the product order. In general, the product list is sorted as a training set, private set, and public set. Another reason why this feature will work is that the product-to-example ratio is close to 1 which means most products are used once.

There is another way to construct this leak feature that checks whether the query-product pair is in task 1 public dataset. This one will definitely fail in the private set as we cannot access that information.

Note that the evaluation service used V100 equipped with the tensor core. Transfer it to onnx fp16 help a lot for the speed. For example, our one unoptimized debertaV3-base model takes about 90 mins to do the inference with a single 2080Ti GPU locally but only 35-40 mins to do the inference online for 2 debertaV3 models (2 folds).

[ETS-Lab] Our solution

Almost 3 years ago

Thanks the AIcrowd team and the Amazon search team to organize this extensive competition. Finally, this game is ended. Our team learned a lot here and we believe this memorable period will help a lot in our future. Here we generally introduce our solution for this competition.

General solution

We trained 3 cross encoder models (DebertaV3, CocoLM, and Bigbird) for each language which differs in the pertained models, training method (e.g., knowledge distillation), and data splitting. In total, six identical models (2 folds x 3 models) for each language are used to produce the initial prediction (4 class probability) of the query-product pair. Use those models only, the public set score for task 2 is around 0.816.
For Task 1, we used the output 4 class probability with some simple features to train a lightgbm model, calculate the expected gain (P_e*1 + P_s*0.1 + P_c*0.01), and sort the query-product list by this gain. This is method is slightly better than using LambdaRank directly in LightGBM.
For task 2 and Task 3, we used lightgbm to fuse those predictions with some important features. Most important features are designed based on the potential data leakage from task 1 and the behavior of the query-product group:
- The stats (min, medium, and max) of the cross encoder output probability grouped by query_id (0.007+ in Task 2 Public Leaderboard)
- The percentage of product_id in Task 1 product list grouped by query_id (0.006+ in Task 2 Public Leaderboard)

Small modification towards Cross Encoder architecture

As the product context has multiple fields (title, brand, and so on), we use neither the cls token nor mean (max) pooling to get the latent vector of the query-product pair. Instead, we concatenate the hidden states of a predefined token (query, title, brand color, etc.). The format is:
```
[CLS] [QUERY] <query_content> [SEP] [TITLE] <title_content> [SEP] [BRAND] <brand_content> [SEP] ...
```
where [TEXT] is the special token and <text_content> is the text contents.

Code submission speed up

Pre-process product token and save it as an HDF5 file.
Transfer all models to ONNX with FP16 precision.
Pre-sort the product id to reduce the side impact of batch zero padding.
Use a relatively small mini-batch size when inference (batch size = 4).

You can find our training code here and code submission here.

Currently, I am seeking either a machine learning engineer or a research scientist job in the US. Collaborated with my friend Yang @Yang_Liu_CTH, I won some champions and runner-ups in many competitions including the champion of the KDD CUP 2020 reinforcement learning track. You can email me directly or go to my personal website for more details.

Best
Dr. Wu, Fanyou
Postdoc @ Purdue University

wufanyou has not provided any information yet.

Notebooks

Create Notebook

Filters

Private

Notebooks

Create Notebook

Filters

Private

General solution

Small modification towards Cross Encoder architecture

Code submission speed up

Advertisement

Notebooks

Notebooks