Loading
0 Follower
0 Following
wufanyou
FANYOU WU

Location

Seattle, US

Badges

1
1
0

Connect

Activity

Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Mon
Wed
Fri

Ratings Progression

Loading...

Challenge Categories

Loading...

Challenges Entered

Improve RAG with Real-World Benchmarks

Latest submissions

See All
failed 267153
graded 267152
graded 266979

What data should you label to get the most value for your money?

Latest submissions

See All
graded 173244

Latest submissions

See All
graded 195825
graded 195824
graded 195801

3D Seismic Image Interpretation by Machine Learning

Latest submissions

No submissions made in this challenge.

Play in a realistic insurance market, compete for profit!

Latest submissions

No submissions made in this challenge.

Multi Agent Reinforcement Learning on Trains.

Latest submissions

No submissions made in this challenge.

Evaluating RAG Systems With Mock KGs and APIs

Latest submissions

See All
graded 263889
graded 263880
failed 263878

Enhance RAG systems With Multiple Web Sources & Mock API

Latest submissions

See All
failed 267153
graded 267152
failed 265531
Participant Rating
Participant Rating
  • TLab Seismic Facies Identification Challenge
    View
  • ETS-Lab ESCI Challenge for Improving Product Search
    View
  • ETSLab Meta Comprehensive RAG Benchmark: KDD Cup 2024
    View

Meta Comprehensive RAG Benchmark: KDD Cup 2-9d1937

Final Evaluation Process & Team Scores

4 months ago

Can we obtain the full rankings for the main 3 tasks? At least I want to understand how far I am away from the top teams.

Has the Winner Notification already been sent?

4 months ago

We heard from the organizers by email that some part of the human annotations are still undergoing.

Copied from email:

We are still in the middle of annotations for other challenge tasks and will announce winners by email once the annotations are ready. The official winner announcement for the CRAG challenge will be made in early August.

‼️ ⏰ Select Submission ID before 20th June, 2024 23:55 UTC

5 months ago

Can we confirm which are our final submissions in the google forms after June 20th e.g (June 21st some time when online evaluation finished). It is because some people want to select their final solution based on the round 2 score. Besides the current evaluation system is stuck as a lot of people are submitting to the evaluation system.

‼️ ⏰ Select Submission ID before 20th June, 2024 23:55 UTC

5 months ago

Can we submit the same submissions for all 3 tasks? The aicrowd.json might be the same but the code is able to deal with all three task setting.

Can we submit solution that has not been tested in the online

5 months ago

Hi,

My limit of online submission is low due to debugging, I am wondering for the final submission, can we chooce solution which are not tested online yet.

Best
Fanyou

Submission Fail due to Private Test: Evaluation timed out 😒

5 months ago

@aicrowd_team

Could you help check what the reason is for the failure of those submissions:

Those submissions made small changes to previous successful submissions and all were tested successfully on the provided public dataset. Besides, all those submissions passed the Validation Step but got stuck during the start of the evaluations. So there are no progress bars for the evaluations.

Now I can idenfity which part in my code create the problem. But I am still not able to reproduce it offline. I wish I can get some error message from the log to help me solve the problem.

Best
Fanyou

Phase 1 has released the dataset , and how to appy a cut-off to limit Phase 2?

7 months ago

Hi organizers,

It is apparently there are two teams in Track 1 now (April 30th), use the public testset [1] to obtain nearly full score (~0.98). I am wondering in this senario, how to apply a cut-off in Phase 2? Every participant just need to upload public testset and obtain the similar full score. Is there still potential cut-off?

[1] What does `split` field mean? - #3 by graceyx.yale

Best
Fanyou

Regarding to maxiumn number tokens of response for llama 3

7 months ago

@aicrowd_team Yes. I understand that the code has already had this tokenzier. But Llama 3 had different vocab size (128K vs 32K). In some cases, the output number of tokens will be smaller than that of llama 2 if the output texts are the same. In terms of the model performance, LLama 3 is better (in the report) and I foresee people might use it. So I suggest if we can replace the current tokenzier for truncating predictions to Llama 3’s.

Regarding to maxiumn number tokens of response for llama 3

7 months ago

I want to raise organziers attention that Llama 3 had a larger vocabulary size (128K) comparing to llama 2 (32K). So we need to clear define in the rule that what tokenizer is used to truncate the response (previously the code used llama 2 tokenzier).

Best
Fanyou

Are we allowed to use LLama 3?

7 months ago

Hi Organziers,

Meta has introudced Llama 3 and is avalible at huggingface. I am wondering if we can use it for the competition. The Llama 3 - 8B model might be a good choice.

Best
Fanyou

Can we use other LLM at training stage?

8 months ago

Hi Organizers,

I want to understand if we can use other LLM (not LLAMA2 family) during the traning stage, specifically, used for RLHF and Data Generation.

Below is the raw request for model:

This KDD Cup requires participants to use Llama models to build their RAG solution. Specially, participants can use or fine-tune the following 4 Llama 2 models from https://llama.meta.com/llama-downloads:

  • llama-2-7b
  • llama-2-7b-chat
  • llama-2-70b
  • llama-2-70b-chat

Best
Fanyou

Amazon KDD Cup '23: Multilingual Recommendation Ch

Eligiblity for the attendence

Over 1 year ago

Hi, Organizer.

I am currently an Amazon employee but does not related to Amazon Search. Each year, I used to attend the KDD Cup to learn and practice and won several top places for KDD CUP before. I am wondering if I am eligible to attend it and eligble for the prize. If I am eligiable to attend but not for the prize, and I am luck to get top places, if it is possible to keep my ranking without granting any cash prize?

The rule is writen as :

People who, during the Challenge Period, are directors, officers, employees, interns, and contractors (β€œPersonnel”) of Sponsor, its parents, subsidiaries, affiliates, and their respective advertising, promotion and public relations agencies, representatives, and agents (collectively, β€œChallenge Entities”), immediate families members of such Personnel (parents, siblings, children, spouses, and life partners of each) and members of the households of such Personnel (whether related or not) are ineligible to win a prize in this Challenge. Sponsor reserves the right to verify eligibility and adjudicate any eligibility dispute at any time.

Best
Fanyou Wu

ESCI Challenge for Improving Product Search

πŸ‘‘ Final Winners Announcement πŸ‘‘

Over 2 years ago

Hi, Mohanty

Thanks to the whole Aicrowd team and Amazon search team to organize this year’s KDD CUP. I have a question about the KDD workshop. Do we need to or could we submit a paper for the KDD workshop? As the workshop paper deadline is also Aug 1st. If it is possible to finalize the ranking in advance (e.g, 2-3 days before the paper deadline). And I believe the current rank will not change anymore.

Best
Fanyou

[ETS-Lab] Our solution

Over 2 years ago

That task one feature will probably work on the private dataset as well that’s why use used it. If you check the product list in task two and task one. You will find a special pattern of the product order. In general, the product list is sorted as a training set, private set, and public set. Another reason why this feature will work is that the product-to-example ratio is close to 1 which means most products are used once.

There is another way to construct this leak feature that checks whether the query-product pair is in task 1 public dataset. This one will definitely fail in the private set as we cannot access that information.

Note that the evaluation service used V100 equipped with the tensor core. Transfer it to onnx fp16 help a lot for the speed. For example, our one unoptimized debertaV3-base model takes about 90 mins to do the inference with a single 2080Ti GPU locally but only 35-40 mins to do the inference online for 2 debertaV3 models (2 folds).

[ETS-Lab] Our solution

Over 2 years ago

Thanks the AIcrowd team and the Amazon search team to organize this extensive competition. Finally, this game is ended. Our team learned a lot here and we believe this memorable period will help a lot in our future. Here we generally introduce our solution for this competition.

General solution

  • We trained 3 cross encoder models (DebertaV3, CocoLM, and Bigbird) for each language which differs in the pertained models, training method (e.g., knowledge distillation), and data splitting. In total, six identical models (2 folds x 3 models) for each language are used to produce the initial prediction (4 class probability) of the query-product pair. Use those models only, the public set score for task 2 is around 0.816.

  • For Task 1, we used the output 4 class probability with some simple features to train a lightgbm model, calculate the expected gain (P_e*1 + P_s*0.1 + P_c*0.01), and sort the query-product list by this gain. This is method is slightly better than using LambdaRank directly in LightGBM.

  • For task 2 and Task 3, we used lightgbm to fuse those predictions with some important features. Most important features are designed based on the potential data leakage from task 1 and the behavior of the query-product group:

    • The stats (min, medium, and max) of the cross encoder output probability grouped by query_id (0.007+ in Task 2 Public Leaderboard)
    • The percentage of product_id in Task 1 product list grouped by query_id (0.006+ in Task 2 Public Leaderboard)

Small modification towards Cross Encoder architecture

  • As the product context has multiple fields (title, brand, and so on), we use neither the cls token nor mean (max) pooling to get the latent vector of the query-product pair. Instead, we concatenate the hidden states of a predefined token (query, title, brand color, etc.). The format is:
    [CLS] [QUERY] <query_content> [SEP] [TITLE] <title_content> [SEP] [BRAND] <brand_content> [SEP] ...
    
    where [TEXT] is the special token and <text_content> is the text contents.

Code submission speed up

  1. Pre-process product token and save it as an HDF5 file.
  2. Transfer all models to ONNX with FP16 precision.
  3. Pre-sort the product id to reduce the side impact of batch zero padding.
  4. Use a relatively small mini-batch size when inference (batch size = 4).

You can find our training code here and code submission here.

Advertisement

Currently, I am seeking either a machine learning engineer or a research scientist job in the US. Collaborated with my friend Yang @Yang_Liu_CTH, I won some champions and runner-ups in many competitions including the champion of the KDD CUP 2020 reinforcement learning track. You can email me directly or go to my personal website for more details.

Best
Dr. Wu, Fanyou
Postdoc @ Purdue University

πŸ“† Deadline Extension to 20th July && ⏳ Increased Timeout of 120 mins

Over 2 years ago

Endless deadline. Let me call it aliveline. :face_vomiting:

Calling on the organizer team to ban using external data in online code submission

Over 2 years ago

There is another way for the fairness that request all the teams to publish their external data.

Calling on the organizer team to ban using external data in online code submission

Over 2 years ago

If manually code review could be done. Then I support to ban external data as our team might benefit from it. But my stand of view is still from the rule itself. It is really not a wise idea to change anything at this stage.

Calling on the organizer team to ban using external data in online code submission

Over 2 years ago

Although our team does not use any external data, we do not support change the rule any more. Keeping change rule makes this the competition like a joke and make all of us tired!

Please do not change any rule again and I believe that the host have promised before in the deadline extension poster. @mohanty

Note that different from task 1 that @TransiEnt focus on, task 2 required many efforts to made the code more efficient. So we applied pre-process to tokenizer all products. Beside, the product id itself is also a feature and we put it to the transformers. If the product is disputed. All of my model need to be retrained and I have no enough computing resources. So it is impossible to ban an product id here.

The only way to ban external data is to inspect code afterwards which become extremely hard for the host to do.

Best
Fanyou

πŸ“† Deadline Extension to 20th July && ⏳ Increased Timeout of 120 mins

Over 2 years ago

I create an Unoffical vote for ths the extension of deadline and timeouts . Please share your option there. I wish the orginzer could head something from the poll.

wufanyou has not provided any information yet.