Activity
Ratings Progression
Challenge Categories
Challenges Entered
Improve RAG with Real-World Benchmarks
Latest submissions
See Allfailed | 267153 | ||
graded | 267152 | ||
graded | 266979 |
What data should you label to get the most value for your money?
Latest submissions
See Allgraded | 173244 |
3D Seismic Image Interpretation by Machine Learning
Latest submissions
Play in a realistic insurance market, compete for profit!
Latest submissions
Multi Agent Reinforcement Learning on Trains.
Latest submissions
Evaluating RAG Systems With Mock KGs and APIs
Latest submissions
See Allgraded | 263889 | ||
graded | 263880 | ||
failed | 263878 |
Enhance RAG systems With Multiple Web Sources & Mock API
Latest submissions
See Allfailed | 267153 | ||
graded | 267152 | ||
failed | 265531 |
Participant | Rating |
---|
Participant | Rating |
---|
Meta Comprehensive RAG Benchmark: KDD Cup 2-9d1937
Has the Winner Notification already been sent?
4 months agoWe heard from the organizers by email that some part of the human annotations are still undergoing.
Copied from email:
We are still in the middle of annotations for other challenge tasks and will announce winners by email once the annotations are ready. The official winner announcement for the CRAG challenge will be made in early August.
βΌοΈ β° Select Submission ID before 20th June, 2024 23:55 UTC
5 months agoCan we confirm which are our final submissions in the google forms after June 20th e.g (June 21st some time when online evaluation finished). It is because some people want to select their final solution based on the round 2 score. Besides the current evaluation system is stuck as a lot of people are submitting to the evaluation system.
βΌοΈ β° Select Submission ID before 20th June, 2024 23:55 UTC
5 months agoCan we submit the same submissions for all 3 tasks? The aicrowd.json
might be the same but the code is able to deal with all three task setting.
Can we submit solution that has not been tested in the online
5 months agoHi,
My limit of online submission is low due to debugging, I am wondering for the final submission, can we chooce solution which are not tested online yet.
Best
Fanyou
Submission Fail due to Private Test: Evaluation timed out π’
5 months agoCould you help check what the reason is for the failure of those submissions:
Those submissions made small changes to previous successful submissions and all were tested successfully on the provided public dataset. Besides, all those submissions passed the Validation Step but got stuck during the start of the evaluations. So there are no progress bars for the evaluations.
Now I can idenfity which part in my code create the problem. But I am still not able to reproduce it offline. I wish I can get some error message from the log to help me solve the problem.
Best
Fanyou
Phase 1 has released the dataset , and how to appy a cut-off to limit Phase 2?
7 months agoHi organizers,
It is apparently there are two teams in Track 1 now (April 30th), use the public testset [1] to obtain nearly full score (~0.98). I am wondering in this senario, how to apply a cut-off in Phase 2? Every participant just need to upload public testset and obtain the similar full score. Is there still potential cut-off?
[1] What does `split` field mean? - #3 by graceyx.yale
Best
Fanyou
Regarding to maxiumn number tokens of response for llama 3
7 months ago@aicrowd_team Yes. I understand that the code has already had this tokenzier. But Llama 3 had different vocab size (128K vs 32K). In some cases, the output number of tokens will be smaller than that of llama 2 if the output texts are the same. In terms of the model performance, LLama 3 is better (in the report) and I foresee people might use it. So I suggest if we can replace the current tokenzier for truncating predictions to Llama 3βs.
Regarding to maxiumn number tokens of response for llama 3
7 months agoI want to raise organziers attention that Llama 3 had a larger vocabulary size (128K) comparing to llama 2 (32K). So we need to clear define in the rule that what tokenizer is used to truncate the response (previously the code used llama 2 tokenzier).
Best
Fanyou
Are we allowed to use LLama 3?
7 months agoHi Organziers,
Meta has introudced Llama 3 and is avalible at huggingface. I am wondering if we can use it for the competition. The Llama 3 - 8B model might be a good choice.
Best
Fanyou
Can we use other LLM at training stage?
8 months agoHi Organizers,
I want to understand if we can use other LLM (not LLAMA2 family) during the traning stage, specifically, used for RLHF and Data Generation.
Below is the raw request for model:
This KDD Cup requires participants to use Llama models to build their RAG solution. Specially, participants can use or fine-tune the following 4 Llama 2 models from https://llama.meta.com/llama-downloads:
- llama-2-7b
- llama-2-7b-chat
- llama-2-70b
- llama-2-70b-chat
Best
Fanyou
Amazon KDD Cup '23: Multilingual Recommendation Ch
Eligiblity for the attendence
Over 1 year agoHi, Organizer.
I am currently an Amazon employee but does not related to Amazon Search. Each year, I used to attend the KDD Cup to learn and practice and won several top places for KDD CUP before. I am wondering if I am eligible to attend it and eligble for the prize. If I am eligiable to attend but not for the prize, and I am luck to get top places, if it is possible to keep my ranking without granting any cash prize?
The rule is writen as :
People who, during the Challenge Period, are directors, officers, employees, interns, and contractors (βPersonnelβ) of Sponsor, its parents, subsidiaries, affiliates, and their respective advertising, promotion and public relations agencies, representatives, and agents (collectively, βChallenge Entitiesβ), immediate families members of such Personnel (parents, siblings, children, spouses, and life partners of each) and members of the households of such Personnel (whether related or not) are ineligible to win a prize in this Challenge. Sponsor reserves the right to verify eligibility and adjudicate any eligibility dispute at any time.
Best
Fanyou Wu
ESCI Challenge for Improving Product Search
π Final Winners Announcement π
Over 2 years agoHi, Mohanty
Thanks to the whole Aicrowd team and Amazon search team to organize this yearβs KDD CUP. I have a question about the KDD workshop. Do we need to or could we submit a paper for the KDD workshop? As the workshop paper deadline is also Aug 1st. If it is possible to finalize the ranking in advance (e.g, 2-3 days before the paper deadline). And I believe the current rank will not change anymore.
Best
Fanyou
[ETS-Lab] Our solution
Over 2 years agoThat task one feature will probably work on the private dataset as well thatβs why use used it. If you check the product list in task two and task one. You will find a special pattern of the product order. In general, the product list is sorted as a training set, private set, and public set. Another reason why this feature will work is that the product-to-example ratio is close to 1 which means most products are used once.
There is another way to construct this leak feature that checks whether the query-product pair is in task 1 public dataset. This one will definitely fail in the private set as we cannot access that information.
Note that the evaluation service used V100 equipped with the tensor core. Transfer it to onnx fp16 help a lot for the speed. For example, our one unoptimized debertaV3-base model takes about 90 mins to do the inference with a single 2080Ti GPU locally but only 35-40 mins to do the inference online for 2 debertaV3 models (2 folds).
[ETS-Lab] Our solution
Over 2 years agoThanks the AIcrowd team and the Amazon search team to organize this extensive competition. Finally, this game is ended. Our team learned a lot here and we believe this memorable period will help a lot in our future. Here we generally introduce our solution for this competition.
General solution
-
We trained 3 cross encoder models (DebertaV3, CocoLM, and Bigbird) for each language which differs in the pertained models, training method (e.g., knowledge distillation), and data splitting. In total, six identical models (2 folds x 3 models) for each language are used to produce the initial prediction (4 class probability) of the query-product pair. Use those models only, the public set score for task 2 is around 0.816.
-
For Task 1, we used the output 4 class probability with some simple features to train a lightgbm model, calculate the expected gain (P_e*1 + P_s*0.1 + P_c*0.01), and sort the query-product list by this gain. This is method is slightly better than using LambdaRank directly in LightGBM.
-
For task 2 and Task 3, we used lightgbm to fuse those predictions with some important features. Most important features are designed based on the potential data leakage from task 1 and the behavior of the query-product group:
- The stats (min, medium, and max) of the cross encoder output probability grouped by query_id (0.007+ in Task 2 Public Leaderboard)
- The percentage of product_id in Task 1 product list grouped by query_id (0.006+ in Task 2 Public Leaderboard)
Small modification towards Cross Encoder architecture
- As the product context has multiple fields (title, brand, and so on), we use neither the cls token nor mean (max) pooling to get the latent vector of the query-product pair. Instead, we concatenate the hidden states of a predefined token (query, title, brand color, etc.). The format is:
where[CLS] [QUERY] <query_content> [SEP] [TITLE] <title_content> [SEP] [BRAND] <brand_content> [SEP] ...
[TEXT]
is the special token and<text_content>
is the text contents.
Code submission speed up
- Pre-process product token and save it as an HDF5 file.
- Transfer all models to ONNX with FP16 precision.
- Pre-sort the product id to reduce the side impact of batch zero padding.
- Use a relatively small mini-batch size when inference (batch size = 4).
You can find our training code here and code submission here.
Advertisement
Currently, I am seeking either a machine learning engineer or a research scientist job in the US. Collaborated with my friend Yang @Yang_Liu_CTH, I won some champions and runner-ups in many competitions including the champion of the KDD CUP 2020 reinforcement learning track. You can email me directly or go to my personal website for more details.
Best
Dr. Wu, Fanyou
Postdoc @ Purdue University
π Deadline Extension to 20th July && β³ Increased Timeout of 120 mins
Over 2 years agoEndless deadline. Let me call it aliveline.
Calling on the organizer team to ban using external data in online code submission
Over 2 years agoThere is another way for the fairness that request all the teams to publish their external data.
Calling on the organizer team to ban using external data in online code submission
Over 2 years agoIf manually code review could be done. Then I support to ban external data as our team might benefit from it. But my stand of view is still from the rule itself. It is really not a wise idea to change anything at this stage.
Calling on the organizer team to ban using external data in online code submission
Over 2 years agoAlthough our team does not use any external data, we do not support change the rule any more. Keeping change rule makes this the competition like a joke and make all of us tired!
Please do not change any rule again and I believe that the host have promised before in the deadline extension poster. @mohanty
Note that different from task 1 that @TransiEnt focus on, task 2 required many efforts to made the code more efficient. So we applied pre-process to tokenizer all products. Beside, the product id itself is also a feature and we put it to the transformers. If the product is disputed. All of my model need to be retrained and I have no enough computing resources. So it is impossible to ban an product id here.
The only way to ban external data is to inspect code afterwards which become extremely hard for the host to do.
Best
Fanyou
π Deadline Extension to 20th July && β³ Increased Timeout of 120 mins
Over 2 years agoI create an Unoffical vote for ths the extension of deadline and timeouts . Please share your option there. I wish the orginzer could head something from the poll.
Final Evaluation Process & Team Scores
4 months agoCan we obtain the full rankings for the main 3 tasks? At least I want to understand how far I am away from the top teams.