Loading
2 Follower
0 Following
tfriedel

Location

Berlin, DE

Badges

3
1
0

Activity

Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Mon
Wed
Fri

Challenge Categories

Loading...

Challenges Entered

Multi-Agent Dynamics & Mixed-Motive Cooperation

Latest submissions

No submissions made in this challenge.

Small Object Detection and Classification

Latest submissions

See All
graded 241075
graded 241074
graded 241073

A benchmark for image-based food recognition

Latest submissions

No submissions made in this challenge.

What data should you label to get the most value for your money?

Latest submissions

See All
graded 179179
failed 179174
graded 179153

Image-based plant identification at global scale

Latest submissions

No submissions made in this challenge.
Participant Rating
cadabullos 0
Sudhakar37 0
Participant Rating
tfriedel has not joined any teams yet...

AIcrowd

Scoring Announcement: Public vs. Private

About 1 year ago

Two questions:

  1. The designation of top three submissions is per team and not per participant right? Otherwise a team with multiple members would have a big advantage, as scores fluctuate widely and a best of 9 would be much more likely to win than a best of 3.

  2. The calculation of the private scores was already done at the time of submission, right? So we will not have to worry if a solution may fail on the private test set because of going over the 2sec limit per image.

Scoring Announcement: Public vs. Private

Over 1 year ago

Thanks for the clarification! I think this is quite fair and reasonable.

MosquitoAlert Challenge 2023

External datasets used by participants

Over 1 year ago

@MPWARE
So iNaturalist is a separate app from mosquito alert, so any image taken with the app directly will be different from images taken with the mosquito alert app. That said, it’s also possible to upload files you have stored locally, probably in both apps. It would be difficult to rule out images that have been uploaded in such a way by users to both apps, especially if you don’t have the mosquito alert image dataset. I guess if this happens rarely it will not be a big deal.

External datasets used by participants

Over 1 year ago

@MPWARE really? that’s impressive if you got such a high score only with the provided data! Got to tell us how you achieved that after the competition!

External datasets used by participants

Over 1 year ago

As required by the competition rules, I here share the external data I used for my competition entries. Other participants may want to share their data as well in this thread.

I have used the following external datasets:

inaturalist 2021 dataset:

A custom subset of iNaturalist images, including many mosquito species, but also other species was downloaded from inaturalist-open-data:

A csv file containing path, url and species name:

For downloading use e.g. a download manager like aria2. Careful, a lot of space is required (440 GB), which is why I’m sharing the links rather than reuploading the data.

The license is image specific but generally is either public domain or some form of creative commons. The bulk of the images have CC-BY-NC and CC BY-SA licenses. I’m not a lawyer, but I assume using them for non commercial machine learning models is fair use.

Justification for selection:

to solve the issue of having few samples in some minority classes and to have better discriminate features for insect classification.

🚨 Important Updates for Round 2

Over 1 year ago

Can you explain how the calculation of the private leaderboard is done, please?
So here’s how I understand it:
all submissions are run on the private test set and ranked by score.
This means one candidate may have submission x rank high on public leaderboard and submission y rank high on private leaderboard.
Another way it could be implemented would be that whatever solution is ranked high on the public leaderboard is then evaluated on the private test set and other submissions are ignored.
Which one is it?

Submissions are quite unstable

Over 1 year ago

I think it’s pretty tricky to get this right. Besides some indeterminism / caching issues that naturally occur, on cloud instances you additionally have to face things like noisy neighbors or β€œsteal time”.
see Understanding CPU Steal Time - when should you be worried? | Scout APM Blog
While you say the container gets the full node, it’s not quite clear if that means it get’s the full bare metal server. You are probably using EC2 instances with 2 cores, which are VMs on a bigger machine and thus you have to deal with the problems mentioned.
Increasing time to 2 sec doesn’t solve the problem, as people may just deploy bigger models and then run over the limit again. Imo only averaging can prevent the issue.

Submissions are quite unstable

Over 1 year ago

Not sure how the performance is measured, but if it’s like β€œNO image is allowed to take longer than 1sec” it could be relaxed to β€œON AVERAGE no image is allowed to take longer than 1sec”

About submissions

Over 1 year ago

@harshitsheoran for the stats on current sub you should be able to click on the β€œView” button next to the submission trend on the leader board. About past submissions you are right, those are missing. In another competition it was possible to see them.

πŸ“’ Announcement: Important Updates to Challenge Rules!

Over 1 year ago

The updated rules just say no public MosquitoAlert data may be used. This implies other data may still be used.

About submissions

Over 1 year ago

Yes, there’s a bug here. I can’t see the tab and the page I land on after submitting something is empty:

Data Purchasing Challenge 2022

[Announcement] Leaderboard Winners

Over 2 years ago

congrats to the winners! Quite a shakeup in the final leader board. I’m curious about your solutions, would be cool if you’d explain them.

:aicrowd: [Update] Round 2 of Data Purchasing Challenge is now live!

Almost 3 years ago

I agree. There’s now an incentive to not buy the most useful images, but images that can be learned and improve a model in the first few epochs. It would probably rule out β€œdifficult” images. It’s quite likely that this is of little practical relevance. While for competitions sake it’s ok, it would still be good if the results here had some practical relevance.
While I appreciate if the training pipeline would be made more realistic, I hope this will not be a change implemented like a week before deadline and force us to make big changes.

Which submission is used for private LB scoring?

Almost 3 years ago

good question!
If it would be 1., there would be an incentive to run many variations covering many potential distributions in the hope of one fitting best. So this seems bad.
2. seems plausible. But it has the danger that this submission is overfitted to the public leader board. It would incentivize not trying out many submissions.
3. seems best, but there is no feature currently where you can specify this.

Why there is no GaussianBlur in test transform?

Almost 3 years ago

does gaussianblur even make sense with the small particles ? Someone should look at how an image with this applied looks like. Let’s assume it totally washes out the small particles, but still is recognizable, but it just looks different, this could explain the worse scores in eval. Or maybe it does only affect the speed of convergence.

πŸ“Ή Town Hall Recording & Resources from top participants

Almost 3 years ago

I tried this method in round 1 (locally) and it worked pretty well:

It’s sold as an active learning method, but really does select labels in one go. However it really is essential that it uses a model that was trained in an unsupervised fashion, like facebook’s Dino. I tried using the vision transformer that came with torchvision or an efficientnet that was finetuned on the given data. Both didn’t work. Since dino is not among the supported pretrained weights it’s not an option in this competition.
I also think while it may work, it’s likely not the best performing method.

πŸ“Ή Town Hall Recording & Resources from top participants

Almost 3 years ago

thanks for putting this online! I totally didn’t assume labels were noisy. When looking at some images I did wonder where for example some dents were supposed to be, but because the data was generated synthetically I just assumed labels would be 100% correct. Definitely going to take this into account now.

:aicrowd: [Update] Round 2 of Data Purchasing Challenge is now live!

Almost 3 years ago

In the first round I hit some wall with efnet b1, but didn’t with efnet b4. I.e. using active learning I got an improvement with b4, but not with b1. This is not a totally conclusive argument, but some evidence. However with frozen layers and only 10 epochs at a fixed learning rate, it’s a different situation.

A big issue I see is that the variance of the final scores seems too high and too much dependent on random seeds.
For example, with a modified starter kit (batch size=64, aggregated_dataset used) and a purchase budget of 500 which always buys the first 500 images and using different seeds I measured these f1 scores:
[0.23507449716686799, 0.17841491812405716, 0.19040294167615202, 0.17191250777735645, 0.16459303242037562]
mean: 0.188
std: 0.025

In the first round the improvements I observed with active learning were between 0.7% and 1.5%. Now if results fluctuate up to 7% just based on random seed this is pretty bad. I think the winner should not be decided based on luck or on his skill to fight random number generators.
You do run multiple runs, but even then it’s still not great I guess. Would be better to bring variance down for individual runs, as much as possible.
I guess some experiments should be run to see what improves this. Training for longer, averaging more runs, using weight averaging, not freezing layers, using efnet b1 or b0, different learning rate schedules or dropout would be some of the parameters that are worth experimenting with.

Here’s a paper I just googled (haven’t read it yet) about this issue:
ACCOUNTING FOR VARIANCE IN MACHINE LEARNING BENCHMARKS

And another one:
Torch.manual_seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision

:aicrowd: [Update] Round 2 of Data Purchasing Challenge is now live!

Almost 3 years ago

bug In local_evaluation.py in the post purchase training phase:

trainer.train(
    training_dataset, num_epochs=10, validation_percentage=0.1, batch_size=5
)

It should be aggregated_dataset instead, otherwise none of the purchased labels have an effect! This bug may also be present in your server side evaluation scripts.

another thing:
in run.purchase_phase a dict is returned. Should it be a dict? And also is it allowed to fill in labels for indices you didn’t purchase, say for example with pseudo labeling?

In instantiate_purchased_dataset the type hint says it’s supposed to be a set, which is inconsistent and also wouldn’t work. It would in theory even be possible to return some other type in purchase_phase, which has the dict interface, i.e. supports .keys() but allows repetitions of keys. This would be some hack to increase the dataset to as many images as you want, which is surely an unwanted exploit. I suggest you convert whatever is returned by purchase_phase to a dict, and depending on if pseudo labeling is allowed or not, further validate it.

It would be good if you would test your training pipeline if it can actually achieve good scores under ideal conditions (say with buying all labels).

:aicrowd: [Update] Round 2 of Data Purchasing Challenge is now live!

Almost 3 years ago

I also noticed, the feature layers are frozen during training of the efnet4 model. Is that intentional? Seems like this will guarantee low scores.

Machine Learning engineer @ Plantix