AIcrowd | xiaozhou_wang

1 Follower

0 Following

xiaozhou_wang

Activity

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

Mon

Wed

Fri

Challenge Categories

Challenges Entered

Completed

NeurIPS 2022: CityLearn Challenge

AIcrowd

Intelligent Environments Lab

Using AI For Building’s Energy Management

Latest submissions

No submissions made in this challenge.

Completed

Data Purchasing Challenge 2022

Leibniz Centre for European Economic Research

What data should you label to get the most value for your money?

Latest submissions

See All

graded	179187	Thu, 7 Apr 2022 21:01:28
graded	179186	Thu, 7 Apr 2022 21:01:13
graded	179182	Thu, 7 Apr 2022 19:26:55

Completed

IJCAI 2022 - The Neural MMO Challenge

Parametrix.ai

MIT

THU_SIGS

AIcrowd

Latest submissions

See All

graded	189718	Mon, 20 Jun 2022 15:21:39
graded	189313	Thu, 16 Jun 2022 16:39:00
graded	189283	Thu, 16 Jun 2022 14:12:55

Completed

AI Blitz XIII

AIcrowd

5 Puzzles 21 Days. Can you solve it all?

Latest submissions

No submissions made in this challenge.

Completed

Learn-to-Race: Autonomous Racing Virtual Challenge

Carnegie Mellon University

Arrival

The first, open autonomous racing challenge.

Latest submissions

No submissions made in this challenge.

Participant	Rating
cadabullos	0

Participant	Rating

xiaozhou_wang has not joined any teams yet...

Data Purchasing Challenge 2022

Has anyone actually got the prize payment?

Almost 2 years ago

UPDATE:
I have just received the payment.

Has anyone actually got the prize payment?

Almost 2 years ago

Still no response from anyone, don’t want to assume bad faith, so would like to wait a bit longer. Not sure if the orgnizer can help us out here @dominik.rehse

Has anyone actually got the prize payment?

About 2 years ago

So the last update I received from the team is that they were sending all the payments by end of Jan. But apparently it didn’t happen, yet. Maybe @mohanty will be able to shed some light here for us.

Has anyone actually got the prize payment?

About 2 years ago

Just want to see if any of you got the prize payment? It is a bit too long now…
@sergey_zlobin @ArtemVoronov @santiactis @gaurav_singhal @leocd @sagar_rathod @moto @aorhan

First of all I would like to say huge thank you to aicrowd for this unique and super fun challenge! Also congrats to other winners! I consider myself very lucky to land the first place and would love to share my solution and learnings here!

My solution is very simple and straightforward. It is basically “iteratively purchase the next batch of data with the best possible model until run out of purchase budget”. One of the biggest challenges for this competition, imho, is that you cannot get very reliable performance scores locally or on public leaderboard. So if we can filter out more noise from the weak signal of the scores, the chance of overfitting may be much lower. And during my experiments, I was more focused on simple strategies, mainly because more complex strategies require more tuning which means more decisions to make, and higher risk of overfitting (since everytime when making a decision, we may like to refer to the same local and public scores, over and over again).

OK, enough hypothesis and high level talk! Here’s the details (code):

Most importantly, the purchase strategy:

def decide_what_to_purchase(probpred_ary, purchased_labels, num_labels):
    """purchase strategy given the predicted probabilities"""

    oneminusprob = 1 - probpred_ary
    topk_prob_ind = np.argsort((oneminusprob * np.log(oneminusprob) + probpred_ary * np.log(probpred_ary)).mean(axis=1))
    topk_prob_ind = [x for x in topk_prob_ind if x not in set(purchased_labels)][:num_labels]
    return set(topk_prob_ind)

basically, select the most uncertain samples based on entropy.

And for each iteration, number of labels to purchase is decided on the fly given the compute and purchase budget:

    def get_iteration_numbers(self, purchase_budget, compute_budget):
        for ratio in range(30, 100, 5):
            ratio /= 100
            num_labels_list = self.get_purchase_numbers(purchase_budget, ratio=ratio)
            for epochs in ZEWDPCBaseRun.generate_valid_epoch_comb(num_labels_list):
                epoch_time = self.calculate_total_time_given_epochs(epochs, num_labels_list)
                #print("ratio and time takes!", ratio, epoch_time)
                if epoch_time <= compute_budget:
                    print(f"settle with ratio {ratio}")
                    return num_labels_list, epochs
        return [purchase_budget], [10]

    def get_purchase_numbers(self, purchase_budget, ratio):
        start_number = int(1000 * ratio)
        if start_number >= purchase_budget:
            return [purchase_budget]
        num_labels_list = [start_number]
        remain_budget = purchase_budget - start_number
        while remain_budget > 0:
            label_to_purchase = min(remain_budget, int(ratio * 1000))
            remain_budget -= label_to_purchase
            num_labels_list.append(label_to_purchase)
        return num_labels_list

basically, we try to see if we can purchase 300 images for each iteration and get the purchase budget exhausted before we run out of time. If not, then we increase it to 350 images (so fewer iterations), and see if that works. And then increase to 400 images… And we do it for each iteration and only take the first element of the purchase list generated by the strategy. Namely, we may have decided to purchase 300 images for each iteration last round, and may increase that to 400 images this iteration. Mainly because we couldn’t accurately estimate the exact time it may take to train the next iteration model so would like to re-estimate each time if we can still finish in time. In fact, I did a moving average (with a extra 0.1 time buffer) to estimate how long it may take to train the next iteration.

self.train_time_per_image = self.train_time_per_image * 0.8 + (train_end_time - train_start_time) / len(tr_dset) / num_epochs * 0.3

Now within each iteration, we need to train the model.

The model I ended up with is

def load_model():
    """load a pretrained model"""
    model = models.regnet_y_8gf(pretrained=True)
    model.fc = nn.Linear(2016, 6)
    return model.to(device)

basically, the most complex model that is still reasonable to train.

Like any other computer vision problems, data augmentation is also very key:

self.train_transform = transforms.Compose([
transforms.ToTensor(),
transforms.ColorJitter(brightness=0.5, contrast=0.8, saturation=0.5, hue=0.5),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomVerticalFlip(p=0.5),
transforms.RandomInvert(p=0.25),
transforms.RandomAdjustSharpness(sharpness_factor=2, p=0.25),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
self.test_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

Was trying to also do test time augmentation but I found that prediction takes too much time, and it might not be worth it.

And the optimizer training scheduler:

scheduler = CosineAnnealingWarmRestarts(optimizer, T_0=len(dataloader)*NEPOCH, T_mult=1)

NEPOCH was a tuning parameter, I tried 5, 7, 10, 15. 5 or 7 didn’t seem to be enough, 15 seemed to be a bit too much, and 10 seemed to be pretty good.

So baiscally, the flow works like this:

train the model, with data aug, 10 epochs, and 1 cycle of cosine annealing lr sheduler
decide how many images to purchase based on compute and purchase budget
do one round of prediction to get probabilities and then purchase the most uncertain images based on entropy
collect the just purchased images, further train the model (load the model and optimizer checkpoints)
repeat until no purchase budget left

Hopefully this is helpful! And please ask any questions if you have any!

[Announcement] Leaderboard Winners

Almost 3 years ago

Thanks Camaro! Would love to share my approach. Just not sure what is the usual way of sharing solutions at aicrowd (e.g. do we just do a post or do we make our code public, or is there any other place they ask us to put everything there?)

Which submission is used for private LB scoring?

About 3 years ago

I couldn’t find it anywhere stating this. Which of the following statements is true?

all submissions to round 2 will be evaluated on private LB and the best score is picked automatically for each participant.
only the best submission on public LB of each participant will be selected for private LB scoring
each participant needs to specify which submission to use for private LB scoring.

Thank you in advance for clarification!

[deleted due to unclarity]

About 3 years ago

(topic deleted by author)

xiaozhou_wang has not provided any information yet.

Notebooks

Create Notebook

Filters

Private

Notebooks

Create Notebook

Filters

Private

Location

Badges

Activity

Challenge Categories

Challenges Entered

NeurIPS 2022: CityLearn Challenge

Latest submissions

Data Purchasing Challenge 2022

Latest submissions

IJCAI 2022 - The Neural MMO Challenge

Latest submissions

AI Blitz XIII

Latest submissions

Learn-to-Race: Autonomous Racing Virtual Challenge

Latest submissions

Data Purchasing Challenge 2022

Has anyone actually got the prize payment?

Has anyone actually got the prize payment?

Has anyone actually got the prize payment?

Has anyone actually got the prize payment?

🚀 Share your solutions! 🚀

[Announcement] Leaderboard Winners

Which submission is used for private LB scoring?

[deleted due to unclarity]

Notebooks

Notebooks