Learning to Smell
Machine learnig approch with 3 diffferent vectos embedding
xgboost , NearestNeighbors, and neural network with 3 different kinds of embedding
Machine learning approach!
hey there everyone I try xgboost , NearestNeighbors, and neural network with 3 different kinds of embedding if you find any learning in it give me like, it boosts my motivation to contribute open-source
you can also check my fastai approach for a new perspective
FAST AI - Colab Link
i open to any new ideas, criticism, suggestion any thanks
!pip install -q --upgrade fastcore
!pip install -q --upgrade fastai # Make sure we have the latest fastai
#Please restart the runtime after this step, so fastai new version will be available
import fastai
print(fastai.__version__)
#it should be '2.0.15'
import pandas as pd
from matplotlib import pyplot as plt
from PIL import Image as PImage # TO avoid confusion later
from fastai.vision.all import *
!gdown --id 1t5be8KLHOz3YuSmiiPQjopb4c_q2U4tG
!unzip olfactorydata.zip
#thanks mmi333
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")
sample_submission = pd.read_csv("sample_submission.csv")
vocabulary = open("vocabulary.txt").read()
#Download images [256*256] i create with the help of "rdkit"
!gdown --id 10zsdoTzY9tBqkXOyycSvfk1BGMXMIXnl
!gdown --id 1-1DzGTEezaTCTrUetClg-01P1VqSewbF
#unzip the file please gives correct path of zip images
!unzip '/content/train_imgs.zip'
!unzip '/content/test_imgs.zip'
train['imgs'] = [f"imgs/{i}.png" for i in range(len(train))]
test['imgs'] = [f"imgs_test/{i}.png" for i in range(len(test))]
train.head(2)
base_img_path = '/content/content/'
def get_x(r): return f'{base_img_path}{r["imgs"]}'
def get_y(r): return r['SENTENCE'].split(',')
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
get_x = get_x, get_y = get_y)
dsets = dblock.datasets(train)
dsets.train[0]
# Create DataLoders
bs = 32
dls = dblock.dataloaders(train,bs = bs)
dls.show_batch()
#To check gpu usages
# !nvidia-smi
Training
#different pretrained model
models = {
"resnet18":resnet18,"resnet34":resnet34,"resnet50":resnet50,
"resnet101":resnet101,
"resnet152":resnet152,
"densenet121":densenet121,
"densenet161":densenet161,
"densenet169":densenet169,"densenet201":densenet201,
"vgg11_bn":vgg11_bn,"vgg13_bn":vgg13_bn,
"vgg16_bn":vgg16_bn,"vgg19_bn":vgg19_bn
}
#Choose any model and trained your model how to it might improve the scrores
#but it take more time to train then other smaller models
learn = cnn_learner(dls, models["resnet34"], metrics=partial(accuracy_multi, thresh=0.9))
learn.fine_tune(3, base_lr=3e-3, freeze_epochs=4)
what is accuracy_mult:
it is metric which is used for multilabel classification
why because simple "accuracy" only compares one prediction with one target variable
but here the problem is multilabel so we choose
"accuracy_mult" metric to see model improvment
What is thresh parameter :
it is the hyperparameter which can tweek to it ranges
from [0 to 1] inclusively
Prediction
#predictions
from tqdm import tqdm
preds = []
for i in tqdm(test['imgs'],total = len(test['imgs'])) :
# Image.open(base_img_path+i)
arr=learn.predict(base_img_path+i)[2]
top_3 = np.argpartition(arr, -6)[-6:]
top3_vocab = dls.train.vocab[top_3]
p = ''
for l in top3_vocab:
p = p + l + ","
# print(type(p),p)
preds.append(p)
preds[:10]
# Create the submission.csv file
test["PREDICTIONS"] = preds
test.to_csv("fastai_smells3.csv",index=False)
#with the above code i got
# Top-5 TSS - 0.189
# Top-2 TSS - 0.189
definatly with some tweeking there is lot to improve here and and eveyone must have there secreat sauch if you like and need more explations on any topic here let me know
Improving your score
There are a few different ways to improve your score. For example:
1) Ensembles I'm generally not a fan of large ensembles, but combining several different models can give more robust overall predictions. Just make sure you document everything so it's reproducible!
2) Test Time Augmentation
replaced learn.get_preds() with learn.tta()? This is a free boost to your score - the model will make predictions for four different versions of each input image and take the average.
3) Image Augmentation
Fastai makes it easy to add image augmentation - check out the docs for some diffeent parameters you can tweak: https://docs.fast.ai/vision.augment (especially the aug_transforms section)
4) More Training
We did very quick training runs here - even just inccreasing the number of epochs to fine-tune will get an improvement in your score. You can also get fancier with how you train, using the lr_find method to pick learning rates and so on (there are lots of good fastai tutorials out there with instructions)
5) Better models
We're using the resnet34 architecture - not bad, but not the latest greatet model either. A larger model like densenet201 could take longer to train and might be more complex, but you can sometimes get better performance by scaling up.
7) Get creative - I'm sure there are all sorts of other fun ways to squeeze some extra performance out :)
Enjoy, and good luck.
This notebook made with love by Anish Jain
PS: I'm happy to answer questions pop them in the discussion boards and tag me :)
Content
Comments
You must login before you can post a comment.