DGBRD
[Getting Started Notebook] DGBRD Challange
This is a Baseline Code to get you started with the challenge.
You can use this code to start understanding the data and create a baseline model for further imporvments.
Getting Started Code for DGBRD Challenge on AIcrowd¶
Author : Gauransh Kumar, Sanjay Pokkali¶
Import necessary packages¶
import numpy as np
import pandas as pd
import os
import cv2
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import random
import zipfile
from sklearn.utils import shuffle
from tqdm import tqdm_notebook
import fastai
from fastai import *
from fastai.vision import *
fastai.__version__
Download the files¶
These include the train test and validation images as well the csv indexing them
!pip install aicrowd-cli
%load_ext aicrowd.magic
%aicrowd login
!rm -rf data
!mkdir data
%aicrowd ds dl -c dgbrd -o data
!mkdir FinalData
# training data
with zipfile.ZipFile("data/train.zip","r") as zip_ref:
zip_ref.extractall("FinalData/")
# test data
with zipfile.ZipFile("data/test.zip","r") as zip_ref:
zip_ref.extractall("FinalData/")
# valdation data
with zipfile.ZipFile("data/val.zip","r") as zip_ref:
zip_ref.extractall("FinalData/")
Check for corrupt images¶
root="FinalData/"
for path in os.listdir(root):
verify_images(root+path,delete=True)
import warnings
warnings.filterwarnings("ignore")
Loading Data¶
Loading CSV Data¶
Using pandas, we can directly load the csvs. We have created a val set for you, which has an even split of all the classes
train_df=pd.read_csv("data/train.csv")
val_df=pd.read_csv("data/val.csv")
test_df=pd.read_csv("data/test.csv")
train_df["filename"]=train_df["filename"]
val_df["filename"]=val_df["filename"]
test_df["filename"]=test_df["filename"]
Defining Transformations¶
When we do any sort of computer vision task, it is important to define carefully thought out transformations on the image. This is to allow our model to more accurately predict a wider range of images. Try experimenting with various other types of transformations. Click here to learn more
tfms = get_transforms(do_flip=True,flip_vert=False,max_zoom=1.05,max_warp=0.1)
Creating dataframes¶
We now load our data into dataframes and apply the transformations we defined above. For now we create the dataframes for the tran and val data.
root="FinalData/"
train_data = (ImageList.from_df(train_df,root+"train",cols=["filename"])
.split_none()
.label_from_df(cols=["label"])
.transform(tfms, size=128))
val_data = (ImageList.from_df(val_df,root+"val",cols=["filename"])
.split_none()
.label_from_df(cols=["label"])
.transform(tfms, size=128))
We now merge the two dataframes into one dataframe which contains train and val dataset.
train_data.valid=val_data.train
train_data
We now create a databunch for the data that is to be used in training. A databunch in fastai is the data that is going to be passed into the neural network. While creating the databunch, we also define the batch size, number of workers, and we normalize the images according to imagenet_stats.
You can also leave the normalize as blank, and fastai will normalize according to the dataset. However since the imagenet dataset has over 1000 classes, it is very likely that imagenet has classes which overlap with our dataset
train_databunch=train_data.databunch(bs=16,num_workers=0).normalize(imagenet_stats)
We now create the test_data dataframe. We then add the test data to the databunch
test_data=ImageList.from_df(test_df,root+"test",cols=["filename"])
train_databunch.add_test(test_data)
Notice how the train_databunch has all the data we require to train the model, run validation, and to predict on!
train_databunch
train_databunch.show_batch(rows=3, figsize=(8,10))
print(train_databunch.classes)
from fastai.callbacks import *
learn = cnn_learner(train_databunch, models.resnet18, metrics=[error_rate, accuracy],model_dir="models/")
Now that we've loaded our model, FastAI provides a convienient function to find the most efficient learning rate for your model at that time. This is what we have implemented below
learn.lr_find()
learn.recorder.plot(suggestion=True)
TRAINING PHASE 🏋️¶
Alright enough talk and time to train. We define the number of epochs and we define a callback which saves the best model trained during this training phase based on the error_rate.
We will use fastai fit_one_cyle to train our model. To learn more about the function check here.
learn.fit_one_cycle(1,1.0E-03,callbacks=[ShowGraph(learn),SaveModelCallback(learn,monitor='error_rate',mode='min',
name="bestmodel")])
learn.save('resnet18-5epochs-stage-1') # save model
learn.unfreeze() # unfreeze all layers
learn.lr_find() # find learning rate
learn.recorder.plot(suggestion=True) # plot learning rate
learn.fit_one_cycle(1,6.31E-07,callbacks=[ShowGraph(learn),SaveModelCallback(learn,monitor='error_rate',mode='min',
name="bestmodel")])
Lets try and figure out where our model messed up¶
Fastai provides a way to see which images our model is having trouble with.
interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()
len(train_databunch.valid_ds)==len(losses)==len(idxs)
As you can see here, our model is very accurate and doesn't mess up for many images
interp.plot_top_losses(9, figsize=(15,11))
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)
interp.most_confused()
Testing Phase 😅¶
Moment of truth. Lets run prediction on the test set and finally generate our submission file!
Here we load our best model, which is convieniently called bestmodel
learn.load("bestmodel")
preds,y = learn.get_preds(ds_type=DatasetType.Test)
Now that we've run prediction, let us convert our prediction from a tensor to a list. The below code does that. In this code, we take the max of each row of the tensor, move it to the memory accessable by the cpu, convert it to a numpy array and then convert it to a list
preds_list=preds.argmax(dim=-1).cpu().numpy().tolist()
Since the predictions are all in numbers, lets create a dictionary for us to easily convert it into a list of string labels
mapping={}
for x in range(len(learn.data.classes)):
mapping[x]=learn.data.classes[x]
print(mapping)
submission=[]
for x in preds_list:
submission.append(mapping[x])
filename=test_df.filename.tolist()
d={"filename":filename,"label":submission}
df=pd.DataFrame(d)
df.head()
Submitting the predictions¶
Lets convert the dataframe into the csv file! After this we are done just submit it right from here using the aicrowd cli
.
!rm -rf assets
!mkdir assets
df.to_csv(os.path.join("assets", "submission.csv"), index=False)
!aicrowd submission create -c dgbrd -f assets/submission.csv
Well Done! 👍 See your score on the leaderboard.¶
Content
Comments
You must login before you can post a comment.