FOODC Editorial

Over 4 years ago

The ChallengeΒΆ

Maintaining a healthy diet is difficult. As the saying goes, the best way to escape a problem is to solve it. So why not leverage the power of deep learning and computer vision to build the foundation of a semi-automated food tracking application?

With over 9300 hand-annotated images with 61 classes, the challenge is to train accurate models that can look at images of food items and detect the food items present in the image.

It's time to unleash the food (data)scientist in you! Given any image, identify the food item present in it.

Downloads and InstallsΒΆ

In [0]:
!wget -q https://s3.eu-central-1.wasabisys.com/aicrowd-practice-challenges/public/foodc/v0.1/train_images.zip
!wget -q https://s3.eu-central-1.wasabisys.com/aicrowd-practice-challenges/public/foodc/v0.1/test_images.zip
!wget -q https://s3.eu-central-1.wasabisys.com/aicrowd-practice-challenges/public/foodc/v0.1/train.csv
!wget -q https://s3.eu-central-1.wasabisys.com/aicrowd-practice-challenges/public/foodc/v0.1/test.csv
In [0]:
!mkdir data
!mkdir data/test
!mkdir data/train
!unzip train_images -d data/train
!unzip test_images -d data/test
In [0]:
!mkdir models


In [0]:
import sys
import os
import gc
import warnings
import torch

import torch.nn as nn
import numpy as np
import pandas as pd 
import torch.nn.functional as F

from fastai.script import *
from fastai.vision import *
from fastai.callbacks import *
from fastai.distributed import *
from fastprogress import fastprogress
from torchvision.models import *
In [0]:
print("[INFO] GPU:", torch.cuda.get_device_name())
[INFO] GPU: Tesla P100-PCIE-16GB

DataBunch and ModelΒΆ

Here we use a technique called progressive resizing. At each step the model is loaded with weights trained on images of lower sizes.

In [0]:
def get_data(size, batch_size):
  function that returns a DataBunch as needed for the Learner
  train = pd.read_csv("train.csv")
  src = (ImageList.from_df(train, path="data/", folder="train/train_images/").split_by_rand_pct(0.1).label_from_df())
  tfms = get_transforms(do_flip=True, flip_vert=False, max_rotate=10.0, 
                      max_zoom=1.1, max_lighting=0.2, max_warp=0.2, p_affine=0.75, p_lighting=0.75)

  data = (src.transform(
  assert sorted(set(train.ClassName.unique())) == sorted(data.classes), "Class Mismatch"
  print("[INFO] Number of Classes: ", data.c)
  data.num_workers = 4
  return data
In [0]:
sample_data = get_data(32, (2048//32))
sample_data.show_batch(3, 3)
[INFO] Number of Classes:  61

As you can see, the transforms have been applied and the image is normalized as well!

We first initialize all the models.

In [0]:
learn = create_cnn(get_data(32, (2048//32)), models.densenet161, 
                   metrics=[accuracy, FBeta(beta=1,average='macro')])
learn.model_dir = "models/"

learn = create_cnn(get_data(64, (2048//64)), models.densenet161, 
                   metrics=[accuracy, FBeta(beta=1,average='macro')]).load("densenet_32")
learn.model_dir = "models/"

learn = create_cnn(get_data(128, (2048//128)), models.densenet161, 
                   metrics=[accuracy, FBeta(beta=1,average='macro')]).load("densenet_64")
learn.model_dir = "models/"

learn = create_cnn(get_data(256, (2048//256)), models.densenet161, 
                   metrics=[accuracy, FBeta(beta=1,average='macro')]).load("densenet_128")
learn.model_dir = "models/"
[INFO] Number of Classes:  61
Downloading: "https://download.pytorch.org/models/densenet161-8d451a50.pth" to /root/.cache/torch/checkpoints/densenet161-8d451a50.pth
[INFO] Number of Classes:  61
[INFO] Number of Classes:  61
[INFO] Number of Classes:  61
In [0]:
def train_model(size, iter1, iter2, mixup=False):
  function to quickly train a model for a certain number of iterations.
  size_match = {"256": "128", "128": "64", "64": "32"}
  learn = create_cnn(get_data(size, (2048//size)), models.densenet161, 
  learn.model_dir = "models/"
  if mixup:
  if str(size) != str(32):
    learn.load("densenet_" + str(size_match[str(size)]))

  name = "densenet_" + str(size)
  print("[INFO] Training for : ", name)

  learn.fit_one_cycle(iter1, 1e-4, callbacks=[ShowGraph(learn),
                            SaveModelCallback(learn, monitor='f_beta', mode='max', name=name)])
  learn.fit_one_cycle(iter2, 5e-5, callbacks=[ShowGraph(learn),
                            SaveModelCallback(learn, monitor='f_beta', mode='max', name=name)])

Here you might notice the use of a function mixup. mixup is a callback in fastai that is extremely efficient at regularizing models in computer vision.

Instead of feeding the model the raw images, we take two images (not necessarily from the same class) and make a linear combination of them: in terms of tensors, we have:

new_image = t * image1 + (1-t) * image2

where t is a float between 0 and 1. The target we assign to that new image is the same combination of the original targets:

new_target = t * target1 + (1-t) * target2

assuming the targets are one-hot encoded (which isn’t the case in PyTorch usually). And it's as simple as that.

For example:

Source Dog or cat? The right answer here is 70% dog and 30% cat!
In [0]:
train_model(32, 5, 3)
[INFO] Number of Classes:  61
[INFO] Training for :  densenet_32
epoch train_loss valid_loss accuracy f_beta time
0 5.436698 4.320179 0.106223 0.053227 01:54
1 4.155217 3.488357 0.257511 0.111412 01:54
2 3.625813 3.116575 0.283262 0.144687 01:55
3 3.403799 3.113646 0.290773 0.148819 01:56
4 3.333214 3.136955 0.293991 0.144410 01:56
Better model found at epoch 0 with f_beta value: 0.05322723090648651.
Better model found at epoch 1 with f_beta value: 0.1114121824502945.
Better model found at epoch 2 with f_beta value: 0.14468735456466675.
Better model found at epoch 3 with f_beta value: 0.14881914854049683.
epoch train_loss valid_loss accuracy f_beta time
0 3.269448 2.944852 0.311159 0.151784 02:01
1 3.095446 2.667753 0.329399 0.163058 02:01
2 2.985259 2.677143 0.334764 0.164230 02:02
Better model found at epoch 0 with f_beta value: 0.15178431570529938.
Better model found at epoch 1 with f_beta value: 0.1630583107471466.
Better model found at epoch 2 with f_beta value: 0.1642296463251114.
In [0]:
train_model(64, 5, 4)
[INFO] Number of Classes:  61
[INFO] Training for :  densenet_64
epoch train_loss valid_loss accuracy f_beta time
0 3.042036 2.391506 0.375536 0.202430 02:24
1 2.755056 2.175985 0.427039 0.274385 02:23
2 2.513455 2.062872 0.440987 0.286241 02:23
3 2.333173 2.029333 0.448498 0.294666 02:23
4 2.274806 2.010746 0.449571 0.299761 02:23
Better model found at epoch 0 with f_beta value: 0.20242981612682343.
Better model found at epoch 1 with f_beta value: 0.2743850350379944.
Better model found at epoch 2 with f_beta value: 0.286241352558136.
Better model found at epoch 3 with f_beta value: 0.2946656346321106.
Better model found at epoch 4 with f_beta value: 0.2997610867023468.
epoch train_loss valid_loss accuracy f_beta time
0 2.224584 2.064080 0.450644 0.308239 02:32
1 2.183188 1.941107 0.477468 0.358477 02:32
2 1.866471 1.893163 0.482833 0.357009 02:33
3 1.833622 1.912134 0.483906 0.363549 02:33
Better model found at epoch 0 with f_beta value: 0.3082387149333954.
Better model found at epoch 1 with f_beta value: 0.3584773540496826.
Better model found at epoch 3 with f_beta value: 0.36354920268058777.
In [0]:
train_model(128, 7, 4, mixup=True)
[INFO] Number of Classes:  61
[INFO] Training for :  densenet_128
epoch train_loss valid_loss accuracy f_beta time
0 3.102915 1.607829 0.563305 0.414498 03:27
1 2.943032 1.549630 0.581545 0.438603 03:26
2 2.808276 1.498592 0.587983 0.435788 03:26
3 2.682379 1.481404 0.592275 0.444419 03:27
4 2.538528 1.465215 0.580472 0.441078 03:28
5 2.511207 1.447936 0.597640 0.465081 03:26
6 2.440458 1.438690 0.604077 0.465968 03:25
Better model found at epoch 0 with f_beta value: 0.4144982099533081.
Better model found at epoch 1 with f_beta value: 0.43860334157943726.
Better model found at epoch 3 with f_beta value: 0.44441917538642883.
Better model found at epoch 5 with f_beta value: 0.4650808572769165.
Better model found at epoch 6 with f_beta value: 0.46596816182136536.
epoch train_loss valid_loss accuracy f_beta time
0 2.546155 1.477883 0.585837 0.457701 03:43
1 2.494597 1.511773 0.579399 0.443396 03:44
2 2.333117 1.432688 0.595494 0.473695 03:44
3 2.253165 1.432526 0.597640 0.471653 03:43
Better model found at epoch 0 with f_beta value: 0.4577012360095978.
Better model found at epoch 2 with f_beta value: 0.4736945331096649.
In [0]:
train_model(256, 7, 5, mixup=True)
[INFO] Number of Classes:  61
[INFO] Training for :  densenet_256
epoch train_loss valid_loss accuracy f_beta time
0 2.703704 1.285418 0.629828 0.506337 05:32
1 2.622411 1.273359 0.631974 0.494505 05:30
2 2.474278 1.328985 0.607296 0.483533 05:31
3 2.390934 1.312649 0.619099 0.496389 05:32
4 2.265631 1.301950 0.610515 0.480573 05:33
5 2.341162 1.284232 0.624463 0.505368 05:35
6 2.306352 1.292962 0.621245 0.501745 05:36
Better model found at epoch 0 with f_beta value: 0.50633704662323.
epoch train_loss valid_loss accuracy f_beta time
0 2.633306 1.271392 0.637339 0.507305 06:12
1 2.680736 1.447017 0.596566 0.460401 06:13
2 2.451501 1.412368 0.596566 0.469816 06:13
3 2.242612 1.392771 0.609442 0.487551 06:13
4 2.171517 1.368796 0.619099 0.496713 06:12
Better model found at epoch 0 with f_beta value: 0.5073045492172241.
In [0]:
learn = create_cnn(get_data(300, (2048//300)), models.densenet161, 
                   metrics=[accuracy, FBeta(beta=1,average='macro')]).load("densenet_256")
learn.model_dir = "models/"
size = 300
name = "densenet_" + str(size)
print("[INFO] Training for : ", name)

learn.fit_one_cycle(5, 1e-4, callbacks=[ShowGraph(learn),
                          SaveModelCallback(learn, monitor='f_beta', mode='max', name=name)])
[INFO] Number of Classes:  61
[INFO] Training for :  densenet_300
epoch train_loss valid_loss accuracy f_beta time
0 2.749508 1.281459 0.644850 0.566936 06:56
1 2.606565 1.301558 0.634120 0.522477 06:56
2 2.626434 1.291356 0.637339 0.534306 06:55
3 2.604175 1.296236 0.650215 0.560165 07:01
4 2.425535 1.281673 0.648069 0.548248 07:00
Better model found at epoch 0 with f_beta value: 0.5669360160827637.
In [0]:
interp = ClassificationInterpretation.from_learner(learn)
losses, idxs = interp.top_losses()

display(interp.plot_top_losses(9, figsize=(15,11)))
display(interp.plot_confusion_matrix(figsize=(12,12), dpi=100))
In [0]:
[('coffee-with-caffeine', 'espresso-with-caffeine', 15),
 ('salad-leaf-salad-green', 'mixed-salad-chopped-without-sauce', 11),
 ('bread-white', 'butter', 7),
 ('bread-sourdough', 'bread-wholemeal', 6),
 ('bread-white', 'bread-wholemeal', 6),
 ('salad-leaf-salad-green', 'leaf-spinach', 6),
 ('butter', 'bread-wholemeal', 5),
 ('coffee-with-caffeine', 'white-coffee-with-caffeine', 5),
 ('espresso-with-caffeine', 'coffee-with-caffeine', 5)]

The model is getting confused between some very common categories like coffee-with-caffeine and espresso-with-caffeine.

The model needs to be made more robust to this and hence appropriate augmentations can be used.

In [0]:
def make_submission(learn, name):
  images = []
  prediction = []
  probability = []
  test_path = "data/test/test_images/"
  test = pd.read_csv("test.csv")
  files = test.ImageId
  for i in files:
        img = open_image(os.path.join(test_path, i))
        pred_class, pred_idx, outputs = learn.predict(img)
  answer = pd.DataFrame({'ImageId': images, 'ClassName': prediction, 'probability': probability})
  answer[["ImageId","ClassName"]].to_csv(name, index=False)
In [0]:
make_submission(learn, name="submission_size300.csv")
ImageId ClassName probability
0 90e63a2fde.jpg water 0.994021
1 a554d1ca8d.jpg water-mineral 0.990370
2 48317e8ee8.jpg water 0.856607
3 79528df667.jpg hard-cheese 0.901751
4 6d2f2f63f5.jpg bread-wholemeal 0.979332

Improving FurtherΒΆ

  • Appropriate augmnentations
  • Different models like densenet201, resnet50
  • Mixed Precision training (i.e. to_fp16() in fastai)


