Loading

BKMKT

[Getting Started Notebook] BKMKT Challange

This is a Baseline Code to get you started with the challenge.

gauransh_k

You can use this code to start understanding the data and create a baseline model for further imporvments.

Starter Code for BKMKT Practice Challange

Note : Create a copy of the notebook and use the copy for submission. Go to File > Save a Copy in Drive to create a new copy

Author: Gauransh Kumar

Downloading Dataset

Installing aicrowd-cli

In [1]:
!pip install aicrowd-cli
%load_ext aicrowd.magic
Requirement already satisfied: aicrowd-cli in /home/gauransh/anaconda3/lib/python3.8/site-packages (0.1.10)
Requirement already satisfied: toml<1,>=0.10.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.10.2)
Requirement already satisfied: rich<11,>=10.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (10.15.2)
Requirement already satisfied: requests-toolbelt<1,>=0.9.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.9.1)
Requirement already satisfied: requests<3,>=2.25.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (2.26.0)
Requirement already satisfied: click<8,>=7.1.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (7.1.2)
Requirement already satisfied: pyzmq==22.1.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (22.1.0)
Requirement already satisfied: tqdm<5,>=4.56.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (4.62.2)
Requirement already satisfied: GitPython==3.1.18 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (3.1.18)
Requirement already satisfied: gitdb<5,>=4.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from GitPython==3.1.18->aicrowd-cli) (4.0.9)
Requirement already satisfied: smmap<6,>=3.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->GitPython==3.1.18->aicrowd-cli) (5.0.0)
Requirement already satisfied: idna<4,>=2.5 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (3.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.26.6)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.10.8)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.10.0)
Requirement already satisfied: colorama<0.5.0,>=0.4.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.4.4)
Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.9.1)
In [2]:
%aicrowd login
Please login here: https://api.aicrowd.com/auth/zKG91CQV78Vt9LoFforV0oAHknwuSP14wUSeEM1Z7fo
Opening in existing browser session.
API Key valid
Saved API Key successfully!
In [3]:
!rm -rf data
!mkdir data
%aicrowd ds dl -c bkmkt -o data

Importing Libraries

In this baseline, we will be using skleanr library to train the model and generate the predictions

In [4]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import normalize, LabelEncoder
import os
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

Reading the dataset

Here, we will read the train.csv which contains both training samples & labels, and test.csv which contains testing samples.

In [5]:
# Reading the CSV
train_data_df = pd.read_csv("data/train.csv")
test_data_df = pd.read_csv("data/test.csv")

# train_data.shape, test_data.shape
display(train_data_df.head())
display(test_data_df.head())
age job marital education default housing loan contact month day_of_week ... campaign pdays previous poutcome emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed y
0 57 services married high.school unknown no no telephone may mon ... 1 999 0 nonexistent 1.1 93.994 -36.4 4.857 5191.0 no
1 37 services married high.school no yes no telephone may mon ... 1 999 0 nonexistent 1.1 93.994 -36.4 4.857 5191.0 no
2 40 admin. married basic.6y no no no telephone may mon ... 1 999 0 nonexistent 1.1 93.994 -36.4 4.857 5191.0 no
3 45 services married basic.9y unknown no no telephone may mon ... 1 999 0 nonexistent 1.1 93.994 -36.4 4.857 5191.0 no
4 59 admin. married professional.course no no no telephone may mon ... 1 999 0 nonexistent 1.1 93.994 -36.4 4.857 5191.0 no

5 rows × 21 columns

age job marital education default housing loan contact month day_of_week duration campaign pdays previous poutcome emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed
0 34 admin. married university.degree no no yes cellular nov fri 408 9 999 0 nonexistent -0.1 93.200 -42.0 4.021 5195.8
1 47 admin. married basic.9y no yes no cellular aug fri 109 1 999 0 nonexistent 1.4 93.444 -36.1 4.966 5228.1
2 60 blue-collar married unknown unknown yes no telephone may mon 5 2 999 0 nonexistent 1.1 93.994 -36.4 4.857 5191.0
3 54 services married high.school no yes no cellular aug tue 136 5 999 0 nonexistent 1.4 93.444 -36.1 4.966 5228.1
4 47 self-employed married university.degree no no no cellular aug tue 320 3 999 0 nonexistent 1.4 93.444 -36.1 4.965 5228.1

Data Preprocessing

In [6]:
# one hot encoding the categorical data

# columns with categgorical data
cat_columns = ['job', 'marital', 'education', 'default', 'housing', 'loan', 'contact', 'month', 'day_of_week','poutcome']

# changing type from object to category
for col in cat_columns:
    train_data_df[col] = train_data_df[col].astype('category')
    test_data_df[col] = test_data_df[col].astype('category')

# one hot encoding the data using pd.get_dummies()
train_data_df = pd.get_dummies(data=train_data_df,columns=cat_columns)
test_data_df = pd.get_dummies(data=test_data_df,columns=cat_columns)
In [7]:
# encoding the lables using sklean.preprocessing.LabelEncoder()
label_encoder = LabelEncoder()
label_encoder.fit(train_data_df["y"])
train_data_df["y"] = label_encoder.transform(train_data_df["y"])
train_data_df.head()
Out[7]:
age duration campaign pdays previous emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed ... month_oct month_sep day_of_week_fri day_of_week_mon day_of_week_thu day_of_week_tue day_of_week_wed poutcome_failure poutcome_nonexistent poutcome_success
0 57 149 1 999 0 1.1 93.994 -36.4 4.857 5191.0 ... 0 0 0 1 0 0 0 0 1 0
1 37 226 1 999 0 1.1 93.994 -36.4 4.857 5191.0 ... 0 0 0 1 0 0 0 0 1 0
2 40 151 1 999 0 1.1 93.994 -36.4 4.857 5191.0 ... 0 0 0 1 0 0 0 0 1 0
3 45 198 1 999 0 1.1 93.994 -36.4 4.857 5191.0 ... 0 0 0 1 0 0 0 0 1 0
4 59 139 1 999 0 1.1 93.994 -36.4 4.857 5191.0 ... 0 0 0 1 0 0 0 0 1 0

5 rows × 64 columns

In [8]:
# Separating data from the dataframe for final training
X = normalize(train_data_df.drop(columns=["y"]).to_numpy())
y = train_data_df["y"].to_numpy()
print(X.shape, y.shape)
(32928, 63) (32928,)
In [9]:
# Visualising the final lable classes for training
sns.countplot(y)
/home/gauransh/anaconda3/lib/python3.8/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
  warnings.warn(
Out[9]:
<AxesSubplot:ylabel='count'>

Splitting the data

In [10]:
# Splitting the training set, and training & validation
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
print(X_train.shape)
print(y_train.shape)
(26342, 63)
(26342,)
In [11]:
X_train[0], y_train[0]
Out[11]:
(array([ 7.13018842e-03,  4.61585882e-02,  1.87636537e-04,  1.87448901e-01,
         0.00000000e+00,  2.62691152e-04,  1.75335086e-02, -6.77367900e-03,
         9.31990681e-04,  9.80982580e-01,  1.87636537e-04,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         1.87636537e-04,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  1.87636537e-04,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  1.87636537e-04,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  1.87636537e-04,
         1.87636537e-04,  0.00000000e+00,  0.00000000e+00,  1.87636537e-04,
         0.00000000e+00,  0.00000000e+00,  1.87636537e-04,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  1.87636537e-04,
         0.00000000e+00,  1.87636537e-04,  0.00000000e+00]),
 0)

Training the Model

In [12]:
model = KNeighborsClassifier()
model.fit(X_train, y_train)
Out[12]:
KNeighborsClassifier()

Validation

In [13]:
model.score(X_val, y_val)
Out[13]:
0.9044943820224719

So, we are done with the baseline let's test with real testing data and see how we submit it to challange.

Predictions

In [14]:
# Separating data from the dataframe for final testing
X_test = normalize(test_data_df.to_numpy())
print(X_test.shape)
(8238, 63)
In [15]:
# Predicting the labels
predictions = model.predict(X_test)
predictions.shape
Out[15]:
(8238,)
In [16]:
# Converting the predictions array into pandas dataset
submission = pd.DataFrame({"y":label_encoder.inverse_transform(predictions)})
submission
Out[16]:
y
0 no
1 no
2 no
3 no
4 no
... ...
8233 no
8234 no
8235 no
8236 no
8237 no

8238 rows × 1 columns

In [17]:
# Saving the pandas dataframe
!rm -rf assets
!mkdir assets
submission.to_csv(os.path.join("assets", "submission.csv"), index=False)

Submitting our Predictions

Note : Please save the notebook before submitting it (Ctrl + S)

In [18]:
!!aicrowd submission create -c bkmkt -f assets/submission.csv
Out[18]:
['submission.csv ━━━━━━━━━━━━━━━━━━━━━ 100.0% • 27.1/25.5 KB • 30.2 MB/s • 0:00:00',
 '                                  ╭─────────────────────────╮                                  ',
 '                                  │ Successfully submitted! │                                  ',
 '                                  ╰─────────────────────────╯                                  ',
 '                                        Important links                                        ',
 '┌──────────────────┬──────────────────────────────────────────────────────────────────────────┐',
 '│  This submission │ https://www.aicrowd.com/challenges/bkmkt/submissions/169729              │',
 '│                  │                                                                          │',
 '│  All submissions │ https://www.aicrowd.com/challenges/bkmkt/submissions?my_submissions=true │',
 '│                  │                                                                          │',
 '│      Leaderboard │ https://www.aicrowd.com/challenges/bkmkt/leaderboards                    │',
 '│                  │                                                                          │',
 '│ Discussion forum │ https://discourse.aicrowd.com/c/bkmkt                                    │',
 '│                  │                                                                          │',
 '│   Challenge page │ https://www.aicrowd.com/challenges/bkmkt                                 │',
 '└──────────────────┴──────────────────────────────────────────────────────────────────────────┘',
 "{'submission_id': 169729, 'created_at': '2021-12-24T19:15:16.203Z'}"]
In [ ]:


Comments

You must login before you can post a comment.

Execute