Starter Code for DEBAT Practice Challange

Note : Create a copy of the notebook and use the copy for submission. Go to File > Save a Copy in Drive to create a new copy

Downloading Dataset¶

Installing aicrowd-cli

In [1]:

!pip install aicrowd-cli
%load_ext aicrowd.magic

Requirement already satisfied: aicrowd-cli in /home/gauransh/anaconda3/lib/python3.8/site-packages (0.1.10)
Requirement already satisfied: GitPython==3.1.18 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (3.1.18)
Requirement already satisfied: toml<1,>=0.10.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.10.2)
Requirement already satisfied: requests<3,>=2.25.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (2.26.0)
Requirement already satisfied: pyzmq==22.1.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (22.1.0)
Requirement already satisfied: tqdm<5,>=4.56.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (4.62.2)
Requirement already satisfied: rich<11,>=10.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (10.15.2)
Requirement already satisfied: click<8,>=7.1.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (7.1.2)
Requirement already satisfied: requests-toolbelt<1,>=0.9.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.9.1)
Requirement already satisfied: gitdb<5,>=4.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from GitPython==3.1.18->aicrowd-cli) (4.0.9)
Requirement already satisfied: smmap<6,>=3.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->GitPython==3.1.18->aicrowd-cli) (5.0.0)
Requirement already satisfied: idna<4,>=2.5 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (3.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.26.6)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.10.8)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.10.0)
Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.9.1)
Requirement already satisfied: colorama<0.5.0,>=0.4.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.4.4)

In [2]:

%aicrowd login

Please login here: https://api.aicrowd.com/auth/ycr0lzTcSe7Duw_0RaBdj5TPgYcvBuTxKsAiAda5S6c
Opening in existing browser session.
API Key valid
Saved API Key successfully!

In [3]:

!rm -rf data
!mkdir data
%aicrowd ds dl -c debat -o data

Importing Libraries¶

In this baseline, we will be using skleanr library to train the model and generate the predictions

In [4]:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import normalize, LabelEncoder
from scipy.sparse import hstack
import os
from IPython.display import display

Reading the dataset¶

Here, we will read the train.csv which contains both training samples & labels, and test.csv which contains testing samples.

In [5]:

# Reading the CSV
# name=["unit_id", "golden_or_not", "unit_state", "trusted_judgments", "last_judgment_at", "agree_or_not_variance", "sentence", "agree_or_not"]
train_data_df = pd.read_csv("data/train.csv")
test_data_df = pd.read_csv("data/test.csv")

# train_data.shape, test_data.shape
display(train_data_df.head())
display(test_data_df.head())
print(train_data_df.shape, test_data_df.shape)

	candidate	candidate:confidence	relevant_yn	relevant_yn:confidence	sentiment:confidence	subject_matter	subject_matter:confidence	name	retweet_count	text	tweet_created	tweet_id	sentiment
0	Donald Trump	1.0	yes	1.0	0.6515	Abortion	1.0	jeannetteeee12	236	RT @feministabulous: Trump can say "I hate the...	8/7/15 9:35	6.296920e+17	Negative
1	No candidate mentioned	1.0	yes	1.0	1.0000	LGBT issues	1.0	censoredbee	1	RT @gearhead81: #GOPDebate so the debate for P...	8/7/15 9:48	6.296960e+17	Negative
2	No candidate mentioned	1.0	yes	1.0	0.6905	None of the above	1.0	chvsr	0	#Repeal was the word of the night. #GOPDebate	8/7/15 9:51	6.296960e+17	Neutral
3	Scott Walker	1.0	yes	1.0	0.6279	Jobs and Economy	1.0	RedRoadRail	0	Scott Walker's Wisconsin Is Seeing The Fastest...	8/7/15 9:47	6.296950e+17	Positive
4	Marco Rubio	1.0	yes	1.0	1.0000	None of the above	1.0	craigcarroll	0	#GOPDebate\nMegyn Kelly:A+\nRubio:A\nFiorina:A...	8/7/15 9:45	6.296950e+17	Positive

	candidate	candidate:confidence	relevant_yn	relevant_yn:confidence	sentiment:confidence	subject_matter	subject_matter:confidence	name	retweet_count	text	tweet_created	tweet_id
0	No candidate mentioned	1.0	yes	1.0	1.0000	Foreign Policy	0.3571	jazzicattt	276	RT @amaraconda: ISIS is not islam jfc republic...	8/7/15 9:41	6.296940e+17
1	Donald Trump	1.0	yes	1.0	1.0000	None of the above	1.0000	4closureNation2	1216	RT @pattonoswalt: If Trump is against "p.c. cu...	8/7/15 9:52	6.296970e+17
2	No candidate mentioned	1.0	yes	1.0	0.6484	None of the above	1.0000	JamesComtois	0	Dorp. #GOPDebate https://t.co/cy9C8nvS5y	8/7/15 9:33	6.296920e+17
3	Donald Trump	1.0	yes	1.0	1.0000	Women's Issues (not abortion though)	1.0000	corinne_fal	0	Donald Trump just gave a master class on how t...	8/7/15 9:38	6.296930e+17
4	No candidate mentioned	1.0	yes	1.0	0.6813	Religion	0.6703	DeityFree	12	RT @MrPolyatheist: Because you know, god is re...	8/7/15 9:45	6.296950e+17

(1764, 13) (441, 12)

Data Preprocessing¶

In the preprocessing we have a lot of textual data so we will first One-Hot Encode the Possible Features and use TF IDF Tokens to convert the sentence to a possible feature and use it in the regression.

In [6]:

# removing some unneccesary data
train_data_df.drop(['tweet_id', 'tweet_created', 'name'], axis=1, inplace=True)
test_data_df.drop(['tweet_id', 'tweet_created', 'name'], axis=1, inplace=True)

In [7]:

# utility function to one hot encode the dataset
def one_hot_df(df):
    df = pd.concat([df, pd.get_dummies(df["candidate"])],axis=1)
    df.drop("candidate",axis=1, inplace=True)
    df = pd.concat([df, pd.get_dummies(df["subject_matter"])],axis=1)
    df.drop("subject_matter",axis=1, inplace=True)
    df = pd.concat([df, pd.get_dummies(df["relevant_yn"])],axis=1)
    df.drop("relevant_yn",axis=1, inplace=True)
    return df

In [8]:

train_data_df = one_hot_df(train_data_df)
test_data_df = one_hot_df(test_data_df)
display(train_data_df)
display(test_data_df)

	candidate:confidence	relevant_yn:confidence	sentiment:confidence	subject_matter:confidence	retweet_count	text	sentiment	Ben Carson	Chris Christie	Donald Trump	...	Gun Control	Healthcare (including Medicare)	Immigration	Jobs and Economy	LGBT issues	None of the above	Racial issues	Religion	Women's Issues (not abortion though)	yes
0	1.0000	1.0000	0.6515	1.0000	236	RT @feministabulous: Trump can say "I hate the...	Negative	0	0	1	...	0	0	0	0	0	0	0	0	0	1
1	1.0000	1.0000	1.0000	1.0000	1	RT @gearhead81: #GOPDebate so the debate for P...	Negative	0	0	0	...	0	0	0	0	1	0	0	0	0	1
2	1.0000	1.0000	0.6905	1.0000	0	#Repeal was the word of the night. #GOPDebate	Neutral	0	0	0	...	0	0	0	0	0	1	0	0	0	1
3	1.0000	1.0000	0.6279	1.0000	0	Scott Walker's Wisconsin Is Seeing The Fastest...	Positive	0	0	0	...	0	0	0	1	0	0	0	0	0	1
4	1.0000	1.0000	1.0000	1.0000	0	#GOPDebate\nMegyn Kelly:A+\nRubio:A\nFiorina:A...	Positive	0	0	0	...	0	0	0	0	0	1	0	0	0	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1759	1.0000	1.0000	0.6745	1.0000	124	RT @CharlotteAbotsi: Today is the 50th anniver...	Neutral	0	0	0	...	0	0	0	0	0	1	0	0	0	1
1760	1.0000	1.0000	1.0000	1.0000	9	RT @mrdaveyd: So donald Trump says everyone is...	Negative	0	0	1	...	0	0	0	0	0	1	0	0	0	1
1761	1.0000	1.0000	1.0000	1.0000	27	RT @TUSK81: Not one candidate spoke up against...	Negative	0	0	1	...	0	0	0	0	0	0	1	0	0	1
1762	1.0000	1.0000	0.6517	1.0000	0	established #GOPDebate afraid someone willing ...	Negative	0	0	0	...	0	0	0	0	0	1	0	0	0	1
1763	0.4589	0.6774	0.6774	0.4589	0	ReTw EmotientInc: Emotion-Reading Technology F...	Neutral	0	0	0	...	0	0	0	0	0	1	0	0	0	1

1764 rows × 31 columns

	candidate:confidence	relevant_yn:confidence	sentiment:confidence	subject_matter:confidence	retweet_count	text	Ben Carson	Chris Christie	Donald Trump	Jeb Bush	...	Gun Control	Healthcare (including Medicare)	Immigration	Jobs and Economy	LGBT issues	None of the above	Racial issues	Religion	Women's Issues (not abortion though)	yes
0	1.0000	1.0000	1.0000	0.3571	276	RT @amaraconda: ISIS is not islam jfc republic...	0	0	0	0	...	0	0	0	0	0	0	0	0	0	1
1	1.0000	1.0000	1.0000	1.0000	1216	RT @pattonoswalt: If Trump is against "p.c. cu...	0	0	1	0	...	0	0	0	0	0	1	0	0	0	1
2	1.0000	1.0000	0.6484	1.0000	0	Dorp. #GOPDebate https://t.co/cy9C8nvS5y	0	0	0	0	...	0	0	0	0	0	1	0	0	0	1
3	1.0000	1.0000	1.0000	1.0000	0	Donald Trump just gave a master class on how t...	0	0	1	0	...	0	0	0	0	0	0	0	0	1	1
4	1.0000	1.0000	0.6813	0.6703	12	RT @MrPolyatheist: Because you know, god is re...	0	0	0	0	...	0	0	0	0	0	0	0	1	0	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
436	1.0000	1.0000	1.0000	1.0000	0	17 candidates + #Trump makes the #GOPDebate in...	0	0	1	0	...	0	0	0	0	0	1	0	0	0	1
437	1.0000	1.0000	1.0000	1.0000	802	RT @DanScavino: .@realDonaldTrump wins 1st #GO...	0	0	1	0	...	0	0	0	0	0	1	0	0	0	1
438	1.0000	1.0000	0.6435	1.0000	0	Trump's lead shows just how low the Bar is for...	0	0	1	0	...	0	0	0	0	0	1	0	0	0	1
439	1.0000	1.0000	1.0000	0.6633	4	RT @pppatticake: . @jojokejohn @milesjreed @Mi...	0	0	0	0	...	0	0	0	0	0	0	0	0	0	1
440	0.4698	0.6854	0.3483	0.4698	0	Who's the real illegal alien AMERICA #GOPDeba...	0	0	0	0	...	0	0	1	0	0	0	0	0	0	1

441 rows × 30 columns

Transfroming Train Data for Submission¶

In [9]:

# For beginning, transform train_data_df['sentence'] to lowercase using text.lower()
train_data_df['text'].str.lower()

# Then replace everything except the letters and numbers in the spaces.
# it will facilitate the further division of the text into words.
train_data_df['text'].replace('[^a-zA-Z0-9]', ' ', regex = True)

# Convert a collection of raw documents to a matrix of TF-IDF features with TfidfVectorizer
vectorizer = TfidfVectorizer(min_df=5)
X_tfidf = vectorizer.fit_transform(train_data_df['text']) 

# merging final features to the Dataframe and removing the redundent colums
train_data_df = pd.concat([train_data_df,pd.DataFrame(X_tfidf.toarray())], axis=1)
train_data_df.drop("text", axis=1, inplace=True)
display(train_data_df)

	candidate:confidence	relevant_yn:confidence	sentiment:confidence	subject_matter:confidence	retweet_count	sentiment	Ben Carson	Chris Christie	Donald Trump	Jeb Bush	...	764	765	766	767	768	769	770	771	772	773
0	1.0000	1.0000	0.6515	1.0000	236	Negative	0	0	1	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1	1.0000	1.0000	1.0000	1.0000	1	Negative	0	0	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
2	1.0000	1.0000	0.6905	1.0000	0	Neutral	0	0	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
3	1.0000	1.0000	0.6279	1.0000	0	Positive	0	0	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
4	1.0000	1.0000	1.0000	1.0000	0	Positive	0	0	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1759	1.0000	1.0000	0.6745	1.0000	124	Neutral	0	0	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1760	1.0000	1.0000	1.0000	1.0000	9	Negative	0	0	1	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1761	1.0000	1.0000	1.0000	1.0000	27	Negative	0	0	1	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1762	1.0000	1.0000	0.6517	1.0000	0	Negative	0	0	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1763	0.4589	0.6774	0.6774	0.4589	0	Neutral	0	0	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0

1764 rows × 804 columns

In [10]:

# Separating data from the dataframe for final training
X = normalize(train_data_df.drop(["sentiment"], axis=1).to_numpy())
lable_encoder = LabelEncoder()
lable_encoder = lable_encoder.fit(train_data_df.sentiment)
train_data_df.sentiment = lable_encoder.transform(train_data_df.sentiment)
y = train_data_df.sentiment.to_numpy()
print(X.shape, y.shape)

(1764, 803) (1764,)

Transfroming Test Data for Submission¶

In [11]:

# For beginning, transform test_data_df['sentence'] to lowercase using text.lower()
test_data_df['text'].str.lower()

# Then replace everything except the letters and numbers in the spaces.
# it will facilitate the further division of the text into words.
test_data_df['text'].replace('[^a-zA-Z0-9]', ' ', regex = True)

# Convert a collection of raw documents to a matrix of TF-IDF features with TfidfVectorizer
X_tfidf_test = vectorizer.transform(test_data_df['text']) 

# merging final features to the Dataframe and removing the redundent colums
test_data_df = pd.concat([test_data_df,pd.DataFrame(X_tfidf_test.toarray())], axis=1)
test_data_df.drop("text", axis=1, inplace=True)
display(test_data_df)

	candidate:confidence	relevant_yn:confidence	sentiment:confidence	subject_matter:confidence	retweet_count	Ben Carson	Chris Christie	Donald Trump	Jeb Bush	John Kasich	...	764	765	766	767	768	769	770	771	772	773
0	1.0000	1.0000	1.0000	0.3571	276	0	0	0	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.0	0.000000	0.0
1	1.0000	1.0000	1.0000	1.0000	1216	0	0	1	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.0	0.000000	0.0
2	1.0000	1.0000	0.6484	1.0000	0	0	0	0	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.0	0.000000	0.0
3	1.0000	1.0000	1.0000	1.0000	0	0	0	1	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.0	0.000000	0.0
4	1.0000	1.0000	0.6813	0.6703	12	0	0	0	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.201117	0.0	0.000000	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
436	1.0000	1.0000	1.0000	1.0000	0	0	0	1	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.0	0.000000	0.0
437	1.0000	1.0000	1.0000	1.0000	802	0	0	1	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.0	0.000000	0.0
438	1.0000	1.0000	0.6435	1.0000	0	0	0	1	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.0	0.000000	0.0
439	1.0000	1.0000	1.0000	0.6633	4	0	0	0	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.0	0.000000	0.0
440	0.4698	0.6854	0.3483	0.4698	0	0	0	0	0	0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.0	0.275911	0.0

441 rows × 803 columns

Splitting the data¶

In [12]:

# Splitting the training set, and training & validation
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
print(X_train.shape)
print(y_train.shape)

(1411, 803)
(1411,)

In [13]:

X_train[0], y_train[0]

Out[13]:

(array([0.35355339, 0.35355339, 0.35355339, 0.35355339, 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.35355339, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.35355339,
        0.        , 0.        , 0.        , 0.35355339, 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.09014736, 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.02682314, 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.10007157, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.08005099,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.06134699, 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.101034  , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.11254785,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.12372076, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.16187724, 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.04281326, 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.09326223, 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.14509054, 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        ]),
 0)

Training the Model¶

In [14]:

model = MLPClassifier()
model.fit(X_train, y_train)

/home/gauransh/anaconda3/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
  warnings.warn(

Out[14]:

MLPClassifier()

Validation¶

In [15]:

model.score(X_val, y_val)

Out[15]:

0.5835694050991501

So, we are done with the baseline let's test with real testing data and see how we submit it to challange.

Predictions¶

In [16]:

# Separating data from the dataframe for final testing
X_test = normalize(test_data_df.to_numpy())
print(X_test.shape)

(441, 803)

In [17]:

# Predicting the labels
predictions = model.predict(X_test)
predictions = lable_encoder.inverse_transform(predictions)

In [18]:

# Converting the predictions array into pandas dataset
submission = pd.DataFrame({"sentiment":predictions})
submission

Out[18]:

	sentiment
0	Negative
1	Negative
2	Negative
3	Negative
4	Negative
...	...
436	Positive
437	Negative
438	Negative
439	Neutral
440	Neutral

441 rows × 1 columns

In [19]:

# Saving the pandas dataframe
!rm -rf assets
!mkdir assets
submission.to_csv(os.path.join("assets", "submission.csv"), index=False)

Submitting our Predictions¶

Note : Please save the notebook before submitting it (Ctrl + S)

In [20]:

!!aicrowd submission create -c debat -f assets/submission.csv

Out[20]:

['submission.csv ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 5.6/3.9 KB • ? • 0:00:00',
 '                                  ╭─────────────────────────╮                                  ',
 '                                  │ Successfully submitted! │                                  ',
 '                                  ╰─────────────────────────╯                                  ',
 '                                        Important links                                        ',
 '┌──────────────────┬──────────────────────────────────────────────────────────────────────────┐',
 '│  This submission │ https://www.aicrowd.com/challenges/debat/submissions/172195              │',
 '│                  │                                                                          │',
 '│  All submissions │ https://www.aicrowd.com/challenges/debat/submissions?my_submissions=true │',
 '│                  │                                                                          │',
 '│      Leaderboard │ https://www.aicrowd.com/challenges/debat/leaderboards                    │',
 '│                  │                                                                          │',
 '│ Discussion forum │ https://discourse.aicrowd.com/c/debat                                    │',
 '│                  │                                                                          │',
 '│   Challenge page │ https://www.aicrowd.com/challenges/debat                                 │',
 '└──────────────────┴──────────────────────────────────────────────────────────────────────────┘',
 "{'submission_id': 172195, 'created_at': '2022-01-16T15:33:36.629Z'}"]

In [ ]:

DEBAT

[Getting Started Notebook] DEBAT Challange