Loading

DEBAT

[Getting Started Notebook] DEBAT Challange

This is a Baseline Code to get you started with the challenge.

gauransh_k

You can use this code to start understanding the data and create a baseline model for further imporvments.

Starter Code for DEBAT Practice Challange

Note : Create a copy of the notebook and use the copy for submission. Go to File > Save a Copy in Drive to create a new copy

Downloading Dataset

Installing aicrowd-cli

In [1]:
!pip install aicrowd-cli
%load_ext aicrowd.magic
Requirement already satisfied: aicrowd-cli in /home/gauransh/anaconda3/lib/python3.8/site-packages (0.1.10)
Requirement already satisfied: GitPython==3.1.18 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (3.1.18)
Requirement already satisfied: toml<1,>=0.10.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.10.2)
Requirement already satisfied: requests<3,>=2.25.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (2.26.0)
Requirement already satisfied: pyzmq==22.1.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (22.1.0)
Requirement already satisfied: tqdm<5,>=4.56.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (4.62.2)
Requirement already satisfied: rich<11,>=10.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (10.15.2)
Requirement already satisfied: click<8,>=7.1.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (7.1.2)
Requirement already satisfied: requests-toolbelt<1,>=0.9.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.9.1)
Requirement already satisfied: gitdb<5,>=4.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from GitPython==3.1.18->aicrowd-cli) (4.0.9)
Requirement already satisfied: smmap<6,>=3.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->GitPython==3.1.18->aicrowd-cli) (5.0.0)
Requirement already satisfied: idna<4,>=2.5 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (3.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.26.6)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.10.8)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.10.0)
Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.9.1)
Requirement already satisfied: colorama<0.5.0,>=0.4.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.4.4)
In [2]:
%aicrowd login
Please login here: https://api.aicrowd.com/auth/ycr0lzTcSe7Duw_0RaBdj5TPgYcvBuTxKsAiAda5S6c
Opening in existing browser session.
API Key valid
Saved API Key successfully!
In [3]:
!rm -rf data
!mkdir data
%aicrowd ds dl -c debat -o data

Importing Libraries

In this baseline, we will be using skleanr library to train the model and generate the predictions

In [4]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import normalize, LabelEncoder
from scipy.sparse import hstack
import os
from IPython.display import display

Reading the dataset

Here, we will read the train.csv which contains both training samples & labels, and test.csv which contains testing samples.

In [5]:
# Reading the CSV
# name=["unit_id", "golden_or_not", "unit_state", "trusted_judgments", "last_judgment_at", "agree_or_not_variance", "sentence", "agree_or_not"]
train_data_df = pd.read_csv("data/train.csv")
test_data_df = pd.read_csv("data/test.csv")

# train_data.shape, test_data.shape
display(train_data_df.head())
display(test_data_df.head())
print(train_data_df.shape, test_data_df.shape)
candidate candidate:confidence relevant_yn relevant_yn:confidence sentiment:confidence subject_matter subject_matter:confidence name retweet_count text tweet_created tweet_id sentiment
0 Donald Trump 1.0 yes 1.0 0.6515 Abortion 1.0 jeannetteeee12 236 RT @feministabulous: Trump can say "I hate the... 8/7/15 9:35 6.296920e+17 Negative
1 No candidate mentioned 1.0 yes 1.0 1.0000 LGBT issues 1.0 censoredbee 1 RT @gearhead81: #GOPDebate so the debate for P... 8/7/15 9:48 6.296960e+17 Negative
2 No candidate mentioned 1.0 yes 1.0 0.6905 None of the above 1.0 chvsr 0 #Repeal was the word of the night. #GOPDebate 8/7/15 9:51 6.296960e+17 Neutral
3 Scott Walker 1.0 yes 1.0 0.6279 Jobs and Economy 1.0 RedRoadRail 0 Scott Walker's Wisconsin Is Seeing The Fastest... 8/7/15 9:47 6.296950e+17 Positive
4 Marco Rubio 1.0 yes 1.0 1.0000 None of the above 1.0 craigcarroll 0 #GOPDebate\nMegyn Kelly:A+\nRubio:A\nFiorina:A... 8/7/15 9:45 6.296950e+17 Positive
candidate candidate:confidence relevant_yn relevant_yn:confidence sentiment:confidence subject_matter subject_matter:confidence name retweet_count text tweet_created tweet_id
0 No candidate mentioned 1.0 yes 1.0 1.0000 Foreign Policy 0.3571 jazzicattt 276 RT @amaraconda: ISIS is not islam jfc republic... 8/7/15 9:41 6.296940e+17
1 Donald Trump 1.0 yes 1.0 1.0000 None of the above 1.0000 4closureNation2 1216 RT @pattonoswalt: If Trump is against "p.c. cu... 8/7/15 9:52 6.296970e+17
2 No candidate mentioned 1.0 yes 1.0 0.6484 None of the above 1.0000 JamesComtois 0 Dorp. #GOPDebate https://t.co/cy9C8nvS5y 8/7/15 9:33 6.296920e+17
3 Donald Trump 1.0 yes 1.0 1.0000 Women's Issues (not abortion though) 1.0000 corinne_fal 0 Donald Trump just gave a master class on how t... 8/7/15 9:38 6.296930e+17
4 No candidate mentioned 1.0 yes 1.0 0.6813 Religion 0.6703 DeityFree 12 RT @MrPolyatheist: Because you know, god is re... 8/7/15 9:45 6.296950e+17
(1764, 13) (441, 12)

Data Preprocessing

In the preprocessing we have a lot of textual data so we will first One-Hot Encode the Possible Features and use TF IDF Tokens to convert the sentence to a possible feature and use it in the regression.

In [6]:
# removing some unneccesary data
train_data_df.drop(['tweet_id', 'tweet_created', 'name'], axis=1, inplace=True)
test_data_df.drop(['tweet_id', 'tweet_created', 'name'], axis=1, inplace=True)
In [7]:
# utility function to one hot encode the dataset
def one_hot_df(df):
    df = pd.concat([df, pd.get_dummies(df["candidate"])],axis=1)
    df.drop("candidate",axis=1, inplace=True)
    df = pd.concat([df, pd.get_dummies(df["subject_matter"])],axis=1)
    df.drop("subject_matter",axis=1, inplace=True)
    df = pd.concat([df, pd.get_dummies(df["relevant_yn"])],axis=1)
    df.drop("relevant_yn",axis=1, inplace=True)
    return df
In [8]:
train_data_df = one_hot_df(train_data_df)
test_data_df = one_hot_df(test_data_df)
display(train_data_df)
display(test_data_df)
candidate:confidence relevant_yn:confidence sentiment:confidence subject_matter:confidence retweet_count text sentiment Ben Carson Chris Christie Donald Trump ... Gun Control Healthcare (including Medicare) Immigration Jobs and Economy LGBT issues None of the above Racial issues Religion Women's Issues (not abortion though) yes
0 1.0000 1.0000 0.6515 1.0000 236 RT @feministabulous: Trump can say "I hate the... Negative 0 0 1 ... 0 0 0 0 0 0 0 0 0 1
1 1.0000 1.0000 1.0000 1.0000 1 RT @gearhead81: #GOPDebate so the debate for P... Negative 0 0 0 ... 0 0 0 0 1 0 0 0 0 1
2 1.0000 1.0000 0.6905 1.0000 0 #Repeal was the word of the night. #GOPDebate Neutral 0 0 0 ... 0 0 0 0 0 1 0 0 0 1
3 1.0000 1.0000 0.6279 1.0000 0 Scott Walker's Wisconsin Is Seeing The Fastest... Positive 0 0 0 ... 0 0 0 1 0 0 0 0 0 1
4 1.0000 1.0000 1.0000 1.0000 0 #GOPDebate\nMegyn Kelly:A+\nRubio:A\nFiorina:A... Positive 0 0 0 ... 0 0 0 0 0 1 0 0 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1759 1.0000 1.0000 0.6745 1.0000 124 RT @CharlotteAbotsi: Today is the 50th anniver... Neutral 0 0 0 ... 0 0 0 0 0 1 0 0 0 1
1760 1.0000 1.0000 1.0000 1.0000 9 RT @mrdaveyd: So donald Trump says everyone is... Negative 0 0 1 ... 0 0 0 0 0 1 0 0 0 1
1761 1.0000 1.0000 1.0000 1.0000 27 RT @TUSK81: Not one candidate spoke up against... Negative 0 0 1 ... 0 0 0 0 0 0 1 0 0 1
1762 1.0000 1.0000 0.6517 1.0000 0 established #GOPDebate afraid someone willing ... Negative 0 0 0 ... 0 0 0 0 0 1 0 0 0 1
1763 0.4589 0.6774 0.6774 0.4589 0 ReTw EmotientInc: Emotion-Reading Technology F... Neutral 0 0 0 ... 0 0 0 0 0 1 0 0 0 1

1764 rows × 31 columns

candidate:confidence relevant_yn:confidence sentiment:confidence subject_matter:confidence retweet_count text Ben Carson Chris Christie Donald Trump Jeb Bush ... Gun Control Healthcare (including Medicare) Immigration Jobs and Economy LGBT issues None of the above Racial issues Religion Women's Issues (not abortion though) yes
0 1.0000 1.0000 1.0000 0.3571 276 RT @amaraconda: ISIS is not islam jfc republic... 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1
1 1.0000 1.0000 1.0000 1.0000 1216 RT @pattonoswalt: If Trump is against "p.c. cu... 0 0 1 0 ... 0 0 0 0 0 1 0 0 0 1
2 1.0000 1.0000 0.6484 1.0000 0 Dorp. #GOPDebate https://t.co/cy9C8nvS5y 0 0 0 0 ... 0 0 0 0 0 1 0 0 0 1
3 1.0000 1.0000 1.0000 1.0000 0 Donald Trump just gave a master class on how t... 0 0 1 0 ... 0 0 0 0 0 0 0 0 1 1
4 1.0000 1.0000 0.6813 0.6703 12 RT @MrPolyatheist: Because you know, god is re... 0 0 0 0 ... 0 0 0 0 0 0 0 1 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
436 1.0000 1.0000 1.0000 1.0000 0 17 candidates + #Trump makes the #GOPDebate in... 0 0 1 0 ... 0 0 0 0 0 1 0 0 0 1
437 1.0000 1.0000 1.0000 1.0000 802 RT @DanScavino: .@realDonaldTrump wins 1st #GO... 0 0 1 0 ... 0 0 0 0 0 1 0 0 0 1
438 1.0000 1.0000 0.6435 1.0000 0 Trump's lead shows just how low the Bar is for... 0 0 1 0 ... 0 0 0 0 0 1 0 0 0 1
439 1.0000 1.0000 1.0000 0.6633 4 RT @pppatticake: . @jojokejohn @milesjreed @Mi... 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1
440 0.4698 0.6854 0.3483 0.4698 0 Who's the real illegal alien AMERICA #GOPDeba... 0 0 0 0 ... 0 0 1 0 0 0 0 0 0 1

441 rows × 30 columns

Transfroming Train Data for Submission

In [9]:
# For beginning, transform train_data_df['sentence'] to lowercase using text.lower()
train_data_df['text'].str.lower()

# Then replace everything except the letters and numbers in the spaces.
# it will facilitate the further division of the text into words.
train_data_df['text'].replace('[^a-zA-Z0-9]', ' ', regex = True)

# Convert a collection of raw documents to a matrix of TF-IDF features with TfidfVectorizer
vectorizer = TfidfVectorizer(min_df=5)
X_tfidf = vectorizer.fit_transform(train_data_df['text']) 

# merging final features to the Dataframe and removing the redundent colums
train_data_df = pd.concat([train_data_df,pd.DataFrame(X_tfidf.toarray())], axis=1)
train_data_df.drop("text", axis=1, inplace=True)
display(train_data_df)
candidate:confidence relevant_yn:confidence sentiment:confidence subject_matter:confidence retweet_count sentiment Ben Carson Chris Christie Donald Trump Jeb Bush ... 764 765 766 767 768 769 770 771 772 773
0 1.0000 1.0000 0.6515 1.0000 236 Negative 0 0 1 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 1.0000 1.0000 1.0000 1.0000 1 Negative 0 0 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 1.0000 1.0000 0.6905 1.0000 0 Neutral 0 0 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 1.0000 1.0000 0.6279 1.0000 0 Positive 0 0 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 1.0000 1.0000 1.0000 1.0000 0 Positive 0 0 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1759 1.0000 1.0000 0.6745 1.0000 124 Neutral 0 0 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1760 1.0000 1.0000 1.0000 1.0000 9 Negative 0 0 1 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1761 1.0000 1.0000 1.0000 1.0000 27 Negative 0 0 1 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1762 1.0000 1.0000 0.6517 1.0000 0 Negative 0 0 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1763 0.4589 0.6774 0.6774 0.4589 0 Neutral 0 0 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

1764 rows × 804 columns

In [10]:
# Separating data from the dataframe for final training
X = normalize(train_data_df.drop(["sentiment"], axis=1).to_numpy())
lable_encoder = LabelEncoder()
lable_encoder = lable_encoder.fit(train_data_df.sentiment)
train_data_df.sentiment = lable_encoder.transform(train_data_df.sentiment)
y = train_data_df.sentiment.to_numpy()
print(X.shape, y.shape)
(1764, 803) (1764,)

Transfroming Test Data for Submission

In [11]:
# For beginning, transform test_data_df['sentence'] to lowercase using text.lower()
test_data_df['text'].str.lower()

# Then replace everything except the letters and numbers in the spaces.
# it will facilitate the further division of the text into words.
test_data_df['text'].replace('[^a-zA-Z0-9]', ' ', regex = True)

# Convert a collection of raw documents to a matrix of TF-IDF features with TfidfVectorizer
X_tfidf_test = vectorizer.transform(test_data_df['text']) 

# merging final features to the Dataframe and removing the redundent colums
test_data_df = pd.concat([test_data_df,pd.DataFrame(X_tfidf_test.toarray())], axis=1)
test_data_df.drop("text", axis=1, inplace=True)
display(test_data_df)
candidate:confidence relevant_yn:confidence sentiment:confidence subject_matter:confidence retweet_count Ben Carson Chris Christie Donald Trump Jeb Bush John Kasich ... 764 765 766 767 768 769 770 771 772 773
0 1.0000 1.0000 1.0000 0.3571 276 0 0 0 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
1 1.0000 1.0000 1.0000 1.0000 1216 0 0 1 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
2 1.0000 1.0000 0.6484 1.0000 0 0 0 0 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
3 1.0000 1.0000 1.0000 1.0000 0 0 0 1 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
4 1.0000 1.0000 0.6813 0.6703 12 0 0 0 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.201117 0.0 0.000000 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
436 1.0000 1.0000 1.0000 1.0000 0 0 0 1 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
437 1.0000 1.0000 1.0000 1.0000 802 0 0 1 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
438 1.0000 1.0000 0.6435 1.0000 0 0 0 1 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
439 1.0000 1.0000 1.0000 0.6633 4 0 0 0 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0
440 0.4698 0.6854 0.3483 0.4698 0 0 0 0 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.275911 0.0

441 rows × 803 columns

Splitting the data

In [12]:
# Splitting the training set, and training & validation
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
print(X_train.shape)
print(y_train.shape)
(1411, 803)
(1411,)
In [13]:
X_train[0], y_train[0]
Out[13]:
(array([0.35355339, 0.35355339, 0.35355339, 0.35355339, 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.35355339, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.35355339,
        0.        , 0.        , 0.        , 0.35355339, 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.09014736, 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.02682314, 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.10007157, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.08005099,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.06134699, 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.101034  , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.11254785,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.12372076, 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.16187724, 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.04281326, 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.09326223, 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.14509054, 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        ]),
 0)

Training the Model

In [14]:
model = MLPClassifier()
model.fit(X_train, y_train)
/home/gauransh/anaconda3/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:614: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
  warnings.warn(
Out[14]:
MLPClassifier()

Validation

In [15]:
model.score(X_val, y_val)
Out[15]:
0.5835694050991501

So, we are done with the baseline let's test with real testing data and see how we submit it to challange.

Predictions

In [16]:
# Separating data from the dataframe for final testing
X_test = normalize(test_data_df.to_numpy())
print(X_test.shape)
(441, 803)
In [17]:
# Predicting the labels
predictions = model.predict(X_test)
predictions = lable_encoder.inverse_transform(predictions)
In [18]:
# Converting the predictions array into pandas dataset
submission = pd.DataFrame({"sentiment":predictions})
submission
Out[18]:
sentiment
0 Negative
1 Negative
2 Negative
3 Negative
4 Negative
... ...
436 Positive
437 Negative
438 Negative
439 Neutral
440 Neutral

441 rows × 1 columns

In [19]:
# Saving the pandas dataframe
!rm -rf assets
!mkdir assets
submission.to_csv(os.path.join("assets", "submission.csv"), index=False)

Submitting our Predictions

Note : Please save the notebook before submitting it (Ctrl + S)

In [20]:
!!aicrowd submission create -c debat -f assets/submission.csv
Out[20]:
['submission.csv ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 5.6/3.9 KB • ? • 0:00:00',
 '                                  ╭─────────────────────────╮                                  ',
 '                                  │ Successfully submitted! │                                  ',
 '                                  ╰─────────────────────────╯                                  ',
 '                                        Important links                                        ',
 '┌──────────────────┬──────────────────────────────────────────────────────────────────────────┐',
 '│  This submission │ https://www.aicrowd.com/challenges/debat/submissions/172195              │',
 '│                  │                                                                          │',
 '│  All submissions │ https://www.aicrowd.com/challenges/debat/submissions?my_submissions=true │',
 '│                  │                                                                          │',
 '│      Leaderboard │ https://www.aicrowd.com/challenges/debat/leaderboards                    │',
 '│                  │                                                                          │',
 '│ Discussion forum │ https://discourse.aicrowd.com/c/debat                                    │',
 '│                  │                                                                          │',
 '│   Challenge page │ https://www.aicrowd.com/challenges/debat                                 │',
 '└──────────────────┴──────────────────────────────────────────────────────────────────────────┘',
 "{'submission_id': 172195, 'created_at': '2022-01-16T15:33:36.629Z'}"]
In [ ]:


Comments

You must login before you can post a comment.

Execute