π Welcome thread | π₯ Looking for teammates? | π Easy-2-Follow Code Notebooks
π Don't forget to participate in the Community Contribution Prize!
Introduction
Through the previous puzzle of Emotional Detection, you performed a binary classification task. With this puzzle, we are leveling up and going to perform a multi-class classification. Your input dataset consists of text taken from research papers. You need to build a model which will correctly classify this with a label from 0 to 3.
To solve this challenge, you will be using the concepts of LSTM and Vectorization while employing Tensorflow.
Now, what is LSTM?!
πͺ Getting Started
Long Short-Term Memory (LSTM) is a type of recurrent neural network (RNN) that can learn sequence to sequence tasks such as texts. Unlike most feedforward neural networks, LSTM has a feedback connection that helps LSTM to retain the previous information of a text to be able to predict the next set of texts. Read more about the concept of LSTM over here.
Word Vectorization is the second process used in this challenge. Simply put, it converts words into numbers. Why? Because converting words into numbers helps in word prediction and word similarity and semantics. Know more about the concept here.
To solve this challenge, you need to convert text into tokens and encode them using Vectorization. After this, we will train the Tensorflow model with LSTM layers. Test and submit the results to get your score.
AIcrowd's easy-to-use baseline has a breakdown of all the tools and codes required to get started. Find the starter code-kit here.
πΎ Dataset
The dataset is fairly easy to understand, again! in any training/validation dataset, there will be two columns - text & label. The text is the abstract from the research papers and the label column represents the category that the research paper falls in.
text | label |
Estimating 3D hand meshes from single RGB ...... Each technical component above meaningfully improves the accuracy in the ablation study. |
2 |
The emergence of collective ...... classes and overlapping structures of data. | 0 |
The label categories are as follows - Artificial Intelligence, Machine Learning, Robotics, Computer Vision.
π Files
Following files are available in the resources
section:
-
train.csv
- (31499
samples) This CSV file containing a text column as the sentence and a label column as the category of the research paper. -
val.csv
- (2699
samples) This CSV file containing a text column as the sentence and a label column as the emotion of the category of the research paper. -
test.csv
- (10799
samples) This CSV file containing a text column as the sentence and a label column containing the category of the research paper. This file also serves the purpose ofsample_submission.csv
π Submission
- Creating a submission directory
- Use
test.csv
and fill the corresponding labels. - Save the test.csv in the submission directory. The name of the above file should be
submission.csv
. - Inside a submission directory, put the .ipynb notebook from which you trained the model and made inference and save it as
original_notebook.ipynb
.
Overall, this is what your submission directory should look like -
- Zip the submission directory!
Make your first submission here π !!
π Evaluation Criteria
During the evaluation, the F1 score ( weighted average ) and Accuracy Score will be used to test the efficiency of the model where,
\(x = {-b \pm \sqrt{b^2-4ac} \over 2a}\)
π Links
- πͺ Challenge Page: https://www.aicrowd.com/challenges/research-paper-classification
- π£οΈ Discussion Forum: https://www.aicrowd.com/challenges/research-paper-classification/discussion
- π Leaderboard: https://www.aicrowd.com/challenges/research-paper-classification/leaderboards
π± Contact
Notebooks
0
|
0
|
|
1
|
0
|
|
0
|
0
|
|
7
|
0
|
|
1
|
0
|