Introduction
See detailed instructions on the course github, including the PDF project description.
Dataset
File descriptions -
- train_pos.txt and train_neg.txt - a small set of training tweets for each of the two classes. (Dataset available in the zip file, see link below)
- train_pos_full.txt and train_neg_full.txt - a complete set of training tweets for each of the two classes, about 1M tweets per class. (Dataset available in the zip file, see link below)
- test_data.txt - the test set, that is the tweets for which you have to predict the sentiment label.
- sampleSubmission.csv - a sample submission file in the correct format, note that each test tweet is numbered. (submission of predictions: -1 = negative prediction, 1 = positive prediction)
Note that all tweets have been tokenized already, so that the words and punctuation are properly separated by a whitespace.
Evaluation Criteria
Your submission will be evaluated in terms of classification error (accuracy).
Rules
Each participant is allowed to make 5 submissions per day. If you participate as a team, the whole team gets 5 submissions, not 15 as the rules page states. Failed submissions (e.g. wrong submission file format) do not count.