Loading
2019: Completed 2020: Completed 2021: Completed 2022: Completed 2023: Completed 2024: 16 hours left ยท Ending 19 Dec 15:00 UTC #classroom
16.8k
765
113
4191

Introduction

See detailed instructions on the course github, including the PDF project description.

Dataset

File descriptions -

  • train_pos.txt and train_neg.txt - a small set of training tweets for each of the two classes. (Dataset available in the zip file, see link below)
  • train_pos_full.txt and train_neg_full.txt - a complete set of training tweets for each of the two classes, about 1M tweets per class. (Dataset available in the zip file, see link below)
  • test_data.txt - the test set, that is the tweets for which you have to predict the sentiment label.
  • sampleSubmission.csv - a sample submission file in the correct format, note that each test tweet is numbered. (submission of predictions: -1 = negative prediction, 1 = positive prediction)

Note that all tweets have been tokenized already, so that the words and punctuation are properly separated by a whitespace.

Evaluation Criteria

Your submission will be evaluated in terms of classification error (accuracy).

Rules

Each participant is allowed to make 5 submissions per day. If you participate as a team, the whole team gets 5 submissions, not 15 as the rules page states. Failed submissions (e.g. wrong submission file format) do not count.