De-Shuffling Text
Solution for submission 147097
A detailed solution for submission 147097 submitted for challenge De-Shuffling Text
Downloading Dataset¶
AIcrowd had a recent addition that allows you to directly download the dataset from any challenge using AIcrowd CLI.
So we will first need to download the python library by AIcrowd that will allow us to download the dataset by just inputting the API key.
!pip install aicrowd-cli
Collecting aicrowd-cli Downloading https://files.pythonhosted.org/packages/1f/57/59b5a00c6e90c9cc028b3da9dff90e242ad2847e735b1a0e81a21c616e27/aicrowd_cli-0.1.7-py3-none-any.whl (49kB) |████████████████████████████████| 51kB 4.0MB/s Collecting gitpython<4,>=3.1.12 Downloading https://files.pythonhosted.org/packages/a6/99/98019716955ba243657daedd1de8f3a88ca1f5b75057c38e959db22fb87b/GitPython-3.1.14-py3-none-any.whl (159kB) |████████████████████████████████| 163kB 10.8MB/s Collecting requests-toolbelt<1,>=0.9.1 Downloading https://files.pythonhosted.org/packages/60/ef/7681134338fc097acef8d9b2f8abe0458e4d87559c689a8c306d0957ece5/requests_toolbelt-0.9.1-py2.py3-none-any.whl (54kB) |████████████████████████████████| 61kB 8.5MB/s Collecting rich<11,>=10.0.0 Downloading https://files.pythonhosted.org/packages/a8/32/eb8aadb1ed791081e5c773bd1dfa15f1a71788fbeda37b12f837f2b1999b/rich-10.3.0-py3-none-any.whl (205kB) |████████████████████████████████| 215kB 12.1MB/s Collecting requests<3,>=2.25.1 Downloading https://files.pythonhosted.org/packages/29/c1/24814557f1d22c56d50280771a17307e6bf87b70727d975fd6b2ce6b014a/requests-2.25.1-py2.py3-none-any.whl (61kB) |████████████████████████████████| 61kB 7.6MB/s Collecting tqdm<5,>=4.56.0 Downloading https://files.pythonhosted.org/packages/b4/20/9f1e974bb4761128fc0d0a32813eaa92827309b1756c4b892d28adfb4415/tqdm-4.61.1-py2.py3-none-any.whl (75kB) |████████████████████████████████| 81kB 7.9MB/s Requirement already satisfied: toml<1,>=0.10.2 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (0.10.2) Requirement already satisfied: click<8,>=7.1.2 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (7.1.2) Collecting gitdb<5,>=4.0.1 Downloading https://files.pythonhosted.org/packages/ea/e8/f414d1a4f0bbc668ed441f74f44c116d9816833a48bf81d22b697090dba8/gitdb-4.0.7-py3-none-any.whl (63kB) |████████████████████████████████| 71kB 9.3MB/s Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /usr/local/lib/python3.7/dist-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.6.1) Collecting colorama<0.5.0,>=0.4.0 Downloading https://files.pythonhosted.org/packages/44/98/5b86278fbbf250d239ae0ecb724f8572af1c91f4a11edf4d36a206189440/colorama-0.4.4-py2.py3-none-any.whl Requirement already satisfied: typing-extensions<4.0.0,>=3.7.4; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from rich<11,>=10.0.0->aicrowd-cli) (3.7.4.3) Collecting commonmark<0.10.0,>=0.9.0 Downloading https://files.pythonhosted.org/packages/b1/92/dfd892312d822f36c55366118b95d914e5f16de11044a27cf10a7d71bbbf/commonmark-0.9.1-py2.py3-none-any.whl (51kB) |████████████████████████████████| 51kB 7.0MB/s Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.24.3) Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (3.0.4) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.5.30) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.10) Collecting smmap<5,>=3.0.1 Downloading https://files.pythonhosted.org/packages/68/ee/d540eb5e5996eb81c26ceffac6ee49041d473bc5125f2aa995cf51ec1cf1/smmap-4.0.0-py2.py3-none-any.whl ERROR: google-colab 1.0.0 has requirement requests~=2.23.0, but you'll have requests 2.25.1 which is incompatible. ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible. Installing collected packages: smmap, gitdb, gitpython, requests, requests-toolbelt, colorama, commonmark, rich, tqdm, aicrowd-cli Found existing installation: requests 2.23.0 Uninstalling requests-2.23.0: Successfully uninstalled requests-2.23.0 Found existing installation: tqdm 4.41.1 Uninstalling tqdm-4.41.1: Successfully uninstalled tqdm-4.41.1 Successfully installed aicrowd-cli-0.1.7 colorama-0.4.4 commonmark-0.9.1 gitdb-4.0.7 gitpython-3.1.14 requests-2.25.1 requests-toolbelt-0.9.1 rich-10.3.0 smmap-4.0.0 tqdm-4.61.1
API Key valid Saved API Key successfully!
# Downloading the Dataset
!rm -rf data
!mkdir data
val.csv: 100% 714k/714k [00:00<00:00, 6.59MB/s] test.csv: 100% 1.83M/1.83M [00:00<00:00, 11.4MB/s] train.csv: 100% 7.00M/7.00M [00:00<00:00, 34.0MB/s]
Downloading & Importing Libraries¶
Here we are going to use HuggingFace 🤗 ! HuggingFace is a fast growing startup based on Natural Language Processing providing many open source Natueal Language Processing libraries including transforms, for using start-of-the-art transformers for Natural Language Processing tasks and datasets containing datasets of NLP, and many evaluation metrics.
# Installing
!pip install datasets transformers rich
Collecting datasets Downloading https://files.pythonhosted.org/packages/08/a2/d4e1024c891506e1cee8f9d719d20831bac31cb5b7416983c4d2f65a6287/datasets-1.8.0-py3-none-any.whl (237kB) |████████████████████████████████| 245kB 6.6MB/s Collecting transformers Downloading https://files.pythonhosted.org/packages/d5/43/cfe4ee779bbd6a678ac6a97c5a5cdeb03c35f9eaebbb9720b036680f9a2d/transformers-4.6.1-py3-none-any.whl (2.2MB) |████████████████████████████████| 2.3MB 41.1MB/s Requirement already satisfied: rich in /usr/local/lib/python3.7/dist-packages (10.3.0) Requirement already satisfied: pyarrow<4.0.0,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from datasets) (3.0.0) Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.7/dist-packages (from datasets) (2.25.1) Collecting fsspec Downloading https://files.pythonhosted.org/packages/8e/d2/d05466997f7751a2c06a7a416b7d1f131d765f7916698d3fdcb3a4d037e5/fsspec-2021.6.0-py3-none-any.whl (114kB) |████████████████████████████████| 122kB 51.1MB/s Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from datasets) (1.19.5) Collecting xxhash Downloading https://files.pythonhosted.org/packages/7d/4f/0a862cad26aa2ed7a7cd87178cbbfa824fc1383e472d63596a0d018374e7/xxhash-2.0.2-cp37-cp37m-manylinux2010_x86_64.whl (243kB) |████████████████████████████████| 245kB 22.2MB/s Requirement already satisfied: dill in /usr/local/lib/python3.7/dist-packages (from datasets) (0.3.3) Collecting tqdm<4.50.0,>=4.27 Downloading https://files.pythonhosted.org/packages/73/d5/f220e0c69b2f346b5649b66abebb391df1a00a59997a7ccf823325bd7a3e/tqdm-4.49.0-py2.py3-none-any.whl (69kB) |████████████████████████████████| 71kB 9.1MB/s Collecting huggingface-hub<0.1.0 Downloading https://files.pythonhosted.org/packages/3c/e3/fb7b6aefaf0fc7b792cebbbd590b1895c022ab0ff27f389e1019c6f2e68a/huggingface_hub-0.0.10-py3-none-any.whl Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from datasets) (1.1.5) Requirement already satisfied: importlib-metadata; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from datasets) (4.5.0) Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from datasets) (20.9) Requirement already satisfied: multiprocess in /usr/local/lib/python3.7/dist-packages (from datasets) (0.70.11.1) Collecting sacremoses Downloading https://files.pythonhosted.org/packages/75/ee/67241dc87f266093c533a2d4d3d69438e57d7a90abb216fa076e7d475d4a/sacremoses-0.0.45-py3-none-any.whl (895kB) |████████████████████████████████| 901kB 38.1MB/s Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers) (3.0.12) Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (2019.12.20) Collecting tokenizers<0.11,>=0.10.1 Downloading https://files.pythonhosted.org/packages/d4/e2/df3543e8ffdab68f5acc73f613de9c2b155ac47f162e725dcac87c521c11/tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3MB) |████████████████████████████████| 3.3MB 20.3MB/s Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /usr/local/lib/python3.7/dist-packages (from rich) (2.6.1) Requirement already satisfied: typing-extensions<4.0.0,>=3.7.4; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from rich) (3.7.4.3) Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /usr/local/lib/python3.7/dist-packages (from rich) (0.9.1) Requirement already satisfied: colorama<0.5.0,>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from rich) (0.4.4) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (2.10) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (1.24.3) Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (3.0.4) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (2021.5.30) Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas->datasets) (2018.9) Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas->datasets) (2.8.1) Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->datasets) (3.4.1) Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->datasets) (2.4.7) Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.15.0) Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.0.1) Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (7.1.2) ERROR: aicrowd-cli 0.1.7 has requirement tqdm<5,>=4.56.0, but you'll have tqdm 4.49.0 which is incompatible. ERROR: transformers 4.6.1 has requirement huggingface-hub==0.0.8, but you'll have huggingface-hub 0.0.10 which is incompatible. Installing collected packages: fsspec, xxhash, tqdm, huggingface-hub, datasets, sacremoses, tokenizers, transformers Found existing installation: tqdm 4.61.1 Uninstalling tqdm-4.61.1: Successfully uninstalled tqdm-4.61.1 Successfully installed datasets-1.8.0 fsspec-2021.6.0 huggingface-hub-0.0.10 sacremoses-0.0.45 tokenizers-0.10.3 tqdm-4.49.0 transformers-4.6.1 xxhash-2.0.2
# Importing Libraries
import pandas as pd
import numpy as np
import os
import torch
import datasets
from datasets import load_dataset
from transformers import EncoderDecoderModel, EncoderDecoderConfig, BertTokenizerFast, Seq2SeqTrainingArguments, Seq2SeqTrainer, BertConfig
# To make cell output more beautiful!
from rich.console import Console
from rich.table import Table
from rich import pretty
pretty.install()
# function to display YouTube videos
from IPython.display import YouTubeVideo
from transformers import RobertaTokenizerFast, RobertaConfig
Reading Dataset¶
Reading the necessary files to train, validation & submit our results!
train_dataset = pd.read_csv("data/train.csv")
validation_dataset = pd.read_csv("data/val.csv")
test_dataset = pd.read_csv("data/test.csv")
train_dataset
text | label | |
---|---|---|
0 | presented here Furthermore, naive improved. im... | Furthermore, the naive implementation presente... |
1 | vector a in a form vector multidimensional spa... | Those coefficients form a vector in a multidim... |
2 | compatible of The model with recent is model s... | The model is compatible with a recent model of... |
3 | but relevance outlined. hemodynamics its based... | The model is based on electrophysiology, but i... |
4 | of transitions lever-like involve reorientatio... | Conformational transitions in macromolecular c... |
... | ... | ... |
39996 | for pose a clutter. estimation autonomous mani... | Object pose estimation is a crucial prerequisi... |
39997 | objects added warehouses bin-picking present s... | Real-world bin-picking settings such as wareho... |
39998 | validation The proposed real-world on is metho... | The proposed method is evaluated on a syntheti... |
39999 | a between modes. statistics survival qualitati... | This breakdown is associated with a crossover ... |
40000 | a image super of resolution great low-level vi... | Single image super resolution is of great impo... |
40001 rows × 2 columns
As we can see the text column contains senteces with shuffled words and the label column contining the corresponding sentences in correct form.
In the below cell, we are using huggingface's dataset library to load our dataset, as you will see, this will help a lot in preprocessing our texts, and creating batches to put into the transformer model!
# Loading the training, validation and testing dataset
dataset = load_dataset('csv', data_files={"train" : ["data/train.csv"],
"validation": ["data/val.csv"],
"test" : ["data/test.csv"]})
dataset
Using custom data configuration default-e7ff9ec3ded62018
Downloading and preparing dataset csv/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/csv/default-e7ff9ec3ded62018/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0...
Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/csv/default-e7ff9ec3ded62018/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0. Subsequent calls will reuse this data.
DatasetDict({ train: Dataset({ features: ['text', 'label'], num_rows: 40001 }) validation: Dataset({ features: ['label', 'text'], num_rows: 4001 }) test: Dataset({ features: ['label', 'text'], num_rows: 10000 }) })
Preprocessing the dataset 🏭¶
There's a lot going on in below cell, let's debrief one by one!
Understanding BERT Tokenizer¶
Now, we have already talked about what is tokenizer before, but we are going to covert a very little bit ot again and we are also getting some new things too things, so let's start -
# So, we importing our tokenizer using transformers library, now the bert-base-uncased is the pretrained bert model, ( here uncased means that it is training using only lowercase words)
# So, we will be using bert-base-uncased as our pretrained model for tokenization
tokenizer = RobertaTokenizerFast.from_pretrained("xlm-roberta-base")
tokenizer
PreTrainedTokenizerFast(name_or_path='xlm-roberta-base', vocab_size=250002, model_max_len=1000000000000000019884624838656, is_fast=True, padding_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'sep_token': '</s>', 'pad_token': '<pad>', 'cls_token': '<s>', 'mask_token': AddedToken("<mask>", rstrip=False, lstrip=True, single_word=False, normalized=False)})
sample_text = train_dataset['label'][0]
sample_text
'Furthermore, the naive implementation presented here can be improved.'
# Splitting the inputted sentece into words
tokens = tokenizer.tokenize(sample_text)
# Convert the unique words into a specific number which is already mapped by the pretrained bert model
token_ids = tokenizer.convert_tokens_to_ids(tokens)
print("Sample Text : ", sample_text)
print("Tokens : ", tokens)
print("Token Ids : ", token_ids)
Sample Text : Furthermore, the naive implementation presented here can be improved. Tokens : ['▁Fur', 'ther', 'more', ',', '▁the', '▁na', 'ive', '▁implementation', '▁presente', 'd', '▁here', '▁can', '▁be', '▁improve', 'd', '.'] Token Ids : [27766, 9319, 17678, 4, 70, 24, 5844, 208124, 8121, 71, 3688, 831, 186, 52295, 71, 5]
# To convert a token ids back to origin string, we can simply use
output_text = tokenizer.decode(token_ids)
output_text
'Furthermore, the naive implementation presented here can be improved.'
Creating the Dataset¶
# To have consistent dimensions of output vector accross the samples, we have to set the maximam number of tokens for each sample.
MAX_TEXT_LENGTH = 40
MAX_LABEL_LENGTH = 40
def preprocess_function(sample):
# Getting text and label
text = sample["text"]
label = sample["label"]
# Tokenizing the text and label
inputs = tokenizer(text, padding="max_length", truncation=True, max_length=MAX_TEXT_LENGTH)
outputs = tokenizer(label, padding="max_length", truncation=True, max_length=MAX_LABEL_LENGTH)
sample["input_ids"] = inputs.input_ids
sample["attention_mask"] = inputs.attention_mask
sample["decoder_input_ids"] = outputs.input_ids
sample["decoder_attention_mask"] = outputs.attention_mask
sample["labels"] = outputs.input_ids
# The labels are used to calcuate the loss while training, and because we added padding to make all tokens to be of same size,
# we also need to convert the padding number ( 0 ) to ( -100 ), so that we can tell huggingface that these number can be ignorned while calcuating loss.
# Why specifically -100 ? It's simply an arbitrary number, again so that huggingface can ignore this number while calcuating loss
sample["labels"] = [[-100 if token == tokenizer.pad_token_id else token for token in labels] for labels in sample["labels"]]
return sample
# Applying the preprocessing to every sample
BATCH_SIZE = 16
tokenized_datasets = dataset.map(preprocess_function, batch_size=BATCH_SIZE, batched=True)
# Convert the list into torch tensor
tokenized_datasets.set_format(
type="torch", columns=["input_ids", "attention_mask", "decoder_input_ids", "decoder_attention_mask", "labels"],
)
print(tokenized_datasets['train'][0].keys())
tokenized_datasets['train'][0], tokenized_datasets['train'][1]
dict_keys(['attention_mask', 'decoder_attention_mask', 'decoder_input_ids', 'input_ids', 'labels'])
( { 'attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), 'decoder_attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), 'decoder_input_ids': tensor([ 0, 27766, 9319, 17678, 4, 70, 24, 5844, 208124, 8121, 71, 3688, 831, 186, 52295, 71, 5, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), 'input_ids': tensor([ 0, 8121, 71, 3688, 27766, 9319, 17678, 4, 24, 5844, 52295, 71, 5, 208124, 186, 70, 831, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), 'labels': tensor([ 0, 27766, 9319, 17678, 4, 70, 24, 5844, 208124, 8121, 71, 3688, 831, 186, 52295, 71, 5, 2, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]) }, { 'attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), 'decoder_attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), 'decoder_input_ids': tensor([ 0, 139661, 552, 4240, 11044, 35066, 3173, 10, 173, 18770, 23, 10, 6024, 157955, 173, 18770, 32628, 5, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), 'input_ids': tensor([ 0, 173, 18770, 10, 23, 10, 3173, 173, 18770, 6024, 157955, 32628, 5, 139661, 552, 4240, 11044, 35066, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), 'labels': tensor([ 0, 139661, 552, 4240, 11044, 35066, 3173, 10, 173, 18770, 23, 10, 6024, 157955, 173, 18770, 32628, 5, 2, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]) } )
# Using a pretrained BERT Model
model = EncoderDecoderModel.from_encoder_decoder_pretrained("xlm-roberta-base", "xlm-roberta-base")
Some weights of the model checkpoint at xlm-roberta-base were not used when initializing XLMRobertaModel: ['lm_head.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight'] - This IS expected if you are initializing XLMRobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing XLMRobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of XLMRobertaForCausalLM were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['roberta.encoder.layer.11.crossattention.self.query.weight', 'roberta.encoder.layer.7.crossattention.self.value.weight', 'roberta.encoder.layer.6.crossattention.self.value.weight', 'roberta.encoder.layer.1.crossattention.self.query.bias', 'roberta.encoder.layer.0.crossattention.output.LayerNorm.bias', 'roberta.encoder.layer.1.crossattention.self.value.bias', 'roberta.encoder.layer.0.crossattention.self.key.bias', 'roberta.encoder.layer.5.crossattention.self.key.weight', 'roberta.encoder.layer.2.crossattention.self.key.weight', 'roberta.encoder.layer.4.crossattention.output.LayerNorm.weight', 'roberta.encoder.layer.8.crossattention.output.LayerNorm.weight', 'roberta.encoder.layer.1.crossattention.output.LayerNorm.weight', 'roberta.encoder.layer.3.crossattention.self.value.bias', 'roberta.encoder.layer.10.crossattention.self.key.weight', 'roberta.encoder.layer.3.crossattention.output.dense.bias', 'roberta.encoder.layer.9.crossattention.output.dense.weight', 'roberta.encoder.layer.11.crossattention.output.dense.weight', 'roberta.encoder.layer.0.crossattention.self.query.weight', 'roberta.encoder.layer.2.crossattention.output.LayerNorm.bias', 'roberta.encoder.layer.9.crossattention.self.value.bias', 'roberta.encoder.layer.4.crossattention.output.dense.weight', 'roberta.encoder.layer.8.crossattention.output.dense.bias', 'roberta.encoder.layer.0.crossattention.output.dense.weight', 'roberta.encoder.layer.6.crossattention.self.key.weight', 'roberta.encoder.layer.10.crossattention.output.dense.weight', 'roberta.encoder.layer.8.crossattention.self.value.bias', 'roberta.encoder.layer.10.crossattention.self.query.bias', 'roberta.encoder.layer.2.crossattention.self.query.weight', 'roberta.encoder.layer.3.crossattention.output.LayerNorm.bias', 'roberta.encoder.layer.2.crossattention.self.key.bias', 'roberta.encoder.layer.10.crossattention.output.dense.bias', 'roberta.encoder.layer.5.crossattention.output.dense.weight', 'roberta.encoder.layer.11.crossattention.output.LayerNorm.bias', 'roberta.encoder.layer.7.crossattention.output.LayerNorm.weight', 'roberta.encoder.layer.0.crossattention.output.dense.bias', 'roberta.encoder.layer.9.crossattention.self.key.weight', 'roberta.encoder.layer.5.crossattention.output.dense.bias', 'roberta.encoder.layer.5.crossattention.self.value.weight', 'roberta.encoder.layer.7.crossattention.self.key.bias', 'roberta.encoder.layer.8.crossattention.self.value.weight', 'roberta.encoder.layer.2.crossattention.self.query.bias', 'roberta.encoder.layer.2.crossattention.output.LayerNorm.weight', 'roberta.encoder.layer.11.crossattention.output.dense.bias', 'roberta.encoder.layer.7.crossattention.self.query.weight', 'roberta.encoder.layer.3.crossattention.self.value.weight', 'roberta.encoder.layer.8.crossattention.self.query.bias', 'roberta.encoder.layer.9.crossattention.self.key.bias', 'roberta.encoder.layer.5.crossattention.self.value.bias', 'roberta.encoder.layer.2.crossattention.self.value.bias', 'roberta.encoder.layer.2.crossattention.output.dense.weight', 'roberta.encoder.layer.2.crossattention.self.value.weight', 'roberta.encoder.layer.11.crossattention.self.key.bias', 'roberta.encoder.layer.9.crossattention.self.query.bias', 'roberta.encoder.layer.11.crossattention.self.value.bias', 'roberta.encoder.layer.5.crossattention.self.key.bias', 'roberta.encoder.layer.8.crossattention.self.key.bias', 'roberta.encoder.layer.8.crossattention.output.LayerNorm.bias', 'roberta.encoder.layer.1.crossattention.output.dense.bias', 'roberta.encoder.layer.11.crossattention.self.key.weight', 'roberta.encoder.layer.10.crossattention.output.LayerNorm.weight', 'roberta.encoder.layer.6.crossattention.self.query.bias', 'roberta.encoder.layer.1.crossattention.self.value.weight', 'roberta.encoder.layer.1.crossattention.self.key.bias', 'roberta.encoder.layer.4.crossattention.self.value.weight', 'roberta.encoder.layer.4.crossattention.self.query.weight', 'roberta.encoder.layer.8.crossattention.output.dense.weight', 'roberta.encoder.layer.3.crossattention.self.key.bias', 'roberta.encoder.layer.7.crossattention.self.query.bias', 'roberta.encoder.layer.3.crossattention.output.dense.weight', 'roberta.encoder.layer.9.crossattention.output.LayerNorm.bias', 'roberta.encoder.layer.6.crossattention.self.query.weight', 'roberta.encoder.layer.6.crossattention.output.LayerNorm.weight', 'roberta.encoder.layer.7.crossattention.output.dense.weight', 'roberta.encoder.layer.0.crossattention.output.LayerNorm.weight', 'roberta.encoder.layer.2.crossattention.output.dense.bias', 'roberta.encoder.layer.5.crossattention.self.query.weight', 'roberta.encoder.layer.10.crossattention.output.LayerNorm.bias', 'roberta.encoder.layer.5.crossattention.output.LayerNorm.bias', 'roberta.encoder.layer.4.crossattention.self.query.bias', 'roberta.encoder.layer.8.crossattention.self.key.weight', 'roberta.encoder.layer.10.crossattention.self.key.bias', 'roberta.encoder.layer.1.crossattention.self.key.weight', 'roberta.encoder.layer.10.crossattention.self.value.weight', 'roberta.encoder.layer.4.crossattention.output.LayerNorm.bias', 'roberta.encoder.layer.10.crossattention.self.query.weight', 'roberta.encoder.layer.3.crossattention.output.LayerNorm.weight', 'roberta.encoder.layer.9.crossattention.output.LayerNorm.weight', 'roberta.encoder.layer.11.crossattention.self.value.weight', 'roberta.encoder.layer.6.crossattention.output.dense.weight', 'roberta.encoder.layer.0.crossattention.self.value.weight', 'roberta.encoder.layer.4.crossattention.output.dense.bias', 'roberta.encoder.layer.5.crossattention.self.query.bias', 'roberta.encoder.layer.7.crossattention.self.value.bias', 'roberta.encoder.layer.7.crossattention.self.key.weight', 'roberta.encoder.layer.4.crossattention.self.value.bias', 'roberta.encoder.layer.10.crossattention.self.value.bias', 'roberta.encoder.layer.0.crossattention.self.value.bias', 'roberta.encoder.layer.6.crossattention.self.value.bias', 'roberta.encoder.layer.9.crossattention.output.dense.bias', 'roberta.encoder.layer.5.crossattention.output.LayerNorm.weight', 'roberta.encoder.layer.6.crossattention.output.dense.bias', 'roberta.encoder.layer.4.crossattention.self.key.bias', 'roberta.encoder.layer.6.crossattention.output.LayerNorm.bias', 'roberta.encoder.layer.0.crossattention.self.query.bias', 'roberta.encoder.layer.7.crossattention.output.LayerNorm.bias', 'roberta.encoder.layer.1.crossattention.output.LayerNorm.bias', 'roberta.encoder.layer.11.crossattention.output.LayerNorm.weight', 'roberta.encoder.layer.11.crossattention.self.query.bias', 'roberta.encoder.layer.7.crossattention.output.dense.bias', 'roberta.encoder.layer.1.crossattention.output.dense.weight', 'roberta.encoder.layer.9.crossattention.self.query.weight', 'roberta.encoder.layer.3.crossattention.self.query.weight', 'roberta.encoder.layer.4.crossattention.self.key.weight', 'roberta.encoder.layer.9.crossattention.self.value.weight', 'roberta.encoder.layer.3.crossattention.self.key.weight', 'roberta.encoder.layer.3.crossattention.self.query.bias', 'roberta.encoder.layer.8.crossattention.self.query.weight', 'roberta.encoder.layer.1.crossattention.self.query.weight', 'roberta.encoder.layer.0.crossattention.self.key.weight', 'roberta.encoder.layer.6.crossattention.self.key.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
# The model architecture
model
EncoderDecoderModel( (encoder): XLMRobertaModel( (embeddings): RobertaEmbeddings( (word_embeddings): Embedding(250002, 768, padding_idx=1) (position_embeddings): Embedding(514, 768, padding_idx=1) (token_type_embeddings): Embedding(1, 768) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): RobertaEncoder( (layer): ModuleList( (0): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (3): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (4): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (5): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (6): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (7): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (8): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (9): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (10): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (11): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): RobertaPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) (decoder): XLMRobertaForCausalLM( (roberta): RobertaModel( (embeddings): RobertaEmbeddings( (word_embeddings): Embedding(250002, 768, padding_idx=1) (position_embeddings): Embedding(514, 768, padding_idx=1) (token_type_embeddings): Embedding(1, 768) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): RobertaEncoder( (layer): ModuleList( (0): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (crossattention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (crossattention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (crossattention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (3): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (crossattention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (4): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (crossattention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (5): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (crossattention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (6): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (crossattention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (7): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (crossattention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (8): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (crossattention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (9): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (crossattention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (10): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (crossattention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (11): RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (crossattention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) ) (lm_head): RobertaLMHead( (dense): Linear(in_features=768, out_features=768, bias=True) (layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (decoder): Linear(in_features=768, out_features=250002, bias=True) ) ) )
Training the model¶
Setting up Training¶
# Setting up the parameters
model.config.decoder_start_token_id = tokenizer.cls_token_id
model.config.eos_token_id = tokenizer.sep_token_id
model.config.pad_token_id = tokenizer.pad_token_id
model.config.vocab_size = model.config.encoder.vocab_size
# Setting up, batch size, number of epochs
N_EPOCHS = 1
args = Seq2SeqTrainingArguments(
"Scambled Text",
evaluation_strategy = "epoch",
per_device_train_batch_size=BATCH_SIZE,
per_device_eval_batch_size=BATCH_SIZE,
num_train_epochs=N_EPOCHS,
fp16=True, # This will hlep in increasing the speed of training
)
# Setting up training
trainer = Seq2SeqTrainer(
model=model,
args=args,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['validation'],
)
Training¶
🚀 Let's goooo!
# This will take around 20-25 minutes
trainer.train()
Epoch | Training Loss | Validation Loss |
---|
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol, _use_new_zipfile_serialization) 371 with _open_zipfile_writer(opened_file) as opened_zipfile: --> 372 _save(obj, opened_zipfile, pickle_module, pickle_protocol) 373 return /usr/local/lib/python3.7/dist-packages/torch/serialization.py in _save(obj, zip_file, pickle_module, pickle_protocol) 490 num_bytes = storage.size() * storage.element_size() --> 491 zip_file.write_record(name, storage.data_ptr(), num_bytes) 492 KeyboardInterrupt: During handling of the above exception, another exception occurred: KeyboardInterrupt Traceback (most recent call last) <ipython-input-25-b897fd2f75f9> in <module>() 1 # This will take around 20-25 minutes 2 ----> 3 trainer.train() /usr/local/lib/python3.7/dist-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, **kwargs) 1326 self.control = self.callback_handler.on_step_end(args, self.state, self.control) 1327 -> 1328 self._maybe_log_save_evaluate(tr_loss, model, trial, epoch) 1329 1330 if self.control.should_epoch_stop or self.control.should_training_stop: /usr/local/lib/python3.7/dist-packages/transformers/trainer.py in _maybe_log_save_evaluate(self, tr_loss, model, trial, epoch) 1407 1408 if self.control.should_save: -> 1409 self._save_checkpoint(model, trial, metrics=metrics) 1410 self.control = self.callback_handler.on_save(self.args, self.state, self.control) 1411 /usr/local/lib/python3.7/dist-packages/transformers/trainer.py in _save_checkpoint(self, model, trial, metrics) 1497 elif self.is_world_process_zero() and not self.deepspeed: 1498 # deepspeed.save_checkpoint above saves model/optim/sched -> 1499 torch.save(self.optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt")) 1500 with warnings.catch_warnings(record=True) as caught_warnings: 1501 torch.save(self.lr_scheduler.state_dict(), os.path.join(output_dir, "scheduler.pt")) /usr/local/lib/python3.7/dist-packages/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol, _use_new_zipfile_serialization) 370 if _use_new_zipfile_serialization: 371 with _open_zipfile_writer(opened_file) as opened_zipfile: --> 372 _save(obj, opened_zipfile, pickle_module, pickle_protocol) 373 return 374 _legacy_save(obj, opened_file, pickle_module, pickle_protocol) KeyboardInterrupt:
Submitting Results 📄¶
Phew! I was a lot of grasp, anyway, let's make a submission real quick.
# Getting the predictions
def generate_predictions(batch):
# Tokenizing the test
inputs = tokenizer(batch["text"], padding="max_length", truncation=True, max_length=MAX_TEXT_LENGTH, return_tensors="pt")
# Sending the tensors to GPU
input_ids = inputs.input_ids.to("cuda")
attention_mask = inputs.attention_mask.to("cuda")
# Generating the predicted tokens ids
outputs = model.generate(input_ids, attention_mask=attention_mask)
# Converting the token ids to sentence
output_str = tokenizer.batch_decode(outputs, skip_special_tokens=True)
batch["predictions"] = output_str
return batch
# Getting all results
results = dataset['test'].map(generate_predictions, batched=True, batch_size=16)
test_dataset
id | text | label | |
---|---|---|---|
0 | 0 | safely objects move. that system images detect... | system Using approach, move. safely skip of th... |
1 | 1 | We detectors popular influence of confidences ... | confidences popular different in detectors We ... |
2 | 2 | compact coding present supervised We a approac... | We a supervised approach, compact present codi... |
3 | 3 | study high-throughput vital of quantitative be... | is for and individuals study of of collective ... |
4 | 4 | on data sets. We evaluate method many challeng... | sets. challenging the data We method on many e... |
... | ... | ... | ... |
9995 | 9995 | particular i.e. problem, of to due However, na... | the i.e. of problem, However, particular to th... |
9996 | 9996 | Simulation methods. proposed outperforming met... | state-of-the-art the that of Simulation demons... |
9997 | 9997 | in This view introduces a scenarios. paper tec... | used label introduces in placement scenarios. ... |
9998 | 9998 | valve. water from water are and pipeline sourc... | source noise interference of The are valve. pi... |
9999 | 9999 | localization leak analyzed. The leak the compu... | The leak analyzed. leak distance against actua... |
10000 rows × 3 columns
test_dataset['label'] = results['predictions']
test_dataset
id | text | label | |
---|---|---|---|
0 | 0 | safely objects move. that system images detect... | U this this, detection det det detection detec... |
1 | 1 | We detectors popular influence of confidences ... | We have three three in of three three three in... |
2 | 2 | compact coding present supervised We a approac... | We present a compact compact compact compact c... |
3 | 3 | study high-throughput vital of quantitative be... | Tranjectject of of of of oftivetive ofive ofti... |
4 | 4 | on data sets. We evaluate method many challeng... | We evaluat evaluat proposed propose challengin... |
... | ... | ... | ... |
9995 | 9995 | particular i.e. problem, of to due However, na... | However, particular the particular the particu... |
9996 | 9996 | Simulation methods. proposed outperforming met... | Simulatione demonstrat demonstrat demonstratd ... |
9997 | 9997 | in This view introduces a scenarios. paper tec... | This paper paper as amentmentmentmentmentmentm... |
9998 | 9998 | valve. water from water are and pipeline sourc... | The the no the input, input theise input,, inp... |
9999 | 9999 | localization leak analyzed. The leak the compu... | The the the the the the the the the the accu t... |
10000 rows × 3 columns
Note : Please make sure that there should be filename submission.csv
in assets
folder before submitting it
!mkdir assets
test_dataset.to_csv(os.path.join("assets", "submission.csv"), index=False)
Uploading the Results¶
Note : Please save the notebook before submitting it (Ctrl + S)
Mounting Google Drive 💾
Your Google Drive will be mounted to access the colab notebook
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.activity.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fexperimentsandconfigs%20https%3a%2f%2fwww.googleapis.com%2fauth%2fphotos.native&response_type=code
Enter your authorization code:
4/1AY0e-g4iGb1rn19MZD4tHIdQzZJZfUOvAnVjAH-Pm-dHq9S0CAGDtVu1AtI
Congratulations 🎉 you did it, but there still a lot of improvement that can be made, you can try changing many hyperparameters inc. epochs, learning rate or different model architecture!
And btw -
Don't be shy to ask question related to any errors you are getting or doubts in any part of this notebook in discussion forum or in AIcrowd Discord sever, AIcrew will be happy to help you :)
Also, want give us your valuable feedback for next blitz or want to work with us creating blitz challanges ? Let us know!
Content
Comments
You must login before you can post a comment.