Multilingual-News-Article-Similarity

In this paper, we describe our system entry for SemEval 2022 Task 8 which is on Multilingual News Article Similarity, where we leverage the knowledge of pre-trained language models to evaluate the Overall Similarity between a given pair of Articles. In our system, we use a Sentence transformer based approach to estimate the contextualized embeddings, on which we apply the Cosine similarity followed by renormalisation, to get the final score. We further finetune the Model using the Cosine Similarity Loss (details of which is provided in Section 3) on the provided dataset. We also try to leverage the metadata provided with the Articles, by concatinating 'Title' with the textual content, so as to improve the performance. We evaluate the model performance using the Pearson Correlation Score in both Multilingual and Translated to English settings. Our proposed approach using the Multilingual Setting is ranked 19th in the official SemEval 2022 Task 8 Leaderboard with a Pearson correlation score of 0.721. In addition to our final approach, we also discuss some other approaches we experimented on, before arriving at our final model, in Section 4.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Final Model.png		Final Model.png
Model Architecture.png		Model Architecture.png
Multilingual News Similarity.pdf		Multilingual News Similarity.pdf
NLP__Final_Report.pdf		NLP__Final_Report.pdf
Performace_Pressure_PPT.pdf		Performace_Pressure_PPT.pdf
Performace_Pressure_Report.pdf		Performace_Pressure_Report.pdf
Performance Pressure_MidEval.pdf		Performance Pressure_MidEval.pdf
README.md		README.md
Results.pdf		Results.pdf
XLMfinetuned.ipynb		XLMfinetuned.ipynb
ZeroShotBaseline_+_Train_Val_split_for_future.ipynb		ZeroShotBaseline_+_Train_Val_split_for_future.ipynb
finetunedXLM.py		finetunedXLM.py
finetunedmBERT.py		finetunedmBERT.py
mBERTfinetuned.ipynb		mBERTfinetuned.ipynb
requirements.txt		requirements.txt
savedModelLink.txt		savedModelLink.txt
st-cosine-similarity-loss-finetune-translated.py		st-cosine-similarity-loss-finetune-translated.py
st-cosine-similarity-loss-finetune.ipynb		st-cosine-similarity-loss-finetune.ipynb
st-cosine-similarity-loss-finetune.py		st-cosine-similarity-loss-finetune.py
st-cosine-similarity-loss-translated-finetune.ipynb		st-cosine-similarity-loss-translated-finetune.ipynb

abhinav-bohra/Multilingual-News-Article-Similarity

Folders and files

Latest commit

History

Repository files navigation

Multilingual-News-Article-Similarity

About

Resources

Stars

Watchers

Forks

Languages