GitHub - dimitreOliveira/Jigsaw-Multilingual-Toxic-Comment-Classification: :3rd_place_medal: (Bronze medal - 100th place - Top 7%) Repository for the "Jigsaw Multilingual Toxic Comment Classification" Kaggle competition.

About the repository

This competition was about developing multi-lingual models to classify if comments from forums were toxic or not, the test set had comments on 6 different languages (English, Portugese, Russian, French, Italian and Spanish), for my experiments I manly use XLM-RoBERTa a SOTA multi-lingual model form Huggingface repository.

Published Kaggle kernels:

What you will find

Best solution (Bronze medal - 100th place) link
Datasets [link]
Documentation [link]
Models [link]
- Inference link
- Train link
Scripts link

Jigsaw Multilingual Toxic Comment Classification

Use TPUs to identify toxicity comments across multiple languages

Kaggle competition: https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification

Overview

It only takes one toxic comment to sour an online discussion. The Conversation AI team, a research initiative founded by Jigsaw and Google, builds technology to protect voices in conversation. A main area of focus is machine learning models that can identify toxicity in online conversations, where toxicity is defined as anything rude, disrespectful or otherwise likely to make someone leave a discussion. If these toxic contributions can be identified, we could have a safer, more collaborative internet.

In the previous 2018 Toxic Comment Classification Challenge, Kagglers built multi-headed models to recognize toxicity and several subtypes of toxicity. In 2019, in the Unintended Bias in Toxicity Classification Challenge, you worked to build toxicity models that operate fairly across a diverse range of conversations. This year, we're taking advantage of Kaggle's new TPU support and challenging you to build multilingual models with English-only training data.

Jigsaw's API, Perspective, serves toxicity models and others in a growing set of languages (see our documentation for the full list). Over the past year, the field has seen impressive multilingual capabilities from the latest model innovations, including few- and zero-shot learning. We're excited to learn whether these results "translate" (pun intended!) to toxicity classification. Your training data will be the English data provided for our previous two competitions and your test data will be Wikipedia talk page comments in several different languages.

As our computing resources and modeling capabilities grow, so does our potential to support healthy conversations across the globe. Develop strategies to build effective multilingual models and you'll help Conversation AI and the entire industry realize that potential.

Disclaimer: The dataset for this competition contains text that may be considered profane, vulgar, or offensive.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Assets		Assets
Best solution (Bronze medal - 100th place)		Best solution (Bronze medal - 100th place)
Datasets		Datasets
Documentation		Documentation
Model backlog		Model backlog
Scripts		Scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assets

Assets

Best solution (Bronze medal - 100th place)

Best solution (Bronze medal - 100th place)

Datasets

Datasets

Documentation

Documentation

Model backlog

Model backlog

Scripts

Scripts

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

About the repository

Published Kaggle kernels:

What you will find

Jigsaw Multilingual Toxic Comment Classification

Use TPUs to identify toxicity comments across multiple languages

Overview

About

Releases

Packages

Languages

License

dimitreOliveira/Jigsaw-Multilingual-Toxic-Comment-Classification

Folders and files

Latest commit

History

Repository files navigation

About the repository

Published Kaggle kernels:

What you will find

Jigsaw Multilingual Toxic Comment Classification

Use TPUs to identify toxicity comments across multiple languages

Overview

About

Topics

Resources

License

Stars

Watchers

Forks

Languages