Skip to content

Exploring NLP weak supervision approaches to train text classification models. The project is also a prototype for a semi-automated text data labelling platform. Approaches: Snorkel and Zero-Shot Learning.

JayThibs/Weak-Supervised-Learning-Case-Study

Repository files navigation

A Case Study on Weakly Supervised Learning

View our write-up of the project here: A Case Study on Weakly Supervised Learning.

Project was created for the Full Stack Deep Learning 2021 course. This project was chosen as one of the top projects from the course and presented at the project showcase.

Goal of the project

  • Create a text data labeling service where the user inputs text data and receives a labeled dataset.
  • Experiment with weak supervised learning and compare different approaches.

Notebooks

How to use this Project

For using only the Snorkel approach to weak supervision, use the following notebooks in this order: 01, 03, 05, 06.

For using only the model distillation approach to weak supervision, use the following notebooks int this order: 02, 04.

For more information on how to deploy a Streamlit App of this project, please go to our webapp directory.

Project Tree

.
|-- ./pyproject.toml
|-- ./requirements
|   |-- ./requirements/dev.in
|   |-- ./requirements/dev.txt
|   |-- ./requirements/prod.in
|   `-- ./requirements/prod.txt
|-- ./setup.cfg
|-- ./project_proposal.md
|-- ./tasks
|   `-- ./tasks/lint.sh
|-- ./Dockerfile
|-- ./distill_classifier.py
|-- ./service.py
|-- ./test_request.json
|-- ./train_baseline_dbpedia_model.py
|-- ./tree-md
|-- ./text_classifier
|   |-- ./text_classifier/__init__.py
|   |-- ./text_classifier/models
|   |   `-- ./text_classifier/models/__init__.py
|   |-- ./text_classifier/lit_models
|   |   `-- ./text_classifier/lit_models/__init__.py
|   `-- ./text_classifier/notebooks
|       |-- ./text_classifier/notebooks/01_dbpedia_14_bert_classification_exploration.ipynb
|       |-- ./text_classifier/notebooks/04_transformers-multi-label-classification-toxicity.ipynb
|       |-- ./text_classifier/notebooks/03_dbpedia_14_snorkel_dataset_labeling.ipynb
|       |-- ./text_classifier/notebooks/05_toxicity_classification_snorkel_dataset.ipynb
|       |-- ./text_classifier/notebooks/02_dbmedia_14_distilling_with_zero_shot_classification.ipynb
|       `-- ./text_classifier/notebooks/06_AMLS_model_deployment.ipynb
|-- ./data
|   |-- ./data/toxic_comments
|   |   |-- ./data/toxic_comments/test.csv
|   |   |-- ./data/toxic_comments/toxic_dev_200_examples.csv
|   |   |-- ./data/toxic_comments/toxic_test_630_examples.csv
|   |   |-- ./data/toxic_comments/toxic_train_2100_examples.csv
|   |   |-- ./data/toxic_comments/toxic_val_70_examples.csv
|   |   |-- ./data/toxic_comments/train.csv
|   |   |-- ./data/toxic_comments/toxicity_snorkel_dataset_3014ex.csv
|   |   `-- ./data/toxic_comments/toxicity_test_675ex.csv
|   `-- ./data/readme.md
|-- ./README.md
`-- ./webapp
    |-- ./webapp/Dockerfile
    |-- ./webapp/app.py
    |-- ./webapp/backend.py
    |-- ./webapp/demo_config.json
    |-- ./webapp/requirements.txt
    |-- ./webapp/run_webapp.sh
    |-- ./webapp/utils.py
    `-- ./webapp/README.md%

Project Proposal

Find our project proposal here.

About

Exploring NLP weak supervision approaches to train text classification models. The project is also a prototype for a semi-automated text data labelling platform. Approaches: Snorkel and Zero-Shot Learning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages