This repository contains python code to create a corpus of 12,215 terms of service documents scraped from TOSDR, intended for legal, privacy, and natural language processing research.
-
Updated
Mar 14, 2023 - HTML
This repository contains python code to create a corpus of 12,215 terms of service documents scraped from TOSDR, intended for legal, privacy, and natural language processing research.
Repository for the experiments and dataset described in "Simple Queries as Distant Labels for Predicting Gender on Twitter" presented at W-NUT 2017.
Discovers new ontological categories (WordNet synsets) for words based on their lexicosyntactic patterns in Wikipedia
🌐 ANT Corpus website repository.
Linguistic data on the Nuuchahnulth (Wakashan) language
an search engine for classic Chinese poetry
Corpora of speeches of Lessing's plays annotated with sentiment and emotion by annotators (German)
Practical demonstration of scikit learn library for building various classification and regression models
Simple text summariser using NLTK in python
This American Life audio downloader. Dataset is from the paper called "Speech Recognition and Multi-Speaker Diarization of Long Conversations".
Tool to identify plaintext from ciphertext word lengths
Mozilla Firefox places.sqlite tables exported to XML files. A Bash script.
Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.
To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."