corpus

Here are 849 public repositories matching this topic...

sonu-gupta / tosdr-terms-of-service-corpus

This repository contains python code to create a corpus of 12,215 terms of service documents scraped from TOSDR, intended for legal, privacy, and natural language processing research.

python corpus language-resources tosdr terms-of-service-agreements

Updated Mar 14, 2023
HTML

cmry / simple-queries

Star

Repository for the experiments and dataset described in "Simple Queries as Distant Labels for Predicting Gender on Twitter" presented at W-NUT 2017.

machine-learning text-mining twitter corpus author prediction dataset profiling gender distant-supervision

Updated Oct 31, 2017
Python

mgabilo / concept-discovery

Star

Discovers new ontological categories (WordNet synsets) for words based on their lexicosyntactic patterns in Wikipedia

nlp corpus pmi ontology wordnet concept feature-vector concept-vectors upper-level-concepts

Updated Mar 6, 2018
Python

antcorpus / antcorpus.github.io

Star

🌐 ANT Corpus website repository.

corpus corpus-data arabic arabic-language

Updated Feb 7, 2021
JavaScript

jojolebarjos / wikipedia-text

Star

Extract semi-structured text from Wikipedia dumps

wikipedia xml corpus words

Updated Oct 28, 2018
Python

doctt / doctt-frontend

Star

Angular Frontend for DocTT, a document tagging tool.

python angular webpack tool tagging corpus document tag

Updated May 19, 2019
TypeScript

ansvver / ChineseSTS

Star

中文文本语义相似度（Chinese Semantic Text Similarity）语料库建设

nlp corpus

Updated Mar 7, 2018

dwhieb / Nuuchahnulth

Star

Linguistic data on the Nuuchahnulth (Wakashan) language

corpus linguistics corpora corpus-linguistics language-documentation documentary-linguistics nuuchahnulth wakashan

Updated Sep 4, 2021
JavaScript

phueb / WikiCount

Star

Count words in Wikipedia articles on multiple machines

nlp wikipedia corpus

Updated Jun 11, 2022
Python

YuyuZha0 / corpus

Star

an search engine for classic Chinese poetry

search-engine corpus lucene vertx-web chinese-poetry

Updated Feb 10, 2023
Java

EMarquer / topic-corpus

Star

Corpus of topic-specific sentences and queries, created by parsing forums

python reddit parsing forms corpus python3 quora

Updated Aug 12, 2019
Python

lauchblatt / LessingSentimentEmotionCorpus

Star

Corpora of speeches of Lessing's plays annotated with sentiment and emotion by annotators (German)

german corpus sentiment sentiment-annotation annotation-study

Updated Sep 15, 2021

gcdunn / ntc_analytics_2020

Star

NTC Analytics Summit 2020

corpus topic-models

Updated Sep 21, 2020
Jupyter Notebook

ankit013 / Projects-Python

Star

Practical demonstration of scikit learn library for building various classification and regression models

nlp corpus topic-modeling gensim text-processing coherence lda mallet nlp-machine-learning perplexity mallet-lda

Updated May 15, 2020
Jupyter Notebook

kamaravichow / text-summariser-python

Sponsor

Star

Simple text summariser using NLTK in python

corpus nltk stopwords sentences nltk-python text-summariser

Updated Sep 27, 2020
Jupyter Notebook

amandeep25 / NLP_series

Star

corpus sms-messages ham spam-messages sms-spam

Updated Aug 3, 2021
Jupyter Notebook

jovistos / TALAD

Star

This American Life audio downloader. Dataset is from the paper called "Speech Recognition and Multi-Speaker Diarization of Long Conversations".

audio downloader podcast life mp3 corpus this dataset american thisamericanlife

Updated Apr 5, 2021
Python

Serene-Arc / word-length-matcher

Star

Tool to identify plaintext from ciphertext word lengths

cryptography cipher corpus ciphertext plaintext ciphertext-attack word-length

Updated Sep 4, 2021
Python

deutschestextarchiv / nschatz_deu

Star

Neuer Deutscher Novellenschatz (1884–1887)

corpus corpora tei-xml

Updated Sep 13, 2023

apple-fritter / muffin.tin

Sponsor

Star

Mozilla Firefox places.sqlite tables exported to XML files. A Bash script.

nlp firefox machine-learning corpus places database-migrations sqlite3 machinelearning mit-license bash-script nlp-machine-learning metadata-extraction corpus-processing bookmarks-export nlp-par corpus-parsing

Updated Jul 20, 2023
Shell

Improve this page

Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corpus

Here are 849 public repositories matching this topic...

sonu-gupta / tosdr-terms-of-service-corpus

cmry / simple-queries

mgabilo / concept-discovery

antcorpus / antcorpus.github.io

jojolebarjos / wikipedia-text

doctt / doctt-frontend

ansvver / ChineseSTS

dwhieb / Nuuchahnulth

phueb / WikiCount

YuyuZha0 / corpus

EMarquer / topic-corpus

lauchblatt / LessingSentimentEmotionCorpus

gcdunn / ntc_analytics_2020

ankit013 / Projects-Python

kamaravichow / text-summariser-python

amandeep25 / NLP_series

jovistos / TALAD

Serene-Arc / word-length-matcher

deutschestextarchiv / nschatz_deu

apple-fritter / muffin.tin

Improve this page

Add this topic to your repo