TaxoRL

code for "End-to-End Reinforcement Learning for Automatic Taxonomy Induction" ACL 2018 [arXiv]

Requirements

python 2.7
dynet 2.0
tqdm

Data

Preprocessed pickled data including everything else for the WordNet data can be downloaded here.

Preprocessed pickled data including everything else for the WordNet data and SemEval-2016 can be downloaded here. If you run on SemEval-2016, use dev_twodatasets.tsv instead of dev_wnbo_hyper.tsv. Caution: it may take 40+ GB memory.

Go to https://morningmoni.github.io/wordnet-vis/ to see the visualization of WordNet subtrees.

DIY

To build everything from scratch, first download corpora such as Wikipedia, UMBC, and 1 Billion Word Language Model Benchmark.
To preprocess the corpus, generate a vocabulary file and use the scripts under ./corpus/ to find dependency paths between terms in the vocabulary. The scripts are modified based on LexNET. Instructions can be found here. It may take several hours to finish this process.
Run train_RL.py and it will compute all the features and save them into pickle files.

Run

Run train_RL.py for training and testing. All the parameters are in argparse and have default values so that you can run without specifying any parameters (but feel free to tune them).

In each epoch, the performance on training/validation/test sets is reported. You may exit the program at any time.

Cite

@InProceedings{P18-1229,
  author = 	"Mao, Yuning
		and Ren, Xiang
		and Shen, Jiaming
		and Gu, Xiaotao
		and Han, Jiawei",
  title = 	"End-to-End Reinforcement Learning for Automatic Taxonomy Induction",
  booktitle = 	"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"2462--2472",
  location = 	"Melbourne, Australia",
  url = 	"http://aclweb.org/anthology/P18-1229"
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
code		code
corpus		corpus
datasets		datasets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

corpus

corpus

datasets

datasets

LICENSE

LICENSE

README.md

README.md

Repository files navigation

TaxoRL

Requirements

Data

Run

Cite

About

Releases

Packages

Languages

License

morningmoni/TaxoRL

Folders and files

Latest commit

History

Repository files navigation

TaxoRL

Requirements

Data

Run

Cite

About

Topics

Resources

License

Stars

Watchers

Forks

Languages