Learning Input Conditional Language Models for Natural Language Generation

Natural Language Processing (CS6370) Project.

Our objective in this project is to find NN architectures that are capable of learning surface realisation, sentence planning and content determination end-to-end from raw data (images, parameters etc..) to complete paragraphs at once.

LSTMs naturally seemed the best choice. We came up with a hybrid LSTM design that learnt an input-conditional language where the raw inputs are passed through another network before being fed to the LSTM.

This project contains 5 variants of LSTM designs that we tried out:

Running-Input LSTM (RI-LSTM) Our first attempt at obtaining input conditioned language models. Takes the data as input at every stage
Running-Input Language Model LSTM (RILM-LSTM) Integrates LM-LSTM with Running-Input LSTM
Input-Initialized LM-LSTM (IILM-LSTM) Improves upon RILM-LSTM by leveraging long-term memory aspect of LSTMs
Read Only Memory RILM-LSTM Combines a read-only variant of memory-networks with RILM-LSTM to allow the network to copy-paste the inputs into it's output. In other words, in addition to the word that an LSTM outputs at each step, the ROM-LSTM can select an input semantic instead.
**Word2Vec RILM-LSTM ** Combines dense representations of Word2Vec with RILM-LSTM to address scalability and variability issues with the model.

RI-LSTM

The set of raw inputs are provided to the LSTM at every stage (the same inputs) along with the previous hidden and cell states. This allows the LSTM to learn an input conditional language model.

Running Input Language Model LSTM

Training Mode:

Testing Mode:

In the Language Model part, we pass the previous word as an input to the current stage of the LSTM so that it knows which path has been sampled. This is important to allow the LSTM to handle ambuiguous phrases. See the full report for more details.

Input Initialized LM-LSTM

This LSTM initialises the initial cell state with a simple perceptron to transform the input to it's cell state. This part is the input network and can have different structure depending on the nature of inputs. CNNs for images, LSTMs for sequences and standard neural networks for low dimensional parameters.

Read Only Memory (Copy/Paste) LSTM

Training Mode:

Testing Mode:

This LSTM uses a Memory matrix composed of the word forms of the inputs. The LSTM is optionally allowed to optionally select an input semantic instead of an arbitrary one-hot word output. This allows for good generalisation on the Prodigy-METEO dataset as the parts of the output text that is simply copy-pasted input are effectively handled.

Word2Vec LSTM

The Word2Vec LSTM resembles the RILM-LSTM in everything except the output sampling method and the loss function.

The new loss function is the total vector dot product of the expectations with their corresponding targets.
The output sampling is performed by using Word2Vec's K similar vectors search based on the expectation and then sampling using the similarity scores.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.idea		.idea
data		data
data_gen		data_gen
images		images
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
memory-attention.png		memory-attention.png
move-up.png		move-up.png
prodigy-example.png		prodigy-example.png
rilm-eg-end.png		rilm-eg-end.png
rom-eg-end.png		rom-eg-end.png

License

rrmenon10/nlg-rom-lstm

Folders and files

Latest commit

History

Repository files navigation

Learning Input Conditional Language Models for Natural Language Generation

RI-LSTM

Running Input Language Model LSTM

Training Mode:

Testing Mode:

Input Initialized LM-LSTM

Read Only Memory (Copy/Paste) LSTM

Training Mode:

Testing Mode:

Word2Vec LSTM

Prodigy-METEO dataset analysis

Memory Attention Heatmap (Proof of Memory Hypothesis)

Examples

ROM-LSTM on Prodigy METEO

RILM-LSTM on Synthetic Weather Forcast dataset.

About

Topics

Resources

License

Stars

Watchers

Forks

Languages