MoS-Tensorflow

Tensorflow implementation of the mixture of softmaxes algorithm described in the paper Breaking the Softmax Bottleneck: A High-Rank RNN Language Model (Yang et al., 2017).
See https://github.com/zihangdai/mos for an implementation using PyTorch.

Why does mixture of softmaxes matter?

In natural language processing, the extent to which the true probability distribution of appropriate responses can be approximated overall by the network depends on the ability to express probabilties.
The problem with using the softmax function is that, when applied to the logits or raw outputs of a neural network, a substantial amount of information is lost.
This loss of information, signified by the low-rank of a resultant matrix one constructs from the logits, encourages the network to fit generic responses to each input.
Ideally, the rank of the matrix should be high, which entails more expressiveness and allows the network to use more information in its generation of responses and its analysis. Thus, this is what the mixture of softmaxes network accomplishes.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.md		README.md
data.py		data.py
main.py		main.py
model.py		model.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

data.py

data.py

main.py

main.py

model.py

model.py

util.py

util.py

Repository files navigation

MoS-Tensorflow

Why does mixture of softmaxes matter?

Code incomplete and heavily under construction.

About

Releases

Packages

Languages

License

gkluber/MoS-Tensorflow

Folders and files

Latest commit

History

Repository files navigation

MoS-Tensorflow

Why does mixture of softmaxes matter?

Code incomplete and heavily under construction.

About

Topics

Resources

License

Stars

Watchers

Forks

Languages