Skip to content

gkluber/MoS-Tensorflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MoS-Tensorflow

Tensorflow implementation of the mixture of softmaxes algorithm described in the paper Breaking the Softmax Bottleneck: A High-Rank RNN Language Model (Yang et al., 2017).
See https://github.com/zihangdai/mos for an implementation using PyTorch.

Why does mixture of softmaxes matter?

In natural language processing, the extent to which the true probability distribution of appropriate responses can be approximated overall by the network depends on the ability to express probabilties.
The problem with using the softmax function is that, when applied to the logits or raw outputs of a neural network, a substantial amount of information is lost.
This loss of information, signified by the low-rank of a resultant matrix one constructs from the logits, encourages the network to fit generic responses to each input.
Ideally, the rank of the matrix should be high, which entails more expressiveness and allows the network to use more information in its generation of responses and its analysis. Thus, this is what the mixture of softmaxes network accomplishes.

Code incomplete and heavily under construction.

Releases

No releases published

Packages

No packages published

Languages