Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

怎么训练中文word+character+ngram 的Context特征 #16

Open
sherrytong opened this issue Jan 21, 2019 · 4 comments
Open

怎么训练中文word+character+ngram 的Context特征 #16

sherrytong opened this issue Jan 21, 2019 · 4 comments

Comments

@sherrytong
Copy link

你好,最近在用ngram2vec工具,有点困惑,要得到word+character+ngram这种context Features,我的语料要怎么处理呢?分词还是分字?
如果是分词的话,脚本里要怎么传参数才能得到character特征呢? 我在代码里看没有找到这部分内容

@yongzhuo
Copy link

同问,想要训练word+character+ngram。窗口为5的时候,character或ngram怎么选择,是直接取word to word里面的character么,还是只取前后5个character。

@light0415
Copy link

同问,求详细解释

@sunnychou0330
Copy link

您好,我想请问作者一共有两个问题:
1,关于ngram2vec:learning Improved word。。。这篇论文的公式,文章中标注的是公式(2),

image
这个里面的参数可以详细说明一下吗?特别是E()这一块
2,关于模型训练,可以对模型进行一个宏观的说明吗?输入,输出,中间的层数(卷积,池化,softmax)之类的!
非常感谢

@sunnychou0330
Copy link

@sherrytong 同问楼主,这一块有没有什么进展,也想要训练word+character+ngram,想问一下输入应该是怎么样的,是<word\character\ngram>三者的concatenate()(拼接)吗?还是什么?感觉不太清楚具体的输入。不像word2vec使用gensim,训练word embedding 输入分词好后的语料,这里一直很迷糊!希望给解答

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants