Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryError: Unable to allocate 1.47 TiB for an array with shape (635969, 635970) and data type float32 #160

Open
KevinYe553 opened this issue Feb 26, 2023 · 1 comment

Comments

@KevinYe553
Copy link

from gensim.models import KeyedVectors

# model_file = r"fan_word2vec_binary.bin"
model_file = r"D:\code\python\MachineLearning\word2evc\test\ppmi.baidubaike.word" 
#导入模型
model = KeyedVectors.load_word2vec_format(model_file, binary=True)
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_7920\2878082268.py in <module>
      4 model_file = r"D:\code\python\MachineLearning\word2evc\test\ppmi.baidubaike.word"
      5 #导入模型
----> 6 model = KeyedVectors.load_word2vec_format(model_file, binary=True)
D:\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype, no_header)
   1627 
   1628         """
-> 1629         return _load_word2vec_format(
   1630             cls, fname, fvocab=fvocab, binary=binary, encoding=encoding, unicode_errors=unicode_errors,
   1631             limit=limit, datatype=datatype, no_header=no_header,

D:\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in _load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype, no_header, binary_chunk_size)
   1967         if limit:
   1968             vocab_size = min(vocab_size, limit)
-> 1969         kv = cls(vector_size, vocab_size, dtype=datatype)
   1970 
   1971         if binary:

D:\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in __init__(self, vector_size, count, dtype, mapfile_path)
    241         self.key_to_index = {}
    242 
--> 243         self.vectors = zeros((count, vector_size), dtype=dtype)  # formerly known as syn0
    244         self.norms = None
    245 

MemoryError: Unable to allocate 1.47 TiB for an array with shape (635969, 635970) and data type float32

用错了吗??

@MasterSongM
Copy link

内存不够,ppmi是稀疏向量,你加载的这个要1.47TiB=1470GB内存

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants