Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

bug: zipfile.BadZipFile using pretrained BIST model #217

Open
mastreips opened this issue Apr 5, 2021 · 4 comments
Open

bug: zipfile.BadZipFile using pretrained BIST model #217

mastreips opened this issue Apr 5, 2021 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@mastreips
Copy link

Describe the bug
A clear and concise description of what the bug is.
Model/procedure: what model or procedure were you running?

nlp_architect/models/absa/train/train.py produces zipfile.BadZipFile: File is not a zip file error when trying to download the pretrained model for SpacyBISTParser(). Updating spacy to 3.0 results in ImportError: cannot import name 'LEMMA_EXC' error as a result of a change from Spacy v2.1 to v2.2 to move the large lookup tables out of the main library. The lemmatizer data is now stored in the separate package spacy-lookups-data and the Lemmatizer is initialized with a Lookups object instead of the individual variables.since

To Reproduce
Steps to reproduce the behavior:

  1. pip_packages = ['nlp-architect','spacy==2.1.8','numpy==1.19.5']

Expected behavior

**Environment setup: **

  • OS (Linux/Mac OS): Azure AML
  • Python version: 3.6.9
  • Backend:

Additional context

Log Output

You can now load the model via spacy.load('en')
Using pre-trained BIST model.
Downloading pre-trained BIST model...
Unable to determine total file size.
Downloading file to: /root/nlp-architect/cache/bist-pretrained/bist-pretrained.zip

0MB [00:00, ?MB/s]
1MB [00:00, 579.96MB/s]
Download Complete
Unzipping...

[2021-04-05T14:57:17.529886] The experiment failed. Finalizing run...
2021-04-05 14:57:17,535 INFO Exiting context: TrackUserError
2021-04-05 14:57:17,536 INFO Exiting context: RunHistory
Cleaning up all outstanding Run operations, waiting 900.0 seconds
1 items cleaning up...
Cleanup took 0.07420921325683594 seconds
2021-04-05 14:57:30,901 INFO Exiting context: ProjectPythonPath
Traceback (most recent call last):
File "train.py", line 46, in
max_iter=args.max_iter)
File "/azureml-envs/azureml_d664de2764d55f1b5c7b6f4fc0a2fd6b/lib/python3.6/site-packages/nlp_architect/models/absa/train/train.py", line 49, in init
self.parser = SpacyBISTParser()
File "/azureml-envs/azureml_d664de2764d55f1b5c7b6f4fc0a2fd6b/lib/python3.6/site-packages/nlp_architect/pipelines/spacy_bist.py", line 46, in init
_download_pretrained_model()
File "/azureml-envs/azureml_d664de2764d55f1b5c7b6f4fc0a2fd6b/lib/python3.6/site-packages/nlp_architect/pipelines/spacy_bist.py", line 170, in _download_pretrained_model
uncompress_file(zip_path, outpath=str(SpacyBISTParser.dir))
File "/azureml-envs/azureml_d664de2764d55f1b5c7b6f4fc0a2fd6b/lib/python3.6/site-packages/nlp_architect/utils/io.py", line 85, in uncompress_file
with zipfile.ZipFile(filepath) as z:
File "/azureml-envs/azureml_d664de2764d55f1b5c7b6f4fc0a2fd6b/lib/python3.6/zipfile.py", line 1108, in init
self._RealGetContents()
File "/azureml-envs/azureml_d664de2764d55f1b5c7b6f4fc0a2fd6b/lib/python3.6/zipfile.py", line 1175, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

@mastreips mastreips added the bug Something isn't working label Apr 5, 2021
@mastreips
Copy link
Author

mastreips commented Apr 5, 2021

I was able to fix the issue by changing the code in io.py:

with open(destfile, "wb") as f:
for data in tqdm(req.iter_content(chunksz), total=nchunks, unit="MB", file=sys.stdout):
f.write(data)
print("Download Complete")

to:

    url = "https://d2zs9tzlek599f.cloudfront.net/models/dep_parse/bist-pretrained.zip"
    remote = urllib.request.urlopen(url)
    data = remote.read()
    remote.close()
    local = open(destfile, 'wb')
    local.write(data)
    local.close()

@danielkorat
Copy link
Collaborator

Hi @mastreips
According to the stack trace in your issue, you are using an old version of the code (updated 6 months ago according to git blame). I have tested SpacyBist downloading and ABSA execution end-to-end and everything works fine.

@vkurpad
Copy link

vkurpad commented Apr 18, 2021

@danielkorat I am running into the same issue and I believe the issue is an outdated nlp-architect package.

Ran into the same issue with pip install nlp-architect

Resolved with build from cloned repo.

Can you validate that you ran you test with the package version?

@danielkorat
Copy link
Collaborator

danielkorat commented Apr 18, 2021

Hi @vkurpad,
The pip package URLs might be outdated. @peteriz can you confirm?
I installed from source, see the installation instructions here.

@danielkorat danielkorat reopened this Apr 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants