Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect file reading mode and offset in Conll.py #21

Open
userofgithub1 opened this issue Sep 7, 2018 · 0 comments
Open

Incorrect file reading mode and offset in Conll.py #21

userofgithub1 opened this issue Sep 7, 2018 · 0 comments

Comments

@userofgithub1
Copy link

userofgithub1 commented Sep 7, 2018

Hi,

I noticed in the file opening line in Conll.py the mode is incorrect it should be 'rb':

with open(path, 'rd') as f:
            doc_id = None
            doc_tokens = None

Also the calculation of the mentions positions is completely incorrect when both only reading the dataset and after linking.

The incorrect mention offsets is probably caused by these lines in class Conll.py :

begin = sum(len(t)+1 for t in doc_tokens)
dodgy_tokenisation_bs_offset = 1 if re.search('[A-Za-z],',parts[2]) else 0
position = (begin, begin + len(parts[2]) + dodgy_tokenisation_bs_offset)

Hope this is helpful and the files are edited :)
Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant