Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The number of cells in 'structure' file and 'chunk' file does not match. #27

Open
xuewenyuan opened this issue Oct 28, 2020 · 2 comments

Comments

@xuewenyuan
Copy link

xuewenyuan commented Oct 28, 2020

For many annotations, I find the number of cells in 'structure' file and 'chunk' file does not match. The cell ids also do not matched each other.
E.g.,

File # cells
train/chunk/0907.1815v1.2.chunk 221
train/structure/0907.1815v1.2.json 252

And the maximum cell id in 'rel' file is 218.

@maheshp1
Copy link

That's because they have added empty cells in the json file whereas in the chunk files only cells with content are present. I am facing the same problem too, while extracting the structure of the cells from the chunk and json files simultaneously. The json files contain the cell-id but the chunk doesn't. So, if anyone has any solution to match both the files, that would be good.

@xuewenyuan
Copy link
Author

That's because they have added empty cells in the json file whereas in the chunk files only cells with content are present. I am facing the same problem too, while extracting the structure of the cells from the chunk and json files simultaneously. The json files contain the cell-id but the chunk doesn't. So, if anyone has any solution to match both the files, that would be good.

Thank you for your comment. It seems that most works only use this dataset as training set for cell detection while testing on the ICDAR13-Table dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants