Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions & Requests? #20

Open
akshowhini opened this issue Jul 5, 2020 · 34 comments
Open

Questions & Requests? #20

akshowhini opened this issue Jul 5, 2020 · 34 comments

Comments

@akshowhini
Copy link
Contributor

  1. eval_relations(gt=[ground_truth_relations], res=[your_relations], cmp_blank=True)
  • As the objective is to find the merged relations of the neighboring cells, isn't the blank relation comparison a wrong evaluation metric?
  1. rel_gen.py this
  • That assumes every cell has unique text and case sensitive. Considering the fact this is only to generate relations for comparisons against the prediction. This may result in false negatives?
  1. Evaluation stats & trained models
  • Can we expect to get the evaluation stats & trained models getting published?
@abhyantrika
Copy link

Hi,
Do you know how to recover the graph/table structure from the model predictions ?

@akshowhini
Copy link
Contributor Author

akshowhini commented Jul 7, 2020

Unfortunately not, I'm still trying to understand that piece of it. Expecting help from @CZWin32768, the author of thi repo

@rmporsch
Copy link
Contributor

rmporsch commented Jul 7, 2020

@akshowhini I am not one of the authors. I just fixed an issue when I was looking over this repo. For table re-construction I believe this is not included and in my experience this is a topic by its own. Table reconstruction, including accounting for potential errors, isn't that trivial.

@abhyantrika
Copy link

@akshowhini @rmporsch
Had a doubt regarding the annotations.
x1,x2,y1,y2 with respect to what ?
Original PDF image ?
or the cropped image ?
Both do not seem to match. I am assuming x1,x2,y1,y2 as xmin,xmax,ymin and ymax.

@rmporsch
Copy link
Contributor

rmporsch commented Jul 8, 2020

@abhyantrika I will say that this also confused me a bit. Its also not mentioned in the paper I believe:
You can see in

def transform_coord(chunks):
# Get table width and height
coords_x, coords_y = [], []
for chunk in chunks:
coords_x.append(chunk.x1)
coords_x.append(chunk.x2)
coords_y.append(chunk.y1)
coords_y.append(chunk.y2)
# table_width = max(coords_x) - min(coords_x)
# table_height = max(coords_y) - min(coords_y)
# Coordinate transformation for chunks
table_min_x, table_max_y = min(coords_x), max(coords_y)
chunks_new = []
for chunk in chunks:
x1 = chunk.x1 - table_min_x
x2 = chunk.x2 - table_min_x
y1 = table_max_y - chunk.y2
y2 = table_max_y - chunk.y1
chunk_new = Chunk(
text=chunk.text,
pos=(x1, x2, y1, y2),
)
chunks_new.append(chunk_new)
# return table_width, table_height
return chunks_new

the chunks are actually transformed. You will need to re-convert them to visualize the results.

Please note that the evaluation stats are published in the paper @akshowhini . I guess this also should answer a number of your other questions.

@abhyantrika
Copy link

@rmporsch
I saw that.
So basically there are chunks in the chunk folder and once loaded they are transformed.
But I visualized the both the table and pdf (as image) and no co ordinates (original and transformed) were matching the boxes of any image. I am seriously confused.
Do you have any intuition on that ?

@rmporsch
Copy link
Contributor

rmporsch commented Jul 8, 2020

@abhyantrika
The following works fine for me:

        for chunk in chunks:
            copy_chunk = copy.deepcopy(chunk)
            chunk.x1 = copy_chunk.x1 + self.table_min_x
            chunk.x2 = copy_chunk.x2 + self.table_min_x
            chunk.y1 = table_max_y - copy_chunk.y2
            chunk.y2 = table_max_y - copy_chunk.y1
            chunk.pos = [chunk.x1, chunk.x2, chunk.y1, chunk.y2]

@abhyantrika
Copy link

@rmporsch
Is this a preprocessing function you apply in utils.py ?
And I assume they refer to xmin,xmax,ymin,ymax of the table image (Not the pdf image).

@rmporsch
Copy link
Contributor

rmporsch commented Jul 8, 2020

This is a post processing I did to visualize the prediction results when I took a look at the paper. table_mix_x etc refer to the same one as indata/utils.py

@abhyantrika
Copy link

abhyantrika commented Jul 8, 2020

@rmporsch
Okay.
Her is my code for relation predictions.
for d in dataset: outputs = model(d.nodes, d.edges, d.adj, d.incidence); output_rel = outputs.max(dim=1)[1];
Now, your post processing needs to be applied for d.chunks ?

@abhyantrika
Copy link

@rmporsch @akshowhini
Sorry for the silly questions.
Also what exactly do the coordinates in .chunk files represent ?
I tried visualizing them for both pdf and table image. The coordinates do not seem to match any boxes.

@rmporsch
Copy link
Contributor

@abhyantrika yes, since the chunks are changed during preprocssing (during the data loader process) you will need to change them back.

@kbrajwani
Copy link

@rmporsch hey can you provide steps or example notebook for training and inference the model .

@rmporsch
Copy link
Contributor

rmporsch commented Oct 30, 2020 via email

@kbrajwani
Copy link

@rmporsch @akshowhini @abhyantrika can you help me to inference the trained model. I have only image from which contains the table. so how can i create the other required files and get the structure of table from image.

@rmporsch
Copy link
Contributor

For that you would need an text segmentation model and probably an OCR model first. This model here requires at least the coordinates of the text.
Have a look at other projects if you need something more end to end. If you really want to use this model please aim to understand the paper first. This is not a production ready repo but a research project.

@kbrajwani
Copy link

Thanks for reply i am using the tesseract model which is able to give me text coordinate with transcription.
Yes i have experience with cascadetabnet which is giving me table coordinate then i am doing line detection to get structure of table.

@kbrajwani
Copy link

@rmporsch As per your guidance i was reading paper and i have found following things.
model takes chunk which is transcription and coordinate boxes that we can generate from ocr model another is structure which is cells position in that i need help.
then model will find the structure of table.

image

@kbrajwani
Copy link

@CZWin32768 @rmporsch hey i have doubt structure is actually a table right? so if we giving complete table then what model is doing.
or model doesn't take structure into account while training and predict from chunk and rel file.

rel file is just relation horizontal and vertical also the cell between them right?

@kbrajwani
Copy link

@rmporsch hey can you give me few hints to go ahead i am stuck.

@rmporsch
Copy link
Contributor

rmporsch commented Dec 3, 2020

@kbrajwani The model predicts the edges (a column, a row, or no relationship) between each node (the text boxes). Hence, during inference the model predicts from features present in the chunk file the relationship between each text chunk.

@kbrajwani
Copy link

@rmporsch Thanks for the support. I am studying this type of work so if you know another project similar to this can you please share it here. it would be appreciated.

@levanpon98
Copy link

Hi,
How do I recover the relations predicted from model to csv?
Ex: If node_1 have 2 relations by column, when export to csv, node_1 is a merge cell
Screenshot from 2021-01-15 14-04-33

@Darenar
Copy link

Darenar commented Feb 4, 2021

Hi, everyone.
For those who also struggles with the identification of boundary boxes in the pdf - I think I've got a solution.
First of all, to read PDF correctly through python, you have to be sure that the size of loaded PDF is exactly the same as the original one.
For example, I've been loading pages from pdf as images using pdf2image convert_from_path function. As the default, it uses parameter dpi as 200, which is appeared to be not correct. The thing you need to do is to identify the correct shape of the PDF, using for example PyPDF2 from https://stackoverflow.com/questions/6230752/extracting-page-sizes-from-pdf-in-python , and then you could have used convert_from_path with parameter ```size`` specifying the correct shape of the pdf.

Another vital point here is that coordinates are different from those you wish to apply to your PIL.Image object, for example. The thing you have to do is too adjust your y1 and y2 by y1 = PAGE_HEIGHT-y2 and y2 = PAGE_HEIGHT-y1. It comes from the different measures of Y coordinates in python and in PDF itself.

I hope it will be handy to someone, cause I've struggled with it for almost the whole day!

@brabbit61
Copy link

brabbit61 commented Mar 1, 2021

@Darenar Hey, that tip was useful thanks for that! Could you also please shed some light on the inference script. I can't seem to find that.

@JBBalling
Copy link

Has anyone done inference with the provided model? I am stuck loading the TableInferDataset. There the script wants to load the relation files from relation_path. But on inference there shouldn't be a relation file, because this is what we want to predict isn't it?
https://github.com/Academic-Hammer/SciTSR/blob/master/scitsr/data/loader.py#L121

I have a structure.json and chunks.json file according to the input in the training routine. What am I missing?
Any help is appreciated, thank you!

@abhyantrika @rmporsch @kbrajwani

@JBBalling
Copy link

JBBalling commented Aug 4, 2022

I have managed to get this running and added the inference script at my fork. Also I had to do minor changes in the source code in the TableInferDataset class. Here is the Link to the repo. Have fun and I hope it helps.

Where:
[0, 0, 1] -> vertical relation
[0, 1, 0] -> horizontal relation
[1, 0, 0] -> no relation

https://github.com/JBBalling/SciTSR

@MathamPollard
Copy link

Hi, everyone. For those who also struggles with the identification of boundary boxes in the pdf - I think I've got a solution. First of all, to read PDF correctly through python, you have to be sure that the size of loaded PDF is exactly the same as the original one. For example, I've been loading pages from pdf as images using pdf2image convert_from_path function. As the default, it uses parameter dpi as 200, which is appeared to be not correct. The thing you need to do is to identify the correct shape of the PDF, using for example PyPDF2 from https://stackoverflow.com/questions/6230752/extracting-page-sizes-from-pdf-in-python , and then you could have used convert_from_path with parameter ```size`` specifying the correct shape of the pdf.

Another vital point here is that coordinates are different from those you wish to apply to your PIL.Image object, for example. The thing you have to do is too adjust your y1 and y2 by y1 = PAGE_HEIGHT-y2 and y2 = PAGE_HEIGHT-y1. It comes from the different measures of Y coordinates in python and in PDF itself.

I hope it will be handy to someone, cause I've struggled with it for almost the whole day!

do you know how to infer? or in other words, how to make prediction?

@MathamPollard
Copy link

  1. eval_relations(gt=[ground_truth_relations], res=[your_relations], cmp_blank=True)
  • As the objective is to find the merged relations of the neighboring cells, isn't the blank relation comparison a wrong evaluation metric?
  1. rel_gen.py this
  • That assumes every cell has unique text and case sensitive. Considering the fact this is only to generate relations for comparisons against the prediction. This may result in false negatives?
  1. Evaluation stats & trained models
  • Can we expect to get the evaluation stats & trained models getting published?

do you know how to infer? or in other words, how to make prediction?

@MathamPollard
Copy link

Hi, Do you know how to recover the graph/table structure from the model predictions ?

do you know how to infer? or in other words, how to make prediction?

@MathamPollard
Copy link

@akshowhini I am not one of the authors. I just fixed an issue when I was looking over this repo. For table re-construction I believe this is not included and in my experience this is a topic by its own. Table reconstruction, including accounting for potential errors, isn't that trivial.

do you know how to infer? or in other words, how to make prediction?

@MathamPollard
Copy link

@rmporsch hey can you provide steps or example notebook for training and inference the model .

do you know how to infer? or in other words, how to make prediction?

@MathamPollard
Copy link

@Darenar Hey, that tip was useful thanks for that! Could you also please shed some light on the inference script. I can't seem to find that.

do you know how to infer? or in other words, how to make prediction?

@MathamPollard
Copy link

Darenar

@Darenar do you know how to infer? or in other words, how to make prediction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants