Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import PDF files from a dir #9

Open
spectramaster opened this issue Apr 26, 2023 · 2 comments
Open

Import PDF files from a dir #9

spectramaster opened this issue Apr 26, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@spectramaster
Copy link

@freedmand

Good job! Semantra runs smoothly on my linux PC!

I think the command options :

semantra [dir]
semantra [dir1] [dir2] [....]

which can import one or more dirs contain many PDF files are useful and helpful.

@freedmand freedmand added the enhancement New feature or request label Apr 26, 2023
@freedmand
Copy link
Owner

Agreed! This seems useful. I'm thinking the behavior that makes sense would be to recursively include .txt and .pdf files when you specify a directory. Do you also think that makes sense?

@spectramaster
Copy link
Author

Of course! Import many files with various types including .txt .pdf in a dir is essentially beneficial for the experience of using semantra.

I think the ''Unstructured'' package in Langchain which can parse different types of file including .txt .pdf may be a good technical solution.

https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/unstructured_file.html

https://github.com/Unstructured-IO/unstructured

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants