You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current version of Semantra (0.1.3) is a good start, but unfortunately cannot be used as a generic PDF search engine (though it is very close!).
I indexed every PDF on my linux machine by running the following command: find / -iname "*.pdf" -type f -print0| xargs -0 semantra --no-server
I note that:
Only one CPU core is used.
It is not clear to me that I could rewrite my find command to safely run 10 copies of Semantra in parallel, so I didn't.
Once I had everything indexed, there is no way to tell Semantra to just search over all already indexed files.
Using the same command for indexing and viewing violates separation of concerns.
Also, I note that when run without arguments. Semantra indicates that it accepts an optional filename (using []), but does not actually accept such input:
$ semantra
Usage: semantra [OPTIONS] [FILENAME]...
Try 'semantra --help' for help.
Error: Must provide a filename to process/query
I appreciate the work done so far, and have the following suggestions:
If nothing else, provide a command line option to Semantra to make it search all already-indexed files, or default to this behaviour when no file name is provided.
Separate Semantra into two files. One for indexing and one for searching. Allow indexers to run in parallel.
Allow the search webapp to run independently of indexers, so I can add files to the index without fear of breaking the webapp's search capabilities, and can leave the search window open 24/7. This could hopefully be as simple as only moving the index files out of a temporary directory and into the semantra search directory after indexing is completed.
The text was updated successfully, but these errors were encountered:
Thanks for the detailed write-up. This is clearly the way to go for 0.2.0 (and it was not obvious to me when I first started creating Semantra <1mo ago that it could actually be useful in this sense). I think this separation of concerns also ties into an idea of being able to add/remove files from the frontend itself (e.g. you could launch semantra without args and then open things in the UI).
This will take some design/thought to do elegantly, so I'll think further on this. But I'm interested in any more detailed design ideas you or anyone else might have on it.
It would be nice if one could make indexes that contain some selected folders, like dtSearch. I write a book and have folders for books, articles, documents, news. Then I have other projects with more folders. Thus I would like to search only in the folders of the projects.
The current version of Semantra (0.1.3) is a good start, but unfortunately cannot be used as a generic PDF search engine (though it is very close!).
I indexed every PDF on my linux machine by running the following command:
find / -iname "*.pdf" -type f -print0| xargs -0 semantra --no-server
I note that:
Also, I note that when run without arguments. Semantra indicates that it accepts an optional filename (using
[]
), but does not actually accept such input:I appreciate the work done so far, and have the following suggestions:
The text was updated successfully, but these errors were encountered: