Skip to content

We designed an Information Retrieval system based on Vector Space model in python. We Also have implemented Bi gram Indices for Phrasal query search and Champion List retrieval. We also compared time of whole retrieving in our project report.

Notifications You must be signed in to change notification settings

Krutash/Vector-Space-IR-model

Repository files navigation


Author: UTKARSH KUMAR


  1. Please refer to "requirements.txt" for information about required libraries for smooth running of the code.

  2. Run "corpusProcess.py" first to generate corpus files.

  3. The default courpus is "wiki_56" but a new corpus or a list of corpus can be given as a command line argument. when running "corpusProcess.py".

  4. To test queries, please provide your query in "query.txt" and run "test_queries.py".

  5. By default the "test_queries.py" take the files generated by "corpusProcess.py".

  6. "test_queries.py" also accpet command line arguments with file name ordered as :

    1. Query file
    2. Index file
    3. Bigram_Index file
    4. Document IDs file
  7. Please give 5-10 minutes to each script to preprocess and perform file i/o and construct required Data structures.


If you want to explore and experiment how the model performs with other corpus find some corpus files here at: https://drive.google.com/drive/folders/1ZsnuEm7_N6aUwhjFpv-TZXFt4DiYex4t?usp=sharing

About

We designed an Information Retrieval system based on Vector Space model in python. We Also have implemented Bi gram Indices for Phrasal query search and Champion List retrieval. We also compared time of whole retrieving in our project report.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages