Skip to content

Semantra 0.2.0

No due date 0% complete

Semantra 0.2.0

A redesign of Semantra to be more efficient and versatile. With these changes, Semantra will be easily installable and able to be run stand-alone, with documents added/removed through the UI. The changes introduced will likely not be backwards-compatible with old, stored embeddings but will be a strong step towards stability.

Robustness

  • Un…

Semantra 0.2.0

A redesign of Semantra to be more efficient and versatile. With these changes, Semantra will be easily installable and able to be run stand-alone, with documents added/removed through the UI. The changes introduced will likely not be backwards-compatible with old, stored embeddings but will be a strong step towards stability.

Robustness

  • Unit tests
  • Linting
  • Pre-commit hooks
  • GitHub actions, including to deploy to PyPI

Faster document storage and retrieval

  • Using annlite and docarray
  • Deprecate using Annoy as it doesn't scale well for large collections of documents and poses installation problems

Additional formats

  • Rewrite PDF frontend renderer to use PDF.js to avoid needing backend PDF rendering
  • CSV with indexing certain columns
  • Audio and video with transcription using faster-whisper
  • Ability to represent different processing options per file and memoize results (potentially requires central sqlite db)

Ease of installation

  • Use PyInstaller to create an installer that non-technical users can employ
  • Ability to export document collections as entirely web-runnable demos using Transformers.js

Website

  • A dedicated documentation and demo website at semantra.ai (already registered)

Extensibility and documentation

  • A plug-in system to build additional document loaders, frontend document renderers
  • Well-documented APIs
  • Welcoming to contributors
  • Additional guides (contributing, installing, deploying on a server, recipes, how embeddings are stored/cached)

Probably not for this release

  • Add a terminal-only search UI using Textual