Skip to content

Latest commit

 

History

History
152 lines (115 loc) · 9.27 KB

CHANGELOG.md

File metadata and controls

152 lines (115 loc) · 9.27 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.6.0] - 2024-05-07

Added

Changed

  • Renamed example csv_rag to structured_data_rag
  • Model Engine name update
    • nv-ai-foundation and nv-api-catalog llm engine are renamed to nvidia-ai-endpoints
    • nv-ai-foundation embedding engine is renamed to nvidia-ai-endpoints
  • Embedding model update
    • developer_rag example uses UAE-Large-V1 embedding model.
    • Using ai-embed-qa-4 for api catalog examples instead of nvolveqa_40k as embedding model
  • Ingested data now persists across multiple sessions.
  • Updated langchain-nvidia-endpoints to version 0.0.11, enabling support for models like llama3.
  • File extension based validation to throw error for unsupported files.
  • The default output token length in the UI has been increased from 250 to 1024 for more comprehensive responses.
  • Stricter chain-server API validation support to enhance API security
  • Updated version of llama-index, pymilvus.
  • Updated pgvector container to pgvector/pgvector:pg16
  • LLM Model Updates

[0.5.0] - 2024-03-19

This release adds new dedicated RAG examples showcasing state of the art usecases, switches to the latest API catalog endpoints from NVIDIA and also refactors the API interface of chain-server. This release also improves the developer experience by adding github pages based documentation and streamlining the example deployment flow using dedicated compose files.

Added

Changed

  • Switched from NVIDIA AI Foundation to NVIDIA API Catalog endpoints for accessing cloud hosted LLM models.
  • Refactored API schema of chain-server component to support runtime allocation of llm parameters like temperature, max tokens, chat history etc.
  • Renamed llm-playground service in compose files to rag-playground.
  • Switched base containers for all components to ubuntu instead of pytorch and optimized container build time as well as container size.
  • Deprecated yaml based configuration to avoid confusion, all configurations are now environment variable based.
  • Removed requirement of hardcoding NVIDIA_API_KEY in compose.env file.
  • Upgraded all python dependencies for chain-server and rag-playground services.

Fixed

  • Fixed a bug causing hallucinated answer when retriever fails to return any documents.
  • Fixed some accuracy issues for all the examples.

[0.4.0] - 2024-02-23

Added

  • New dedicated notebooks showcasing usage of cloud based Nvidia AI Playground based models using Langchain connectors as well as local model deployment using Huggingface.
  • Upgraded milvus container version to enable GPU accelerated vector search.
  • Added support to interact with models behind NeMo Inference Microservices using new model engines nemo-embed and nemo-infer.
  • Added support to provide example specific collection name for vector databases using an environment variable named COLLECTION_NAME.
  • Added faiss as a generic vector database solution behind utils.py.

Changed

  • Upgraded and changed base containers for all components to pytorch 23.12-py3.
  • Added langchain specific vector database connector in utils.py.
  • Changed speech support to use single channel for Riva ASR and TTS.
  • Changed get_llm utility in utils.py to return Langchain wrapper instead of Llmaindex wrappers.

Fixed

  • Fixed a bug causing empty rating in evaluation notebook
  • Fixed document search implementation of query decomposition example.

[0.3.0] - 2024-01-22

Added

Changed

  • Upgraded Langchain and llamaindex dependencies for all container.
  • Restructured README files for better intuitiveness.
  • Added provision to plug in multiple examples using a common base class.
  • Changed minio service's port to 9010from 9000 in docker based deployment.
  • Moved evaluation directory from top level to under tools and created a dedicated compose file.
  • Added an experimental directory for plugging in experimental features.
  • Modified notebooks to use TRTLLM and Nvidia AI foundation based connectors from langchain.
  • Changed ai-playground model engine name to nv-ai-foundation in configurations.

Fixed

[0.2.0] - 2023-12-15

Added

Changed

  • Repository restructing to allow better open source contributions
  • Upgraded dependencies for chain server container
  • Upgraded NeMo Inference Framework container version, no seperate sign up needed for access.
  • Main README now provides more details.
  • Documentation improvements.
  • Better error handling and reporting mechanism for corner cases
  • Renamed triton-inference-server container to llm-inference-server

Fixed