The Gemma AI Toolkit offers a user-friendly way to leverage Google's latest open-source model Gemma, for text generation and processing tasks such as conversation, summarization, and question answering.
This toolkit offers both a Python wrapper and a command-line interface without the need for deep technical knowledge. It supports various model versions, including special instruction-tuned variants, and allows for offline use once the models are downloaded to your system.
- Conversational AI: Create interactive, real-time chat experiences (chatbots) or AI assistants.
- Text Generation: Produce coherent and contextually relevant text and answers from simple prompts.
- Offline Capability: Utilize models offline once downloaded, reducing dependency on an internet connection.
- Highly Customizable: Tailor settings like model version, maximum tokens and more to suit your specific requirements.
- Lightweight Integration: Efficiently designed with minimal dependencies, requiring only the
torch
andtransformers
packages for core functionality.
Python 3.6
or newer.- Internet connection for downloading model weights and dependencies (only required for download not use).
- An API key from Hugging Face (if accessing models that are not cached locally).
The following Python packages are required:
-
torch
: This package, also known as PyTorch provides the underlying framework for tensor operations and neural network layers, enabling the loading and execution of the Gemma models. -
transformers
: Developed by Hugging Face, this library is used for easy access to the Gemma models and their tokenizers, facilitating tasks like model downloading, loading, and inference with just a few lines of code.
The following Python packages are optional:
python-dotenv
: For managing API keys and other environment variables.
To use the Gemma AI Toolkit, clone the repository to your local machine and install the required Python packages.
-
Clone the repository:
git clone https://github.com/RMNCLDYO/gemma-ai-toolkit.git
-
Navigate to the repositories folder:
cd gemma-ai-toolkit
-
Install the required dependencies:
pip install -r requirements.txt
This wrapper and command-line interface supports gated models like Google's gemma-2b-it
and gemma-7b-it
instruct models. To gain access to these models:
- Visit Google AI and sign in with your Google account.
- Click on "Get Started" to be redirected to the official Google 'Kaggle' page here.
- Authorize Hugging Face to access the gated model by following the prompts on Kaggle.
Once granted access, you can use your Hugging Face API key with this wrapper to download the Gemma models and automatically setup your cache. Once the weights and tokenizer are downloded to your system, and cached, neither the Hugging Face API key or Internet connection are required.
- If not using locally cached models, obtain an API key (token) from Hugging Face.
- Create or rename the .env file in the project's root directory and add your API key:
API_KEY=your_api_key
To start a chat with the default Gemma model:
python gemma_cli.py chat
To ask a question with the default Gemma model:
python gemma_cli.py text --prompt "Your prompt here"
For additional options and help:
python gemma_cli.py --help
To initiate a conversation using the default Gemma model with the wrapper:
from gemma_chat import ChatAPI
ChatAPI().chat()
To ask the model a question using the default Gemma model with the wrapper:
from gemma_text import TextAPI
TextAPI(prompt="Your question goes here.").text()
The tool allows for advanced configurations including specifying the model version, adjusting the maximum number of tokens, and utilizing different computational precisions for optimizing performance on specific hardware.
--model
: Specify the Gemma model version.--api_key
: Your Hugging Face API key.--max_tokens
: The maximum number of tokens to generate.
--prompt
: The question you would like to ask the model.--model
: Specify the Gemma model version.--api_key
: Your Hugging Face API key.--max_tokens
: The maximum number of tokens to generate.
To exit the program at any time, you can type
exit
orquit
. This command works similarly whether you're interacting with the program via the CLI or through the Python wrapper ensuring that you can easily and safely conclude your work with the Gemma AI Toolkit without having to resort to interrupt signals or forcibly closing the terminal or command prompt.
When using the Python wrapper, these configurations can be adjusted by passing parameters to the ChatAPI
class:
ChatAPI(model="google/gemma-2b-it", api_key="your_api_key", max_tokens=150)
When using the Python wrapper, these configurations can be adjusted by passing parameters to the TextAPI
class:
TextAPI(prompt="Your question goes here.", model="google/gemma-2b-it", api_key="your_api_key", max_tokens=150)
Once the Gemma model weights are downloaded to your system, they are cached locally, allowing for offline access thereafter. This means subsequent uses do not require an internet connection or the API key, provided you're using the same system and the model weights remain in the cache.
- Performance: Running large models like Gemma locally can be resource-intensive. Users with limited CPU capabilities may experience slower response times.
- Offline Use: For offline execution, ensure that the model and tokenizer files are correctly cached. All necessary dependencies must be installed while online.
- Hardware Requirements: Performance can vary significantly based on the hardware configuration. Users are encouraged to adjust model parameters or use optimized versions of the model for better performance on constrained devices.
Contributions are welcome!
Please refer to CONTRIBUTING.md for detailed guidelines on how to contribute to this project.
Encountered a bug? We'd love to hear about it. Please follow these steps to report any issues:
- Check if the issue has already been reported.
- Use the Bug Report template to create a detailed report.
- Submit the report here.
Your report will help us make the project better for everyone.
Got an idea for a new feature? Feel free to suggest it. Here's how:
- Check if the feature has already been suggested or implemented.
- Use the Feature Request template to create a detailed request.
- Submit the request here.
Your suggestions for improvements are always welcome.
Stay up-to-date with the latest changes and improvements in each version:
- CHANGELOG.md provides detailed descriptions of each release.
Your security is important to us. If you discover a security vulnerability, please follow our responsible disclosure guidelines found in SECURITY.md. Please refrain from disclosing any vulnerabilities publicly until said vulnerability has been reported and addressed.
Licensed under the MIT License. See LICENSE for details.