Flock: Configurable ML Pipeline for Domain-Specific LLMs

Flock is a versatile and configurable Machine Learning (ML) pipeline designed to build Language Model Models (LLMs) for domain-specific tasks. It offers support for popular LLM architectures such as wizardlm, bloom, falcon, and llama. The project also features a deep document mining system capable of extracting data from both text and images.

Features

Configurable ML pipeline for domain-specific Language Model Models (LLMs).
Supports multiple LLM architectures: wizardlm, bloom, falcon, and llama.
Deep document mining system for data extraction from text and images.
Developed using Python, pdfMiner, langChain, and streamLit technologies.

Installation

Clone the repository:

git clone https://github.com/yourusername/flock.git
cd flock

Install the required dependencies:

pip install -r requirements.txt

Run the Flock application:

python app.py

Usage

Choose an LLM architecture: wizardlm, bloom, falcon, or llama.
Configure the pipeline settings according to your domain-specific task.
Prepare your text and image data for training and evaluation.
Run the pipeline using the provided scripts.
Evaluate the trained LLM and fine-tune as necessary.

Action Plan

Phase 1: Setup and Data Collection

Set up the project repository with a basic directory structure.
Create a virtual environment and install necessary dependencies.
Implement data collection mechanisms for text and image data.
Preprocess and clean the collected data for further processing.

Phase 2: LLM Architecture Integration

Integrate support for wizardlm architecture.
Integrate support for bloom architecture.
Integrate support for falcon architecture.
Integrate support for llama architecture.

Phase 3: Deep Document Mining System

Implement a data extraction system for text documents.
Implement a data extraction system for image documents.
Develop mechanisms to combine text and image data for comprehensive analysis.

Phase 4: Configuration and Pipeline Development

Create a configuration interface for setting pipeline parameters.
Develop the ML pipeline to train and evaluate LLMs based on selected architectures.
Implement mechanisms for fine-tuning LLMs using domain-specific data.

Phase 5: User Interface and Visualization

Build a user-friendly interface using streamLit for interacting with the pipeline.
Implement visualization tools to display training progress and evaluation metrics.

Phase 6: Testing and Optimization

Test the pipeline with sample domain-specific tasks and datasets.
Optimize the pipeline for performance and efficiency.
Identify and resolve any bugs or issues.

Phase 7: Documentation and Deployment

Write comprehensive documentation for setting up, using, and extending the pipeline.
Prepare the repository for deployment, including proper version control and packaging.

Contribution

Contributions are welcome! If you'd like to contribute to Flock, please follow the guidelines in the CONTRIBUTING.md file.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
flock		flock
notebooks		notebooks
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

swainshashwat/Flock

Folders and files

Latest commit

History

Repository files navigation

Flock: Configurable ML Pipeline for Domain-Specific LLMs

Features

Installation

Usage

Action Plan

Phase 1: Setup and Data Collection

Phase 2: LLM Architecture Integration

Phase 3: Deep Document Mining System

Phase 4: Configuration and Pipeline Development

Phase 5: User Interface and Visualization

Phase 6: Testing and Optimization

Phase 7: Documentation and Deployment

Contribution

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages