LLM-FastAPI

NimbleBox Apprenticeship ML Engineer Task - 1 This project demonstrates the implementation of a Language Model Server using FastAPI and gRPC. It leverages a large language model to generate coherent text based on user input.

Getting Started To set up and run the project, follow the steps below:

Install the required Python packages by running bash pip install -r requirements.txt.

Train the language model using trainer.py. Provide the dataset file (--fp argument) and other training arguments as needed. The trained model weights will be saved in a specified location.

Start the language model server by running uvicorn server:app --host 0.0.0.0 --port 8000. The server will listen on http://localhost:8000 and accept text generation requests.

Use the provided APIs or client.py to generate text by sending requests to the server. Example curl command:

curl -X POST -H "Content-Type: application/json" -d '{"text": "Hello"}' http://localhost:8000/generate

Optionally, use test.py to stress test the server's performance and evaluate its response time under load.

Define the Protobuf service and message types in text_generator.proto:

syntax = "proto3";

package textgenerator;

service TextGenerator {
  rpc GenerateText(TextRequest) returns (TextResponse) {}
}

message TextRequest {
  string text = 1;
}

message TextResponse {
  string generated_text = 1;
}

Generate the gRPC code using the protoc compiler, you need to install the protobuf and grpcio-tools packages:

pip install protobuf grpcio-tools

python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. text_generator.proto

After running the command, you will see two new files generated in the current directory:

text_generator_pb2.py: Contains the generated code for the Protobuf messages.
text_generator_pb2_grpc.py: Contains the generated code for the gRPC service.

Crux: ML Engineer

Bonus points:

if the filepath can be a GitHub gist (eg. this gist)
if everything can be run via single shell file
if LLM can give coherent reply
a file test.py that can:
- stress test the server using multithreading
- provide a CLI for using the model fast

Ultra bonus points:

you use gRPC over HTTP/REST
you use something other than python (but not C++, Javascript FFS)

Train a language model and serve it over a FastAPI.

create a github repository
create a file called trainer.py which can be accessed via CLI to train an LLM (protip: take a look at python-fire). It should take in following arguments:
- fp the file to finetune the model on
- some training arguments as well (protip: don't use huggingface try karpathy/minGPT)
- the result of this should be the model weights saved in some location
create a file called server.py that serves the LLM over a HTTP/REST over some APIs (protip: use pydantic for models)
A curl command to call the model and get response
an ipython notebook that contains steps to run this

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.vscode		.vscode
__pycache__		__pycache__
minGPT		minGPT
model_weights		model_weights
README.md		README.md
client.py		client.py
dataset.json		dataset.json
run.ipynb		run.ipynb
run.sh		run.sh
server.py		server.py
test.py		test.py
text_generator.proto		text_generator.proto
text_generator_pb2.py		text_generator_pb2.py
text_generator_pb2_grpc.py		text_generator_pb2_grpc.py
trainer.py		trainer.py

visheshc14/LLM-FastAPI

Folders and files

Latest commit

History

Repository files navigation

LLM-FastAPI

Crux: ML Engineer

Train a language model and serve it over a FastAPI.

About

Topics

Resources

Stars

Watchers

Forks

Languages