A Legal Decision Support System

This repository was created as part of the Data-zoomcamp ML engineering course by Andrew Tsai. This project has been submitted as the final capstone for the course.

Problem scope 📝

In today's complex and dynamic legal landscape, the integration of Artificial Intelligence (AI) into legal assistance services has become increasingly imperative. The multifaceted nature of legal matters often poses challenges for individuals seeking clarity on applicable laws and potential sentencing outcomes. The advent of AI legal assistance, capable of predicting both applicable laws and potential imprisonmen, could addresses these challenges with unparalleled efficiency and accuracy.

Starting from a Taiwanese legal judgement dataset that targets specifically drug-related crimes, which I crawled from the open sourced website and put on hugginface, the goal is to have a model infer a judgment and:

classify the legal articles that are violated by the defendant
predict the length of the imprisonment

Example screenshot of the Gradio interface: The predict function is deployed on Hugginface Space with Gradio. This end point will remain available until the end of the evaluation period.

Demo: Click to go to the Gradio interface

Running the project ⚙️

Prepare the repository

git clone https://github.com/AndrewTsai0406/AI_Judge.git

Start a virtual environment

I advise using a virtual environment for running this project, below are instructions for doing so using Conda which helps one manage multiple envirnoments.

# create virtual environment
conda create -n project-legal python=3.10

# start the virtual environment
conda activate project-legal

# install requirements
pip install -r requirements.txt

Data preparation

The dataset can be downloaded here on the Hugginface. One specific dataset is present for training: finalized data, which should be put under a './data' directory, is used for training two classifier for prediction.

Training & saving the models

We are in a context of multi-label classification problem, with 25 predicted classes being the legal articles violated by the defendant and a multi-class classification problem for the prediction of the length of the imprisonment. All features are categorical.

Five models have been tested with a tuning of their hyperparameters using the Lighting Flash library.

All these steps are described in much details in the train_flash.ipynb.

To run the training script and save the mdoels, use the one script inside train.py with the command:

python train.py

The final models will be saved in the ./models directory.

Where to find the files for Datacamp project ML evaluation

Exploratory Data Analysis
The characteristics of the judgement that are used to predict the judgement are explored in the exploratory data analysis (EDA) part of the notebook. I ran one notebook to do the analysis. A copy of it is in the repository.
Training The logic for training the model is exported to a separate script in train.py, which runs the training for the final models.
Deployment
Predict: Predictions can be ran with a Gunicorn local service (the predict function can be found within app.py).

To run, simply type the following command on the terminal
```
gunicorn app:app -b 0.0.0.0:8000 -k uvicorn.workers.UvicornWorker --timeout 300
```
Then, navigate to your browswer and put
```
http://0.0.0.0:8000/gradio/
```

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
__MACOSX		__MACOSX
app-fastapi		app-fastapi
law-prediction-gradio-space		law-prediction-gradio-space
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
git-push-main.sh		git-push-main.sh
judge.jpeg		judge.jpeg
requirements.txt		requirements.txt
setup.py		setup.py
test.py		test.py

License

AndrewTsai0406/Legal-AI-Judge

Folders and files

Latest commit

History

Repository files navigation

A Legal Decision Support System

Problem scope 📝

Running the project ⚙️

Prepare the repository

Start a virtual environment

Data preparation

Training & saving the models

Where to find the files for Datacamp project ML evaluation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages