Advitiya Hackathon

Submission for Advitiya Techfest AI Hackathon

Bert Fine Tuning using Hugging Face Transformer Library.

Training Code file - BertV2_Final_Version.ipynb Submission File Test data output - In Folder Submission with file name submission.csv, Multiple submission files are present each considering diffrent models and with diffrent behaviour of each model for diffrent classes. All have almost same output with diffrence in 4-5 patents only.If all of them can't be evaluated then file "submission.csv" should be considerd.

Model File link -https://drive.google.com/drive/folders/1LvRBKB9kPmhkL1iyjrywyjaN_W9FKKFj?usp=sharing above link contains all data and model file.

The code is made to run on Google colab so some files might need path changes

Validation acuracy of model were 100% for 2 and 99% for 1 out of 3 used model.Final models to use -

finalbertv4_claims_512_8_20.pth
finalbertv5_abstract_512_8_20_spacecorrected.pth
finalbertv6_classification_512_8_20.pth

All are present in the folder named Models in above link.

Software Stack : Pytorch, Transformers, Beautifulsoup, Numpy, Pandas, Matplotlib File details :

        - BasicEDA.ipynb : EDA of given data and making updating label names to label codes in TrainCSV
                        code - "0" - Non-Alcohol
                        code - "1" - Alcohol
                        code - "2" - Non-Autonomous Vehicle
                        code - "3" - Autonomous Vehicles
                        
         -Scrapper.ipynb :Scrap the data of patents from web, and cleans and organise the data in seprate json file for each patent.
         
         -BertV2_Final_Version.ipynb : Fine tuning Bert Model for patent classification, using Transformer Library from Hugging face and Pytorch.
                         
                         Training Details : Multiple models were trained eith diffrent token sizes, LR, Batch sizes, diffrent number of epochs.
                                             Get good results at traing at LR = 2e-5 for 20 Epochs and another 20 epochs at LR = 3e-5(4e-5 also       
                                             works), with max sentence length at 512, performance increases with size of sentence but GPU Memory is 
                                             limitation.
                                             
                                             
                                             
                                             Finally 3 models are being used for prediction for scrapped data, single model also performs extremly 
                                             well, but to increase accuracy further we are using more models.
                                             
                                             2 out of 3 models were showing validation accuracy of 100% and last one with 99%.
                                             
                                             
         -google_patent_scrapper.py: Scrapping library develped(specific for patent data scarapping) mostly by team after some initial lib code from 
                                    opensource Github account.                                
                                    
                                    
          - Drive folder containf all the data files including model weights, scrapped data in json format, CSV format cleaned data etc.

The Code was designed to run on Google colab for training and evaluation, and scrapping. The whole data was on my google drive sue to large size of Model weights, and data. So appropriate path changes might be required.

Some garbage files are here and some notusefull uncommentned code, which will be cleared shortly.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Submission		Submission
data		data
extra_code		extra_code
.gitignore		.gitignore
BasicEDA.ipynb		BasicEDA.ipynb
BertV2_Final_Version.ipynb		BertV2_Final_Version.ipynb
Evaluate.ipynb		Evaluate.ipynb
README.md		README.md
Scrapper_colab.ipynb		Scrapper_colab.ipynb
Scrapper_org.ipynb		Scrapper_org.ipynb
errors.py		errors.py
google_patent_scrapper.py		google_patent_scrapper.py
sample.json		sample.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

Submission

Submission

data

data

extra_code

extra_code

.gitignore

.gitignore

BasicEDA.ipynb

BasicEDA.ipynb

BertV2_Final_Version.ipynb

BertV2_Final_Version.ipynb

Evaluate.ipynb

Evaluate.ipynb

README.md

README.md

Scrapper_colab.ipynb

Scrapper_colab.ipynb

Scrapper_org.ipynb

Scrapper_org.ipynb

errors.py

errors.py

google_patent_scrapper.py

google_patent_scrapper.py

sample.json

sample.json

Repository files navigation

Advitiya Hackathon

About

Releases

Packages

Languages

Nilanshrajput/AdvitiyaHackathon

Folders and files

Latest commit

History

Repository files navigation

Advitiya Hackathon

About

Topics

Resources

Stars

Watchers

Forks

Languages