The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA

This repository is the official implementation of The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA.

Requirements

To install requirements:

pip install -r requirements_cuda118.txt

📋 The experiments were done under CUDA 11.8

Dataset

Open stage_1_create_canonical_smiles.ipynb and run the cells (** You can skip this part if you want to use the csv file from github with "canonical_smiles_xxx.csv")
python stage_2_descriptors_for.py
Open stage_3_preprocess_smiles_property.ipynb and run the cells.

📋 check instruction_dataset_mtr.txt

Training

To train the model(s) in the paper, move to transformers_and_chemistry (the main directory) and run:

For Multitask Regression (MTR) model training

python -m models_mtr.train_chemXXX

For Finetune model training

python -m finetune.run_auto_XXX_bulk

📋 check instruction_models_mtr.txt and instruction_finetune.txt

Evaluation

To evaluate the Finetune models, move to transformers_and_chemistry/eval (the main directory) and:

Open Eval_Tabular.ipynb and run the cells

Pre-trained Models

Since this experiments yields multiple MTR and Finetune models, downloading pre-trained model is not available. But you can reproduce them through this code.

Contributing

📋 MIT

Authors' Note

Please use this code only for social goods and positive impact.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dataset_mtr		dataset_mtr
eval		eval
finetune		finetune
models_mtr		models_mtr
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
descriptor_list.json		descriptor_list.json
instruction_main.txt		instruction_main.txt
requirements_cuda118_python113.txt		requirements_cuda118_python113.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset_mtr

dataset_mtr

eval

eval

finetune

finetune

models_mtr

models_mtr

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

init.py

init.py

descriptor_list.json

descriptor_list.json

instruction_main.txt

instruction_main.txt

requirements_cuda118_python113.txt

requirements_cuda118_python113.txt

Repository files navigation

The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA

Requirements

Dataset

Training

Evaluation

Pre-trained Models

Contributing

Authors' Note

About

Releases

Languages

License

BrightBlueCheese/transformers_and_chemistry

Folders and files

Latest commit

History

Repository files navigation

The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA

Requirements

Dataset

Training

Evaluation

Pre-trained Models

Contributing

Authors' Note

About

Topics

Resources

License

Stars

Watchers

Forks

Languages