Skip to content

De novo drug discovery of protein-specific using Transformer Neural Network

License

Notifications You must be signed in to change notification settings

AtilMohAmine/protein2smiles-Transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Protein2SMILES Transformer

Protein2SMILES Transformer is De novo drug discovery of protein-specific using Transformer Neural Network, as described in the "Attention is All You Need" paper, to generate novel drugs specific to proteins.

Introduction

Protein2SMILES Transformer is a de novo drug discovery approach that generates SMILES strings, a text-based representation of a molecule, for specific protein targets. The model is trained on a large database of molecules and protein sequences collected from Bindingdb, and can generate new molecules that are optimized for binding to a target protein.

Dataset Availability

All datasets used in this project are available in my Google Drive. You can access them using the following links:

Requirements

Protein2SMILES Transformer requires the following dependencies:

  • Python 3.7 or later
  • PyTorch 1.13.1
  • Torchtext 0.14.1
  • NumPy 1.22.4
  • PyQt5 5.15.9
  • rdkit 2022.9.5

Usage

To use Protein2SMILES Transformer, follow these steps:

  1. Clone this repository to your local machine using git clone:
  $ git clone https://github.com/atilmohamine/protein2smiles-transformer.git
  1. Install the required dependencies by running the following command:
  $ pip install -r requirements.txt
  1. Run the predict.py script with the desired protein sequence as input. For example:
  $ python predict.py --input MGLSDGEWQLVLNVWGKVEGARQPL

This will generate a SMILES string that is optimized for binding to the specified protein.

There are several key args for prediction as follows:

Argument Description Default Type
--input Input Protein none (required) string
--vis Molecule Visualization True boolean
--max Max generated sentence lenght 150 integer
--pad Padding token 1 integer
--sos SOS token 2 integer
--eos EOS token 3 integer
  1. The output SMILES string can be used for further analysis, such as molecular docking or structure-based drug design.

Citation

If you find this project useful in your research, please consider citing our paper:

Amine, A.M.E., Fadila, A. Transformer neural network for protein-specific drug discovery and validation using QSAR. J Proteins Proteom (2023). https://doi.org/10.1007/s42485-023-00124-6

BibTeX:

@article{AmineFadila2023,
  author    = {Atil Mohamed El Amine, Atil Fadila},
  title     = {Transformer neural network for protein-specific drug discovery and validation using QSAR},
  journal   = {Journal of Proteins and Proteomics},
  year      = {2023},
  doi       = {10.1007/s42485-023-00124-6}
}

Licence

Protein2SMILES Transformer is released under the MIT License.

About

De novo drug discovery of protein-specific using Transformer Neural Network

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published