Skip to content

Hoss3inf/Protein-sequence-labeler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Protein-sequence-labeler


this snippet is intended to label protein sequences based on their file name. for example: the file name is: ClassA_Amine_Serotonin.txt and it contains sequence patterns as below:

>gi|73954222|ref|XP_546316.2| PREDICTED: similar to 5-hydroxytryptamine (serotonin) receptor 4 isoform b [Canis familiaris]
MDELDANVSSKEGFGSVEKVVLLTFLSAVILMAILGNLLVMVAVCRDRQLRKIKTNYFIVSLAFVDLLVSVLVMPFGAIELVQDIWIYGEMFCLVRTSLDVLLTTASIFHLCCISLDRYYAICCQPLVYRNKMTPLRIALMLGGCWIIPMFISFLPIMQGWNNIGIIDLIEKRKFNQNSNSTYCIFMVNKPYAITCSVVAFYIPFLLMVLAYYRIYVTAKEHAHQIQMLQRAGAPSEGRPQPADQHSTHRMRTETKAAKTLCIIMGCFCLCWAPFFVTNIVDPFIDYTVPGQVWTAFLWLGYINSGLNPFLYAFLNKSFRRAFLIILCCDDERYRRPSILGQTVPCSTTTINGSTHVLRDAVECGGQWESHCHPPATSSLVAAHPSDP

so it's output would be a CSV file containing this line:

ClassA,Amine,Serotonin,MDELDANVSSKEGFGSVEKVVLLTFLSAVILMAILGNLLVMVAVCRDRQLRKIKTNYFIVSLAFVDLLVSVLVMPFGAIELVQDIWIYGEMFCLVRTSLDVLLTTASIFHLCCISLDRYYAICCQPLVYRNKMTPLRIALMLGGCWIIPMFISFLPIMQGWNNIGIIDLIEKRKFNQNSNSTYCIFMVNKPYAITCSVVAFYIPFLLMVLAYYRIYVTAKEHAHQIQMLQRAGAPSEGRPQPADQHSTHRMRTETKAAKTLCIIMGCFCLCWAPFFVTNIVDPFIDYTVPGQVWTAFLWLGYINSGLNPFLYAFLNKSFRRAFLIILCCDDERYRRPSILGQTVPCSTTTINGSTHVLRDAVECGGQWESHCHPPATSSLVAAHPSDP

the raw data is available in /data/ directory and final output is in sequences.txt

Releases

No releases published

Packages

No packages published