codon-usage

This collection includes a number of functions and models that I have been using repeatedly in many sequence analysis and engineering projects, and are often not available out-of-the-box in any of the standard tools. They are organized into the following categories:

models

CUFS: The Codon Usage Frequency Similarity (Diament, Pinter & Tuller, 2014) defines a distance metric between gene pairs based on their coding sequence, and specificlly their codon and amino acid usage. Files: calc_CUFS, calc_synCUFS
CAI: The Codon Adaptation Index (Sharp & Li, 1987) defines an optimality score for codons based on their frequency of appearance in a given reference set of genes. This is one of the most widely-used model for estimating the translational efficiency of a coding sequence. The weights assigned to codons can also be used to generate vectors for estimating the local optimality of the sequence. Files: calc_CAI_weights, calc_score_from_weights, calc_vec_from_weights
tAI: The tRNA Adaptation Index (dos Reis, Savvy & Wernisch, 2004) defines an optimality score for codons based on a simple biophysical model of codon recognition during the translation process. Files: calc_tAI_weights, calc_score_from_weights, calc_vec_from_weights

next-generation sequencing (ngs)

Resmpling: Read count data is heavily dependent on the number of samples that were available when the dataset was generated. Downsampling (e.g., when comparing two experiments with widely different sample sizes), as well as sampling from a theoretical / simulated distribution, are often used in analysis and result validation. Files: resample_reads, resample_profiles

optimize

Codon usage bias: The most popular methods for coding sequence optimization involve the selection of the most optimal codon to encode each amino acid in the protein according to some model. The converse can also be easily applied. A 'balanced', randomly generated profile based on a desired codon distribution may also be useful at times. Files: maximize_CUB, minimize_CUB, .random/randseq_CUB
Combinatorical sequences: More elaborate optimization methods search the space of all possible sequences based on some objective function. The following functions are dedicated to the exhaustive generation of synonymous sequences (producing the same protein), non-synonymous sequences, and their combinations. Files: all_synonym_seq, all_nt_seq, all_combined_seq

random

Sampling synonymous sequences from a given codon distribution can be very useful (see above) and has been surprisingly missing from matlab's bioinformatics toolbox, so here it is. Files: randseq_CUB
Permuting sequences synonymously allows one to preserve the exact same codon usage while generting many samples. Codon permutations can be performed within each gene independently - changing their order while preserving the codon composition, and any optimality score that is determined by this composition of the gene - or globally, where codons may be exchanged between genes. Files: shuffle_codons
Generalized feature permutation: Permutations of features along a vector based on their similarity / equivalence can be used as a control test. The following function returns a permutation for any vector of features with some distance defined between the features. For example, in the above scenario multiple codons that encode the same amino acid would have a small defined distance between them. But the approach here can further be extended to amino acids with similar biophysical properties, to features other than coding sequences, and so on. Files: randperm_conserv_feat
Generalized feature sampling: Similarly, we can generalize the sampling approach to arbitrary features. Files: rand_conserv_feat

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
models		models
ngs		ngs
optimize		optimize
random		random
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models

models

ngs

ngs

optimize

optimize

random

random

LICENSE

LICENSE

README.md

README.md

Repository files navigation

codon-usage

models

next-generation sequencing (ngs)

optimize

random

About

Releases 1

Packages

Languages

License

alondmnt/codon-usage

Folders and files

Latest commit

History

Repository files navigation

codon-usage

models

next-generation sequencing (ngs)

optimize

random

About

Topics

Resources

License

Stars

Watchers

Forks

Languages