Annotation Paused #61

zgerbec · 2023-02-22T20:39:27Z

Hello,

I recently ran Tormes and while the pipeline completed and I was able to generate an HTML report file and assembly.tgz files for each sample (3 samples total). However I am missing what seems to be the bulk of the prokka outputs. While a directory was created for each sample the only file within was the FNA file along with the log for each sample. All three ended at the identical step where there seemed to be an issue with the contig ID name (see representative log file below).

Sample-1.log

I was wondering if there is a way to solve this issue within Tormes and rerun a shorter version of the pipeline perhaps with the assembled genomes that will provide annotation as well.

Any help would be greatly appreciated and thank you for the time.

Best,
Zack

biobrad · 2023-02-23T23:34:40Z

Hi Zack,

The issue is caused because prokka doesn't like the long naming of some of the contigs.
This is rare and I have never experienced it personally.

You can rename the contigs so that prokka will like them and then rerun just the fasta files through tormes again.
I have written a script for you that will chop off the last '.' and the following numbers in your contig naming. (those details are only there for 'bandage' which is software that you can use to visualize spades assemblies.

Copy this script into a file on your linux computer:

type in:

nano fastacontigchange.sh

then paste this:

#!/bin/bash
# usage: script.sh sequences.fasta > newsequence.fasta

while read line ; do
    if [ ${line:0:1} == ">" ] ; then
        IFS='\.' read -a header <<< "$line"
    	echo -e "${header[0]}"
    else
        echo -e "$line"
    fi
done < $1

then type ctrl-x
and y for yes and hit enter.

Then make the script executable:

chmod +x fastacontigchange.sh

Then run your fasta files through the script sending the output to a new filename like so:

./fastacongtigchange.sh Sample-1.fasta > Sample-1a.fasta
./fastacongtigchange.sh Sample-2.fasta > Sample-2a.fasta
./fastacongtigchange.sh Sample-3.fasta > Sample-3a.fasta

Do that for your three samples so you have three new files.

Here is the difference between two examples in the contig naming:

before the script:

$ head Sample-1.fasta

NODE_1_length_81586_cov_70.951870
GTGGGGTGCGGCCACCATGGCCGACAGGGGATTTCTGCCGGCGCGGTTCGGTAGCGGCGC
CAGAATCGTGCACTTTCCGCCCCATCCTTTGGGGCGCCCCATCCACTGGGCGCGCGTCAA

after the script:

$ head Sample-1a.fasta

NODE_1_length_81586_cov_70
GTGGGGTGCGGCCACCATGGCCGACAGGGGATTTCTGCCGGCGCGGTTCGGTAGCGGCGC
CAGAATCGTGCACTTTCCGCCCCATCCTTTGGGGCGCCCCATCCACTGGGCGCGCGTCAA

Then run those three new files through tormes and it should annotate properly.

zgerbec · 2023-02-24T00:43:34Z

Thank you so much for the reply I will try this and report back.

One question for the new tormes run. Obviously the contig fasta files are not paired forward/reverse reads so are they then submitted as genomes with a corresponding metadata file?

Thanks again for the help.

biobrad · 2023-02-24T04:33:52Z

Hi Zack,

in the metadata file, in the place of read 1, put the word GENOME and put the fasta file name in read 2.
probably easier to just give you an example. :)

Samples Read1   Read2   Description
Sample-1a       GENOME  Sample-1a.fasta Reads from Sample-1a
Sample-2a       GENOME  Sample-2a.fasta Reads from Sample-2a
Samples-3a      GENOME  Samples-3a.fasta        Reads from Samples-3a

nmquijada · 2023-03-01T09:49:34Z

Hi Zack,

Brad is right! The issue comes with Prokka and long contigs' names (such as the automatic one generated with SPAdes).
Please check if Brad's solution fulfills your needs.

You can find a shortcut to generate metadata files here: https://github.com/nmquijada/tormes/wiki/Shortcut-to-generate-the-metadata-file-for-TORMES

Let us know!
Narciso

nmquijada added the question Further information is requested label Mar 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotation Paused #61

Annotation Paused #61

zgerbec commented Feb 22, 2023

biobrad commented Feb 23, 2023 •

edited

zgerbec commented Feb 24, 2023

biobrad commented Feb 24, 2023

nmquijada commented Mar 1, 2023

Annotation Paused #61

Annotation Paused #61

Comments

zgerbec commented Feb 22, 2023

biobrad commented Feb 23, 2023 • edited

zgerbec commented Feb 24, 2023

biobrad commented Feb 24, 2023

nmquijada commented Mar 1, 2023

biobrad commented Feb 23, 2023 •

edited