Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotation Paused #61

Open
zgerbec opened this issue Feb 22, 2023 · 4 comments
Open

Annotation Paused #61

zgerbec opened this issue Feb 22, 2023 · 4 comments
Labels
question Further information is requested

Comments

@zgerbec
Copy link

zgerbec commented Feb 22, 2023

Hello,

I recently ran Tormes and while the pipeline completed and I was able to generate an HTML report file and assembly.tgz files for each sample (3 samples total). However I am missing what seems to be the bulk of the prokka outputs. While a directory was created for each sample the only file within was the FNA file along with the log for each sample. All three ended at the identical step where there seemed to be an issue with the contig ID name (see representative log file below).

Sample-1.log

I was wondering if there is a way to solve this issue within Tormes and rerun a shorter version of the pipeline perhaps with the assembled genomes that will provide annotation as well.

Any help would be greatly appreciated and thank you for the time.

Best,
Zack

@biobrad
Copy link
Collaborator

biobrad commented Feb 23, 2023

Hi Zack,

The issue is caused because prokka doesn't like the long naming of some of the contigs.
This is rare and I have never experienced it personally.

You can rename the contigs so that prokka will like them and then rerun just the fasta files through tormes again.
I have written a script for you that will chop off the last '.' and the following numbers in your contig naming. (those details are only there for 'bandage' which is software that you can use to visualize spades assemblies.

Copy this script into a file on your linux computer:

type in:

nano fastacontigchange.sh

then paste this:

#!/bin/bash
# usage: script.sh sequences.fasta > newsequence.fasta

while read line ; do
    if [ ${line:0:1} == ">" ] ; then
        IFS='\.' read -a header <<< "$line"
    	echo -e "${header[0]}"
    else
        echo -e "$line"
    fi
done < $1

then type ctrl-x
and y for yes and hit enter.

Then make the script executable:

chmod +x fastacontigchange.sh

Then run your fasta files through the script sending the output to a new filename like so:

./fastacongtigchange.sh Sample-1.fasta > Sample-1a.fasta
./fastacongtigchange.sh Sample-2.fasta > Sample-2a.fasta
./fastacongtigchange.sh Sample-3.fasta > Sample-3a.fasta

Do that for your three samples so you have three new files.

Here is the difference between two examples in the contig naming:

before the script:

$ head Sample-1.fasta

NODE_1_length_81586_cov_70.951870
GTGGGGTGCGGCCACCATGGCCGACAGGGGATTTCTGCCGGCGCGGTTCGGTAGCGGCGC
CAGAATCGTGCACTTTCCGCCCCATCCTTTGGGGCGCCCCATCCACTGGGCGCGCGTCAA

after the script:

$ head Sample-1a.fasta

NODE_1_length_81586_cov_70
GTGGGGTGCGGCCACCATGGCCGACAGGGGATTTCTGCCGGCGCGGTTCGGTAGCGGCGC
CAGAATCGTGCACTTTCCGCCCCATCCTTTGGGGCGCCCCATCCACTGGGCGCGCGTCAA

Then run those three new files through tormes and it should annotate properly.

@zgerbec
Copy link
Author

zgerbec commented Feb 24, 2023

Thank you so much for the reply I will try this and report back.

One question for the new tormes run. Obviously the contig fasta files are not paired forward/reverse reads so are they then submitted as genomes with a corresponding metadata file?

Thanks again for the help.

@biobrad
Copy link
Collaborator

biobrad commented Feb 24, 2023

Hi Zack,

in the metadata file, in the place of read 1, put the word GENOME and put the fasta file name in read 2.
probably easier to just give you an example. :)

Samples Read1   Read2   Description
Sample-1a       GENOME  Sample-1a.fasta Reads from Sample-1a
Sample-2a       GENOME  Sample-2a.fasta Reads from Sample-2a
Samples-3a      GENOME  Samples-3a.fasta        Reads from Samples-3a

@nmquijada
Copy link
Owner

Hi Zack,

Brad is right! The issue comes with Prokka and long contigs' names (such as the automatic one generated with SPAdes).
Please check if Brad's solution fulfills your needs.

You can find a shortcut to generate metadata files here: https://github.com/nmquijada/tormes/wiki/Shortcut-to-generate-the-metadata-file-for-TORMES

Let us know!
Narciso

@nmquijada nmquijada added the question Further information is requested label Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants