Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reference-guided genome assembly #69

Open
ariasamin opened this issue Jun 15, 2023 · 6 comments
Open

reference-guided genome assembly #69

ariasamin opened this issue Jun 15, 2023 · 6 comments

Comments

@ariasamin
Copy link

ariasamin commented Jun 15, 2023

Hello,
I am running reference-guided genome assembly; however, I got a problem with that. It looks the problem is with the reference genome. Reference genome can be in fasta or gbk format; however, neither of them worked.

The command is:

tormes  \
--metadata my-metadata.txt  \
--output OUTPUT  \
--reference ./ref/GCA_000013425.1_ASM1342v1_genomic.fasta  \
--threads 100  >  stdout_Tormes

and I got this error:
ref file: org.gel.mauve.contigs.ContigMauveAlignFrame[panel0,0,0,400x383,invalid]
Would you please help me with this error?
I appreciate your help.

PS: as gbk is available anymore in Genbank, I renamed gb to gbk, but failed.

@biobrad
Copy link
Collaborator

biobrad commented Jun 15, 2023 via email

@ariasamin
Copy link
Author

Dear Brad,
I appreciate your prompt reply.
I tried the truncated name you suggested but got the same result.

ref file: org.gel.mauve.contigs.ContigMauveAlignFrame[panel0,0,0,400x383,invalid]
shown
Copying...
        /scratch1/narges/tormes_06.09.2023/tormes4_ref/./ref/GCA_000013425.fasta
to
    
/scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_739821.pbs02/genome_ordering/S12/alignment1/GCA_000013425.fasta
Copying...
        /scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_739821.pbs02/genomes/S12.fasta
to
        /scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_739821.pbs02/genome_ordering/S12/alignment1/S12.fasta
trying path ./linux-x64/progressiveMauve
Running alignment.
Executing
  progressiveMauve
    --output=/scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_739821.pbs02/genome_ordering/S12/alignment1/alignment1
    --skip-refinement
    --weight=200
    --output-guide-tree=/scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_739821.pbs02/genome_ordering/S12/alignment1/alignment1.guide_tree
    --backbone-output=/scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_739821.pbs02/genome_ordering/S12/alignment1/alignment1.backbone
    /scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_739821.pbs02/genome_ordering/S12/alignment1/GCA_000013425.fasta
    /scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_739821.pbs02/genome_ordering/S12/alignment1/S12.fasta
Storing raw sequence at /local_scratch/pbs.739821.pbs02/rawseq1343900.000
Sequence loaded successfully.
/scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_739821.pbs02/genome_ordering/S12/alignment1/GCA_000013425.fasta 2821361 base pairs.
Storing raw sequence at /local_scratch/pbs.739821.pbs02/rawseq1343900.001
Sequence loaded successfully.
/scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_739821.pbs02/genome_ordering/S12/alignment1/S12.fasta 2865382 base pairs.
Using weight 15 mers for initial seeds
Creating sorted mer list
Create time was: 0 seconds.
Creating sorted mer list
Create time was: 1 seconds.
0%..1%..2%..3%..4%..5%..6%..7%..8%..9%..10%..
11%..12%..13%..14%..15%..16%..17%..18%..19%..20%..
21%..22%..23%..24%..25%..26%..27%..28%..29%..30%..
31%..32%..33%..34%..35%..36%..37%..38%..39%..40%..
41%..42%..43%..44%..45%..46%..47%..48%..49%..50%..
51%..52%..53%..54%..55%..56%..57%..58%..59%..60%..
61%..62%..63%..64%..65%..66%..67%..68%..69%..70%..

I belive tormes continue working with skiping the reference genome.
I wonder how the output of a reference_guided denovo assembly will look like? Is there any example/test available I can try?

I appreciate your help,

PS: There is a stdout file attached, saved right before tormes finished the job.

Sstdout_Tormes_735766.pbs02_report_03_42_10.zip

@biobrad
Copy link
Collaborator

biobrad commented Jul 3, 2023

Hi there, sorry for the late reply, i had university exams and then some family issues.
I will have a look at this over the next few days.
cheers
Brad

@biobrad
Copy link
Collaborator

biobrad commented Jul 3, 2023

Looking at the information you provided, it looks like the alignment actually works and is successful.
Check the output folders and I think you will find that the aligned files are there.

Running alignment.
Executing
progressiveMauve
--output=/scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02/genome_ordering/S07/alignment2/alignment2
--skip-refinement
--weight=200
--output-guide-tree=/scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02/genome_ordering/S07/alignment2/alignment2.guide_tree
--backbone-output=/scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02/genome_ordering/S07/alignment2/alignment2.backbone
/scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02/genome_ordering/S07/alignment2/GCA_000013425.1_ASM1342v1_genomic.fasta
/scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02/genome_ordering/S07/alignment2/S07.fasta
shown
Storing raw sequence at /local_scratch/pbs.735766.pbs02/rawseq3354017.000
Sequence loaded successfully.
/scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02/genome_ordering/S07/alignment2/GCA_000013425.1_ASM1342v1_genomic.fasta 2821361 base pairs.
Storing raw sequence at /local_scratch/pbs.735766.pbs02/rawseq3354017.001
Sequence loaded successfully.

^^^^^^^SUCCESSFUL LOAD OF YOUR REFERENCE AND YOUR QUERY SEQUENCES

/scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02/genome_ordering/S07/alignment2/S07.fasta 2826041 base pairs

@ariasamin
Copy link
Author

Dear Brad,
I am sorry to hear you had a tough time. I hope everything is going well on you now.
Thank you for getting back to me.

I realized the point you mentioned about the reference, however; another part of the report says:
/scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02/genome_ordering/S07/alignment2/alignment2

Moreover, there is no Output, as is mentioned in the report:

$ cd /scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02/genome_ordering/S07/alignment2/alignment2
-bash: cd: /scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02/genome_ordering/S07/alignment2/alignment2: No such file or directory
$ cd /scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02/genome_ordering/S07/alignment2
-bash: cd: /scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02/genome_ordering/S07/alignment2: No such file or directory
$ cd /scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02/genome_ordering/S07
-bash: cd: /scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02/genome_ordering/S07: No such file or directory
$ cd /scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02/genome_ordering
-bash: cd: /scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02/genome_ordering: No such file or directory
$ cd /scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02
$ pwd
/scratch1/narges/tormes_06.09.2023/tormes4_ref/OUTPUT_735766.pbs02
$ tree -L 1
.
├── annotation
├── antimicrobial_resistance_genes
├── assembly
├── citations.txt
├── cleaned_reads
├── genomes
├── genome_stats
├── mlst
├── pangenome
├── report_files.tgz
├── rRNA-genes
├── sequencing_assembly_report.txt
├── taxonomic_identification
├── tormes.log
├── tormes_report.html
└── virulence_genes

11 directories, 5 files

I've attached a full report of the tree command.
Thank you for your time,

tree_OUTPUT_735766.pbs02.zip

@biobrad
Copy link
Collaborator

biobrad commented Jul 6, 2023

Hi Ariasamin,
I have found the problem! (sorry i didn't pick it up the first time around)
For some reason (probably during some testing) the results folder is deleted in the script.

But this is how you can fix it.

The instructions involve using the 'nano' editor.
If you don't have it installed, you can install it with:

sudo apt install nano

then we need to edit the tormes script.
Type in the following:

nano /home/narges/.conda/envs/tormes_1.3.0.0/bin/tormes

then search for a line number by using the following keyboard shortcut

ctrl+shift+ -

type in line 1198 and hit enter

it should take you to:

$PARALLEL -j $CPUS -a $OUTWD/list.tmp --gnu rm -rf $OUTWD/genome_ordering/

change it so it has a # at the front

#$PARALLEL -j $CPUS -a $OUTWD/list.tmp --gnu rm -rf $OUTWD/genome_ordering/

then use the keyboard commands:

ctrl+x
then press y to confirm save

After this is done, you can run the pipeline again and it should work.

I did a test run and you can see that genome_ordering folder is now present:

harbj019@ssventer ~/genomeresults/referenceguided/refresults
$ tree -L 1
.
|-- annotation
|-- antimicrobial_resistance_genes
|-- assembly
|-- citations.txt
|-- cleaned_reads
|-- genome_ordering
|-- genome_stats
|-- genomes
|-- mlst
|-- rRNA-genes
|-- report_files.tgz
|-- sequencing_assembly_report.txt
|-- taxonomic_identification
|-- tormes.log
|-- tormes_report.html
`-- virulence_genes

11 directories, 5 files

harbj019@ssventer ~/genomeresults/referenceguided/refresults/genome_ordering/Sm3119/alignment7
$ ls -1
K279a.fasta
K279a.fasta.sslist
Sm3119.fasta
Sm3119.fasta.sslist
Sm3119_contigs.tab
alignment7
alignment7.backbone
alignment7.bbcols
alignment7.guide_tree

Let me know how you go.
I will check with the creator of tormes if there is any reason the code is configured that way before making a permanent change in the repository.

cheers
Brad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants