-
Notifications
You must be signed in to change notification settings - Fork 698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variant calls after local realignment : is it the most accurate ? #763
Comments
Hi, Thanks for pointing this out! We are going to investigate this issue more closely and provide updates as we make progress. Thank you for providing links to the files, those are extremely helpful. |
Local realignment algorithm works as following:
To answer your question. Reads can move either direction as a result of the realignment. To investigate it further it would be helpful to see what haplotype was created as a result of the realignment. Would you be able to provide the reference file that you used? |
Thank you for these explanations. |
Thank you for providing the reference. We were able to reproduce the issue locally. |
Hi @Fred-07, As @akolesnikov mentions, the reason that the reads are re-aligned in this way has to do with some fundamental aspects of how alignment penalties work. The alignment method, Smith Waterman, has alignment score penalties for gap open and gap extend. These penalties make the alignments you see as scoring higher than ones that open and extend a gap early in the repeat. Those alignment penalties are determined from looking at mismatch, insertion, and deletion rates across large amounts of homologous sequences. They work very well for high sequence complexity regions which have standard types of DNA replication error. In this highly repetitive context, the ways that sequence (specifically repeats) can be extended or deleted is a bit different. It might, in theory, be possible to derive better gap extend scores for regions like this, but this is both very hard and a more fundamental problem. I don't think this case is one we can easily solve without very extensive work. I hope that the @akolesnikov suggestion of turning off realigner in regions of interest will be satisfactory. Thank you, |
Dear all,
I would like to share an observation with you.
Deepvariant reported a lot of variants in a region where no reads are mapped.
It is the consequence of Deepvariant local realignment as expected.
But the realigned reads do not seem to belong to this region : the orignal alignment shows less variants than the local realignment.
The genome contains a big region repeated several times (see on the screenshots below). The repeats could be the reason why some reads are moved to the left. The given example is for X chromosome but another region in chromosome 1 (FLG2 gene region) showed a lot of variants.
Why the reads are moved to the left?
Are the reads moved to the most possible left position?
Original alignment and detected variants (bwa-mem)
Local realignment made by DeepVariant (X_140993145_140994144)
Example of reads that were moved:
Original alignment
NB501857:464:HH7FWBGXV:2:23210:26812:14806 99 X 140993994 50 79M = 140994064 149 CCAGATTCCTGTGAGCCGCTCCTTCTCCTCCACTTTAGTGAGTCTTTTCCAGAGTTCCCCTGAGAGAACTCAGAGTACT AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEE<EEEEEEEEEEEEAEEEEE XA:Z:X,+140993784,79M,2; PG:Z:MarkDuplicates AS:i:74 XS:i:69 MD:Z:17C61 NM:i:1 RG:Z:DM_23_2198
NB501857:464:HH7FWBGXV:2:23210:26812:14806 147 X 140994064 57 79M = 140993994 -149 CAGAGTACTTTTGAGGGTTTTCCCCAGTCTCCTCTCCAGATTCCTGTGAGCTCCTCCTCCTCCTCCACTTTATTGAGTC AAEEEEEEEEEEEEEE<66EEEEEEEEEE/EAEEEEEEEE6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA XA:Z:X,-140994589,50M3D29M,4; PG:Z:MarkDuplicates AS:i:79 XS:i:67 MD:Z:79 NM:i:0 RG:Z:DM_23_2198
Local realignment
X:140993145-140994144/ X_140993145_140994144realigned_reads.bam X_140993145_140994144realigned_reads.bam.bai
frmascla@frt:DeepV-TEST$ samtools view Local/X_140993145_140994144realigned_reads.bam | grep NB501857:464:HH7FWBGXV:2:23210:26812:14806
NB501857:464:HH7FWBGXV:2:23210:26812:14806 99 X 140993784 50 79M = 140994064 149 CCAGATTCCTGTGAGCCGCTCCTTCTCCTCCACTTTAGTGAGTCTTTTCCAGAGTTCCCCTGAGAGAACTCAGAGTACT AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEE<EEEEEEEEEEEEAEEEEE
NB501857:464:HH7FWBGXV:2:23210:26812:14806 19 X 140993854 57 79M = 140993994 -149 CAGAGTACTTTTGAGGGTTTTCCCCAGTCTCCTCTCCAGATTCCTGTGAGCTCCTCCTCCTCCTCCACTTTATTGAGTC AAEEEEEEEEEEEEEE<66EEEEEEEEEE/EAEEEEEEEE6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA
Original alignment, bam file
https://www.dropbox.com/scl/fi/c9tc01sdtf2sroxj3u3bj/original_alignment.bam?rlkey=jgxnyhyse2ekcu6t1s3l3lnnl&dl=0
Local realignment, bam file
https://www.dropbox.com/scl/fi/oqhny0s7h9hu3zcyrprig/X_140993145_140994144realigned_reads.bam?rlkey=zmbon72t19vjlcdht1zt6m5xg&dl=0
Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.6/docs/FAQ.md:
Yes
Setup
The text was updated successfully, but these errors were encountered: