Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools view with region filter drops variants at POS=0 (telomeres) #1571

Open
colin-nolan opened this issue Feb 24, 2023 · 5 comments · May be fixed by #1573
Open

bcftools view with region filter drops variants at POS=0 (telomeres) #1571

colin-nolan opened this issue Feb 24, 2023 · 5 comments · May be fixed by #1573
Assignees

Comments

@colin-nolan
Copy link

colin-nolan commented Feb 24, 2023

Given a bgzipped VCF that contains a telomere indicated by position 0 (as per the VCF spec):

$ zcat example.vcf.gz | grep -v '#'
22      0       ID1     C       G       .       PASS    .       GT      1/0
22      1       ID2     C       G       .       PASS    .       GT      1/0

(Disclaimer: I know next to nothing about telomeres, and have really only ended up here due to invalid, non-int positions getting converted to 0, as discussed in #1570. My example undoubtedly makes no realistic sense!)

A simple bcftools view works as expected:

$ bcftools view -H  example.vcf.gz
22      0       ID1     C       G       .       PASS    .       GT      1/0
22      1       ID2     C       G       .       PASS    .       GT      1/0

Applying a region filter however unexpectidly drops the telomere (albeit with a warning referring to a flag that I don't think exists in this context?):

$ bcftools view -H --regions 22 example.vcf.gz
[W::tbx_parse1] Coordinate <= 0 detected. Did you forget to use the -0 option?
22      1       ID2     C       G       .       PASS    .       GT      1/0
$ echo $?
0

It seems like the telomere should be included with this filter, given its chr; and I haven't encountered anywhere in the docs that indicates the observed behaviour. However, there certainly could be some biological reason why that doesn't make sense!

The way it gets dropped can cause an interesting issue when considred with #1570, as it means variants where (for whatever reason...) POS gets corruped, get converted to 0 and then dropped when a pipe goes on to have a region filter!

@pd3
Copy link
Member

pd3 commented Feb 24, 2023

What is the version of bcftools you are using? I am unable to reproduce with the latest

@colin-nolan
Copy link
Author

I'm seeing this with a build of the latest version (BCFtools fe98a6b54536246ce650ad5fde0b75e99bb22fa0; htslib 70df0479771fa28e927c01de4c5ca95f1e63ebba), on Ubuntu 22.04 with:

cd htslib
autoreconf --install
./configure
make

cd ../bcftools
autoreconf
./configure
make

Also the same with 1.17.

$ bcftools view -H --regions 22  example.vcf.gz
[W::tbx_parse1] Coordinate <= 0 detected. Did you forget to use the -0 option?
22      1       ID2     C       G       .       PASS    .       GT      1/0
$ bcftools --version
bcftools 1.17-4-gfe98a6b
Using htslib 1.17-1-g70df047
Copyright (C) 2023 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

@pd3
Copy link
Member

pd3 commented Feb 27, 2023

Just to confirm, I was able to reproduce the problem now

$ echo -e "##fileformat=VCFv4.2\n##contig=<ID=22>\n#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\n22\t0\tID0\tC\tG\t.\tPASS\t.\n22\t1\tID1\tC\tG\t.\tPASS\t." | bcftools view -o rmme.vcf.gz

$ bcftools index -f rmme.vcf.gz
[W::tbx_parse1] Coordinate <= 0 detected. Did you forget to use the -0 option?

$ bcftools --version
bcftools 1.17-2-gdb8d6573
Using htslib 1.17-1-g70df047

The BCF format works

$ echo -e "##fileformat=VCFv4.2\n##contig=<ID=22>\n#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\n22\t0\tID0\tC\tG\t.\tPASS\t.\n22\t1\tID1\tC\tG\t.\tPASS\t." | bcftools view -o rmme.bcf

$ bcftools index -f rmme.bcf

@pd3 pd3 transferred this issue from samtools/bcftools Feb 28, 2023
pd3 added a commit to pd3/htslib that referenced this issue Mar 1, 2023
Review of `bcftools merge` code and possibly others is necessary because it is relying
on uninitialized POS==-1 in synced_bcf_reader

This should resolve samtools#1571
@pd3 pd3 linked a pull request Mar 1, 2023 that will close this issue
pd3 added a commit to samtools/bcftools that referenced this issue Mar 1, 2023
pd3 added a commit to pd3/htslib that referenced this issue Mar 1, 2023
Review of `bcftools merge` code and possibly others is necessary because it is relying
on uninitialized POS==-1 in synced_bcf_reader

This should resolve samtools#1571
@pd3 pd3 reopened this May 31, 2024
@pd3
Copy link
Member

pd3 commented May 31, 2024

I don't think this has been completed

@colin-nolan
Copy link
Author

I don't think this has been completed

Apologies! I spent a few minutes yesterday follow up my old PRs/issues - but I was clearly too trigger happy here when I saw that commits had been added referencing this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants