Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When i use bwa for mapping with grch37.p13.fa and hg19.fa,there exists some differences in some regions. #410

Open
zhangshouwei309194 opened this issue Dec 28, 2023 · 0 comments

Comments

@zhangshouwei309194
Copy link

Dear author:
When i use bwa for mapping with grch37.p13.fa and hg19.fa,there exists some differences in some regions.
grch37.p13.fa: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/
grch37 p13
hg19.fa: https://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/
image
Next it is an example, i use the same command for two types of genome. For a SNP in chr1:206647742, the results is right for hg19. but it is not right for grch37.p13.fa.
hg19:
samtools mpileup -d 200000 -q 0 -r chr1:206647742-206647743 -f hg19.fa test1.markdup.bam
[mpileup] 1 samples in 1 input files
chr1 206647742 A 1316 G$G$G$G$G$GGGGGgggggGGGggggGGggGGGgggGGGGgggggggggggGGggggggGGGGGGGggGGgggGGGGGGGGGGGgggGGGGGGGGGGGGGGGGGggggGGGGGggggggGGGGGGGggggGGGGGgggGGGggggggggGgGgggGGGGGGGgggggggGGggggggGGGGgGGGGGGGGGGGGGGGGgGGGGGGGgGGggGGGgggGgggGGGGGGGGGGGGGGGGGGGGGGGgggGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGgGGGGGGGGGGGGGGGGGgGGGGGGggggGGGGGGGGGGgGGGGGGGGGGgggggggggGGGGGggggggggGGGGGGGGGGGGGGGggggGGGGGGGgggGGGGGGGGGGGGGGGgggGGGGGGGGGGGgGGGGGGGggGGGgggggGGGGGGGGGGGGGGggggGGGGGGGggGgggggggGGGgggggGGGGGGGGGGGGGGGGGGGGGGGGGGGGGggGGGGGGGGGGGGGGGGGGGGGGggGGGGGggGGGGGGGGGGgGGGGGGGGGGgggGGGGGGGGGGGGggggGGGgggGGGGGGGGGGGGGGgGGGGGggggGGGGGGGGggggGGGGGggGGGGGGGGGGGGGGGgggGGGGGGGGGGggggggggGGGGGGGGGGggGGGggggGGGGGgGGGGGGGGgggGgggggggGGggggggGGGGGGGGgGGGGGGGGGGgggGgggggGgGgGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGggGGGGGGGGGGGGggGGGGGGGGGGGGGGGGGGGGggGGGGGGGGGGGGGGGGGGGGGGgggGGGGGGGGGGGGGGggggGGgggGGGGGGGGGgGGGGGGGGGGGGGGGGGGGGGggGGGGGggGGGGGggGGGGggGGggggggGGGGGGGGGGGgggGGGGGGGGGGGGGGGGGgGGGGGGgGGGGGGGgGGGGGGGGGGGGGGgGGGGGGGGGgggggggGGGGGGGGGGGGGggGGGGGGGGGGgggggGGGGGGGGGGGGGGGGGGGGGGGGGggGGGGGGGGGGGGgGGGGGGGGGGGGGGGggGGGgGGGGGGGGGggggGGGGGGGGGGGGGGGGggGGGGGGGGGGGGGGGGGGGGGGGGGGGGGgGGGGGGGGGGGGggGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGgGGGGggGGGgGGGGGgGGGGGGGGGGGGGGGGGGGGGGgGGGGGgggGGGGGGGGGGGGggGggggggGGGgggGGGGGGGGGGGGgGGGGGGGGGGGGGGGGGggGGGgg FFFFFOOIO_:::::FFkFFFFkFFFkkFFFFkkkkFFFFF:FFFFFFFFFFFFFFkkkkFkFFkkFFFkkkkkkkFkkFFFFkkFFFFkFFk^FkFkFFFFFFkkkkFFFFFFFkkkFkkkFFFFkkkkkFF:kFkFFFFFFFFkFkFFFkFkF^FFFFFFFFFFFFFFFFFkkkkFFkkkkkFkFkkkFkkkFSkkkFkkFkkFFFkFFFFkFFFFkkkkkFkkkkkFFkkk^kkkkFFFFkkkkkFkkkkkFkkSkkkkkFkkkFkkSkFkkkkkFkkFkkkFkFkFkFkkkkFkkkkkkFFFFkkkkFkkkSkFkFkkFkkFFFFFFFFFFFFkkFkFFFFFFFFFFkSFFkk^kkFFkFkFFFFkkkFFFkFFFkkFFkkkkkkFkkkkFFF3kkkkkFFkF_FkkkkkkkFFkkkFFFFFFkkkkkkkFkFFkFFFFFkkkFkFkFFFFFFFFFFkkkFFFFFFFkkFFkkkkFkkFkkFkFkkkFkkFFFkFFkkkkkFFFFkFkFkkkkkFFkkFFkkFkFFFFkFkFFkkkkFkkkkkkkkFkFFFkkkFFkFFkFFkFFFFkkkFFFFFkkFkkkkkkkkkFFFFFkFFFFkkFkkkkFFFFFkFFkFFFkkFFkkFkkkkkkF:FFFkkFkkFkFkkFFFFFFFFkkFFFkkFFkFFkkFFFFFkFkkkFkkkFkkFkFFFFFFFFFFFkkFFFFFFkkkkFkFkFkFkFFFFFkkFFFkFFFFFkFFFFkFFkkkFkkFFFFFFFFFFkFFFkkFkFkkkFFFkFkkFFFkk_kFFFFkFFFFkFkkkkkkkFFFFFFFFFkFFFkFkkFFkkkFkFkkFkFFFkFkkFkFFFkFFkFFFFFFkFFFFkkFFkFkFFFFFFkkkkFFFkFkFkkkkkkFFFFFkF:FFkkFkFFFFFkFFFFFFFFFFFkFkkFFFkkFFFFkkFkFkkFFkFFFFkkFFkkkFkFFFFkFFkkFkkkkkkkkFFkFkkFFkFFkFkFFFFFFFFFkFkkFFFkkFkkFFFkFFFFFkkkFFFFFFkFFFkkFkFkFFkkkFkFFkFFFFFFFFFFkFkFFkFkFFFFFFFkFkkFFkkkFFFFkFFkkkkFkFkFFFFFFFkFFFFFFFFkFkFFFFFkFFFkkFFFkFkkkkkFkFFFFFkFFFFFFFkFFFFFFFFkFFkkkFFkkFFFFFkFFFFFFFFFkkFFFFkFkFFFkFFFFFFFkFF:FkFFFFFFFkFkFFFFFFFFFFFFkkFFkFkFFFkFFkFFFFFFFkFFFkFFFkFFkFFFFFFkFFFFFFFFkFkF:FFkkFk^FFFFkFFkFFFkFkFFFQ9Q99
chr1 206647743 G 1322 .$.$.$.$.,,,,,...,,,,..,,...,,,....,,,,,,,,,,,..,,,,,,.......,,..,,,...........,,,.................,,,,.....,,,,,,.......,,,,.....,,,...,,,,,,,,.,.,,,.......,,,,,,,..,,,,,,....,................,.......,..,,,...,,,.,,,.......................,,,...................................,,.................,......,,,,..........,..........,,,,,,,,,.....,,,,,,,,...............,,,,.......,,,...............,,,.$..........,.......,,...,,,,,..............,,,,.......,,.,,,,,,,,...,,,,,.............................,,......................,,.....,,..........,..........,,,............,,,,...,,,..............,.....,,,,........,,,,.....,,...............,,,..........,,,,,,,..........,,...,,,,.....,........,,,.,,,,,,,..,,,,,,........,..........,,,.,,,,,.,.,.................................,,...........,,,....................,,......................,,,..............,,,,,..,,,.........,.....................,,.....,,.....,,....,,..,,,,,,...........,,,.................,......,.......,..............,.........,,,,,,,.............,,..........,,,,,.........................,,............,...............,,...,.........,,,,................,,.............................,...........,,........................................,....,,,...,.....,......................,.....,,,............,,.,,,,,,...,,,............,.................,,...,,^].^].^].^].^].^].^].^], OOIO_:::::FFkFFFFkFFFkkFFFFkkkkFFFFFFFFFFFFFFFFFFFFkkkkFkFFkkFFFkkkkkkkFkkFFFFkkFFFFkFFk^FkFkF:FFFFkkkkFFFFFFFkkkFkkkFFFFkkkkkFFFkFkFFFFFFFFkFkFFFkFkF^FFFFFFFFFFFFFFFFFkkkkFFkkkkkFkFkkkFkkkFSkkkFkkFkkFFFFkFFFFkFFFFkkkkkFkkkkkFFkkk^kkkkFFFFkkkkkFkkkkkFkkSkkkkkFkkkFkkSkFkkkk_FFkkFkkkFkFkFkFkkkkFkkkkkkFFFFkkkkFkkkSkFkFkkFkkFFFFFFFFFFFFkkFkFFFFFFFFFFkSFFkk^kkFFkFkFFFFkkkFFFkFFFkkFFkkkkkkFkkkkFFF3kkkkkFFkFkFkkkkkkkFFkQkFFFFFFkkkkkkkFkFFkFFFFFkkkFkFkFFFFFFFFFFFkkkFFFFFFFkkFFkkkkFkkJkkFkFkkkFkkFFFkFFkkkkkFFFFkFkFkkkkkFFkkFFkkFkFFFJkFkFFkkkkFkkkkkk_kFkFFFkkkFFkFFkFFkFFFFkkkFFFFjkkFkkkkkkkkkFFFFFkFFFFkkFkkkkFFFFFkFFkFFFkkFFkkFkkkkkkFFFFFkkFkkFkFkkFFFFFFFkkFFFkkFFkFFkkFFFFFk:kkkFkkkFkkFkFFFFFFFFFFFkkFFFFFFkkkkFkFkFkFkFFFFFkkFFFkFFFFFkFFFFkFFkkkFkkFFFFFFFFFFkFFFkkFkFkkkFFFkFkkFFFkkkFF>FFkFFFFkFkkkkkkkFFFFFFFFFkFFFkFkkFFkkkFkFkkFkFFFkFkkFkFFFkFFkFFFFFFFk:FFFk_FFkFkFFFFFFkkkkFFFkFkFkkkkkkFFFFFkFFFFkkFkFFFFFkFFFFFFFFFFFkFkkFFFkkFFFFkkFkFkkFFkFFFFkkFFkkkFkFFFFkFFkkFkkkkkkkkFFkJkkFFkFFkFkFFFFFFFFFkFkkFFFkkFkkFFFkFFFFFkkkFFFFFFkFFFkkFkFkFFkkkFkFFkFFFFFFFFFFkFkFFkFkFFFFFFFkFkkFFkkkFFFFkFFkkkkFkFkFFFFFFFkJFFFFFFFkFkFFFFFkFFFkkFJFkFkkkkkFkFFFFFkFFFFFFFkFFFFFFFkFFkkkFFkkFFFFFkFFFFFFFFFkkFFFFkFkFFFkFFFFFFFkFFFFFkFFFFFF:kFkFFFFFFjFFFFFkkFFkFkFFFkFFkFFFFFFFkFFFkFFFkFFkFFFFFFkFFFFFFFFkFkFFF:kkFk^FFFFkFFkFFFkFkFFFQ9Q99iEEiiEEE

grch37.p13:
samtools mpileup -d 200000 -r 1:206647742-206647743 -q 0 -f GRCh37.p13.genome.fa test.markdup.bam
[mpileup] 1 samples in 1 input files
1 206647742 A 1 T F
1 206647743 G 6 .^!.^!.^!.^!.^!, FiEiiE

Then i extract the expanding 500 bp bases left or right from the two genomes and align them:

samtools faidx hg19.fa chr1:206647242-206648242

chr1:206647242-206648242
tgcagtgagctgagatcttgacactgcactccagcctgggtgacagagcgaggctccgtc
tcaaaaaaaaaaaaaaaaaaaaaaaagaaTTGGAGCCATACAGACCAGGTTCCAATCCCT
TCCCTGCTGCTAACCCCAGGGAGTGTTAGCTGCCCTGTGATGATTGTCAATAGCAATTGT
AATAATGACAACAAGCCATCCCCTGCAGAAGATCAGAGTGTCAGGATCTTGTCACCTCCC
AGTGCTGGACTCTCTACCCCTTGAGAGGGAAAGGCGGTGCGGATGGGAGCCCCCATCCAA
CCAGGCTAATCTCTGGGGTTGGGCTGGCCGGAGAGGCTGAATGGAGGCCCAGGAGAGGGT
GGCTGCTCCCCTGTGGGAGTGGGACATGTGCTAATCCCATGCTGTCTCCCACTGCTCCCT
CCCCAATGGCAGAAATCCGGAGAGCTGGTTGCTGTGAAGGTCTTCAACACTACCAGCTAC
CTGCGGCCCCGCGAGGTGCAAGTGAGGGAGTTTGAGGTCCTGCGGAAGCTGAACCACCAG
AACATTGTCAAGCTCTTTGCGGTGGAGGAGACGGTAGGTCCGGTGCTTGGTCAGAGAATG
GTCTTGTCCTTGACCCTTATGGTCTGGGGAGAATCAGGCCACATGATAACAGAGATTTGG
TCCCATGCTCATCAGCAGGTCAGAGACAGCAGGCAAATTGCAGAAGGGAGCAAAGGGGGC
AAGGGGGTGGGGGCGGTGCACTGGAAAGGAACGATGGACAGAATCAGTACCTAAGCAGAG
GGCTTCCTGGAATAACTGACTTTGGATTCCAGTGTGCGGGATCAGTGTGAGGCCAAGGAG
GGAAGGCCAGGCCAGAAGCTGGGACCTGGAGAATGGGGGCTCTGGGCTCCAGGCTGAGCC
ACTTCTTCCTGGTGGGTGGGGAGGAGAAGTGCCGTCCTCATGAGCCCCTCTCTGTCCCAC
CCATAGGGCGGAAGCCGGCAGAAGGTACTGGTGATGGAGTA

samtools faidx GRCh37.p13.genome.fa 1:206647242-206648242

1:206647242-206648242
TGCAGTGAGCTGAGATCTTGACACTGCACTCCAGCCTGGGTGACAGAGCGAGGCTCCGTC
TCAAAAAAAAAAAAAAAAAAAAAAAAGAATTGGAGCCATACAGACCAGGTTCCAATCCCT
TCCCTGCTGCTAACCCCAGGGAGTGTTAGCTGCCCTGTGATGATTGTCAATAGCAATTGT
AATAATGACAACAAGCCATCCCCTGCAGAAGATCAGAGTGTCAGGATCTTGTCACCTCCC
AGTGCTGGACTCTCTACCCCTTGAGAGGGAAAGGCGGTGCGGATGGGAGCCCCCATCCAA
CCAGGCTAATCTCTGGGGTTGGGCTGGCCGGAGAGGCTGAATGGAGGCCCAGGAGAGGGT
GGCTGCTCCCCTGTGGGAGTGGGACATGTGCTAATCCCATGCTGTCTCCCACTGCTCCCT
CCCCAATGGCAGAAATCCGGAGAGCTGGTTGCTGTGAAGGTCTTCAACACTACCAGCTAC
CTGCGGCCCCGCGAGGTGCAAGTGAGGGAGTTTGAGGTCCTGCGGAAGCTGAACCACCAG
AACATTGTCAAGCTCTTTGCGGTGGAGGAGACGGTAGGTCCGGTGCTTGGTCAGAGAATG
GTCTTGTCCTTGACCCTTATGGTCTGGGGAGAATCAGGCCACATGATAACAGAGATTTGG
TCCCATGCTCATCAGCAGGTCAGAGACAGCAGGCAAATTGCAGAAGGGAGCAAAGGGGGC
AAGGGGGTGGGGGCGGTGCACTGGAAAGGAACGATGGACAGAATCAGTACCTAAGCAGAG
GGCTTCCTGGAATAACTGACTTTGGATTCCAGTGTGCGGGATCAGTGTGAGGCCAAGGAG
GGAAGGCCAGGCCAGAAGCTGGGACCTGGAGAATGGGGGCTCTGGGCTCCAGGCTGAGCC
ACTTCTTCCTGGTGGGTGGGGAGGAGAAGTGCCGTCCTCATGAGCCCCTCTCTGTCCCAC
CCATAGGGCGGAAGCCGGCAGAAGGTACTGGTGATGGAGTA

########################################

Program: needle

Rundate: Thu 28 Dec 2023 01:45:15

Commandline: needle

-auto

-stdout

-asequence emboss_needle-I20231228-014512-0468-51177144-p1m.asequence

-bsequence emboss_needle-I20231228-014512-0468-51177144-p1m.bsequence

-datafile EDNAFULL

-gapopen 10.0

-gapextend 0.5

-endopen 10.0

-endextend 0.5

-aformat3 pair

-snucleotide1

-snucleotide2

Align_format: pair

Report_file: stdout

########################################

#=======================================

Aligned_sequences: 2

1: 206647242-206648242

2: 206647242-206648242

Matrix: EDNAFULL

Gap_penalty: 10.0

Extend_penalty: 0.5

Length: 1001

Identity: 1001/1001 (100.0%)

Similarity: 1001/1001 (100.0%)

Gaps: 0/1001 ( 0.0%)

Score: 5005.0

They are exactly the same. I don't know why there exist so much differences.
For each genomes, i found in most regions,the variation for SNV/InDel is the same, but it exists some difference in some regions. I don't know how to resolve this problem. Because grch37 and hg19 in most regions is the same. And in the exactly the same region,the alignment have such difference, as described above.
Look foward to your reply! Thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant