You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I want to report a potentially problematic behaviour using pysam.fetch on AWS S3 bucket infrastructure. Using the following pseudo code on a Bam file in a S3 Bucket will create requests without a defined end range.
Code
with pysam.AlignmentFile(bamfile_S3,filepath_index=baifile_S3) as f: for r in f.fetch(chrom,start,end):
Request
This kind of 'open' request results in high egress costs because aws logs the whole file after the start byte as delivered, even if you stop reading the data at the end of your fetch coordinates.
Compared to the requests from IGV on S3 data (low egress costs, only the exact byte range is logged)
That looks unfortunate. We'll investigate and see if we can make these requests less open-ended. It may need a bit of rework to how our http requests work though so I can't be sure how long it will take.
Thx, initially I used pysam.fetch which created the problematic open end requests.
After some debugging I switched to pysam.view (more or less a wrapper around the samtools view cmd), this created clean range requests.
Hi @StephanHolgerD , did pysam.view end up producing clean range requests? I wonder if you would be able to share some code for how you implemented it as I am looking for a similar functionality and haven't found an easy solution!
Hi, I want to report a potentially problematic behaviour using pysam.fetch on AWS S3 bucket infrastructure. Using the following pseudo code on a Bam file in a S3 Bucket will create requests without a defined end range.
Code
with pysam.AlignmentFile(bamfile_S3,filepath_index=baifile_S3) as f:
for r in f.fetch(chrom,start,end):
Request
This kind of 'open' request results in high egress costs because aws logs the whole file after the start byte as delivered, even if you stop reading the data at the end of your fetch coordinates.
Compared to the requests from IGV on S3 data (low egress costs, only the exact byte range is logged)
Request
Initially I reported this here:
pysam-developers/pysam#1215
The text was updated successfully, but these errors were encountered: