Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merges sometimes do lots of work even after being aborted #13354

Open
DaveCTurner opened this issue May 9, 2024 · 0 comments
Open

Merges sometimes do lots of work even after being aborted #13354

DaveCTurner opened this issue May 9, 2024 · 0 comments
Labels

Comments

@DaveCTurner
Copy link

DaveCTurner commented May 9, 2024

Description

We see some Lucene indices taking many seconds (occasionally minutes) to abort merges during rollback, doing a lot of now-pointless IO, with the merge thread spending all its time within a call to one of the various checkIntegrity methods that reads a file from beginning to end. For instance:

    ⋮
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.BufferedChecksumIndexInput.readBytes(BufferedChecksumIndexInput.java:46)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.DataInput.readBytes(DataInput.java:73)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.ChecksumIndexInput.skipByReading(ChecksumIndexInput.java:79)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.ChecksumIndexInput.seek(ChecksumIndexInput.java:64)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:619)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader.checkIntegrity(Lucene90CompressingStoredFieldsReader.java:725)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.merge(Lucene90CompressingStoredFieldsWriter.java:609)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:234)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger$$Lambda/0x00000080029f0c00.merge(Unknown Source)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:273)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:110)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5252)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4740)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6541)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:118)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)

Also here, although this is using a org.elasticsearch.index.codec.postings.ES812PostingsReader it doesn't look to be doing anything different from e.g. org.apache.lucene.codecs.lucene99.Lucene99PostingsReader#checkIntegrity:

    ⋮
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.BufferedChecksumIndexInput.readBytes(BufferedChecksumIndexInput.java:46)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.DataInput.readBytes(DataInput.java:73)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.ChecksumIndexInput.skipByReading(ChecksumIndexInput.java:79)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.store.ChecksumIndexInput.seek(ChecksumIndexInput.java:64)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:619)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.codec.postings.ES812PostingsReader.checkIntegrity(ES812PostingsReader.java:1975)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsReader.checkIntegrity(Lucene90BlockTreeTermsReader.java:338)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.checkIntegrity(PerFieldPostingsFormat.java:370)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.perfield.PerFieldMergeState$FilterFieldsProducer.checkIntegrity(PerFieldMergeState.java:296)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:83)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:205)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:209)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger$$Lambda/0x0000007802b71ab8.merge(Unknown Source)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:298)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5252)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4740)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6541)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
    app/org.elasticsearch.server@8.15.0/org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:118)
    app/org.apache.lucene.core@9.10.0/org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)

The data in these cases is in rather cold storage so we would expect it to take quite some time (possibly minutes) to complete this end-to-end read. That's ok, we don't need such merges to complete especially quickly, but it is rather troublesome that it takes so long to react to the abort signal in these situations. Is there something we can do to abort this read more promptly? For instance, could we add an abort-sensitive wrapper to the DataInput that's reading the data?

Version and environment details

Lucene 9.10 embedded in Elasticsearch (here the main branch, currently targetting 8.15.0-SNAPSHOT) but this behaviour does not seem to be at all new.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant