Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporarily disable file ingestion and LockWAL combination #12642

Closed

Conversation

jowlyzhang
Copy link
Contributor

As titled. A proper fix should probably be failing file ingestion if the DB is in a lock wal state as it promises to "Freezes the logical state of the DB".

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@hx235
Copy link
Contributor

hx235 commented May 10, 2024

Is this why sequence number can be advanced under LockWAL()?

@jowlyzhang
Copy link
Contributor Author

Is this why sequence number can be advanced under LockWAL()?

Yeah, that's my guess, we are discussing more context on the task T187711884.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

@jowlyzhang jowlyzhang force-pushed the disable_file_ingestion_for_lockwal branch from aae90f1 to c0a5489 Compare May 11, 2024 00:11
@facebook-github-bot
Copy link
Contributor

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pdillinger
Copy link
Contributor

Is this why sequence number can be advanced under LockWAL()?

Btw, the documented function of LockWAL is just my attempt at reverse engineering, because the original author did not document its purpose or function. It's possible that certain logical state changes should be allowed. 🤷‍♂️

@jowlyzhang
Copy link
Contributor Author

Btw, the documented function of LockWAL is just my attempt at reverse engineering, because the original author did not document its purpose or function. It's possible that certain logical state changes should be allowed. 🤷‍♂️

Right, I can sync with MyRocks to see if this is a legitimate combination and decide if we should test LockWAL with more restrictive ways in stress test or work on a fix that disallow this combination.

@@ -858,6 +858,9 @@ def finalize_and_sanitize(src_params):
elif (dest_params.get("use_put_entity_one_in") > 1 and
dest_params.get("use_timed_put_one_in") == 1):
dest_params["use_timed_put_one_in"] = 3
# TODO: renable this when file ingestion is failed during wal lock.
if dest_params.get("lock_wal_one_in") != 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most configurations, lock_wal_one_in is always != 0. I'm not sure there are any configurations remaining with a chance of testing ingestion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! I updated this to randomly disable one of the two features so that they do not run together, but can both get tested.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nice solution, thanks. I'll stamp the internal diff on re-import.

Unit test update I was working on is in #12652. On pause while we're making sure we understand the requirements

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jowlyzhang jowlyzhang changed the title Temporarily disable file ingestion if LockWAL is tested Temporarily disable file ingestion and LockWAL combination May 13, 2024
@facebook-github-bot
Copy link
Contributor

@jowlyzhang merged this pull request in 1567d50.

@jowlyzhang jowlyzhang deleted the disable_file_ingestion_for_lockwal branch May 15, 2024 17:47
pdillinger added a commit to pdillinger/rocksdb that referenced this pull request May 20, 2024
Summary: We recently noticed that some memtable flushed and file
ingestions could proceed during LockWAL, in violation of its stated
contract. (Note: we aren't 100% sure its actually needed by MySQL, but
we want it to be in a clean state nonetheless.)

Despite earlier skepticism that this could be done safely (facebook#12666), I
found a place to wait to wait for LockWAL to be cleared before allowing
these operations to proceed: WaitForPendingWrites()

Test Plan: Added to unit tests. Extended how db_stress validates LockWAL
and re-enabled combination of ingestion and LockWAL in crash test, in
follow-up to facebook#12642

Ran blackbox_crash_test for a long while with relevant features
amplified.

Suggested follow-up: fix FaultInjectionTestFS to report file sizes
consistent with what the user has requested to be flushed.
facebook-github-bot pushed a commit that referenced this pull request May 21, 2024
Summary:
We recently noticed that some memtable flushed and file
ingestions could proceed during LockWAL, in violation of its stated
contract. (Note: we aren't 100% sure its actually needed by MySQL, but
we want it to be in a clean state nonetheless.)

Despite earlier skepticism that this could be done safely (#12666), I
found a place to wait to wait for LockWAL to be cleared before allowing
these operations to proceed: WaitForPendingWrites()

Pull Request resolved: #12652

Test Plan:
Added to unit tests. Extended how db_stress validates LockWAL
and re-enabled combination of ingestion and LockWAL in crash test, in
follow-up to #12642

Ran blackbox_crash_test for a long while with relevant features
amplified.

Suggested follow-up: fix FaultInjectionTestFS to report file sizes
consistent with what the user has requested to be flushed.

Reviewed By: jowlyzhang

Differential Revision: D57622142

Pulled By: pdillinger

fbshipit-source-id: aef265fce69465618974b4ec47f4636257c676ce
konstantinvilin pushed a commit to konstantinvilin/rocksdb that referenced this pull request May 22, 2024
…12652)

Summary:
We recently noticed that some memtable flushed and file
ingestions could proceed during LockWAL, in violation of its stated
contract. (Note: we aren't 100% sure its actually needed by MySQL, but
we want it to be in a clean state nonetheless.)

Despite earlier skepticism that this could be done safely (facebook#12666), I
found a place to wait to wait for LockWAL to be cleared before allowing
these operations to proceed: WaitForPendingWrites()

Pull Request resolved: facebook#12652

Test Plan:
Added to unit tests. Extended how db_stress validates LockWAL
and re-enabled combination of ingestion and LockWAL in crash test, in
follow-up to facebook#12642

Ran blackbox_crash_test for a long while with relevant features
amplified.

Suggested follow-up: fix FaultInjectionTestFS to report file sizes
consistent with what the user has requested to be flushed.

Reviewed By: jowlyzhang

Differential Revision: D57622142

Pulled By: pdillinger

fbshipit-source-id: aef265fce69465618974b4ec47f4636257c676ce
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants