Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] Archive operation always release lock on the timeline when try lock failed #11104

Open
Ytimetravel opened this issue Apr 26, 2024 · 9 comments

Comments

@Ytimetravel
Copy link
Contributor

Dear community,
I have discovered an issue when using Hudi.If multiple archive tasks run on COW table and set "hoodie.archive.automatic=false", it may cause data problems. If set hoodie.archive.automatic=true, then this issue will not occur.
And then I find that if the archive operation try lock failed, it will always release lock(if exist).
image
image
image
I suspect that this lock release operation may have affected other normal operations.
Perhaps the problem could be avoided by doing it this way?
image
Looking forward to your valuable suggestions.

Hudi version :0.14.0

@danny0405
Copy link
Contributor

archive itself holds an transaction lock, so we need to release it in any case, what is the wrong case you have ecountered?

@Ytimetravel
Copy link
Contributor Author

@danny0405 Sorry, I don't remember the details of the problem (I will confirm with my colleagues and provide the results later), but based on the logic here, wouldn't it be better to attempt to release the lock only when acquiring it, otherwise is there a chance of mistakenly releasing the lock of other operations and causing problems?

@danny0405
Copy link
Contributor

otherwise is there a chance of mistakenly releasing the lock of other operations and causing problems?

That's reasonable, we should ensure the lock been acquired in the first place.

@beyond1920
Copy link
Contributor

@danny0405 @Ytimetravel The behavior of method TransactionManager#endTransaction itself is correct, it would check whether the current lock is hold by itself before it unlock. However, there is a bug in HoodieTimelineArchiver because archiving itself is not a transaction and does not correspond to any instant in timeline. When an exception occurs, it might mistakenly deletes locks held by others.
image
@Ytimetravel Would you like to fix this issue?

@Ytimetravel
Copy link
Contributor Author

@beyond1920 Yes, I am very willing to fix this issue.
image
This fix has already been tested and verified~

@danny0405
Copy link
Contributor

What is the general reason that the trasanction start of archival is failing?

@Ytimetravel
Copy link
Contributor Author

@danny0405 Failed to acquire lock.

@Ytimetravel
Copy link
Contributor Author

image

@danny0405
Copy link
Contributor

Okay, it would be great if you can fire a fix for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Awaiting Triage
Development

No branches or pull requests

4 participants