Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read Repair vs healing to satisfy ReplicaCount #55

Open
cobexer opened this issue Sep 26, 2020 · 3 comments
Open

Read Repair vs healing to satisfy ReplicaCount #55

cobexer opened this issue Sep 26, 2020 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@cobexer
Copy link

cobexer commented Sep 26, 2020

During an upgrade of my application, which uses olric embedded, all replicas get restarted in quick succession.
Most keys will never be read, or will be read very infrequently ... which with read repair means that after an upgrade of my application the cache will be immediately "empty".

That behavior means that olric doesn't actually provide useful functionality for my use case - unless I add code that reads the entire keyspace after startup to effectively repair the cache redundancy before letting Kubernetes know that the Pod is started successfully.

I believe that olric could do this internally more efficiently and I also believe that such a functionality would be generally useful:

From the documentation it seems that olric could "easily" know that a part of the keyspace doesn't satisfy the requested ReplicaCount and actively transfer the data to the newly joined member to repair the cache in case a node restarts.

So this is a request for:

  • when joining a cluster, ask it to transfer some data to the new node to satisfy ReplicaCount
  • provide an API to detect when this initial sync is finished so that the embedding application can communicate to the Kubernetes API when it is safe to continue with the rollout
  • detect node joins/departures fast enough to make such a rollout fast enough
  • useful node identity in the context of a Kubernetes cluster where IP-addresses are basically useless
@buraksezer buraksezer self-assigned this Sep 26, 2020
@wliuroku
Copy link

wliuroku commented Dec 18, 2020

Having same issue here. I have to to do a full read repair of all keys to trigger data transfer when new node joins. Do we know what version will we address this issue?

@hacktmz
Copy link

hacktmz commented Aug 30, 2021

Having same issue

@buraksezer
Copy link
Owner

Hi all,

I'm aware of this is one of the most wanted features among the users. I started working on a solution based on a technique called vector clock. It may be ready for initial tests in a couple of months. I plan to make it production-ready by the end of this year.

For anyone who is curious about version vectors, here is some info:

  1. https://riak.com/posts/technical/vector-clocks-revisited/index.html?p=9545.html
  2. https://haslab.wordpress.com/2011/07/08/version-vectors-are-not-vector-clocks/
  3. https://en.wikipedia.org/wiki/Vector_clock
  4. https://github.com/hazelcast/hazelcast/blob/master/docs/design/partitioning/03-fine-grained-anti-entropy-mechanism.md
  5. https://people.cs.rutgers.edu/~pxk/417/notes/logical-clocks.html

@buraksezer buraksezer added the enhancement New feature or request label Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants