Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 5724: LTS: Expose new utility methods for implementing admin tools. #6725

Open
wants to merge 54 commits into
base: master
Choose a base branch
from

Conversation

AJadhav29
Copy link
Contributor

@AJadhav29 AJadhav29 commented May 4, 2022

Change log description
LTS - Expose utility methods for admin command.
Expose following utility methods for implementing admin commands.
-Evict metadata cache
-Evict read index cache
-Evict read index cache for segment
-Check chunk segment storage sanity

Implement the following commands for respective utility methods listed above:

  • segmentstore evict-meta-data-cache [container-id] [segmentStore-endpoint]
  • segmentstore evict-read-index-cache [container-id] [segmentStore-endpoint] [fully-qualified-segment-name]
  • segmentstore check-chunk-sanity [container-id] [qualified-chunk-name] [data-size] [segmentStore-endpoint]

Note: [fully-qualified-segment-name] is an optional parameter EvictReadIndexCacheCommand

Purpose of the change
Fixes #6724

What the code does
CLI changes:
Added following commands which makes the call to the segment store using the AdminSegmentHelper :

  • EvictMetaDataCacheCommand
  • EvictReadIndexCacheCommand
  • CheckChunkSanityCommand

How to verify it
All tests must pass.

AJadhav29 and others added 22 commits March 30, 2022 11:16
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
@AJadhav29 AJadhav29 marked this pull request as ready for review May 6, 2022 19:08
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
AJadhav29 and others added 5 commits December 15, 2022 13:21
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
@sachin-j-joshi
Copy link
Contributor

@AJadhav29 There are several non-code files like jpeg and css that somehow got checked in.
Please fix it.

Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
AJadhav29 and others added 3 commits January 10, 2023 00:53
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
return new CommandDescriptor(COMPONENT, "check-chunk-sanity", "Check sanity of the given chunk with range of operations performed on it.",
new ArgDescriptor("segmentstore-endpoint", "Address of the Segment Store we want to send this request."),
new ArgDescriptor("container-id", "The container Id of the Segment Container for which sanity operation is performed."),
new ArgDescriptor("qualified-chunk-name", "Fully qualified name of the Chunk to perform sanity operations like (e.g., create, check if exists, write, read, delete)."),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you "delete" a chunk as part of the sanity test? Isn't this dangerous?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, we can't as it would lead to failing in creating a new chuck. But, after the creation fails, we do delete it irrespective of anything. Also, we randomly generate a unique name for chunk each time it is created which makes the operation much easier.

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.TimeUnit;

public class EvictMetaDataCacheCommand extends StorageCommand {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we need to evict SLTS metadata? What problem that we have observed can this command help us to fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the storage metadata is modified using admin tool, it still won't reflect in a running container if the metadata is cached by SLTS. These two commands provide way to drain/empty the cache and therefore force SLTS to reload metadata from table store/LTS. These commands are handy for evicting SLTS metadata cache whenever we want.

AJadhav29 and others added 3 commits January 11, 2023 15:44
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Copy link
Contributor

@sachin-j-joshi sachin-j-joshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needs minor fixes. But otherwise looking okay.

Signed-off-by: Aditi Jadhav <Aditi.Jadhav@dell.com>
Copy link
Contributor

@sachin-j-joshi sachin-j-joshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Please address remaining comments from others.

Copy link
Contributor

@RaulGracia RaulGracia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have 3 main concerns about the PR as it is right now:

  • This PR is adding a new API to the Segment Container called SegmentAdminApi which is intended to encapsulate the administration operations against the container. While this may be good, I remember that there was another attempt to do the same, and in this case the interface used was StreamSegmentStore. In that case, the flushToStorage() operation was added to the Segment Container as part of the Admin CLI process to recover a Segment Container from LTS. I would prefer us to think about a way to cleanly encapsulate the existing and future administration commands in the Segment Store in a single location, as otherwise the code will get even more complex.
  • I think I understand the usefulness of the checkChunkSanity() method, as it is mainly to show that from the Segment Store point of view, we can operate against LTS normally. I also wonder if this could be implemented as part of the health check mechanism, which would let us know at any moment whether LTS is working fine or not. What would be the advantages of having this CLI command to check LTS health status compared to integrating it as part of the Segment Store health check?
  • I don't understand the purpose of the other functions: evictStorageMetaDataCache(), evictStorageReadIndexCache(), and evictStorageReadIndexCacheForSegment(). It is not that we cannot add them. Is that I have not yet seen the justification of why we need these. I have been involved in some repair work of Pravega clusters and I have not yet seen any scenario in which we missed these commands to recover the Pravega cluster. Some examples or arguments supporting why we need these would be great to have for us to understand the motivation of these new functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LTS: Expose new utility methods for implementing admin tools.
5 participants