Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: support sampling record users to read specific information into detail log file #1853

Open
ninsmiracle opened this issue Jan 15, 2024 · 0 comments
Labels
type/enhancement Indicates new feature requests

Comments

@ninsmiracle
Copy link
Contributor

Feature Request

Is your feature request related to a problem? Please describe:
When we operat and maintain pegasus,there are many user give us feedback that they need to know the specific hash_key and sort_key and which client read it.
There is no doubt that ,this feature incurs a significant computational cost. As we need to add relevant logic in the main flow of the read operation, we have to handle it with caution. We should be able to dynamically configure the sampling rate, such as 1/10000. This means that on average, only one out of every 10000 reads will be recorded in Detail LOG.
This feature will help some users better understand which of their data has been read and which data is redundant.They may no need to write unecessary data any more. This will help them reduce the amount of online write traffic and the storage capacity of Pegasus.
In addition, we can also configure a threshold. When the size of a key or value is greater than this threshold, the key-value pair will be recorded. This will help us notify users to improve their data in order to achieve better read performance.

Describe the feature you'd like:

In my opinion,there are 5 parameters that we could config:

  • Log path
    The path of detail log file can be configured by the user. These logs are independent of the main path currently used by Pegasus.

  • Sampling function switch
    When we don't need to use this feature, it should be possible to dynamically turn it off.

  • Sampling Rate
    Set a certain sampling rate. Each time a get or multi_get operation is performed, there is a certain probability of being recorded, instead of logging every time.

  • Filter size
    A built-in bloom filter is included, mainly to reduce the size of the generated special logs. The size of the filter can be configured to prevent excessive memory usage.

  • sampling status check time
    Periodically check if there are any changes in the sampling status.

@ninsmiracle ninsmiracle added the type/enhancement Indicates new feature requests label Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Indicates new feature requests
Projects
None yet
Development

No branches or pull requests

1 participant