-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remote write 2.0 - benchmarking framework; integrated with with prombench #13995
Comments
Hello @cstyan, my name is Olorunfemi Daramola a software engineer. I'm interested in this project, is it still open? |
There's a bit of work ongoing already but nothing's finished yet. This is also open as a project for the upcoming LFX mentorship session. |
Since it's on LFX, could you send the link so i can apply as a mentee? |
This is the general LFX website: https://lfx.linuxfoundation.org/tools/mentorship/ IIRC applications for the summer session aren't open yet. |
I wanted the exact link to apply for this project, but since you said applications aren't opened yet, would you do me a favor and update this thread with the link when it is? |
Hi @cstyan my name is Avigyan Sinha, I am pretty interested in this project for lfx, could you recommend some resources to get satarted with this? |
hi @cstyan my name is Emeka Uzowulu, I am interested in this task and would love to participate |
I have been thinking about this idea, specifically about how we can extend Avalanche and had a few thoughts.
In the TSDB OOO feature talk given in 2022 there are a number of claims on how OOO samples affect performance. Like % of active series getting OOO samples, the cost being more on CPU than memory then finally no noticeable difference on read/write speeds. I personally dislike it when a talk makes a number of claims and a few years down the line there is no available code to test that claim and its very difficult to test it yourself. Maintaining an elaborate test suite for every claim is difficult but a versatile benchmarking system with examples can be extremely useful to users so that they can test it themselves. |
Nice @Ellipse0934 , feel free to help with avalanche issues and if you will be consistent then I am sure maintainers would love to have you in officially too. Per 2. What the idea exactly? Is the idea to convert Avalanche into PRW capable receiver? Could be done I guess, but it could be equally separate project/repo e.g. planned to start sth, never finished. Given small activity on Avalanche I think it would make sense to NOT increase avalanche scope, but to create dedicated small tool in Prometheus ecosystem. We have kind of similar in compliance test - https://github.com/prometheus/compliance/tree/main/remote_write_sender - dedicated, small receiver. What would be productive is perhaps some exact plan / design on the benchmark (the end result). Given that it will be easier to decide where to put what functionality, perhaps even starting with something small/tailored in prombench. |
I mean that allow more granular control over Avalanche. For example, right now although we can set the number of series, labels,.etc I feel its still too coarse. I want to support a larger number of patterns. So the perf effect can more easily be measured, if someone wants to specifically test for OOO prom perf then that should be possible. If someone wants to test only a specific compression algorithm for their workload it should be possible to make such a test happen without too much effort.
No, I was not thinking on these lines. For now I think that passing in flags/additional-prw config into
This is a good idea. I'll do a PoC and get back rather than trying to pull attention towards Avalanche. |
Note that some work here will be done via the related LFX project, I believe LFX selection notifications go out next week on the 12th PST.
An oversight on my part, I assumed that since I was in
Not sure what you mean here? It already runs forever (in essence), since it will restart if there is an error or config reload, but otherwise keeps running until prometheus itself shuts down. What are the proposed use cases for having avalanche as a library? |
My bad, I mean Avalanche's remote write mode. This was also pointed out in Avalanche#41 |
I see. I had no envisioned using anything like a "remote write directly from avalanche" mode as a way of benchmark testing remote write itself, just using avalanche as a way of generating various scrape loads worth of data to stress different parts of remote write. |
Okay, so you meant only adding capability to Avalanche to generate more realistic data ? Then it's scraped by prom and sent to another prom as a part of e2e benchmark suite ? Whether or not remote write is used directly from Avalanche this(different data patterns) capability is important and should be first in any case. Currently with
Which looks non-representative of a realistic workload. test-infra/tools/fake-webserver Looks better.
I still feel that benchmarks should be reproducible and hence either data or generator scripts should be a part of the toolkit. When I used the phrase "making avalanche more versatile/scalable" I was first thinking of writing a small DSL to generate data. capacity: 3000
max_shards: 10
name: 'test'
writers:
- name: 'http_requests_total'
type: counter
value:
rate:
between: [30, 50]
- name: 'cpu_temp'
type: gauge
value:
bounds: [0, 110]
rate: [-2,4]
avg: 65
- name: 'DB_error'
type: counter
value:
inc:
expr: t, 300, 'div', 0, 'max', 200, 'min'] # reverse polish notation, t = time But then I felt it will still be hard to model many problems where a general purpose programming language is much better. Hence the suggestion that Avalanche should be able to act like a library. But perhaps all of this is overkill and we just need to add a small set of writer patterns and config options to avalanche to make the workload more realistic. |
I'm not very familiar with the I would prefer to have the tool we build/extend be as "intelligent" as possible in terms of it's data generation. Obviously we will need a few more knobs to turn in terms of telling it what data to generate, but I don't want to have to write a whole config file in order to use it. |
Proposal
We need a more formal and repeatable way of benchmarking changes within remote write. It makes sense to include this as a (non-blocking) task for the remote write 2.0 tracking issue.
We can extend the avalanche project plus build a
/dev/null
esque sink that accepts remote write metrics/introduces latency etc. These could be used within prombench to provide a way of benchmarking changes to remote write in a realistic environment; k8s, multiple pods, etc.The text was updated successfully, but these errors were encountered: