Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: chore: Add an end-to-end batching benchmark #20479

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

orf
Copy link

@orf orf commented May 12, 2024

This is a bit of a weird PR, sorry. I have been playing a lot with the internals of Vector recently for a project of mine to integrate Vector as a Python library. Vector is a really nicely structured project, and after a few wrong turns I was able to integrate it in-proces with Python as a library.

To explore how everything works (especially the service abstractions) I built the following dummy source and sink, and I figured I'd try and contribute it as a end-to-end Criterion benchmark for a couple of reasons:

  1. I found it quite hard to find completely minimal/bare-bones non-test sources and sinks, so perhaps this could form an example for contributors?
  2. An end-to-end benchmark involving producing and consuming might be an useful thing to include?
    • I was curious about potential overhead incurred with having to repeatedly allocate Vec's to call SourceSender::send_batch and I had no idea where to start profiling this.

Feel free to close this, but I figured it might be useful to someone in the future and I didn't want it to go to waste.

@bits-bot
Copy link

bits-bot commented May 12, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @orf !

Thanks for opening this for discussion!

Once-upon-a-time we actually had planned on, and had started, pursuing using criterion for end-to-end benchmarks, but ended up swapping it for alternate approach first proposed in https://github.com/vectordotdev/vector/blob/1b57acd6bb806a09556a8d5cd3a64709c5f44354/rfcs/2021-02-23-6531-performance-testing.md that relies on an external benchmark tool to run. This was primarily due to issues we with eliminating noise and to have more control over how the benchmark analysis is performed.

That being said, I do see it being useful to have a dummy source and sink to serve as examples for contributors. I just don't know that the benchmarks will end up being used much.

I think I'd be ok with accepting this contribution if we add a comment to benches/integration/mod.rs explaining what it is and what it should be used for. I'll tag some other team members for review too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants