Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Higher Latency compared to Tokio #27

Open
rohitjoshi opened this issue Jan 17, 2018 · 11 comments
Open

Higher Latency compared to Tokio #27

rohitjoshi opened this issue Jan 17, 2018 · 11 comments
Labels

Comments

@rohitjoshi
Copy link

Thank you for creating this high-performance library and making async easier. Reading your blog, it seems average and peak latency seems to be higher compared to tokio based implementation.

Is this due to higher throughput or scheduling coroutine?

e.g.

Tokio threaded:

Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    52.54us   56.39us  12.14ms   99.36%

vs

May coroutine:

Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.26ms    6.48ms 105.13ms   98.35%
@Xudong-Huang
Copy link
Owner

Xudong-Huang commented Jan 18, 2018

Good question! It needs us investigate deeply. I have a blog that analysis why using hyper based porting is slow. those numbers are coming from such conditions: while tokio version using the latest master branch of hyper, may version using the old 0.10.x branch which is not very optimized. so the comparison is not that fairly positive.

I have a minihttp, that should be a fair comparison.

tokio_minihttp

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.98ms  284.06us  12.53ms   98.92%

vs
may_minihttp

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.70ms  812.42us  20.17ms   97.94%

may is faster! But with bigger jitters.

@rohitjoshi
Copy link
Author

👍 looks good. I am planning to convert my existing threaded application to may coroutine-based. This is a near-real-time application with 99.9% latency below 1ms with max latency ~3ms. The application runs on 16 core machine but actively uses only 9 threads (1 thread per TCP connection + 1 cache updater thread).
The number of TCP connections is fixed to 8. Let me know if you have any suggestions.

@Xudong-Huang
Copy link
Owner

Your application is very special, there are enough resources for every thread and only 8 connections running on 16 cores! I think you can try clone each stream into two part: one for reading and one for writing, so that read and write don't block each other. and spawn coroutines to handle each request. Don't forget to config the io_workers to a reasonable number, so that fully utilize the CPU.

@alkis
Copy link
Contributor

alkis commented Jan 18, 2018

Is it possible to reduce jitter if we use a work stealing queue per worker thread instead of a global mpmc queue for coroutines?

@Xudong-Huang
Copy link
Owner

Xudong-Huang commented Jan 19, 2018

@alkis
I don't thinks so. IO related tasks will not schedule from the mpmc channel (used by normal worker thread), they will run directly in the io_workers threads. I think the jitter comes from the fact that coroutine will not suspend if there is already some data on the io, say for the reading, the coroutine would only suspend when the unblocking read returns no data, this would cause some coroutines spin the cpu for a while until it drains out and consumes all of the available data, while other corouines have to wait for the cpu release.

@alkis
Copy link
Contributor

alkis commented Jan 19, 2018

Is there some documentation on the tradeoffs and/or different designs for the scheduler? More particularly why do we need separate io_workers? How does it compare to N generic workers plus a single epoll_wait() worker whose only job is to handle the slow path of park/unpark coros that wait for I/O?

@Xudong-Huang
Copy link
Owner

Xudong-Huang commented Jan 19, 2018

well, at the beginning I do use unified thread to handle all of the events, but it's not as efficient as I expect. the sync primitive events and the timeout events would not go through the io thread, they are scheduled in normal worker thread. if generate event to the epoll system, this would have another system call which is not necessary. running coroutines in normal worker thread for the io event would have a noticeable delay after receiving the io event which generated in io worker thread.

@antoyo
Copy link

antoyo commented Jan 20, 2018

For the record, I wrote (with GuillaumeGomez) an FTP server with tokio and one with may and the version with may seems a little bit faster when benchmarking with this tool, so I think may is fast enough.

@rohitjoshi
Copy link
Author

Can you please share your results? Did you measure latency differences?

@antoyo
Copy link

antoyo commented Jan 20, 2018

I'll write a blog post with Guillaume (hopefully soon) to talk about our experience and show these results.
This benchmark only shows the number of connection that remained at the desired speed, so I guess the latency was mesured somehow, but I only have the number of users.
If you know a better tool to benchmark an FTP server, I'll be happy to try it.
Thanks.

@pedrozaalex
Copy link

pedrozaalex commented Mar 11, 2023

Hello @Xudong-Huang, thanks for your great work!

I'm very interested in may and would like to know if you have more recent data on tokio vs may?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants