Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.3.0 ingest performance poor compared to v1.2.2 #676

Open
lslate opened this issue Mar 16, 2023 · 6 comments
Open

v1.3.0 ingest performance poor compared to v1.2.2 #676

lslate opened this issue Mar 16, 2023 · 6 comments

Comments

@lslate
Copy link

lslate commented Mar 16, 2023

Hello. I recently updated from 1.2.2 to 1.3.0 (for rollup issue #457 fix) and now only seeing about 20% of the previous http.ingest_count and a doubling of http.ingest_time, which results is Kairos ingest falling behind rabbitmq.

For reference the data flow is rabbitmq -> python(pika) -> kairosdb -> scylladb 5.0.

I translated the previous kairosdb.properties to the new kairosdb.conf hocon format and I believe I have all the settings the same (threads, batch sizes, connections etc).

Java is 1.8.0 and I have tried 11 but no difference.
Starting with a fresh keyspace makes no difference.
Kairosdb 1.3.0 is the only change to the stack and when I change back to 1.2.2 it's all happy again.

I've trawled the commits and issues but nothing stands out. Any ideas?

@brianhks
Copy link
Member

Well this is disappointing. Tell me a bit about your setup. How many clients, how many kairos nodes, how big is your scylla cluster? What are your ingest numbers before and after 1.3.0? You can email directly your config files and I'll take a look if anything sticks out as wrong.

@lslate
Copy link
Author

lslate commented Mar 21, 2023 via email

@brianhks
Copy link
Member

I've done some testing on a single kairos and single cassandra 4 node. I'm getting almost identical performance from kairos 1.2 and 1.3.
When using 1.3 I did have to make a change in cassandra.yaml for the batch warn and fail config
batch_size_warn_threshold: 75KiB
batch_size_fail_threshold: 150KiB
All the warnings did seem to effect the performance of the inserts.

I'm working on 1.4 release right now and I'm upgrading to the latest cassandra driver. I'll be testing it along with the other versions before I release to see if it makes any difference.

@lslate
Copy link
Author

lslate commented May 24, 2023

Thanks for that Brian. I'll have a play with those thresholds and get back to you.

@brianhks
Copy link
Member

Also someone else made a comment that made me think this may be the issue. key caching may have changed. Have a look at the kairos cassandra metrics for writes to different tables: kairosdb.datastore.cassandra.write_batch_size.sum then group by table. In an ideal state you are only writting to the data_points table and everything else gets cached. If the cache is too small you will see a lot of writes to the other tables. It would be interesting to compare the two versions and see if there is a difference there.

@lslate
Copy link
Author

lslate commented May 31, 2023

I've played with the batch size thresholds but I see no improvement. Looking at kairosdb.datastore.cassandra.write_batch_size.sum grouped by table, the only writes are to data_points, the other tables writes are negligible or zero, so key caching doesn't seem to be a problem. I'll update Scylla.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants