Operations timeout while inserting data into ScyllaDB cluster at very low throughput #18632

amitesh88 · 2024-05-12T16:18:07Z

I have a 3 node scyllaDB cluster
32 CPU ,64GB RAM , scylla version: 5.4.3
io_properties.yaml :
read_iops: 36764
read_bandwidth: 769690880
write_iops: 42064
write_bandwidth: 767818944
When application has increased writes operations from 1200 to 10000 tps , which is far less than claimed write_iops, it was getting error below:
Error inserting Data : Operation timed out for xxx_xxx.xxx_xxxxx_240512 - received only 1 responses from 2 CL=QUORUM.
On ScyllaDB node the only log can be seen is:
[shard 8:comp] large_data - Writing large partition xxx_xxx.xxx_xxxxx_240512: xxx (37041816 bytes) to me-3gg2_13mq_3jyhc2r2wxx7hvxxw4-big-Data.db
CPU utilisation on each node is hardly 15% , but application failed to write
Note: RF of system_auth and other keyspaces is already equal to number of nodes

Need insights on this
Thanks in advance

mykaul · 2024-05-13T06:15:34Z

Hi @amitesh88 - you can't compare the io_properties IOPS in any way to the CQL OPs - ScyllaDB does a whole lot more 'raw' IOPS per every CQL transaction. For example, commit log I/O, or compaction.
However, I do encourage you to test with fio the disk - it may be that iotune is configuring vastly less IOPs than the disks can sustain and you may be able to raise the numbers somewhat. Unsure if that will solve your issues, but worth a try.

amitesh88 · 2024-05-13T17:09:57Z

we are using ssd disk on gcp vm which has good io throughput , refer image above

Can writing large partition be the issue related to partition key not properly distributing load??

mykaul · 2024-05-14T08:22:11Z

@amitesh88 - as you can see, the numbers quoted above and iotune are vastly different. I'd also compare with fio. If fio is substantially better, I'd change the number manually to higher values and try again. See scylladb/seastar#1297 for reference

amitesh88 · 2024-05-15T11:01:53Z

Using FIO , I am getting below result
scylla_io_2: (groupid=0, jobs=16): err= 0: pid=16100: Wed May 15 16:28:36 2024
write: IOPS=37.8k, BW=185MiB/s (193MB/s)(10.8GiB/60008msec); 0 zone resets

mykaul · 2024-05-15T11:13:18Z

Using FIO , I am getting below result scylla_io_2: (groupid=0, jobs=16): err= 0: pid=16100: Wed May 15 16:28:36 2024 write: IOPS=37.8k, BW=185MiB/s (193MB/s)(10.8GiB/60008msec); 0 zone resets

That's a bit low - I expected more. Can you share the full fio command line and results?

amitesh88 · 2024-05-15T11:38:10Z

Below is the command with output:

fio --filename=/var/lib/scylla/a --direct=1 --rw=randrw --refill_buffers --size=1G --norandommap --randrepeat=0 --ioengine=libaio --bs=5kb --rwmixread=0 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=scylla_io_2
scylla_io_2: (g=0): rw=randrw, bs=(R) 5120B-5120B, (W) 5120B-5120B, (T) 5120B-5120B, ioengine=libaio, iodepth=16
...
fio-3.16
Starting 16 processes
scylla_io_2: Laying out IO file (1 file / 1024MiB)
Jobs: 16 (f=16): [w(16)][100.0%][w=179MiB/s][w=36.8k IOPS][eta 00m:00s]
scylla_io_2: (groupid=0, jobs=16): err= 0: pid=16100: Wed May 15 16:28:36 2024
write: IOPS=37.8k, BW=185MiB/s (193MB/s)(10.8GiB/60008msec); 0 zone resets
slat (usec): min=3, max=1636, avg=10.87, stdev=13.92
clat (usec): min=391, max=26669, avg=6759.47, stdev=1146.82
lat (usec): min=549, max=26699, avg=6770.58, stdev=1146.98
clat percentiles (usec):
| 1.00th=[ 2245], 5.00th=[ 4621], 10.00th=[ 5932], 20.00th=[ 6390],
| 30.00th=[ 6587], 40.00th=[ 6783], 50.00th=[ 6915], 60.00th=[ 7046],
| 70.00th=[ 7177], 80.00th=[ 7373], 90.00th=[ 7635], 95.00th=[ 7963],
| 99.00th=[ 9110], 99.50th=[10421], 99.90th=[13566], 99.95th=[15008],
| 99.99th=[20579]
bw ( KiB/s): min=179810, max=321463, per=99.99%, avg=188943.08, stdev=1561.74, samples=1920
iops : min=35962, max=64289, avg=37788.25, stdev=312.34, samples=1920
lat (usec) : 500=0.01%, 750=0.01%, 1000=0.02%
lat (msec) : 2=0.62%, 4=3.40%, 10=95.36%, 20=0.58%, 50=0.01%
cpu : usr=1.39%, sys=3.04%, ctx=1606741, majf=0, minf=194
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,2267856,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=185MiB/s (193MB/s), 185MiB/s-185MiB/s (193MB/s-193MB/s), io=10.8GiB (11.6GB), run=60008-60008msec

Disk stats (read/write):
sdb: ios=0/2266088, merge=0/0, ticks=0/15152759, in_queue=15152760, util=99.87%

mykaul · 2024-05-15T12:32:03Z

Very strange. This is what I'm getting on my laptop :
Run status group 0 (all jobs):
WRITE: bw=3684MiB/s (3863MB/s), 3684MiB/s-3684MiB/s (3863MB/s-3863MB/s), io=16.0GiB (17.2GB), run=4447-4447msec

And of course, if I switch to 4KB bs, it's slightly better.
Run status group 0 (all jobs):
WRITE: bw=4011MiB/s (4206MB/s), 4011MiB/s-4011MiB/s (4206MB/s-4206MB/s), io=16.0GiB (17.2GB), run=4085-4085msec

avikivity · 2024-05-15T14:15:31Z

Please check the advanced dashboard in per-shard view mode to see if some shard is the bottleneck.

amitesh88 · 2024-05-15T14:16:44Z

Thanks a lot
Can we check this on opensource Scylla??

mykaul · 2024-05-15T14:28:31Z

Thanks a lot Can we check this on opensource Scylla??

Yes, you can use the monitor with open source Scylla.

amitesh88 · 2024-05-16T07:11:19Z

I got the issue, It was due to partition key which was not letting data to be equally divided on the nodes , thats why getting
large_data - Writing large partition
We have corrected it to uuid and now data is equally distributed on both DC1 and DC2
Thanks for your time.

amitesh88 closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operations timeout while inserting data into ScyllaDB cluster at very low throughput #18632

Operations timeout while inserting data into ScyllaDB cluster at very low throughput #18632

amitesh88 commented May 12, 2024

mykaul commented May 13, 2024

amitesh88 commented May 13, 2024

mykaul commented May 14, 2024

amitesh88 commented May 15, 2024

mykaul commented May 15, 2024

amitesh88 commented May 15, 2024

mykaul commented May 15, 2024

avikivity commented May 15, 2024

amitesh88 commented May 15, 2024

mykaul commented May 15, 2024

amitesh88 commented May 16, 2024

Operations timeout while inserting data into ScyllaDB cluster at very low throughput #18632

Operations timeout while inserting data into ScyllaDB cluster at very low throughput #18632

Comments

amitesh88 commented May 12, 2024

mykaul commented May 13, 2024

amitesh88 commented May 13, 2024

mykaul commented May 14, 2024

amitesh88 commented May 15, 2024

mykaul commented May 15, 2024

amitesh88 commented May 15, 2024

mykaul commented May 15, 2024

avikivity commented May 15, 2024

amitesh88 commented May 15, 2024

mykaul commented May 15, 2024

amitesh88 commented May 16, 2024