Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zfree_with_size #453

Open
wants to merge 3 commits into
base: unstable
Choose a base branch
from
Open

Add zfree_with_size #453

wants to merge 3 commits into from

Conversation

poiuj
Copy link

@poiuj poiuj commented May 7, 2024

Description

zfree updates memory statistics. It gets the size of the buffer from jemalloc by calling zmalloc_size. This operation is costly. We can avoid it if we know the buffer size. For example, we can calculate size of sds from the data we have in its header.

This commit introduces zfree_with_size function that accepts both pointer to a buffer, and its size. zfree is refactored to call zfree_with_size.

sdsfree uses the new interface for all but SDS_TYPE_5.

Benchmark

Dataset is 3 million strings. Each benchmark run uses its own value size (8192, 512, and 120). The benchmark is 100% write load for 5 minutes.

value size       new tps      old tps      %       new us/call    old us/call    %
8k               272088.53    269971.75    0.78    1.83           1.92           -4.69
512              356881.91    352856.72    1.14    1.27           1.35           -5.93
120              377523.81    368774.78    2.37    1.14           1.19           -4.20

Copy link

codecov bot commented May 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.94%. Comparing base (ba9dd7b) to head (883f63d).
Report is 10 commits behind head on unstable.

❗ Current head 883f63d differs from pull request most recent head 779ba8c. Consider uploading reports for the commit 779ba8c to get more accurate results

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #453      +/-   ##
============================================
- Coverage     68.95%   68.94%   -0.02%     
============================================
  Files           109      109              
  Lines         61793    61798       +5     
============================================
- Hits          42611    42604       -7     
- Misses        19182    19194      +12     
Files Coverage Δ
src/sds.c 86.96% <100.00%> (+0.04%) ⬆️
src/zmalloc.c 83.65% <100.00%> (+0.18%) ⬆️

... and 9 files with indirect coverage changes

@lipzhu
Copy link
Contributor

lipzhu commented May 8, 2024

I think the zmalloc_size is really expensive to update the used_memory, we had a similar discussion in #308 (comment), for this patch, I'm curious if the assertion of sds->alloc == allocate size in all cases, what happened if call the sdsResize ?

zfree updates memory statistics. It gets the size of the buffer from
jemalloc by calling zmalloc_size. This operation is costly. We can
avoid it if we know the buffer size. For example, we can calculate
size of sds from the data we have in its header.

This commit introduces zfree_with_size function that accepts both
pointer to a buffer, and its size. zfree is refactored to call
zfree_with_size.

sdsfree uses the new interface for all but SDS_TYPE_5.

Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
@zvi-code
Copy link

zvi-code commented May 8, 2024

@lipzhu the update of used_memory has 2 cost aspects, the zmalloc_size and also the fact it's an atomic operation.

  1. regarding zmalloc_size cost: it is actually called twice, once in zmalloc code and enother one inside the je_free because the tcache needs to know what bin the buffer belongs to. So, if you remove the zmalloc_size call for stat, you are still paying the cost. When using je_sdallocx to free (this PR) it uses, internally in jemalloc, fast_path that avoids fetching the size to begin with. BTW: I think for cpp delete operator is actually overloaded in jemalloc.cpp to use sized delete.
  2. [edit I see you already have PR for this] regarding atomic cost: this I believe can be also eliminated with use of thread local vars(I guess the team has considered this before)

@poiuj
Copy link
Author

poiuj commented May 8, 2024

@lipzhu good point about sds->alloc == allocate size. I was sure it is. But now I see that sds->alloc is truncated when the sds len is close enough to sds type max len (2^8, 2^16, etc.). I'll update the PR with a fix.

poiuj added 2 commits May 9, 2024 09:10
sds type should be determined based on the size of the underlying
buffer, not the logical length of the sds. Otherwise it's a waste of
memory on jemalloc.

For example, let's consider creation of sds of length 253. According
to the length, the appropriate type is SDS_TYPE_8. But we allocate
`253 + sizeof(struct sdshdr8) + 1` bytes, which sums to 257 bytes. In
this case jemalloc allocates buffer from the next size bucket. With
current configuration on Linux, for example, it's 320 bytes. So we end
up with 320 bytes buffer, while we can't address more than 255.

The same happens with other types and length close enough to the
appropriate powers of 2.

The downside is that with allocators that do not allocate larger than
requested chunks (like GNU allocator), we switch to a larger type "too
early". It leads to small waste of memory. Specifically:
sds of length 31 takes 35 bytes instead of 33 (2 bytes wasted)
sds of length 255 takes 261 bytes instead of 259 (2 bytes wasted)
sds of length 65,535 takes 65,545 bytes instead of 65,541 (4 bytes wasted)
sds of length 4,294,967,295 takes 4,294,967,313 bytes instead of
4,294,967,305 (8 bytes wasted)

Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
sdsalloc returns sds's length for SDS_TYPE_5. It's not correct for
SDS_TYPE_5.

This patch makes sdsAllocSize call zmalloc_size for
SDS_TYPE_5. sdsalloc is a lower level function that continues to
return length for SDS_TYPE_5.

Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
@poiuj poiuj mentioned this pull request May 15, 2024
@@ -39,6 +39,7 @@
#include "sds.h"
#include "sdsalloc.h"
#include "util.h"
#include "zmalloc.h"
Copy link
Member

@madolson madolson May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really want zmalloc in sds, since it uses sdsalloc.

EDIT: I see now we call zmalloc in this file anyways, and probably should have been imported before, but we would still like not have a dependency on zmalloc in this file.

char header = s[-1];
/* SDS_TYPE_5 header doesn't contain the size of the allocation */
if ((header & SDS_TYPE_MASK) == SDS_TYPE_5) {
return zmalloc_size(sdsAllocPtr(s));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zmalloc_size is not on all platforms! You can see the previous usage of zmalloc_size was inside a jemalloc ifdef

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If zmalloc_size from jemalloc is unavailable, we use our own implementation – 

size_t zmalloc_size(void *ptr) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I initially thought wasn't ideal because it wasn't accurate, but if we don't have alloc_size on the system we are going to call free anyways.

madolson pushed a commit that referenced this pull request May 16, 2024
…to align the logic with sdsnewlen function. (#476)

This patch try to correct the actual allocated size from allocator
when call sdsRedize to align the logic with sdsnewlen function.

Maybe the #453 optimization
should depend on this.

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
@madolson madolson requested a review from ranshid May 16, 2024 16:53
srgsanky pushed a commit to srgsanky/valkey that referenced this pull request May 19, 2024
…to align the logic with sdsnewlen function. (valkey-io#476)

This patch try to correct the actual allocated size from allocator
when call sdsRedize to align the logic with sdsnewlen function.

Maybe the valkey-io#453 optimization
should depend on this.

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants