Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smp: allocate hugepages eagerly when kernel support is available #2146

Closed
wants to merge 1 commit into from

Conversation

avikivity
Copy link
Member

@avikivity avikivity commented Mar 16, 2024

Instead of deferring merging pages into hugepages to the transparent
hugepage scanner, advise the kernel to do so immediately using the new
MADV_POPULATE_WRITE and MADV_COLLAPSE advices.

Refactor the prefaulter to attempt first to use MAP_POPULATE_WRITE
to fault in a whole hugepage's worth of memory. This should fault
the range as a hugepage but for good measure use MADV_COLLAPSE too
(which would be a no-op if the work was done in MADV_POPULATE_WRITE).

@avikivity avikivity requested a review from xemul March 16, 2024 19:19
@avikivity avikivity force-pushed the madvise-collapse branch 2 times, most recently from 52a65cb to ff55fa7 Compare March 16, 2024 19:29
@avikivity
Copy link
Member Author

@xemul please review

@xemul
Copy link
Contributor

xemul commented Mar 21, 2024

Side note -- https://lwn.net/Articles/846501/
There's MADV_POPULATE flag that "prefaults" memory for us. Do you think it's worthwhile? Should be present since 5.14

src/core/smp.cc Outdated
auto aligned = addr & ~(page_size - 1);
auto offset_in_hugepage = aligned & (*huge_page_size_opt - 1);
auto first_page_offset = forwards ? 0 : *huge_page_size_opt - page_size;
auto hugepage_addr = aligned - offset_in_hugepage;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just hugepage_addr = addr & ~(huge_page_size_opt - 1) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avikivity ,

Why not just hugepage_addr = addr & ~(huge_page_size_opt - 1) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That just happened to be the way I wrote it. I'll reconsider, but more important is to use MAP_POPULATE_WRITE instead.

@avikivity
Copy link
Member Author

MADV_POPULATE_WRITE is probably better than MADV_COLLAPSE. We could just issue both.

  1. If MADV_POPULATE_WRITE is available, issue it, followed by MADV_COLLAPSE which is likely a no-op
  2. Otherwise do the funky stuff with manually faulting in pages

Instead of deferring merging pages into hugepages to the transparent
hugepage scanner, advise the kernel to do so immediately using the new
MADV_POPULATE_WRITE and MADV_COLLAPSE advices.

Refactor the prefaulter to attempt first to use MAP_POPULATE_WRITE
to fault in a whole hugepage's worth of memory. This should fault
the range as a hugepage but for good measure use MADV_COLLAPSE too
(which would be a no-op if the work was done in MADV_POPULATE_WRITE).
@avikivity avikivity changed the title smp: advise kernel to collapse transparent hugepages eagerly when prefaulting smp: allocate hugepages eagerly when kernel support is available May 18, 2024
@avikivity
Copy link
Member Author

v2: rewrite; use MADV_POPULATE_WRITE if available to have the kernel do the faulting. Add new helper populate_memory() to work on a hugepage's worth of memory

src/core/smp.cc Show resolved Hide resolved
src/core/smp.cc Show resolved Hide resolved
static std::regex re(R"(Hugepagesize:\s*(\d+) kB)");
auto m = std::smatch{};
if (std::regex_search(meminfo, m, re)) {
return std::stoi(m[1]) * size_t(1024);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it m[1] and not m[0] because of that extra outer parentheses in the regular expression?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why is this not m[0]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m[0] is the entire match, not a subexpression.

https://en.cppreference.com/w/cpp/regex/match_results/operator_at

src/core/smp.cc Show resolved Hide resolved
populate_memory(batch_start, range.end);
range.end = batch_start;

if (range.start >= range.end) {
ranges.erase(ranges.begin() + current_range);
current_range = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pre-existing code, but I still wonder - why do we have this current_range at all? It seems it's set to 0 initially, and then always remains 0 or set to 0 again. Where is it set to anything else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's incremented below.

@nyh nyh closed this in 5918cb0 May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants