Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a fast-path to Debug ASCII &str #121150

Merged
merged 5 commits into from
May 24, 2024
Merged

Conversation

Swatinem
Copy link
Contributor

Instead of going through the EscapeDebug machinery, we can just skip over ASCII chars that don’t need any escaping.


This is an alternative / a companion to #121138.

The other PR is adding the fast path deep within EscapeDebug, whereas this skips as early as possible.

@rustbot
Copy link
Collaborator

rustbot commented Feb 15, 2024

r? @cuviper

rustbot has assigned @cuviper.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 15, 2024
@Swatinem
Copy link
Contributor Author

While we are still bikeshedding the implementation details, I would appreciate a benchmark run. Not sure if the compiler performance testsuite will show any change, in a micro-benchmark, I was able to get a 10x improvement for a pure-ASCII string, and even a tiny improvement in the "pure"-Unicode case (because even a "pure" unicode sentence still contains ASCII spaces)

@the8472
Copy link
Member

the8472 commented Feb 18, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 18, 2024
@bors
Copy link
Contributor

bors commented Feb 18, 2024

⌛ Trying commit 04602a2 with merge d3c44b1...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 18, 2024
Add a fast-path to `Debug` ASCII `&str`

Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping.

---

This is an alternative / a companion to rust-lang#121138.

The other PR is adding the fast path deep within `EscapeDebug`, whereas this skips as early as possible.
@bors
Copy link
Contributor

bors commented Feb 18, 2024

☀️ Try build successful - checks-actions
Build commit: d3c44b1 (d3c44b181fb6776a28574e754c3db10da6172bd9)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d3c44b1): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.4% [1.4%, 1.4%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.8% [1.8%, 1.8%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 640.535s -> 641.862s (0.21%)
Artifact size: 308.83 MiB -> 308.80 MiB (-0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 18, 2024
@Swatinem
Copy link
Contributor Author

Not too surprisingly, the compile time benchmarks did not budge, or look bogus.

Though if you select "runtime", there is a very significant improvement to fmt-debug-derive, which sounds like exactly the benchmark touching this code.

https://perf.rust-lang.org/compare.html?start=6f726205a1b7992537ddec96c83f2b054b03e04f&end=d3c44b181fb6776a28574e754c3db10da6172bd9&stat=instructions%3Au&tab=runtime&nonRelevant=true&showRawData=true


So the question remains, what can I do to improve the confidence in this change? I have looked through other similar existing methods, but none of which do exactly what I expect.

Another option would be to also perf-test #121138, as in my local benchmarking I found that the slowness came from binary searching the Grapheme_Extend table. Maybe the compiler is smart enough to generate equally fast code with an arguably simpler change?

@cuviper
Copy link
Member

cuviper commented Mar 6, 2024

Let's see what this now does on top of #121138...

@bors try @rust-timer queue

And for good measure, let's mention that #122013 is also making changes around this.

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 6, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 6, 2024
Add a fast-path to `Debug` ASCII `&str`

Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping.

---

This is an alternative / a companion to rust-lang#121138.

The other PR is adding the fast path deep within `EscapeDebug`, whereas this skips as early as possible.
@bors
Copy link
Contributor

bors commented Mar 6, 2024

⌛ Trying commit 04602a2 with merge d253c69...

@bors
Copy link
Contributor

bors commented Mar 6, 2024

☀️ Try build successful - checks-actions
Build commit: d253c69 (d253c6933da1078e2c622d865089469a722c376b)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d253c69): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
4.5% [1.4%, 7.7%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.6% [-2.6%, -2.6%] 1
All ❌✅ (primary) 4.5% [1.4%, 7.7%] 2

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 644.65s -> 646.087s (0.22%)
Artifact size: 175.06 MiB -> 175.06 MiB (0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 6, 2024
@Swatinem
Copy link
Contributor Author

Swatinem commented Mar 8, 2024

Another -26.93% on the fmt-debug-derive runtime benchmark.
I can take another look if I can maybe improve vectorization of the loop as suggested by @the8472. I will reach out on zulip if I have questions :-)

@cuviper
Copy link
Member

cuviper commented Mar 19, 2024

AIUI you're still investigating...
@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 19, 2024
@Swatinem
Copy link
Contributor Author

I rebased, reduced the number of unsafe, and documented the remaining uses.

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 12, 2024
@rust-log-analyzer

This comment has been minimized.

library/core/src/fmt/mod.rs Outdated Show resolved Hide resolved
library/core/src/fmt/mod.rs Outdated Show resolved Hide resolved
library/core/src/fmt/mod.rs Outdated Show resolved Hide resolved
Swatinem and others added 4 commits May 20, 2024 10:04
Instead of writing each `char` of an escape sequence one by one,
this delegates to `Display`, which uses `write_str` internally
in order to write the whole escape sequence at once.
Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping.
Instead of having a single loop that works on utf-8 `char`s,
this splits the implementation into a loop that quickly skips over
printable ASCII, falling back to per-char iteration for other chunks.
Surprisingly, benchmarks have shown that using `&str`
instead of `&[u8]` with some `unsafe` code is actually faster.
Copy link
Contributor

@joboet joboet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great and is ready to merge (modulo the comment nit), thank you!

I have some ideas on how to optimize this further (I don't think the len_utf8 gets optimized away, even though it is redundant), but I think we can leave that for a follow-up PR...

library/core/src/fmt/mod.rs Outdated Show resolved Hide resolved
This avoids having to collect a non-ASCII-printable run before processing it.
@joboet
Copy link
Contributor

joboet commented May 24, 2024

Thank you!
@bors r+

@bors
Copy link
Contributor

bors commented May 24, 2024

📌 Commit 004100c has been approved by joboet

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 24, 2024
@bors
Copy link
Contributor

bors commented May 24, 2024

⌛ Testing commit 004100c with merge 213ad10...

@bors
Copy link
Contributor

bors commented May 24, 2024

☀️ Test successful - checks-actions
Approved by: joboet
Pushing 213ad10 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label May 24, 2024
@bors bors merged commit 213ad10 into rust-lang:master May 24, 2024
7 checks passed
@rustbot rustbot added this to the 1.80.0 milestone May 24, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (213ad10): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.5% [-0.9%, -0.4%] 7
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (secondary 4.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.1% [3.5%, 4.8%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results (primary -2.4%, secondary -4.4%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.4% [-3.0%, -2.0%] 5
Improvements ✅
(secondary)
-4.4% [-8.8%, -2.4%] 16
All ❌✅ (primary) -2.4% [-3.0%, -2.0%] 5

Binary size

Results (primary 0.0%, secondary 0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.0% [0.0%, 0.0%] 1
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 8
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.0% [-0.0%, -0.0%] 1
All ❌✅ (primary) 0.0% [0.0%, 0.0%] 1

Bootstrap: 673.01s -> 673.59s (0.09%)
Artifact size: 315.73 MiB -> 315.74 MiB (0.00%)

flip1995 pushed a commit to flip1995/rust-clippy that referenced this pull request May 24, 2024
Add benchmarks for `impl Debug for str`

In order to inform future perf improvements and prevent regressions, lets add some benchmarks that stress `impl Debug for str`.

---

As I am currently working on improving the perf in rust-lang/rust#121150, its nice to have these benchmarks.

Writing them, I also saw that escapes are written out one char at a time, even though other parts of the code are already optimizing that via `as_str`, which I intend to do as well as a followup improvement.

r? ``@cuviper``
☝🏻 as you were also assigned to rust-lang/rust#121150, CC ``@the8472`` if you want to steal the review :-)
@Swatinem Swatinem deleted the debug-ascii-str branch May 25, 2024 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants