recursive implementation for `in(::Any, ::Tuple)` #54411

nsajko · 2024-05-08T15:36:05Z

Avoids relying on tail(::Tuple), thus it can be performant for moderate length tuples. Before this change the recursive implementation could only be used for tuples of length less than about thirty, while now the recursion is used for lengths up to 600.

This means that in for larger tuples than before will both be faster and amenable to constant folding.

This introduces the TupleView type, which will hopefully be used in a few other places in the future, too. It's meant to help with recursing over every tuple element in a compiler-friendly way. In particular, TupleView could be useful for _isdisjoint, isdisjoint(::Tuple, ::Tuple) and other places where tail(::Tuple) is currently used.

nsajko · 2024-05-08T15:37:26Z

The TupleView type is a simplified version of my unregistered package StaticViews.jl. This PR was prompted by @jishnub recent message on Slack about in(::Any, ::Tuple).

jishnub · 2024-05-08T16:56:50Z

We may want a cutoff length for this, as the present loop-based implementation is faster for large tuples.
On nightly v"1.12.0-DEV.467"

julia> t = ntuple(x->'A', 1000);

julia> @btime 'C' in $t;
  354.118 ns (0 allocations: 0 bytes)

This PR

julia> @btime 'C' in $t;
  2.889 μs (0 allocations: 0 bytes)

aviatesk · 2024-05-09T05:54:33Z

Can you explain the advantages of using StaticView, especially how it makes it unnecessary to use tail? I'm particularly interested in its advantages compared to the technique that uses the length of argument tuple as a threshold to split dispatches between recursive and loop implementations, like in the implementation of map.

aviatesk · 2024-05-09T05:54:42Z

Also xref: #54026

Avoids relying on `tail(::Tuple)`, thus it can be performant for moderate length tuples. Before this change the recursive implementation could only be used for tuples of length less than about thirty, while now the recursion is used for lengths up to 600. This means that `in` for larger tuples than before will both be faster and amenable to constant folding. This introduces the `TupleView` type, which will hopefully be used in a few other places in the future, too. It's meant to help with recursing over every tuple element in a compiler-friendly way. In particular, `TupleView` could be useful for `_isdisjoint`, `isdisjoint(::Tuple, ::Tuple)` and other places where `tail(::Tuple)` is currently used.

nsajko · 2024-05-10T07:37:10Z

Performance comparisons show 3x-4x improvement:

Before:

julia> using BenchmarkTools

julia> t = ntuple(Returns(7), 100);

julia> minimum(@benchmark 3 ∈ $t)
BenchmarkTools.TrialEstimate: 
  time:             34.215 ns
  gctime:           0.000 ns (0.00%)
  memory:           0 bytes
  allocs:           0

julia> t = ntuple(Returns(7), 500);

julia> minimum(@benchmark 3 ∈ $t)
BenchmarkTools.TrialEstimate: 
  time:             203.996 ns
  gctime:           0.000 ns (0.00%)
  memory:           0 bytes
  allocs:           0

julia> versioninfo()
Julia Version 1.12.0-DEV.495
Commit 3c966a5107a (2024-05-09 11:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × AMD Ryzen 3 5300U with Radeon Graphics
  WORD_SIZE: 64
  LLVM: libLLVM-17.0.6 (ORCJIT, znver2)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)

After:

julia> using BenchmarkTools

julia> t = ntuple(Returns(7), 100);

julia> minimum(@benchmark 3 ∈ $t)
BenchmarkTools.TrialEstimate:
  time:             9.829 ns
  gctime:           0.000 ns (0.00%)
  memory:           0 bytes
  allocs:           0

julia> t = ntuple(Returns(7), 500);

julia> minimum(@benchmark 3 ∈ $t)
BenchmarkTools.TrialEstimate:
  time:             53.974 ns
  gctime:           0.000 ns (0.00%)
  memory:           0 bytes
  allocs:           0

julia> versioninfo()
Julia Version 1.12.0-DEV.502
Commit 9598e8dce1 (2024-05-10 07:13 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: 8 × AMD Ryzen 3 5300U with Radeon Graphics
  WORD_SIZE: 64
  LLVM: libLLVM-17.0.6 (ORCJIT, znver2)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)

aviatesk · 2024-05-10T10:56:41Z

The benchmark isn't fair because this PR increases the threshold for tuple lengths that permit unrolling. Essentially, the performance difference we're seeing is between the loop implementation (from the master) and the recursive implementation (in this PR) for the very large (100-length) tuples. If comparisons are needed, they should be made between recursive implementations. Something like this:

julia> @benchmark i ∈ t setup=(n = rand((2:10...,25,100)); t = ntuple(n) do _; rand(1:10); end; i = rand(1:10))
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min … max):  17.117 ns … 83.709 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     22.253 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   22.440 ns ±  2.913 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

            ▁▁   ▃▂▄▁▁▂█▇▄                                     
  ▁▁▂▂▂▄▄▄▃▅██▅▅▇█████████▇▇▇▇▄▃▃▅▅▃▃▃▄▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  17.1 ns         Histogram: frequency by time        31.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

nsajko · 2024-05-10T11:37:14Z

The benchmark isn't fair because this PR increases the threshold for tuple lengths that permit unrolling.

Increasing the threshold is the point of this PR (now that the competing PR is already merged).

If comparisons are needed, they should be made between recursive implementations.

That obviously makes this PR's perf wins even greater, by a lot. The recursive implementation on master allocates when given tuples larger than intended (because of relying on tail(::Tuple)):

julia> f(x, t) = Base._in_tuple(x, t, false)
f (generic function with 1 method)

julia> using BenchmarkTools

julia> t = ntuple(Returns(7), 100);

julia> minimum(@benchmark f(3,$t))
BenchmarkTools.TrialEstimate: 
  time:             175.159 μs
  gctime:           0.000 ns (0.00%)
  memory:           37.48 KiB
  allocs:           69

nsajko · 2024-05-10T21:59:50Z

base/operators.jl

+                let f = is_missing | any_missing, rest = tail(v)
+                    _in_tupleview(x, rest, f)


Probably I should just get rid of the let and inline f and rest.

nsajko · 2024-05-13T07:59:02Z

Closing in favor of an upcoming, more comprehensive PR.

This comment was marked as outdated.

Sign in to view

nsajko force-pushed the in_tuple branch from 6841418 to 3d6e427 Compare May 9, 2024 07:48

nsajko marked this pull request as draft May 9, 2024 08:05

This comment was marked as resolved.

Sign in to view

nsajko mentioned this pull request May 9, 2024

Type-stable codegen for specialized in(v, ::Tuple) #54026

Merged

tecosaur added the domain:collections Data structures holding multiple items, e.g. sets label May 9, 2024

nsajko marked this pull request as ready for review May 10, 2024 07:35

nsajko force-pushed the in_tuple branch from c587dbf to 9598e8d Compare May 10, 2024 07:35

nsajko added the performance Must go faster label May 10, 2024

nsajko commented May 10, 2024

View reviewed changes

nsajko closed this May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recursive implementation for `in(::Any, ::Tuple)` #54411

recursive implementation for `in(::Any, ::Tuple)` #54411

nsajko commented May 8, 2024 •

edited

nsajko commented May 8, 2024 •

edited

jishnub commented May 8, 2024 •

edited

This comment was marked as outdated.

aviatesk commented May 9, 2024

aviatesk commented May 9, 2024

This comment was marked as resolved.

nsajko commented May 10, 2024 •

edited

aviatesk commented May 10, 2024

nsajko commented May 10, 2024

nsajko May 10, 2024

nsajko commented May 13, 2024

		let f = is_missing \| any_missing, rest = tail(v)
		_in_tupleview(x, rest, f)

recursive implementation for in(::Any, ::Tuple) #54411

recursive implementation for in(::Any, ::Tuple) #54411

Conversation

nsajko commented May 8, 2024 • edited

nsajko commented May 8, 2024 • edited

jishnub commented May 8, 2024 • edited

This comment was marked as outdated.

aviatesk commented May 9, 2024

aviatesk commented May 9, 2024

This comment was marked as resolved.

nsajko commented May 10, 2024 • edited

aviatesk commented May 10, 2024

nsajko commented May 10, 2024

nsajko May 10, 2024

Choose a reason for hiding this comment

nsajko commented May 13, 2024

recursive implementation for `in(::Any, ::Tuple)` #54411

recursive implementation for `in(::Any, ::Tuple)` #54411

nsajko commented May 8, 2024 •

edited

nsajko commented May 8, 2024 •

edited

jishnub commented May 8, 2024 •

edited

nsajko commented May 10, 2024 •

edited