RFC: Refactor MethodInstance to allow for more general specialization #54373

Keno · 2024-05-06T08:15:13Z

Overview

This refactors the base MethodInstance data structure to the following:

mutable struct MethodSpecialization{D}
     # If def is a MethodSpecialization, inherits edges from parent
     const def::Union{Module, Method, MethodSpecialization}
     specTypes::Type{<:Tuple} # To be renamed `abi` in the future
     backedges::Vector
     cache::CodeInstance
     next::MethodSpecialization # N.B.: No {D}
     data::D
end

struct DefaultSpec
    sparam_vals::SimpleVector
    inInference::UInt8
    cache_with_orig::UInt8
    precompiled::UInt8
end

struct UninferredSpec; end # Replaces owner === :uninferred

const MethodInstance = MethodSpecialization{DefaultSpecialization}

The owner field of CodeInstance is removed in favor of using
separate toplevel MethodInstances.

Motivation

This refactor aims to unify a number of recent requirements on the
internal cache. Broadly speaking, we'd like to cache (with proper
invalidation and world age semantics) several classes of data:

World-age partitioned, type-specialized native code instances (the
traditional MethodInstance/CodeInstance cache)
World-age partitioned, type-specialized non-native code instances (the
GPUCompiler use case)
World-age partitioned, type-specialized non-inferred code (e.g.
generated function results, some expensive-to-compute intermediate
results in external absint)
World-age partitioned, finer-than-type (think constant arguments,
return values, more fancy external absint specializations) native and
non native code instances
World-age partitioned, type-specialized derived code instance (e.g.
effect preconditions, see RFC: Effect Preconditions - or - the future of @inbounds #50641)

Now, some of these are expected to be compiled by the standard julia
execution engine (1, 4, 5), some of these have ABIs that match the
type specialization (1, 2, 3), but generally they are not all the same.
Most of these are invalidated along with the original method instance,
but not all. Additionally, some of these (1, 4, 5) have more likely more
edges than the default method-instance leading to over-invalidation.

Recently, we added the owner field to CodeInstance, which allowed us
to put all of these into the cache, but that didn't given them support
to be compiled/executed. I tried to fix that in #52797, but we didn't
manage to figure out good precompile semantics, so that stalled.

This PR pulls up the owner field one level into the type tag of
MethodSpecialization. This is partly to save the extra pointer in
every CodeInstance, but also to allow partitioning the edges between
native MethodInstances and those used by external abstract interpreters.
There's a few different usage modes:

The external absint sets def to a Method. In this case, the set of
edges is completely partitioned between the internal and external
absint and can be managed according to absint requirements (e.g. this
makes sense if the external absint is using an overlay table)
The external absint sets def to another MethodSpecialization. In
this case the set of edges is extended. This is inteded to be used
by absints, which wrap another absint and produce more fine grained
specializations.

Additionally, all non-MethodInstance MethodSpecializations are allowed
in Expr(:invoke). There's some TBD still for how to handle recovery
on reload, but in principle everything should just go through. This thus
closes #52797 as it addresses the same use case, but with proper edge
tracking.

Currently, there's two D tags that the runtime system uses.
DefaultSpec, which has the memoization of sparams and the various
lock bits that the runtime uses. And UninferredSpec which is a singleton
and replaces the owner === :uninferred CodeInstances introduced in #54362.
I anticipate extending this further for effect preconditions and various
finer-than-type specializations in Base.

Current Status

This PR changes the data structures, but does not yet provide the
Core.Compiler utilities for cache lookup in non-default
MethodSpecializations. That's on my immediate to do list.
Additionally, the new edge/invalidation logic described above
is not yet implemented. I also haven't tried the #52797 replacement
yet to make sure it actually works properly. I'm putting this up
as a draft to make sure that all relevant package developers have
a chance to complain if I missed something important.

vchuravy

Recently, we added the owner field to CodeInstance, which allowed us
to put all of these into the cache, but that didn't given them support
to be compiled/executed.

Making them compilable/executable is #52964 since that requires tracking the current compiler instance through dynamic dispatch.

I will need to see how this actually interacts with foreign abstract interpreters. While working on #52964 I often needed to know the origin of a CodeInstance. I suppose in your proposal we would add a GPUSpec or ForeignSpec? But right-now in many places the code assumes that it DefaultSpec.

Part of my goal with owner is that we could re-use the code in many places and reduce the reliance on solutions like spinning up a separate OrcJIT.

One ask that Jeff had for #52964 if we could provide a "separate MethodTable" and outside of generic dispatch that currently works.

vchuravy · 2024-05-06T21:55:37Z

src/precompile_utils.c

-        if (codeinst->owner != jl_nothing) {
-            // TODO(vchuravy) native code caching for foreign interpreters
-        }
-        else if (jl_atomic_load_relaxed(&codeinst->invoke) != jl_fptr_const_return) {


This was needed so that CUDA or other foreign CodeInstances don't bled into the native cache. I am unsure how this alternative handles this.

The idea is to only handle the DefaultSpec here. Additional work may be required in various places to disambiguate.

Yeah you will have to filter out everything that isn't "DefaultSpec"

Keno · 2024-05-06T22:31:16Z

Making them compilable/executable is #52964 since that requires tracking the current compiler instance through dynamic dispatch.

It requires tracking them trough dynamic dispatch if you want dynamic dispatch to do something other than the default. That's an orthogonal feature to this. This just lets you have multiple specializations with different ABIs for one particular method. They may :invoke each other, but if they dynamic dispatch, that still goes through the default compiler. #52964 is still desirable, but a separate concern (though it may be able to re-use part of the mechanism here).

I suppose in your proposal we would add a GPUSpec or ForeignSpec?

Yes

But right-now in many places the code assumes that it DefaultSpec.

Yes, I need to go through and disambiguate the assumptions, but that's a fair bit of work, so I wanted to get agreement on the direction first.

Part of my goal with owner is that we could re-use the code in many places and reduce the reliance on solutions like spinning up a separate OrcJIT.

That's still in scope, but again, not addressed by this. I think the cleanest way to address that is to also make CodeInstance parametric, so that the .inferred field can be something other than julia IR (e.g. LLVM IR) with corresponding helpers that know how to serialize that.

vchuravy · 2024-05-07T13:48:33Z

That's still in scope, but again, not addressed by this. I think the cleanest way to address that is to also make CodeInstance parametric, so that the .inferred field can be something other than julia IR (e.g. LLVM IR) with corresponding helpers that know how to serialize that.

Yeah the challenge here is to figure out what fields have meaning and what is needed.

What are the requirements for something going through the native pipeline (but not wanting to poison it). The ci->owner goal was to maximize re-use up until it becomes hard.

Almost everyone reuses inferred to store the result of the high-level pipeline.
Most GPUCompiler targets don't need specptr & co, but Enzyme would reuse those
if we add the option to customize the LLVM pipeline.

aviatesk · 2024-05-14T15:58:43Z

There's a few different usage modes:

The external absint sets def to a Method. In this case, the set of
edges is completely partitioned between the internal and external
absint and can be managed according to absint requirements (e.g. this
makes sense if the external absint is using an overlay table)

The external absint sets def to another MethodSpecialization. In
this case the set of edges is extended. This is inteded to be used
by absints, which wrap another absint and produce more fine grained
specializations.

Is this behavior already implemented, or is it something that needs to be implemented?

Also, I'm curious about how this change should affect the existing cache overload system based on cache_owner. If external abstract interpreters use InternalCodeCache, it seems to be necessary anymore. And if it uses an external code cache, only code_cache will be needed, so this makes the system look like how it was before.

Keno · 2024-05-15T04:06:22Z

Is this behavior already implemented, or is it something that needs to be implemented?

It's partially implemented. Some cachine/invalidating/precompile logic is missing.

Also, I'm curious about how this change should affect the existing cache overload system based on cache_owner

InternalCodeCache takes a specialization type, which replaces the cache_owner mechanism.

vchuravy · 2024-05-15T05:31:13Z

base/compiler/cicache.jl

@@ -73,7 +101,7 @@ function setindex!(wvc::WorldView{InternalCodeCache}, ci::CodeInstance, mi::Meth
 end

 function code_cache(interp::AbstractInterpreter)


Should this now be code_cache(interp:: NativeInterpreter, ...)? Otherwise foreign interp will silently leak information into the native cache.

Sure, we can force external absint to explicitly declare their cache

vchuravy · 2024-05-15T05:42:31Z

I don't yet see the end design and how it will work. It may be that an example of a not DefaultSpec would be helpful for me to understand the goal.

Right now I think that owner concept and this mechanism are orthogonal to each other. In particular I am worried that the polymorphism here here will be limited since the C code only expects DefaultSpec.

Maybe the answer is for Base to provide OtherSpec that has an owner field?

My original attempt when working on the integrated cache started with adding the owner field on the MethodInstance, but that was awkward since we use mi as a "result type" of a method query and at least in the GPUCompiler world we performed first the query independently and then we use it to drive compilation.

Keno · 2024-05-15T05:49:41Z

Right now I think that owner concept and this mechanism are orthogonal to each other. In particular I am worried that the polymorphism here here will be limited since the C code only expects DefaultSpec.

The C code is mostly fine. I've been using it extensively with non-DefaultSpec things. There's a few places that still need to be updated, but it largely works fine.

Maybe the answer is for Base to provide OtherSpec that has an owner field?

That would be fine.

My original attempt when working on the integrated cache started with adding the owner field on the MethodInstance, but that was awkward since we use mi as a "result type" of a method query and at least in the GPUCompiler world we performed first the query independently and then we use it to drive compilation.

Yeah, it's slightly awkward, but you can still do that and just look through the next field as InternalCodeCache does in this PR.

Keno requested review from vchuravy, vtjnash, maleadt and aviatesk May 6, 2024 08:15

vchuravy reviewed May 6, 2024

View reviewed changes

Keno force-pushed the kf/54360 branch from b58ed0d to a3c1309 Compare May 7, 2024 01:02

Keno force-pushed the kf/mirefactor branch from 3c3378e to b56d0a7 Compare May 7, 2024 01:06

Base automatically changed from kf/54360 to master May 7, 2024 17:01

Keno mentioned this pull request May 7, 2024

Allow CodeInstance in Expr(:invoke) #52797

Closed

Keno added 6 commits May 14, 2024 05:01

Restructure MethodInstance

749eaa9

MethodInstance -> MethodSpecialization{D}

5189243

rm CodeInstance owner

3144f8e

Fixes

5d856dc

InternalCodeCache update

9ba6ef7

fixup

cb66300

Keno force-pushed the kf/mirefactor branch from 3cc128a to cb66300 Compare May 14, 2024 05:02

vchuravy reviewed May 15, 2024

View reviewed changes

vtjnash added the needs nanosoldier run This PR should have benchmarks run on it label May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Refactor MethodInstance to allow for more general specialization #54373

RFC: Refactor MethodInstance to allow for more general specialization #54373

Keno commented May 6, 2024

vchuravy left a comment

vchuravy May 6, 2024

Keno May 6, 2024

vchuravy May 7, 2024

Keno commented May 6, 2024

vchuravy commented May 7, 2024 •

edited

aviatesk commented May 14, 2024

Keno commented May 15, 2024

vchuravy May 15, 2024

Keno May 15, 2024

vchuravy commented May 15, 2024

Keno commented May 15, 2024

		@@ -73,7 +101,7 @@ function setindex!(wvc::WorldView{InternalCodeCache}, ci::CodeInstance, mi::Meth
		end

		function code_cache(interp::AbstractInterpreter)

RFC: Refactor MethodInstance to allow for more general specialization #54373

Are you sure you want to change the base?

RFC: Refactor MethodInstance to allow for more general specialization #54373

Conversation

Keno commented May 6, 2024

Overview

Motivation

Current Status

vchuravy left a comment

Choose a reason for hiding this comment

vchuravy May 6, 2024

Choose a reason for hiding this comment

Keno May 6, 2024

Choose a reason for hiding this comment

vchuravy May 7, 2024

Choose a reason for hiding this comment

Keno commented May 6, 2024

vchuravy commented May 7, 2024 • edited

aviatesk commented May 14, 2024

Keno commented May 15, 2024

vchuravy May 15, 2024

Choose a reason for hiding this comment

Keno May 15, 2024

Choose a reason for hiding this comment

vchuravy commented May 15, 2024

Keno commented May 15, 2024

vchuravy commented May 7, 2024 •

edited