Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Refactor MethodInstance to allow for more general specialization #54373

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Keno
Copy link
Member

@Keno Keno commented May 6, 2024

Overview

This refactors the base MethodInstance data structure to the following:

mutable struct MethodSpecialization{D}
     # If def is a MethodSpecialization, inherits edges from parent
     const def::Union{Module, Method, MethodSpecialization}
     specTypes::Type{<:Tuple} # To be renamed `abi` in the future
     backedges::Vector
     cache::CodeInstance
     next::MethodSpecialization # N.B.: No {D}
     data::D
end

struct DefaultSpec
    sparam_vals::SimpleVector
    inInference::UInt8
    cache_with_orig::UInt8
    precompiled::UInt8
end

struct UninferredSpec; end # Replaces owner === :uninferred

const MethodInstance = MethodSpecialization{DefaultSpecialization}

The owner field of CodeInstance is removed in favor of using
separate toplevel MethodInstances.

Motivation

This refactor aims to unify a number of recent requirements on the
internal cache. Broadly speaking, we'd like to cache (with proper
invalidation and world age semantics) several classes of data:

  1. World-age partitioned, type-specialized native code instances (the
    traditional MethodInstance/CodeInstance cache)
  2. World-age partitioned, type-specialized non-native code instances (the
    GPUCompiler use case)
  3. World-age partitioned, type-specialized non-inferred code (e.g.
    generated function results, some expensive-to-compute intermediate
    results in external absint)
  4. World-age partitioned, finer-than-type (think constant arguments,
    return values, more fancy external absint specializations) native and
    non native code instances
  5. World-age partitioned, type-specialized derived code instance (e.g.
    effect preconditions, see RFC: Effect Preconditions - or - the future of @inbounds #50641)

Now, some of these are expected to be compiled by the standard julia
execution engine (1, 4, 5), some of these have ABIs that match the
type specialization (1, 2, 3), but generally they are not all the same.
Most of these are invalidated along with the original method instance,
but not all. Additionally, some of these (1, 4, 5) have more likely more
edges than the default method-instance leading to over-invalidation.

Recently, we added the owner field to CodeInstance, which allowed us
to put all of these into the cache, but that didn't given them support
to be compiled/executed. I tried to fix that in #52797, but we didn't
manage to figure out good precompile semantics, so that stalled.

This PR pulls up the owner field one level into the type tag of
MethodSpecialization. This is partly to save the extra pointer in
every CodeInstance, but also to allow partitioning the edges between
native MethodInstances and those used by external abstract interpreters.
There's a few different usage modes:

  • The external absint sets def to a Method. In this case, the set of
    edges is completely partitioned between the internal and external
    absint and can be managed according to absint requirements (e.g. this
    makes sense if the external absint is using an overlay table)
  • The external absint sets def to another MethodSpecialization. In
    this case the set of edges is extended. This is inteded to be used
    by absints, which wrap another absint and produce more fine grained
    specializations.

Additionally, all non-MethodInstance MethodSpecializations are allowed
in Expr(:invoke). There's some TBD still for how to handle recovery
on reload, but in principle everything should just go through. This thus
closes #52797 as it addresses the same use case, but with proper edge
tracking.

Currently, there's two D tags that the runtime system uses.
DefaultSpec, which has the memoization of sparams and the various
lock bits that the runtime uses. And UninferredSpec which is a singleton
and replaces the owner === :uninferred CodeInstances introduced in #54362.
I anticipate extending this further for effect preconditions and various
finer-than-type specializations in Base.

Current Status

This PR changes the data structures, but does not yet provide the
Core.Compiler utilities for cache lookup in non-default
MethodSpecializations. That's on my immediate to do list.
Additionally, the new edge/invalidation logic described above
is not yet implemented. I also haven't tried the #52797 replacement
yet to make sure it actually works properly. I'm putting this up
as a draft to make sure that all relevant package developers have
a chance to complain if I missed something important.

Copy link
Sponsor Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently, we added the owner field to CodeInstance, which allowed us
to put all of these into the cache, but that didn't given them support
to be compiled/executed.

Making them compilable/executable is #52964 since that requires tracking the current compiler instance through dynamic dispatch.

I will need to see how this actually interacts with foreign abstract interpreters. While working on #52964 I often needed to know the origin of a CodeInstance. I suppose in your proposal we would add a GPUSpec or ForeignSpec? But right-now in many places the code assumes that it DefaultSpec.

Part of my goal with owner is that we could re-use the code in many places and reduce the reliance on solutions like spinning up a separate OrcJIT.

One ask that Jeff had for #52964 if we could provide a "separate MethodTable" and outside of generic dispatch that currently works.

Comment on lines -185 to -188
if (codeinst->owner != jl_nothing) {
// TODO(vchuravy) native code caching for foreign interpreters
}
else if (jl_atomic_load_relaxed(&codeinst->invoke) != jl_fptr_const_return) {
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed so that CUDA or other foreign CodeInstances don't bled into the native cache. I am unsure how this alternative handles this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to only handle the DefaultSpec here. Additional work may be required in various places to disambiguate.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you will have to filter out everything that isn't "DefaultSpec"

@Keno
Copy link
Member Author

Keno commented May 6, 2024

Making them compilable/executable is #52964 since that requires tracking the current compiler instance through dynamic dispatch.

It requires tracking them trough dynamic dispatch if you want dynamic dispatch to do something other than the default. That's an orthogonal feature to this. This just lets you have multiple specializations with different ABIs for one particular method. They may :invoke each other, but if they dynamic dispatch, that still goes through the default compiler. #52964 is still desirable, but a separate concern (though it may be able to re-use part of the mechanism here).

I suppose in your proposal we would add a GPUSpec or ForeignSpec?

Yes

But right-now in many places the code assumes that it DefaultSpec.

Yes, I need to go through and disambiguate the assumptions, but that's a fair bit of work, so I wanted to get agreement on the direction first.

Part of my goal with owner is that we could re-use the code in many places and reduce the reliance on solutions like spinning up a separate OrcJIT.

That's still in scope, but again, not addressed by this. I think the cleanest way to address that is to also make CodeInstance parametric, so that the .inferred field can be something other than julia IR (e.g. LLVM IR) with corresponding helpers that know how to serialize that.

@vchuravy
Copy link
Sponsor Member

vchuravy commented May 7, 2024

That's still in scope, but again, not addressed by this. I think the cleanest way to address that is to also make CodeInstance parametric, so that the .inferred field can be something other than julia IR (e.g. LLVM IR) with corresponding helpers that know how to serialize that.

Yeah the challenge here is to figure out what fields have meaning and what is needed.

What are the requirements for something going through the native pipeline (but not wanting to poison it). The ci->owner goal was to maximize re-use up until it becomes hard.

Almost everyone reuses inferred to store the result of the high-level pipeline.
Most GPUCompiler targets don't need specptr & co, but Enzyme would reuse those
if we add the option to customize the LLVM pipeline.

Base automatically changed from kf/54360 to master May 7, 2024 17:01
@aviatesk
Copy link
Sponsor Member

There's a few different usage modes:

  • The external absint sets def to a Method. In this case, the set of
    edges is completely partitioned between the internal and external
    absint and can be managed according to absint requirements (e.g. this
    makes sense if the external absint is using an overlay table)
  • The external absint sets def to another MethodSpecialization. In
    this case the set of edges is extended. This is inteded to be used
    by absints, which wrap another absint and produce more fine grained
    specializations.

Is this behavior already implemented, or is it something that needs to be implemented?

Also, I'm curious about how this change should affect the existing cache overload system based on cache_owner. If external abstract interpreters use InternalCodeCache, it seems to be necessary anymore. And if it uses an external code cache, only code_cache will be needed, so this makes the system look like how it was before.

@Keno
Copy link
Member Author

Keno commented May 15, 2024

Is this behavior already implemented, or is it something that needs to be implemented?

It's partially implemented. Some cachine/invalidating/precompile logic is missing.

Also, I'm curious about how this change should affect the existing cache overload system based on cache_owner

InternalCodeCache takes a specialization type, which replaces the cache_owner mechanism.

@@ -73,7 +101,7 @@ function setindex!(wvc::WorldView{InternalCodeCache}, ci::CodeInstance, mi::Meth
end

function code_cache(interp::AbstractInterpreter)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this now be code_cache(interp:: NativeInterpreter, ...)? Otherwise foreign interp will silently leak information into the native cache.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can force external absint to explicitly declare their cache

@vchuravy
Copy link
Sponsor Member

I don't yet see the end design and how it will work. It may be that an example of a not DefaultSpec would be helpful for me to understand the goal.

Right now I think that owner concept and this mechanism are orthogonal to each other. In particular I am worried that the polymorphism here here will be limited since the C code only expects DefaultSpec.

Maybe the answer is for Base to provide OtherSpec that has an owner field?

My original attempt when working on the integrated cache started with adding the owner field on the MethodInstance, but that was awkward since we use mi as a "result type" of a method query and at least in the GPUCompiler world we performed first the query independently and then we use it to drive compilation.

@Keno
Copy link
Member Author

Keno commented May 15, 2024

Right now I think that owner concept and this mechanism are orthogonal to each other. In particular I am worried that the polymorphism here here will be limited since the C code only expects DefaultSpec.

The C code is mostly fine. I've been using it extensively with non-DefaultSpec things. There's a few places that still need to be updated, but it largely works fine.

Maybe the answer is for Base to provide OtherSpec that has an owner field?

That would be fine.

My original attempt when working on the integrated cache started with adding the owner field on the MethodInstance, but that was awkward since we use mi as a "result type" of a method query and at least in the GPUCompiler world we performed first the query independently and then we use it to drive compilation.

Yeah, it's slightly awkward, but you can still do that and just look through the next field as InternalCodeCache does in this PR.

@vtjnash vtjnash added the needs nanosoldier run This PR should have benchmarks run on it label May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs nanosoldier run This PR should have benchmarks run on it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants