Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic use of StaticArrays and begin ... end leads to GC error (probable corruption) #54422

Open
kbarros opened this issue May 9, 2024 · 6 comments · May be fixed by #54433
Open

Basic use of StaticArrays and begin ... end leads to GC error (probable corruption) #54422

kbarros opened this issue May 9, 2024 · 6 comments · May be fixed by #54433
Labels
backport 1.10 Change should be backported to the 1.10 release backport 1.11 Change should be backported to release-1.11 compiler:codegen Generation of LLVM IR and native code GC Garbage collector kind:bug Indicates an unexpected problem or unintended behavior

Comments

@kbarros
Copy link

kbarros commented May 9, 2024

Version Info

Can reproduce with both Julia 1.10.3 and 1.11.0-beta1 with StaticArrays@1.9.3. Can not reproduce on Julia 1.9. Reproduced on two different Mac laptops.

julia> versioninfo()
Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 8 × Apple M1 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 6 virtual cores)

Installed with juliaup:

juliaup status
 Default  Channel  Version                                Update 
-----------------------------------------------------------------
          1.9      1.9.4+0.aarch64.apple.darwin14                
          beta     1.11.0-beta1+0.aarch64.apple.darwin14         
       *  release  1.10.3+0.aarch64.apple.darwin14    

MWE

Executing the following in Julia 1.10 or 1.11 will (usually) crash the terminal with a GC error (probable corruption).

# minimal_script_for_crash.jl

using StaticArrays

dims = (2, 2, 2)
si = SVector(3.2)

B = zeros(1)
A = zeros(1)

GC.gc()

begin
    dims = (2, 2, 2)
    for i in eachindex(A)
        B[i] = A[i]
    end
end

GC.gc()
Toggle for full error message.
GC error (probable corruption)
Allocations: 918415 (Pool: 917408; Big: 1007); GC: 2
<?#0x109498aa0::0x0>

thread 0 ptr queue:
~~~~~~~~~~ ptr queue top ~~~~~~~~~~
Task(next=nothing, queue=nothing, storage=Base.IdDict{Any, Any}(ht=Array{Any, (32,)}[
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  :SOURCE_PATH,
  "/Users/kbarros/Desktop/debug/minimal_script_for_crash.jl",
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>], count=1, ndel=8), donenotify=nothing, result=nothing, logstate=nothing, code=#<null>, rngState0=0xc24ed66f14aaa175, rngState1=0x3a832cbdd1879a3b, rngState2=0xc0df06be2ad7e4ec, rngState3=0x7a8ff91b83216bb0, rngState4=0xa0157dbde05f2e94, _state=0x00, sticky=true, _isexception=false, priority=0x0000)
==========
Core.Binding(value=Main, globalref=Main.Main, owner=<circular reference @-1>, ty=Any, flags=0x03)
==========
Core.Binding(value=Core, globalref=Main.Core, owner=<circular reference @-1>, ty=Any, flags=0x01)
==========
Core.Binding(value=#<null>, globalref=Main.getfield, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=Base, globalref=Main.Base, owner=<circular reference @-1>, ty=#<null>, flags=0x01)
==========
Core.Binding(value=#<null>, globalref=Main.Float64, owner=Core.Binding(value=Float64, globalref=Core.Float64, owner=<circular reference @-1>, ty=Any, flags=0x03), ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.Float32, owner=Core.Binding(value=Float32, globalref=Core.Float32, owner=<circular reference @-1>, ty=Any, flags=0x03), ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.Float16, owner=Core.Binding(value=Float16, globalref=Core.Float16, owner=<circular reference @-1>, ty=Any, flags=0x03), ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.remotecall_wait, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.Distributed, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.procs, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=Base.IdDict{Any, Any}(ht=Array{Any, (32,)}[
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  Base.Docs.Binding(mod=Main, var=:Main),
  Base.Docs.MultiDoc(order=Array{Type, (1,)}[Union{}], docs=Base.IdDict{Any, Any}(ht=Array{Any, (32,)}[
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  Union{},
  Base.Docs.DocStr(text=svec("    Main

`Main` is the top-level module, and Julia starts with `Main` set as the current module.  Variables defined at the prompt go in `Main`, and `varinfo` lists variables in `Main`.
```jldoctest
julia> @__MODULE__
Main
```
"), object=nothing, data=Base.Dict{Symbol, Any}(slots=Array{UInt8, (16,)}[0x00, 0xc2, 0x00, 0xa8, 0xda, 0xbb, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00], keys=Array{Symbol, (16,)}[
  #<null>,
  :typesig,
  #<null>,
  :module,
  :linenumber,
  :binding,
  :path,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>], vals=Array{Any, (16,)}[
  #<null>,
  Union{},
  #<null>,
  Base.BaseDocs,
  3165,
  Base.Docs.Binding(mod=Main, var=:Main),
  "docs/basedocs.jl",
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>], ndel=0, count=5, age=0x0000000000000007, idxfloor=2, maxprobe=1)),
  #<null>,
  #<null>], count=1, ndel=0)),
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>,
  #<null>], count=1, ndel=0), globalref=Main.:(##meta#58), owner=<circular reference @-1>, ty=Any, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.eval, owner=Core.Binding(value=Base.MainInclude.eval, globalref=Base.MainInclude.eval, owner=<circular reference @-1>, ty=#<null>, flags=0x01), ty=#<null>, flags=0x04)
==========
Core.Binding(value=#<null>, globalref=Main.include, owner=Core.Binding(value=Base.MainInclude.include, globalref=Base.MainInclude.include, owner=<circular reference @-1>, ty=#<null>, flags=0x01), ty=#<null>, flags=0x04)
==========
Core.Binding(value=#<null>, globalref=Main.:(@__MODULE__), owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.:(@__FILE__), owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.abspath, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.pushfirst!, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.GC, owner=Core.Binding(value=Base.GC, globalref=Base.GC, owner=<circular reference @-1>, ty=#<null>, flags=0x03), ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.:(@eval), owner=Core.Binding(value=Base.var"@eval", globalref=Base.:(@eval), owner=<circular reference @-1>, ty=#<null>, flags=0x03), ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.:(@elapsed), owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.current_task, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.maximum, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.textwidth, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.string, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.Module, owner=Core.Binding(value=Module, globalref=Core.Module, owner=<circular reference @-1>, ty=Any, flags=0x03), ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.:(-), owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.:(^), owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.:(*), owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.:(==), owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.:(:), owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.push!, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.empty!, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.LOAD_PATH, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.Cvoid, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.isfile, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.:(+), owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.println, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.print, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.stdout, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.:(=>), owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.IOContext, owner=Core.Binding(value=Base.IOContext{IO_t} where IO_t<:IO, globalref=Base.IOContext, owner=<circular reference @-1>, ty=Any, flags=0x03), ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.:(/), owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.show, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.DEPOT_PATH, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.Sys, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.Int64, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.IOBuffer, owner=Core.Binding(value=Base.GenericIOBuffer{Array{UInt8, 1}}, globalref=Base.IOBuffer, owner=<circular reference @-1>, ty=Any, flags=0x03), ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.:(@warn), owner=Core.Binding(value=Base.CoreLogging.var"@warn", globalref=Base.CoreLogging.:(@warn), owner=<circular reference @-1>, ty=#<null>, flags=0x03), ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.IJulia, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=#<null>, globalref=Main.Atom, owner=#<null>, ty=#<null>, flags=0x00)
==========
Core.Binding(value=Revise, globalref=Main.Revise, owner=<circular reference @-1>, ty=Any, flags=0x05)
==========
Core.Binding(value=Infiltrator, globalref=Main.Infiltrator, owner=<circular reference @-1>, ty=Any, flags=0x05)
==========
Core.Binding(value=Main.template, globalref=Main.template, owner=<circular reference @-1>, ty=#<null>, flags=0x01)
==========
Core.Binding(value=typeof(Main.template), globalref=Main.:(#template), owner=<circular reference @-1>, ty=Any, flags=0x01)
==========
Core.Binding(value=StaticArrays, globalref=Main.StaticArrays, owner=<circular reference @-1>, ty=Any, flags=0x05)
==========
~~~~~~~~~~ ptr queue bottom ~~~~~~~~~~

[67809] signal (6): Abort trap: 6
in expression starting at /Users/kbarros/Desktop/debug/minimal_script_for_crash.jl:20
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 918415 (Pool: 917408; Big: 1007); GC: 2
zsh: abort      julia minimal_script_for_crash.jl

The use of GC.gc() makes the crashes more reproducible, but we originally observed this "in the wild" without those calls.

Almost any tweak to this code will make the crash go away. For example, removing thebegin ... end wrapper around the final block, changing the SVector to a Tuple allocation, or removing the two assignments to dims.

@kbarros kbarros changed the title Basic use of StaticArrays leads to GC error (probable corruption) Basic use of StaticArrays and begin ... end leads to GC error (probable corruption) May 9, 2024
@vtjnash
Copy link
Sponsor Member

vtjnash commented May 9, 2024

It looks like we did something odd when assigning (2, 2, 2) (allocated during inference) to a binding (at runtime), as it was GC'd in the meantime:

(rr) p jl_(obj8_parent)
Core.Binding(value=<?#0x7fce30124690::<?#0x7fce301246a0::(nil)>>, globalref=Main.dims, owner=<circular reference @-1>, ty=Any, flags=0x00)

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 9, 2024

I think what I observed happened here is that llvm-late-lowering.cpp assumes code must be permanently allocated, so it skips emitting the expected write barrier for this store (of a constant value into the slot for the binding value). That object then gets GC'd after it gets stored there, and then the next GC segfaults on the bad object in that slot. I couldn't make a small reproducer, but can show a similar example of the problematic IR pattern:

julia> f() = (global dims = (2,2,2); nothing);

julia> @code_llvm raw=true f()
; Function Signature: f()
;  @ REPL[16]:1 within `f`
define swiftcc void @julia_f_1476(ptr nonnull swiftself %pgcstack) #0 !dbg !5 {
top:
  store atomic ptr @"jl_global#1479.jit", ptr @"*Main.dims#1478.jit" release, align 128, !dbg !20, !tbaa !21, !alias.scope !24, !noalias !27
  ret void, !dbg !20
}

@gbaraldi
Copy link
Member

gbaraldi commented May 9, 2024

It also gets 128 aligned which is bad but fixed on master. I'm not sure it it's missing a write barrier or if the code here is missing a root, because if the code can keep a reference to it in any way it has to be perma rooted.

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 9, 2024

MWE

global dims # allocate the Binding
GC.gc(); GC.gc(); # force the binding to be old
GC.enable(false); # prevent new objects from being old
@eval begin
       Base.Experimental.@force_compile # use the compiler
       dims = $([])
       nothing
end
GC.enable(true); GC.gc(false) # incremental collection
@show dims # any interaction will `dims` will likely cause a segfault (or other incorrect results) from somewhere

@KristofferC
Copy link
Sponsor Member

KristofferC commented May 9, 2024

any interaction will dims will likely cause a segfault (or other incorrect results) from somewhere

julia> global dims # allocate the Binding

julia> GC.gc(); GC.gc(); # force the binding to be old


julia> GC.enable(false); # prevent new objects from being old

julia> @eval begin
              Base.Experimental.@force_compile # use the compiler
              dims = $([])
              nothing
       end

julia> GC.enable(true); GC.gc(false) # incremental collection

julia> @show dims
dims = Any[]
Any[]

julia> dims
Any[]

julia> dims
Any[]

julia> dims
Any[]

julia> dims
Any[]

on v"1.11.0-beta1"

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 9, 2024

ah, I tried to simplify too much. You need to first assign a type to dims to trigger it.

julia> global dims = [] # allocate the Binding
Any[]

julia> GC.gc(); GC.gc(); # force the binding to be old

julia> GC.enable(false); # prevent new objects from being old

julia> @eval begin
              Base.Experimental.@force_compile # use the compiler
              dims = $([])
              nothing
       end

julia> GC.enable(true); GC.gc(false) # incremental collection

julia> dims
"enumerate" 😱 

@vtjnash vtjnash added compiler:codegen Generation of LLVM IR and native code GC Garbage collector kind:bug Indicates an unexpected problem or unintended behavior backport 1.10 Change should be backported to the 1.10 release backport 1.11 Change should be backported to release-1.11 labels May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.10 Change should be backported to the 1.10 release backport 1.11 Change should be backported to release-1.11 compiler:codegen Generation of LLVM IR and native code GC Garbage collector kind:bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants