Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual methods that return pointers should only be implementable through an unsafe trait #649

Open
lilizoey opened this issue Mar 27, 2024 · 17 comments
Labels
bug c: register Register classes, functions and other symbols to GDScript ub Undefined behavior

Comments

@lilizoey
Copy link
Member

Currently when generating virtual methods that take pointers as arguments or return pointers we label the method itself unsafe. However generally speaking, a method that returns a pointer shouldn't cause any UB if it's called wrong. For instance Box::into_raw is a safe method. So it's wrong (though generally harmless) for methods that only return pointers, but doesn't take them as arguments, to be labelled as unsafe.

However when someone implements this trait, they must ensure that the pointer they return from the method satisfies certain criteria. Otherwise they may cause UB. This means that the trait the method is in must be an unsafe trait.

@lilizoey lilizoey added bug c: register Register classes, functions and other symbols to GDScript ub Undefined behavior labels Mar 27, 2024
@lilizoey
Copy link
Member Author

I can see a couple of solutions to this:

  1. Label the I* trait that these methods are in as unsafe
  2. Create a separate trait called something like UnsafeI* for these methods.
  3. Don't codegen methods like this, instead we manually check each one and create a custom virtual method

We should check how many methods this applies to. If there are very few then the third option would be fairly viable, because sometimes we can create a safe custom virtual method (see for instance my comment on IScriptExtension::instance_create). However if there are a lot then one of the first two options may be better.

The first option is definitely simplest, but could cause an issue where someone who doesn't care about overriding the methods in question would suddenly need to add unsafe to implement a trait where this isn't even required at all.

The second option limits how much people need to use unsafe but could make it harder for people to find the methods. However these are unsafe APIs so it may be fine to just assume that people who want to use these lower level apis need to dig a bit more to find them.

@Bromeon
Copy link
Member

Bromeon commented Mar 27, 2024

However when someone implements this trait, they must ensure that the pointer they return from the method satisfies certain criteria. Otherwise they may cause UB. This means that the trait the method is in must be an unsafe trait.

I wrote a comment about this here:

// Functions are marked unsafe as soon as raw pointers are involved, irrespectively of whether they appear in parameter or return type
// position. In cases of virtual functions called by Godot, a returned pointer must be valid and of the expected type. It might be possible
// to only use `unsafe` for pointers in parameters (for outbound calls), and in return values (for virtual calls). Or technically more
// correct, make the entire trait unsafe as soon as one function can return pointers, but that's very unergonomic and non-local.
// Thus, let's keep things simple and more conservative.


  1. Label the I* trait that these methods are in as unsafe

While "technically correct", this approach is super impractical:

  1. A user that never overrides the method returning pointers still has to mark the interface trait unsafe.
  2. If Godot in the future adds a method that returns pointers, this breaks all existing impl code.
  3. It is non-local. If you mark the trait unsafe and have 10 overridden methods, of which one involves actual unsafety, how do you know which method needs to take extra care?

  1. Create a separate trait called something like UnsafeI* for these methods.

That might be an option, but we'd need to see how much this bloats the API. Since virtual methods are duplicated across all derived classes, this can cause a proliferation of extra traits just in support of one method.


  1. Don't codegen methods like this, instead we manually check each one and create a custom virtual method

I like this 🙂 For the user, it would likely be more ergonomic if no unsafety is involved at all. But I could imagine there are cases where this can be desired for more low-level control; would need to look more closely into the affected APIs.


I also have another possible idea:

  1. Change the signature so that returning the pointers becomes an unsafe operation, rather than declaring the method.
// Instead of:
fn instance_create(&self, for_object: Gd<Object>) -> *mut c_void {
    do_thing(self, for_object)
}

// We would have:
fn instance_create(&self, for_object: Gd<Object>) -> Checked<*mut c_void> {
    let ptr: *mut c_void = do_thing(self, for_object);
    unsafe { Checked::from_raw(ptr) }
}

Not the nicest, but at least it would re-introduce locality and could be applied in a generic way.

Of course this would not necessarily exclude your suggestion 3). We could also combine the two (either by manually selecting some methods for which we have a safe counterpart, and remove them; or keep both and check that only one is overridden). But this depends a bit on how often such methods occur.

@lilizoey
Copy link
Member Author

However when someone implements this trait, they must ensure that the pointer they return from the method satisfies certain criteria. Otherwise they may cause UB. This means that the trait the method is in must be an unsafe trait.

I wrote a comment about this here:

// Functions are marked unsafe as soon as raw pointers are involved, irrespectively of whether they appear in parameter or return type
// position. In cases of virtual functions called by Godot, a returned pointer must be valid and of the expected type. It might be possible
// to only use `unsafe` for pointers in parameters (for outbound calls), and in return values (for virtual calls). Or technically more
// correct, make the entire trait unsafe as soon as one function can return pointers, but that's very unergonomic and non-local.
// Thus, let's keep things simple and more conservative.

i hadn't seen that before. i do disagree with this being "technically more correct", it is in fact currently incorrect and making the trait unsafe is one of several correct options. soundness of an api isn't a matter of degree really. except ig in how many different ways you can break it, but that's just varying degrees of incorrect not varying degrees of correctness. a particular api either is sound, all other cases are unsound.

regarding Checked

i dont think that fourth option really works to be honest. because like what is the actual safety invariant of Checked here? would it be something like
"when returned from a function, this pointer is known to be safe to use in the context that the return value of that function is intended to be used in"? that sounds incredibly vague and not very useful. it also means that now the safety invariant of Checked is tied to every place it is used in gdext.

it's also not the case that constructing the Checked can actually ever lead to any UB, the UB happens when the Checked is returned. and that's something we can't make require an unsafe block.

i dont think it's actually possible to create a coherent safety invariant for that type which doesn't just become exhaustively enumerating every possible method it can be returned from.

I'd much rather do option 2 or even option 1 over that one.

@Bromeon
Copy link
Member

Bromeon commented Mar 27, 2024

i do disagree with this being "technically more correct", it is in fact currently incorrect [...] soundness of an api isn't a matter of degree really

Not sure if you're getting hung up on the "more" here, but this incorrectness is the exact reason why I'm looking into the different alternatives now 😉 I just didn't feel like solving all problems at once, which is why I wrote that comment.

"Technically correct" still is "correct", but the context is that it's impractical to blindly follow rules without thinking about the further API implications. Instead, we should look for a solution which is both correct and ergonomic (which 1) isn't). I'm not advocating to sacrifice correctness for ergonomics.


it's also not the case that constructing the Checked can actually ever lead to any UB, the UB happens when the Checked is returned. and that's something we can't make require an unsafe block.

That's not how unsafe works though. If you violate invariants, the UB doesn't have to appear during the operation, it can occur in the future through a causal relation.

Example: Vec::set_len()


would [the safety invariant] be something like "when returned from a function, this pointer is known to be safe to use in the context that the return value of that function is intended to be used in"? that sounds incredibly vague and not very useful.

Context is important -- people would only encounter Checked in such situations. It doesn't need to be useful anywhere else.
And it can be named more explicitly, of course.


i dont think it's actually possible to create a coherent safety invariant for that type which doesn't just become exhaustively enumerating every possible method it can be returned from.

I would not even start trying this. It's fine to be vague, people need to understand the context when using it. I really don't see the problem.


I'd much rather do option 2 or even option 1 over that one.

I don't see option 1) happening, for the stated reasons.

Option 2) can be discussed but this being a niche use case, I'm not sure if someone truly benefits from extra traits. On the contrary, this may become harder to discover as people wonder where certain methods are, etc.

And code organization wise, it's still non-local. If I have multiple unsafe methods, then I need to describe their contracts in the trait description, for each:

// SAFETY:
// - why method1 is correct
// - why method2 is correct
unsafe impl UnsafeIThing for MyClass {
    fn method1() -> void* { 
        do_thing()
    }

    fn method2() -> void* { 
        do_thing()
    }
}

instead of near the place where the actual unsafety can occur:

impl IThing for MyClass {
    fn method1() -> Checked<void*> { 
        // SAFETY: why the return value is correct
        unsafe { Checked::from_raw(do_thing()) }
    }

    fn method2() -> Checked<void*> { 
        // SAFETY: why the return value is correct
        unsafe { Checked::from_raw(do_thing()) }
    }
}

Conceptually, each method has its own invariant, there is no inherent unsafety in the trait itself. To demonstrate this, one can implement the trait and override nothing, in which case there is no invariant to uphold.

Per-method unsafe blocks also avoid the problem that implementing a new method makes people forget updating the trait safety docs, etc.

@lilizoey
Copy link
Member Author

lilizoey commented Mar 27, 2024

it's also not the case that constructing the Checked can actually ever lead to any UB, the UB happens when the Checked is returned. and that's something we can't make require an unsafe block.

That's not how unsafe works though. If you violate invariants, the UB doesn't have to appear during the operation, it can occur in the future through a causal relation.

Example: Vec::set_len()

My point was that we can choose where to put safety obligations, and the obligation should be returning this type. But we can't do that without making the trait unsafe.

would [the safety invariant] be something like "when returned from a function, this pointer is known to be safe to use in the context that the return value of that function is intended to be used in"? that sounds incredibly vague and not very useful.

Context is important -- people would only encounter Checked in such situations. It doesn't need to be useful anywhere else. And it can be named more explicitly, of course.

i dont think it's actually possible to create a coherent safety invariant for that type which doesn't just become exhaustively enumerating every possible method it can be returned from.

I would not even start trying this. It's fine to be vague, people need to understand the context when using it. I really don't see the problem.

I'd much rather do option 2 or even option 1 over that one.

I don't see option 1) happening, for the stated reasons.

Option 2) can be discussed but this being a niche use case, I'm not sure if someone truly benefits from extra traits. On the contrary, this may become harder to discover as people wonder where certain methods are, etc.

And code organization wise, it's still non-local. If I have multiple unsafe methods, then I need to describe their contracts in the trait description, for each:

// SAFETY:
// - why method1 is correct
// - why method2 is correct
unsafe impl UnsafeIThing for MyClass {
    fn method1() -> void* { 
        do_thing()
    }

    fn method2() -> void* { 
        do_thing()
    }
}

instead of near the place where the actual unsafety can occur:

impl IThing for MyClass {
    fn method1() -> Checked<void*> { 
        // SAFETY: why the return value is correct
        unsafe { Checked::from_raw(do_thing()) }
    }

    fn method2() -> Checked<void*> { 
        // SAFETY: why the return value is correct
        unsafe { Checked::from_raw(do_thing()) }
    }
}

Conceptually, each method has its own invariant, there is no inherent unsafety in the trait itself. To demonstrate this, one can implement the trait and override nothing, in which case there is no invariant to uphold.

Per-method unsafe blocks also avoid the problem that implementing a new method makes people forget updating the trait safety docs, etc.

Safe methods cannot have safety invariants on their implementation unless they are in a unsafe trait. What you're suggesting is removing the safety invariants imposed on implementing the methods, and instead requiring the user to return a type that must fulfill some safety invariant separate to the method.

Conceptually i am fine with this, for instance making instance_create return an impl ScriptInstance removes that safety invariant from the trait. This specific case is also the ideal case where it's feasible, since now the method is completely safe by construction, no unsafe needed.

What i am against here is making a catch-all such return type. Because the safety invariant of a function should ideally be as minimal as possible. To make it feasible to understand and check the uses of unsafe that such a function imposes on the user. But you're suggesting making an extremely broad safety invariant which encompasses any method that happens to return a pointer.

At the very least if we're gonna do this, we can codegen separate like CheckedMethodNameHere which is specific to each method that needs it.

@Bromeon
Copy link
Member

Bromeon commented Mar 27, 2024

From a quick look, it seems only 5 virtual methods have pointer returns?

trait IScriptExtension {
  fn instance_create(&self, for_object: Gd<Object>)             -> *mut c_void;
  fn placeholder_instance_create(&self, for_object: Gd<Object>) -> *mut c_void;
}

trait ITextServerExtension {
  fn shaped_text_get_glyphs(&self, shaped: Rid)          -> *const Glyph;
  fn shaped_text_sort_logical(&self, shaped: Rid)        -> *const Glyph;
  fn shaped_text_get_ellipsis_glyphs(&self, shaped: Rid) -> *const Glyph;
}

See Godot docs for ScriptExtension and TextServerExtension.


The first 2 methods create new independent objects, which makes a safe abstraction a bit easier.

The second 3 however seem to return a pointer to internal glyphs (there are no docs, so I have to guess) -- which means they are coupled to a lifetime which we don't know.

A safe abstraction like this might be possible, but it would need more thorough investigation?

fn shaped_text_get_glyphs(&'a self, shaped: Rid)          -> &'a [Glyph];
// (lifetime for explicitness, can be elided)

@Bromeon
Copy link
Member

Bromeon commented Mar 27, 2024

(Btw, unrelated to the discussion here: I just noticed that functions accepting Rid parameters would need to be unsafe to call. But it's a simpler problem to address).

@lilizoey
Copy link
Member Author

ok so that does mean that option 3 is likely entirely viable which makes the discussion partially moot. we could still have option 2 as a fallback in cases where we havent made a custom impl.

@Bromeon
Copy link
Member

Bromeon commented Mar 28, 2024

Yeah, if we can get by with safe methods without losing any functionality, that would be the best case scenario 🙂

Maybe we could add a mechanism that stops nightly CI when there are new methods that don't have a safe counterpart, otherwise we'll forget those.


Regarding the above case, I dug a bit in the Godot source code, and the returned glyph pointers are indirectly used as follows:

TypedArray<Dictionary> TextServer::_shaped_text_get_glyphs_wrapper(const RID &p_shaped) const {
	TypedArray<Dictionary> ret;

	const Glyph *glyphs = shaped_text_get_glyphs(p_shaped);
	int gl_size = shaped_text_get_glyph_count(p_shaped);
	for (int i = 0; i < gl_size; i++) {
		Dictionary glyph;

		glyph["start"] = glyphs[i].start;
		glyph["end"] = glyphs[i].end;
		glyph["repeat"] = glyphs[i].repeat;
		glyph["count"] = glyphs[i].count;
		glyph["flags"] = glyphs[i].flags;
		glyph["offset"] = Vector2(glyphs[i].x_off, glyphs[i].y_off);
		glyph["advance"] = glyphs[i].advance;
		glyph["font_rid"] = glyphs[i].font_rid;
		glyph["font_size"] = glyphs[i].font_size;
		glyph["index"] = glyphs[i].index;

		ret.push_back(glyph);
	}

	return ret;
}

So it really looks like the pointers are just used for temporary reading, and all the contents are copied. Returning a slice with lifetime of self should thus be sound.

We can add an assertion that shaped_text_get_glyph_count() matches the length of the slice, to prevent out-of-bounds access. (Or implementing the count method ourselves, but that might be too intrusive...)

@lilizoey
Copy link
Member Author

lilizoey commented Mar 28, 2024

unfortunately we cannot rely on a user-generated count method always returning the right length of such a slice, since there's nothing stopping a user from just returning different lengths every single time. and if we were to check the length every time when calling the method then that's basically the same as us writing the count method but with extra steps.

so that leaves either us making the count method or an unsafe trait.

@Bromeon
Copy link
Member

Bromeon commented Mar 28, 2024

This should probably be fine:

// overridden by user
fn shaped_text_get_glyphs(&self, shaped: Rid) -> &[Glyph]; 

// hidden outside trait, generated when user overrides the above
fn shaped_text_get_glyph_count(&self, shaped: Rid) -> i64 {
    self.shaped_text_get_glyphs(shaped).len() as i64
}

Since the former returns a reference, it's unlikely to incur expensive computations, so calling it twice shouldn't be an issue.

@lilizoey
Copy link
Member Author

lilizoey commented Mar 28, 2024

the user is still free to return differently sized slices from shaped_text_get_glyphs for the same Rid, which could cause UB in this case.

@Bromeon
Copy link
Member

Bromeon commented Mar 28, 2024

A supervillain could theoretically implement shaped_text_get_glyphs() in a way that its size changes between the two invocations. The only way to protect against this would be to cache the size across the two Godot calls. However that is also quite brittle, as it relies on the current Godot implementation.

Currently we have the following call sequences:

// _shaped_text_get_glyphs_wrapper
	const Glyph *glyphs = shaped_text_get_glyphs(p_shaped);
	int gl_size = shaped_text_get_glyph_count(p_shaped);

// _shaped_text_sort_logical_wrapper
	const Glyph *glyphs = shaped_text_sort_logical(p_shaped);
	int gl_size = shaped_text_get_glyph_count(p_shaped);

// _shaped_text_get_ellipsis_glyphs_wrapper
	const Glyph *glyphs = shaped_text_get_ellipsis_glyphs(p_shaped);
	int gl_size = shaped_text_get_ellipsis_glyph_count(p_shaped);

And of course people can query the count directly.

I don't like where this is heading 🙈 a ton of work for an API that might not ever be used at all

@lilizoey
Copy link
Member Author

lilizoey commented Mar 28, 2024

if we're gonna fix this to be correct we should actually make it correct. if we can't find some good relatively simple way to do this correctly then we should just go for something like option 2. (even option 1 may be fine since it's gonna be a single rarely-used virtual method trait that'll be affected)

half-assing the correctness here is gonna give people false confidence. since it'll look like we've taken extra care to make this particular api completely sound, when in reality we've not. it'd arguably be better to have something that is obviously unsound at that point.

@Bromeon
Copy link
Member

Bromeon commented Mar 28, 2024

This problem is almost certainly not limited to methods returning pointers. There are tons of APIs that return some sort of count, or multiple arrays that are expected to have the same length. E.g. font_get_face_count() is used as an iteration bound. The index variable face_idx is then stored in best_match, which later is assigned to sysf.index, and who knows where that data flows from there. Can we be sure that providing logically incorrect results would never cause UB?

Given the vast implementation, how can we be sure that logic errors never result in UB?


half-assing the correctness here is gonna give people false confidence. since it'll look like we've taken extra care to make this particular apiccompletely sound, when in reality we've not. it'd arguably be better to have something that is obviously unsound at that point.

What's your suggestion then? We would need to conservatively make every single interface trait unsafe because we never know what invariants Godot relies on. Even if we know things are perfectly safe right now, Godot can change implementation in ways that invalidate our understanding.

And making everything unsafe is a cheap cop-out, it doesn't help the user in the slightest way to make their code more robust.

@Bromeon
Copy link
Member

Bromeon commented Mar 28, 2024

Maybe a useful delineation is "known to cause UB if incorrect".

The methods returning pointers can definitely cause UB, so making it unsafe to implement them is standing to reason. If we identify other virtual methods that have this property in the future, we can extend the unsafe area.

So what remains is:

  • should we still return slices?
  • which approach to use 😀

@lilizoey
Copy link
Member Author

Maybe a useful delineation is "known to cause UB if incorrect".

yeah basically.

The methods returning pointers can definitely cause UB, so making it unsafe to implement them is standing to reason. If we identify other virtual methods that have this property in the future, we can extend the unsafe area.

So what remains is:

* should we still return slices?

* which approach to use 😀

if we want to provide a safe abstraction over it then there are a lot of options, but all of them have their own drawbacks. the main issue is that we need to find the length given a pointer to the array, and there isn't a very good way to do that in rust. (well if you have a pointer to a slice it's easy, but we have a pointer to the data in the slice not a pointer to the slice itself)

some examples (these are things i've considered for property lists which have many of the same issues):

  1. use a #[repr(C)] DST header struct with the data stored inline after the length, drawback: there is no stable way to construct a value of such a type with a dynamic length, it can only be constructed from an array
  2. pass a null-terminated list, drawback: need to traverse the entire list to find the length
  3. cache the length, drawback: where do we cache it? when do we invalidate the cache? also extra memory use

there isn't really a perfect solution for that case. so i do think the best would be to put it into an unsafe trait, since this is an issue that is easily solvable by just implementing the functions the intended way.

making the methods return slices could help guide people towards doing the right thing better so i'd be in favor of that. but it also isn't super important.

having the ability to codegen a subset of some virtual method trait into a separate unsafe trait could be useful in the future too if this comes up again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug c: register Register classes, functions and other symbols to GDScript ub Undefined behavior
Projects
None yet
Development

No branches or pull requests

2 participants