Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use objc2-metal #5641

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from
Open

Use objc2-metal #5641

wants to merge 2 commits into from

Conversation

madsmtm
Copy link
Contributor

@madsmtm madsmtm commented Apr 30, 2024

Description

Use the objc2-metal crate instead of the metal crate. This will:

  • Improve memory management and soundness.
  • Make it easier to quickly support new Metal APIs (they're either already generated for you, or is basically just an update of Xcode away).
  • Likely allow reducing the usage of Arc and Mutex, as Metal objects are already reference-counted (depending on thread-safety details, not entirely sure).
  • Likely improve performance, we use objc_retainAutoreleasedReturnValue underneath the hood to avoid putting objects into the autorelease pool when possible, reducing memory pressure.

Background

The metal crate contains bindings to the Metal framework. This uses objc to manually perform the message sends, which is quite error-prone, see gfx-rs/metal-rs#284, gfx-rs/metal-rs#319 and gfx-rs/metal-rs#209 for a few examples of unsoundness (out of many).

To solve such problems in the Rust ecosystem in general, I created a successor to objc called objc2, which contains most notably the smart-pointer Retained and the macro msg_send_id!, which together ensure that Objective-C's memory-management rules are upheld correctly.

This is only part of the solution though - we'd still have to write the bindings manually. To solve this, I created a tool (planning to integrate it with bindgen, but that's likely a multi-year project) to generate such framework crates automatically. In acknowledgement that this tool is by far not perfect, and never will be, I've ensured that there's a bunch of options to modify each generated crate.

The modifications for objc2-metal in particular are currently just a few hundred lines of code, weak evidence that the generator is fairly good at this point. I'll also bring attention to the file where unsafe methods are marked safe - I have plans to investigate ways to semi-automatically figure out if something is safe or not, or at least reduce the burden of doing so, but it's a hard problem to ensure is completely sound, so for now it's a bit verbose.

Connections

gpu-allocator is also transitioning to objc2-metal in Traverse-Research/gpu-allocator#225.

gfx-rs/metal-rs#241 is an old open PR for using objc2 in metal internally instead. There currently isn't really a clear path forwards there, and it's a lot of work for less direct benefits to the ecosystem (wgpu-hal is by far the biggest user of metal). But more fundamentally IMO, it's a problem of separation of concerns; metal defines several Foundation types like NSArray, NSString, NSError and so on, and even CAMetalLayer from QuartzCore, and that's a bad idea for interoperability compared to having separate crates for each of these frameworks.

#5752 removed the link feature which is not available in these crates.

Implementation

The first commit implements the actual migration, by using a branch of objc2, with a method naming scheme that (more closely) matches metal, to make it easier to review and test what's changed.

The second commit moves to the real naming scheme that objc2 uses.

I'd strongly recommend you review these two commits separately.

Testing

Tested by using the checklist below, as well as running each example individually, and checking that they seem to work.

During the development of this I made two quite critical typos, which were luckily found by the test suite, but there's bound to be at least one more lurking in here somewhere, please test this thoroughly!

Checklist

  • Run cargo fmt.
  • Run cargo clippy.
  • Run cargo xtask test to run tests.
  • Add change to CHANGELOG.md. See simple instructions inside file.

madsmtm added a commit to madsmtm/objc2 that referenced this pull request May 20, 2024
This may technically be a breaking change if the user implemented these
protocols themselves on a `Mutable` class, but that'd be unsound anyhow,
so I'll consider this a correctness fix.

This is useful for wgpu, see gfx-rs/wgpu#5641,
and the hack will become unnecessary after
#563.
@madsmtm madsmtm force-pushed the objc2-metal branch 3 times, most recently from bd69f0d to 943f2a7 Compare May 23, 2024 01:24
@madsmtm madsmtm marked this pull request as ready for review May 23, 2024 01:35
@madsmtm madsmtm requested a review from a team as a code owner May 23, 2024 01:35
Copy link
Member

@teoxoy teoxoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great step forward from having to maintain our own set of bindings manually.

I do have some questions/comments:

  • We still seem to be using the msg_send! macro in a bunch of places, could we use the generated methods instead?
  • We also use autoreleasepools in a bunch of places because we had issues with leaks but as far as I understand from the docs of Retained, a lot of these (if not all) should now be unnecessary. Is this correct? I think it would make sense to remove them in this PR.
  • While I appreciate the naming being more consistent with metal's, some of the enum variants now have the enum name as their prefix which feels redundant; but not always which feels odd. Examples:
    • MTLFeatureSet's variants are not prefixed
    • MTLLanguageVersion's variants are all prefixed
    • one of MTLReadWriteTextureTier's variants is not prefixed while the other 2 are)
  • Some of the function names are quite long now (ex: copyFromTexture_sourceSlice_sourceLevel_sourceOrigin_sourceSize_toBuffer_destinationOffset_destinationBytesPerRow_destinationBytesPerImage_options 😆) but I'm not sure how they can be shortened by the generator while also keeping things easy to search for on Apple's docs.
  • It would be great if all objects had docs from Apple's docs website and/or at least a link to the page on said website but no hurry, just giving my +1 for it :).

wgpu-hal/src/metal/device.rs Outdated Show resolved Hide resolved
Comment on lines +1326 to +1400
// TODO: `newComputePipelineStateWithDescriptor:error:` is not exposed on
// `MTLDevice`, is this always correct?
fn new_compute_pipeline_state_with_descriptor(
device: &ProtocolObject<dyn MTLDevice>,
descriptor: &MTLComputePipelineDescriptor,
) -> Result<Retained<ProtocolObject<dyn MTLComputePipelineState>>, Retained<NSError>> {
unsafe { msg_send_id![device, newComputePipelineStateWithDescriptor: descriptor, error: _] }
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's odd that it doesn't exist since newRenderPipelineStateWithDescriptor:error: does.
We could use newComputePipelineStateWithDescriptor:options:reflection:error: though and pass MTLPipelineOptionNone for options and nil for reflection.

@crowlKats
Copy link
Collaborator

crowlKats commented May 23, 2024

@madsmtm

The first commit removes the link feature which is not available in these crates. This was added by @crowlKats in #3853, but Deno doesn't actually use it from what I can tell, they instead specify this using -weak_framework, which is the correct solution to this problem IMO. (If I'm wrong about this, please say so, I could be persuaded to add a similar feature to the objc2-... crates).

so we did use it for a while, but then were able to remove the use on it, so yes, its not necessary anymore

@madsmtm
Copy link
Contributor Author

madsmtm commented May 23, 2024

Thanks for taking a look!

We still seem to be using the msg_send! macro in a bunch of places, could we use the generated methods instead?

I've already done this in a separate branch, but there were some issues around the semantics not being quite the same, so I wanted to keep it for a separate PR where that discussion could be more focused.

We also use autoreleasepools in a bunch of places because we had issues with leaks but as far as I understand from the docs of Retained, a lot of these (if not all) should now be unnecessary. Is this correct? I think it would make sense to remove them in this PR.

There's a lot of nuance to this question, including the fact that the optimization that allows us to avoid an autorelease is not guaranteed, only likely (depends on the exact emitted assembly, inlining, and the phase of the moon).

What has changed though is that autorelease pools no longer have any effect on the program, other than in terms of memory usage (autoreleased objects can be reclaimed sooner) and runtime performance (pushing and popping an autorelease pool has a slight overhead).

I'd really prefer to keep it out of this PR, mostly because it'll screw with the diff even more than it already is (indenting of large function bodies will change), but also partly because I do not know why each of these are here, and I'd like to retain that memory-usage profile until deemed wise to do otherwise.

While I appreciate the naming being more consistent with metal's, some of the enum variants now have the enum name as their prefix which feels redundant; but not always which feels odd. Examples:

  • MTLFeatureSet's variants are not prefixed
  • MTLLanguageVersion's variants are all prefixed
  • one of MTLReadWriteTextureTier's variants is not prefixed while the other 2 are)

Yup, that's a bug, the logic for implementing this name translation is very simplistic and a bit hastily thrown together - Swift has the correct rules written down, but it'll be breaking to change, so I'll fix it in v0.3 of the framework crates.

Some of the function names are quite long now (ex: copyFromTexture_sourceSlice_sourceLevel_sourceOrigin_sourceSize_toBuffer_destinationOffset_destinationBytesPerRow_destinationBytesPerImage_options 😆) but I'm not sure how they can be shortened by the generator while also keeping things easy to search for on Apple's docs.

Yeah :/

Feel free to comment on madsmtm/objc2#284 if you think of a good solution (or just any solution)!

It would be great if all objects had docs from Apple's docs website and/or at least a link to the page on said website but no hurry, just giving my +1 for it :).

There's madsmtm/objc2#309 open for it, the local Xcode documentation is stored in an undocumented format that I spent a few hours on, but couldn't immediately reverse-engineer.

Linking is similarly difficult, not for class names, but for methods, they have some ID in them, which is why I didn't pursue this - but I guess just linking to the class name would still be a step up, will try to prioritize it.

@madsmtm
Copy link
Contributor Author

madsmtm commented May 23, 2024

I just tried to actually benchmark this, but it's not an art that I'm familiar with, and the results seemed inconclusive (both improvements and regressions, without any clear pattern that I could discern). I'd suggest that someone more familiar with that tries to benchmark this.

@grovesNL
Copy link
Collaborator

@madsmtm For what it's worth we used to see a huge amount of time in retain/release for a lot of the sampling profilers we tried years ago, so reference counting overhead could be a useful data point.

@madsmtm madsmtm mentioned this pull request May 29, 2024
4 tasks
To keep the diff smaller and easier to review, this uses a temporary
fork of `objc2-metal` and `objc2-quartz-core` whose methods use the
naming scheme of the `metal` crate.

One particular difficult part with this is that the `metal` crate has
several methods where the order of the arguments are swapped relative
to the corresponding Objective-C methods.

This includes most perilously (since these have both an offset and an
index argument, both of which are integers):
- `set_bytes`
- `set_vertex_bytes`
- `set_fragment_bytes`
- `set_buffer`
- `set_vertex_buffer`
- `set_fragment_buffer`
- `set_threadgroup_memory_length`

But also:
- `set_vertex_texture`
- `set_fragment_texture`
- `set_sampler_state`
- `set_vertex_sampler_state`
- `set_fragment_sampler_state`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants