Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Yield-WaitFor syscall #3577

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from
Draft

[RFC] Yield-WaitFor syscall #3577

wants to merge 11 commits into from

Conversation

ppannuto
Copy link
Member

@ppannuto ppannuto commented Jul 28, 2023

Pull Request Overview

Following the discussion at TockWorld6, this describes the proposed Yield-WaitFor and provides a (untested) rough implementation of how the kernel could easily implement it.

For ease of viewing, this draft PR edits TRD 104 directly so it can be seen as a diff. A final PR would follow the proper, full TRD process.

The primary motivation to move this functionality from a userspace yield_for into a specialized system call is to simplify correctness for userspace applications. Userspace upcall handlers do not have to worry about reentrancy if the kernel guarantees that exactly one and only one specific one of userspace's choosing will be called. It becomes an opt-in synchronous API for userspace without reducing the fundamental asynchronous design of Tock.

Testing Strategy

Compiling (and no more!).

TODO or Help Wanted

This is currently designed and architected as a minimal-impact change. In particular, if you want the actual status or return value from the upcall, you still need to have supplied a callback function. If (e.g., often in the case of prints) you don't care about the return success/failure, then you can just leave the default Null Upcall in place and this will do what you want.

The implementation is not intended as final. Rather, it's just trying to demonstrate how this can be a very lightweight change in the kernel (indeed, one should not write careful code while also trying to participate in a meeting 😅 ).

Documentation Updated

  • Updated the relevant files in /docs, or no updates are required.

Formatting

  • Ran make prepush.

Rendered

@bradjc
Copy link
Contributor

bradjc commented Jul 31, 2023

I think we want something like this. This yield-for makes mixing async and sync code in userspace easier. Currently, it is fairly easy to write all async or all sync code in userspace, but when mixing them async callbacks can lead to unexpected call chains when using yield_for() in userspace. With this yield for syscall, sync code knows no other callbacks will happen.

The decision to make is whether to go with this lightweight version, or something more integrated (or something else).

Instead (or in addition to), we could add yield-for-blocking, which would remove the upcall entirely, and instead pass the upcall arguments to userspace via the return arguments of yield-for-blocking.

Advantages of yield-for-blocking:

  1. Userspace sync code is easy to write:

    allow();
    command();
    ret = yield_for_blocking(UPCALL_ID, &arg1, &arg2, &arg3);
    unallow();

    instead of:

    void callback(e, arg1, arg2, arg3) {
        arg1=arg1;
        arg2=arg2;
        arg3=arg3;
    }
    
    allow();
    subscribe(callback);
    command();
    yield_for(UPCALLID);
    unallow();
  2. Eliminates the context switches for subscribe and the upcall.

Disadvantages:

  1. Semantic confusion: schedule_upcall() in a capsule may not, actually, schedule an upcall any more. Instead the "upcall" happens as the return to a yield.
  2. If userspace does call subscribe, but then only uses yield-for-blocking, that upcall would never get called, which is odd. But with a mix of yield-for-blocking and yield, the upcall would get called sometimes. This inconsistency complicates the interface.

Having articulated this, I see the appeal of yield-for. I think Pat pointed out that the complexity in userspace code is fairly easy to hide in a library layer, so that driver code would look like the yield-for-blocking case. Perhaps the prudent move is to address the complexity of writing sync and async code in userspace with yield-for, and not try to introduce blocking semantics before we know the aysnc version is insufficient.

@ppannuto
Copy link
Member Author

A quick note, UPCALL_ID is made up of driver_number and subscribe_number in practice, so it has to be two arguments. The upcall signature is currently three integer arguments and a user data pointer; while the blocking call can eliminate the last argument, as the pointer can just be local to the userland callsite, one of the arguments would have to overwrite one passed in. That makes this pseudocode more awkward:

allow();
command();
// Can't do this:   ret = yield_for_blocking(UPCALL_ID, &arg1, &arg2, &arg3);
int subscribe_num_and_arg1 = SUBSCRIBE_NUMBER;
ret = yield_for_blocking(DRIVER_NUMBER, &subscribe_num_and_arg1, &arg2, &arg3);
unallow();

There are probably other games that could be played to ease this, but it's all more complex than the lightweight change I think we should start with as-proposed here.

@bradjc
Copy link
Contributor

bradjc commented Jul 31, 2023

allow();
command();
// Can't do this:   ret = yield_for_blocking(UPCALL_ID, &arg1, &arg2, &arg3);
int subscribe_num_and_arg1 = SUBSCRIBE_NUMBER;
ret = yield_for_blocking(DRIVER_NUMBER, &subscribe_num_and_arg1, &arg2, &arg3);
unallow();

But the arg1 arg2 arg3 are going back up (kernel to userspace) and driver num and subscribe num are going down (userspace to kernel).

@ppannuto
Copy link
Member Author

Oh, you were thinking some more library magic here? I guess that works so long as the lower layer does the remapping and the pointer writing. I don't think there is any way to pass all the pointers for arg1-arg3 in the syscall and have the kernel do the write, but that's not necessary.... I think the below would work?

// libtock.c
int yield_for_blocking(struct upcall_id, &arg1, &arg2, &arg3) {
    register uint32_t r0 __asm__ ("r0") = YIELD_FOR_BLOCKING_ID;
    register uint32_t r1 __asm__ ("r1") = upcall_id.drv_num;
    register uint32_t r2 __asm__ ("r2") = upcall_id.sub_num;
    register int retval __asm__ ("r0");
    register int rv1 __asm__ ("r1");
    register int rv2 __asm__ ("r2");
    register int rv3 __asm__ ("r3");
    __asm__ volatile (
    "svc 0"
    : "=r" (retval), "=r" (rv1), "=r" (rv2), "=r" (rv3)
    : "r" (r0), "r" (r1), "r" (r2)
    : "memory");

    if (retval & YIELD_FOR_ACTUALLY_YIELDED_FLAG_MASK) {
        *arg1 = rv1;
        *arg2 = rv2;
        *arg3 = rv3;
    }
    return retval;
}

@bradjc
Copy link
Contributor

bradjc commented Jul 31, 2023

Yes that is what I was imagining.

@brghena
Copy link
Contributor

brghena commented Jul 31, 2023

The C wrapper could just take five arguments though instead of a struct:

int yield_for_blocking(driver_num, subscribe_num, &arg1, &arg2, &arg3) {

@ppannuto
Copy link
Member Author

True, though, since we're size-optimizing, if that function didn't get inlined everywhere, adding the fifth argument would require spilling to the stack [assuming stuct upcall_id were passed by ref, unlike the implicit copy I wrote above..] — for something low-level and apparently on the hot path like this, I'd push to limit to four args generally.

@hudson-ayers
Copy link
Contributor

True, though, since we're size-optimizing, if that function didn't get inlined everywhere, adding the fifth argument would require spilling to the stack [assuming stuct upcall_id were passed by ref, unlike the implicit copy I wrote above..] — for something low-level and apparently on the hot path like this, I'd push to limit to four args generally.

This function seems like something that should get inlined everywhere, provided that it is being built with reasonable optimization settings, which should be the case for any project particularly concerned about size. So I would advocate for the five argument function definition, though I don't feel super strongly about it. If we used your approach, wouldn't we need to pass a pointer to upcall_id in order to not spill to the stack?

That said, I think that the disadvantages Brad described are valid:

Disadvantages:

Semantic confusion: schedule_upcall() in a capsule may not, actually, schedule an upcall any more. Instead the "upcall" happens as the return to a yield.
If userspace does call subscribe, but then only uses yield-for-blocking, that upcall would never get called, which is odd. But with a mix of yield-for-blocking and yield, the upcall would get called sometimes. This inconsistency complicates the interface.

These are both disadvantages that do not exist with blocking_command() as Alyssa described it at TockWorld.

yield_for(), in cases where the return value matters, does not remove any system calls compared to the current status quo, or the need for a callback function. It just allows userspace to ensure that only one callback will ever be called. Note that libtock-rs today only even allows one outstanding callback, so this seems to be of very limited usefulness in libtock-rs.

yield_for_blocking(), as Brad described, removes the context switch for subscribe (and unsubscribe in libtock-rs), as well as the need to implement a callback. But it comes with the two disadvantages that Brad mentioned, and still requires one more context switch than blocking_command(). The potential for mixing of functions that call yield_for_blocking() and subscribe() takes away a lot of the purported advantage here of "making it easier to simplify correctness for userspace apps", since there would be a new class of bug if apps mix async and sync driver methods that use the same callbacks.

@ppannuto
Copy link
Member Author

ppannuto commented Aug 2, 2023

wouldn't we need to pass a pointer to upcall_id in order to not spill to the stack?

Yes, that's what I meant by "[assuming struct upcall_id were passed by ref, unlike the implicit copy I wrote above..]"

I'm generally wary about relying on compiler optimizations for performance correctness, but I think we can shelve this side-hypothetical for now.


Note that libtock-rs today only even allows one outstanding callback

Why is that limitation in place? That's a pretty crippling limitation for a lot of applications — can't have something like a periodic timer and a event-based sensor subscription live at the same time? That's not really an acceptable or realistic limitation for the long term.

If the issue is concurrent execution concerns of callbacks, yield_for as-proposed can be used to ensure that only one callback stack/chain is active at any given time.


I'm not opposed to something of the spirit of yield_for_blocking in addition to yield_for, but I think that is a more complex and nuanced interface for the reasons we've articulated, and that the simpler yield_for is useful on its own merits.

@ppannuto
Copy link
Member Author

ppannuto commented Aug 2, 2023

One thing I had been ruminating on was whether there would be value in something closer to a select/poll style interface, or in this context a yeild_for_any_of([array of upcall ids]) (with return value of index of which array entry fired).

In practice, I think the ergonomics around first allowing the array and then sending the next syscall with the pointer is awkward, especially for the likely common-case of just wanting to yield on one specific upcall. I'd be inclined to steer clear of this more complex interface until it's proven necessary.

@bradjc
Copy link
Contributor

bradjc commented Aug 2, 2023

Note that libtock-rs today only even allows one outstanding callback

Why is that limitation in place? That's a pretty crippling limitation for a lot of applications — can't have something like a periodic timer and a event-based sensor subscription live at the same time? That's not really an acceptable or realistic limitation for the long term.

I'm not sure it is? In my little dabble yesterday it seems like subscribe in libtock-rs is essentially the with operator in python: the subscribe is removed when the context goes away, but you can nest "with"s. Maybe that doesn't actually work in practice, but that's the impression the libtock-rs API gives.

But it comes with the two disadvantages that Brad mentioned, and still requires one more context switch than blocking_command().

It's really hard to discuss blocking command without something like this PR. What exactly is blocking command? Critically, how does a capsule give the blocking command its return data? Also, can someone write its #### Disadvantages section? I kinda made it easy by writing my own for the idea I was supporting.

It seems like the major disadvantage is going to be: when writing a capsule driver, after my interrupt fires, do I call grant.sched_upcall() or kernel.command_return()? And when using a capsule, can I call subscribe or do I have to call blocking command?


Imagine we implement yield-for-blocking and it turns out that we like it, and who knows maybe we rename grant.sched_upcall() to grant.deliver() or something and we end up with a nice way to merge an async kernel with a convenient sequential userspace.

It seems like it might not be too big of a step to implement command-yield-for-blocking, which is a command immediately followed by a yield-for-blocking in a single system call (how exactly to do that tbd).

I say this because that to me seems more plausible than re-writing all of our capsules to have both an async and a sync version (if in fact that is what is required for blocking_command).

@jrvanwhy
Copy link
Contributor

jrvanwhy commented Aug 2, 2023

Note that libtock-rs today only even allows one outstanding callback

Why is that limitation in place? That's a pretty crippling limitation for a lot of applications — can't have something like a periodic timer and a event-based sensor subscription live at the same time? That's not really an acceptable or realistic limitation for the long term.

libtock-rs allows multiple outstanding callbacks (e.g. you can wait for multiple sensors concurrently, with a timeout alarm), but it does not allow for independent tasks. That is, you cannot have a callback running in the background that blinks an LED continuously while your code does something completely unrelated to that.

There is more context on the decision to not support independent tasks at tock/libtock-rs#334. The short version is that supporting that requires making a lot of important design decisions that I did not have the data to make at the time.

I also want to re-evaluate whether futures are acceptable in libtock-rs. Over the past few years, I've received a lot of suggestions on how to do more lightweight Future-based APIs than I achieved when I benchmarked the code size impacts of Future. I haven't had the time to investigate those suggestions, but I want to do so, and if any of them pan out then that would we wonderful. Using futures would allow us to support independent tasks without having to make those painful design tradeoffs.

@ppannuto
Copy link
Member Author

ppannuto commented Aug 2, 2023

That state of libtock-rs is much more sane.

w.r.t. to an eventual more async world, there's some more complex stuff in libtock-c apps, but really nothing that would demand more capability than what is sounds like libtock-rs already supports. I do think one of the good outcomes from TockWorld6 is a push for the core team to get more parity in libtock-c / libtock-rs examples, so if I'm wrong, I suspect we'll see soon :).

doc/reference/trd104-syscalls.md Outdated Show resolved Hide resolved
doc/reference/trd104-syscalls.md Outdated Show resolved Hide resolved
doc/reference/trd104-syscalls.md Outdated Show resolved Hide resolved
_Note:_ In the case that the specified upcall is set to the Null Upcall,
no upcall will execute, but Yield-WaitFor will still return. In this
case, the contents of r0-r3 upon return are UNDEFINED and should not be
relied on in any way.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is great except why not have r0-r3 be the values that would have been passed a non-null upcall? This would allow many cases to avoid callbacks entirely and just process the results directly, and doesn't cost anything (I don't think).

Copy link
Contributor

@bradjc bradjc Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is yield-for-blocking. Or a variant at least. I see what you are asking. But, given

The Tock kernel MUST NOT invoke the Null Upcall.

what would those be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so, the reason I didn't do this is because I couldn't decide the right behavior for r3.

TRD104 starts by describing upcalls abstractly as something that have (up to) 4 return arguments passed in r0-r3. We have one type of upcall the kernel actually emits right now, namely the upcall produced in response to subscribe [n.b., nothing in this TRD forbids other upcall signatures in the future*]. For clarity, I will use SubscribableUpcall to refer to an upcall invoked on a pointer passed via Subscribe.

Much to my surprise, as far as I can tell, the signature of SubscribableUpcall is not expressly defined anywhere except in §5.2 of this TRD, where the C prototype for a subscribe_upcall is given as an example.

Indeed, we define the FunctionCall object passed to UKB with this note:

Struct that defines a upcall that can be passed to a process. The upcall takes four arguments that are Driver and upcall specific, so they are represented generically here.

In practice, the type signature for an upcall is established by the kernel in kernel::upcall::schedule() and an implicit assumption in the definition/creation of an Upcall object that upcalls come only from subscribes. But that's just an implementation artifact at the moment, not part of our ABI*.

Now, if YieldFor were to say something like 'sets r0-r3 to the Upcall Arguments if the current Upcall is the NullUpcall', then we would expect all Upcall Arguments to be set, right? Now, for the case of a SubscribableUpcall r0-r2 are clear, but what do we do with r3, which would be the userdata pointer? Does YieldFor need to do different things based on which type of Upcall it's passing?

We could further amend TRD104 to formally specify the function signature for all upcalls, and then define YieldFor to skip r3. Or define SubscribableUpcall and specify YieldFor behavior only for that upcall type.

I was shooting for minimum delta with first RFC, and I don't think we can do this "skip the callback" without also making more changes around Upcall definition. I lean to formalizing SubscribableUpcall as a no-overhead option for today (since there is only one Upcall type still) while leaving the door open to other Upcall signatures in the future.

I do think the "skip the callback" would be nice. I can update this RFC to capture a version of that.


*Subscribe does start like this:

The Subscribe system call class is how a userspace process registers upcalls with the kernel.

Which if we really read into the "a" there maybe suggests the 1:1 mapping, but I wouldn't blink an eye at a future section that said something like "The GetASignal system call is another way userspace processes register upcalls with the kernel". If there really is OneTrueUpcallSignature, this TRD needs to say that much more directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Subscribe does start like this:

The Subscribe system call class is how a userspace process registers upcalls with the kernel.

Which if we really read into the "a" there maybe suggests the 1:1 mapping, but I wouldn't blink an eye at a future section that said something like "The GetASignal system call is another way userspace processes register upcalls with the kernel". If there really is OneTrueUpcallSignature, this TRD needs to say that much more directly.

But wouldn't we need to add that everywhere? A command syscall is THE ONLY way to issue a command. Yield is THE ONLY way to yield, etc. It seems clear to me that subscribe is the only way to subscribe, since that is all it does.

But yes we need to document the upcall signature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's clear that "subscribe is the only way to subscribe".

What I do not think is clear is that "subscribe is the only way to get upcalls". Upcalls are defined completely independently of subscribe in the previous section. As written, subscribe is just one thing (and currently happens to be the only thing) than can generate an upcall.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to say anything about subscribe being the only way to set upcalls, any more than we need to say command is the only way to instruct the kernel to start an operation. Commands are documented elsewhere. But, we do need to define what an upcall is, in my opinion.

@alevy
Copy link
Member

alevy commented Aug 4, 2023

I'm strongly supportive of something very close to this.

High bit

Yield-WaitFor solves several related problems concisely and elegantly:

  1. It allows for true blocking semantics, most notably for debug statements in callbacks, where "simulated" blocking using the user-space yield_for relies heavily on user-written callbacks not, themselves calling yield_for recursively, e.g.. This is a big semantic improvement to userspace.

  2. It can dramatically reduce the code size required of user-space applications that using a synchronous I/O API---they could avoid callbacks all-together, avoid corresponding subscribe calls, and reduces the number of system calls invoked per operation for some common operations. This seems particularly useful in Rust applications where having callbacks always subscribed in the background (without unsubscribing them) may be unsound.

  3. It doesn't require any additional functionality in capsules to support this. Everything is entirely differentiated in the process scheduler. This is important because it means that any code size implications in the kernel are fixed, and it means that userspace doesn't need to wait for a particular capsule to adapt to the new API in order to use this new yield variant.

Some more details separated so they can be quoted separately

More on 1: it's worth looking through our various libtock-c examples and drivers and seeing where we would actually replace calls to the current library yield_for with a new Yield-WaitFor. In practice, these have slightly different semantics, so it's not obvious it would be everywhere, but it might be everywhere or almost everywhere.

More on 2: with some specialization in userspace as well as some sort of combined system-call system call ("please do a command followed by a yield-waitfor") I think we can get nearly optimal code size in applications (compared to a functionality-specific specialized blocking call). So we should view this as a step towards even more code size optimization.

(There are no!) Disadvantages

@bradjc points out some disadvantages. These are very well worth considering. I think an evolution of this proposal avoids these pitfalls.

  1. Semantic confusion: schedule_upcall() in a capsule may not, actually, schedule an upcall any more. Instead the "upcall" happens as the return to a yield.

Yes. As suggested later in the thread, either renaming or aliasing schedule_upcall to deliver may help avoid this confusion. I think this is just a naming issue, which we should resolve, but doesn't break this proposal.

  1. If userspace does call subscribe, but then only uses yield-for-blocking, that upcall would never get called, which is odd. But with a mix of yield-for-blocking and yield, the upcall would get called sometimes. This inconsistency complicates the interface.

Agree completely. Yield-WaitFor should use the existence of a subscribed callback to determine whether to execute a callback or return directly to the system call sight with the values passed to schedule_upcall. If a process has a callback subscribed, that callback should always execute. If it doesn't, Yield-WaitFor should always return the values that would have been passed to a callback.

There is a remaining question of what to do with r0-r3 if a callback was executed. I think, in this case, it might be reasonable to allow the callback (currently a void) to determine what those values should be.

@alevy alevy changed the title [RFC] YieldFor syscall [RFC] Yield-WaitFor syscall Aug 4, 2023
@alevy
Copy link
Member

alevy commented Aug 4, 2023

Naming versions of the Yield variant

We've now described several variations of the proposed yield variant and I believe I've added to the confusion by conflating names for them above. Let's refer to them as follows:

  • Yield-WaitFor is the original proposal as described in this revision of the TRD text
  • Yield-WaitFor-NoCallback (proposed by @bradjc) never executes a callback, regardless of whether one is registered, and instead returns the values that would be passed to the callback as return values to the system call site.
  • Yield-WaitFor-CallbackIfPresent (proposed implicitly by me) always executes a callback if one is present, otherwise it returns directly to the callsight returning the values that would have been passed to the callback in registers r0-r3.
  • Yield-WaitFor-OptionalSubscribe (proposed by @kupiakos) yield passes an upcall handler if it wants to use an upcall (which replaces any existing subscribed upcall handler), otherwise upcall values are passed as return values to the system call site.

@bradjc
Copy link
Contributor

bradjc commented Aug 4, 2023

What is the usecase for calling subscribe and using Yield-WaitFor-CallbackIfPresent? Maybe writing async code with ordering semantics where the code can know which order the callbacks will happen?

@alevy
Copy link
Member

alevy commented Aug 4, 2023

What is the usecase for calling subscribe and using Yield-WaitFor-CallbackIfPresent? Maybe writing async code with ordering semantics where the code can know which order the callbacks will happen?

One simple example may be logging statistics on how often something happens.

If the callback is able to return different values to the syscall site in r0-r3, it might be useful for filtering or customizing the result...

Just general modularity.

Indeed, I suspect this will rarely be used.

@ppannuto
Copy link
Member Author

ppannuto commented Aug 5, 2023

I've updated this to match the Yield-WaitFor-CallbackIfPresent proposed by @alevy.

There are some design choices not fully elucidated in the RFC yet, as I'd like us to decide what we want to do about #3577 (comment) before fleshing that text out completely.

PoC code is updated (again, compile-tested only) to realize the proposed interface.

@bradjc
Copy link
Contributor

bradjc commented Aug 7, 2023

If the callback is able to return different values to the syscall site in r0-r3, it might be useful for filtering or customizing the result...

Is Yield-WaitFor-CallbackIfPresent either or, or both? I think it is either: either the upcall is called or yield returns r0-r3, but not both. So if you use Yield-WaitFor-CallbackIfPresent with a subscribe then it can't also use r0-r3. Or are you saying those would be some other values instead of the upcall return values? But, reading the TRD again, I don't think we can both call an upcall and return values.


The Yield system call class has no return value. This is because
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would no longer be true.

@bradjc
Copy link
Contributor

bradjc commented Aug 7, 2023

With Yield-WaitFor-CallbackIfPresent, is it true that userspace can't determine at runtime if r0-r3 are valid, or if an upcall executed?

So libtock tries to write:

int my_command_sync() {
  driver_command();
  asm("yield wait-for-callbackifpresent");
  // did an upcall execute?
  // or is r0-r3 valid?
  return ??
}

But the then the user at some point called subscribe(my_driver) and then forgot about it. So when the user calls my_command_sync() it seems to work ok, but the values are wrong.

Is this possible or am I missing something?

@ppannuto
Copy link
Member Author

ppannuto commented Aug 7, 2023

With Yield-WaitFor-CallbackIfPresent, is it true that userspace can't determine at runtime if r0-r3 are valid, or if an upcall executed?

As currently written, this is correct, yes. It assumes that userspace tracks whether it has subscribed a callback or not correctly, and if userspace cares that a callback ran, the callback can set a flag.

We could use yield-param-3 to do the same pointer-indicator thing as Yield-NoWait if desired.

@bradjc
Copy link
Contributor

bradjc commented Aug 7, 2023

With Yield-WaitFor-CallbackIfPresent, is it true that userspace can't determine at runtime if r0-r3 are valid, or if an upcall executed?

As currently written, this is correct, yes. It assumes that userspace tracks whether it has subscribed a callback or not correctly, and if userspace cares that a callback ran, the callback can set a flag.

We could use yield-param-3 to do the same pointer-indicator thing as Yield-NoWait if desired.

I propose we pass all yield return arguments via a pointer specified by yield-param-3 unconditionally.

Otherwise, I don't know how we would use Yield-WaitFor-CallbackIfPresent in userspace libraries other than calling subscribe(driver, upcall, 0) before Yield-WaitFor-CallbackIfPresent every time.

OR, I prefer Yield-WaitFor-NoCallback to avoid this potential error entirely (and it would not be a fun bug to debug I suspect).

Copy link
Contributor

@bradjc bradjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if I was missing something, but it seems with Yield-WaitFor-CallbackIfPresent a userspace implementation of yield_for is unable to tell if the registers after the yield_for syscall returns are valid or not, without some other tracking state or program knowledge. Marking as blocked to resolve this, because that seems like a problematic interface we do not want.

@ppannuto
Copy link
Member Author

Okay, I've tried to update this to capture the current discussions here and to a current-best proposal.

Start with the summary of design discussion, here: https://github.com/tock/tock/blob/yeild-for/doc/reference/trd104-syscalls.md#design-rationale-and-alternatives

  • @kupiakos There are a few open questions with your proposal documented in this section — biggest problem is that upcall id takes two registers, so there aren't two registers available for subscribe pointer and app data

The biggest noteworthy changes:

  • I followed @bradjc's thinking and updated the whole Yield Syscall Class to always/explicitly use r1 as a "yield result" mechanism — this addresses the "did a callback run?" for all of the Yeild-Wait-* variations
  • I updated the main proposal to follow @alevy's implicit CallbackIfPresent proposal

I think this is getting somewhere pretty reasonable. Not written up in the RFC yet, because it occurred to me as I was writing this comment, but I think that @kupiakos's 'choose which callback when calling yield' can be implemented in userspace with the CallbackIfPresent as-written. The low-level libtock can take an optional callback argument which it can then call assuming there is no callback subscribed that was already called (of course, the low-level impl could defensively subscribe the null callback before yield-for'ing, but that's not really the target design here..). As I write that sentence, then, I realize this is may be a case for the NeverCallback, as it lets the low-level userspace always make the callback decision based on what was passed to libtock::yield-for.

@bradjc
Copy link
Contributor

bradjc commented Aug 16, 2023

  • I followed @bradjc's thinking and updated the whole Yield Syscall Class to always/explicitly use r1 as a "yield result" mechanism — this addresses the "did a callback run?" for all of the Yeild-Wait-* variations

I thought it was the case that if in userspace you have:

yield_X() {
  svc yield, arg1, arg2, etc
  // PC is eventually set to instruction after svc.
  // Code here has no way of knowing if the registers were set by
  // the kernel as the return arguments to the svc, or if they were
  // set by the upcall handler the kernel called (which then returned to
  // this PC). The upcall handler may very well have changed r1.
}

How are you addressing this?

relevant: #3577 (comment) #3577 (comment)

@ppannuto
Copy link
Member Author

ppannuto commented Aug 16, 2023

yield-result is a pointer passed in r1 to the syscall, the kernel uses that pointer to write a byte in process memory. The "return value" or result from yield is that byte in memory. (This is what Yield-NoWait already does [and Yield-Wait by implementation bug currently]). The pointer is lost by the time userspace executes again.

Code view:

uint8_t result;

yield_X() {
  r0 = YIELD_TYPE;
  r1 = &result;
  r2 = ARG1;
  r3 = ARG2;
  svc yield
}

is closer to truth.

@bradjc
Copy link
Contributor

bradjc commented Aug 16, 2023

Do we need a new error code?

int my_sync_syscall(int* len) {
  uint8_t yield_result
  command()
  yield_for(upcall_id, &yield_result)
  if (yield_result == 0) {
    *len = r2
    return SUCCESS
  } else {
    return ???
  }
}

What happens if sync api is expecting yield to return r0-r3 but it doesn't?

@ppannuto
Copy link
Member Author

ppannuto commented Aug 16, 2023

Oh shoot, I meant to call attention to this too. At the end of §4.1.0 I added another open question:

OPEN QUESTION: ^^ This documents what we currently do, but it is the case right now that there is no way for userspace to identify an errorneous call to yield (e.g. invalid value in r0)---should instead yeild-result be written no matter what and an error case defined?

I went ahead with the first half of that (yield-result written no matter what), but the only two options currently are:

  • *result == 0 — No upcall executed
  • *result == 1 — An upcall executed

I think a more general approach would have the kernel setting *result == {§3.2-style Return Code}, but that would require two words at minimum to fully support just the Success and Failure cases. It's already a bit weird that a syscall is writing an non-Allowed byte; writing two words (or four words if we wanted to expose the whole Return Code space) gets a bit messier perhaps. Which we could support by using *result == -1 (or whatever) means that r0-r3 are set as-per §3.2-style Return Code.


For a generic userspace layer, however, there is still the ugly question of how one function can return "Error-from-Yield" or "Error-from-Command" in a sane way. Not that "Error-from-Yield" should ever happen....

Comment on lines +388 to +391
**OPEN QUESTION:** ^^ This documents what we currently do, but it is the
case right now that there is no way for userspace to identify an
errorneous call to yield (e.g. invalid value in r0)---should instead
`yeild-result` be written no matter what and an error case defined?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this means. What is an erroneous call to yield? Also, yield-result is written no matter what, assuming it is a valid process pointer. What would the error case capture? When are you going to enable spell check in your text editor :)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only one I can think of is an illegal yield type, i.e. the invalid value in r0. FWIW, this is what the current TRD says:
image
which basically kind of just doesn't say anything about what else does or doesn't happen if you make an impossible syscall.

Frankly, I don't really think it matters too much what we do, as so much else has gone wrong likely if you're asking for an invalid Yield, but I do feel like we should define what does or doesn't happen—especially if we have a "r1 is the yield-result" paradigm.


When are you going to enable spell check in your text editor :)?

😬 , never... I learned a long time ago that spellcheck is too distracting for my brain to write/synthesize anything non-trivial. I do try to make a point of spellchecking non-draft PRs before pushing though :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I don't really think it matters too much what we do,

I think we are already clear on what happens:

If userspace tries to invoke a system call that the kernel does not support, the system call will return a Failure result with an error code of NODEVICE or NOSUPPORT (Section 4).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that a system call with class number 0 and r0== is an unknown syscall and not an unknown yield.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that a system call with class number 0 and r0== is an unknown syscall and not an unknown yield.

The problem here is "how does the kernel tell userspace it was an invalid syscall?"

I think we the challenge with the current text / design is that these two parts are at odds:

the system call will return a Failure result with an error code...

The Yield system call class has no return value.

How can a system call class that has no return value return a Failure result?

We're in a pretty deep never-should-happen corner case here, but what I'm struggling with is 'what is the right thing for userspace to do'? Once userspace invokes a syscall with svc 0, i.e. something in the Yield class, it cannot first 'read the return value to see if the syscall failed' as the Yield class defines different return semantics than every other syscall.

@bradjc
Copy link
Contributor

bradjc commented Aug 18, 2023

I am opposed to Yield-WaitFor-CallbackIfPresent. It adds uncertainty to userspace for reasons I don't think are very important. I think the only argument in favor is that the relationship with subscribe is easier to articulate. I don't say intuitive because I don't think it is intuitive for userspace. Yield-WaitFor-CallbackIfPresent also makes our in-kernel interfaces a little more consistent (i.e. schedule_upcall() always has the chance of scheduling an upcall). From the ideas in this comment: #3577 (comment) I'm not sure how keeping track of statistics would work. And having upcall handlers return arguments to yield seems like an obscure pattern no one would use.

While I articulated the drawback of schedule_upcall() not actually scheduling anything depending on how yield-waitfor works, I don't actually think it's a particularly interesting reason. The syscall interface is primarily to support userspace, and we can adapt our in-kernel apis to suit what we want to expose to userspace, not the other way around.

Instead, I propose we implement Yield-WaitFor and (Yield-WaitFor-NoCallback OR Yield-WaitFor-OptionalSubscribe). This gives agency to userspace without the subtle drawbacks of userspace libraries not actually being able to tell what is going to happen when they call yield.

Advantages

  1. Userspace libraries don't have to handle the case where they thought they were going to get a blocking yield and instead got a callback (to somewhere). We don't have to add an error code that would be returned should this case have happened.
  2. We don't have to expand the *yield_result = what_happened pattern, which feels like a hack to me. This API in Process always gives me pause:
    unsafe fn set_byte(&self, addr: *mut u8, value: u8) -> bool;
  3. Userspace can actively choose in declarative code which yield it wants. Readers of the code know exactly which yield the author intended without having to reason about whether subscribe was called somewhere in the program.
  4. Userspace can write async code but be able to prioritize callbacks. Should a program want only a certain upcall handler to execute there is a way to support that.

Disadvantages

  1. Double the system calls to implement. Unclear to me how complex that would be.
  2. It is unclear to me how to reason about having both Yield-WaitFor (kernel) and yield_for() (in userspace). Is this all getting more complicated or less complicated for userspace developers? There would be (at least) four different ways to wait in userspace. I'm not sure I could teach someone which one to use when.

@bradjc
Copy link
Contributor

bradjc commented Aug 30, 2023

I remain in favor of Yield-WaitFor-NoCallback. Are there any want-to-implement-today use cases that Yield-WaitFor-NoCallback doesn't support? I'm counting efficiency in terms of number of syscalls as something else, and not a use case.

Are there any use cases that want to use yield_waitfor_X with upcalls? Amit has mentioned hypothetical ones, but are there any we would implement immediately if we could? Is anyone asking for this?

The discord of doing a blocking syscall with an asynchronous callback makes me adverse to any of the upcall yield-waitfor options. Trying to explain that to someone would be difficult.

I propose we implement the thing which 1. supports our use case and only use cases we know of and 2. is low impact to implement in the kernel, which is Yield-WaitFor-NoCallback. This provides userspace with a clearly desired API without adding features and complexity no one is asking for.

@hudson-ayers
Copy link
Contributor

@bradjc to clarify your most recent comment, are you now in favor of just Yield-WaitFor-NoCallback? Your previous comment indicated you wanted both it and Yield-WaitFor.

If the former, I think I agree. Yield-WaitFor-NoCallback seems clearly useful, efficient, and relatively easy to understand -- it is the most similar to blocking_command(). I could not immediately think of a use case where I would really need in-kernel Yield-WaitFor (over userspace yield-for()) and not prefer Yield-WaitFor-NoCallback. I understand the idea that this enables prioritizing some particular ordering of callback handling, but the same thing can be handled with Yield-WaitFor-NoCallback followed by a manual call to the relevant "callback" function.

The main point of confusion with Yield-WaitFor-NoCallback is still the scenario in which a capsule calls subscribe() with an upcall and then calls yield_for_blocking() -- the upcall is defined, but never called, which seems confusing. And if another function is using that upcall to receive periodic events it is possible that its not totally clear that calling Yield-WaitFor-NoCallback would clear that upcall from future events.
One way to prevent this confusion would be for Yield-WaitFor-NoCallback to return an error if a non-null callback is already set (perhaps we would add another sentinel value in the kernel to indicate a "fake upcall". This would mean adding another error code to userspace, but IMO would prevent some possibly confusing bugs.

@bradjc
Copy link
Contributor

bradjc commented Sep 1, 2023

@bradjc to clarify your most recent comment, are you now in favor of just Yield-WaitFor-NoCallback? Your previous comment indicated you wanted both it and Yield-WaitFor.

My current opinion is to add only Yield-WaitFor-NoCallback`.

@bradjc
Copy link
Contributor

bradjc commented Sep 1, 2023

The main point of confusion with Yield-WaitFor-NoCallback is still the scenario in which a capsule calls subscribe() with an upcall and then calls yield_for_blocking() -- the upcall is defined, but never called, which seems confusing. And if another function is using that upcall to receive periodic events it is possible that its not totally clear that calling Yield-WaitFor-NoCallback would clear that upcall from future events.
One way to prevent this confusion would be for Yield-WaitFor-NoCallback to return an error if a non-null callback is already set (perhaps we would add another sentinel value in the kernel to indicate a "fake upcall". This would mean adding another error code to userspace, but IMO would prevent some possibly confusing bugs.

I think it's fine if there is an upcall handler subscribed that doesn't get used. Having Yield-WaitFor-NoCallback be able to error if there is a subscription means every call now has to look like:

r0, r1, r2, r3 = yield_wait_for_no_callback();
if (r0 == SUBSCRIPTION_ERROR) {
  subscribe(NULL);
  r0, r1, r2, r3 = yield_wait_for_no_callback();
}

And if you want to be even more correct:

old_subscribe = NULL;
r0, r1, r2, r3 = yield_wait_for_no_callback();
if (r0 == SUBSCRIPTION_ERROR) {
  old_subscribe = subscribe(NULL);
  r0, r1, r2, r3 = yield_wait_for_no_callback();
  subscribe(old_subscribe);
}

Because having a library secretly unregister your subscription would also be a subtle bug.

@brghena
Copy link
Contributor

brghena commented Sep 1, 2023

I also support Yield-WaitFor-NoCallback of these choices. From there, the library could still make a callback with very few additional cycles. I also think that even if someone has subscribed previously, it's pretty clear that calling this version of yield won't lead to a callback.

I think it's worth starting an implementation for Yield-WaitFor-NoCallback and seeing what it feels like in terms of effort, code size, possible bugs, etc.

@phil-levis
Copy link
Contributor

phil-levis commented Sep 1, 2023

Yield-WaitFor-NoCallback also makes sense to me. It captures an extremely common case of completion callbacks, which is just waiting on the stack-based boolean. With respect to @hudson-ayers concern about subcribed callbacks not being invoked, I think this is consistent with subscribe semantics. In particular, calling this variant of yield is equivalent to subscribe being called to install a new upcall, yielding for this upcall, then re-installing the old one before returning.

To follow up on the discussion in the call today, I'd suggesting looking at how 4 different system call drivers (userspace and kernel) would change if re-implemented with Yield-WaitFor-NoCallback: Console, Alarm, ADC, and SPI Controller. They cover a bunch of different cases. The initial stated reasoning for this system call was:

The primary motivation to move this functionality from a userspace yield_for into a specialized system call is to simplify correctness for userspace applications. Userspace upcall handlers do not have to worry about reentrancy if the kernel guarantees that exactly one and only one specific one of userspace's choosing will be called. It becomes an opt-in synchronous API for userspace without reducing the fundamental asynchronous design of Tock.

Can we quantify "simplify" as "shorter code text", "fewer branches", or "smaller binary"? Are there other considerations we care about? Are there specific userspace complexity examples that we want to use as data points?

@ppannuto
Copy link
Member Author

ppannuto commented Sep 8, 2023

Main text is updated for Yield-WaitFor-NoCallback semantics.

Can we quantify "simplify" as...

The original / primary motivation discussed at TockWorld was not any quantitative metric. It was correctness-oriented—i.e., no reentrancy concerns for userspace developers. I think the current text captures that as-intended.

The subsequent paragraphs already discussed reduction in syscall overhead as a secondary motivation. I also added a paragraph on code size. I think those are all the driving motivations here.


Will work on the proposed impl's next...

@ppannuto
Copy link
Member Author

ppannuto commented Sep 8, 2023

The userspace updates are actually pretty substantial if the intent is to switch the library fully to a synchronous interface, so I just did console to start: tock/libtock-c#345

With the enormous caveat that all of these are currently compile-tested only, for the kernel:

size before:

   text	   data	    bss	    dec	    hex	filename
 110636	      0	  19928	 130564	  1fe04	/Volumes/code/helena-project/tock/target/thumbv7em-none-eabi/release/hail

size after:

   text	   data	    bss	    dec	    hex	filename
 112228	      0	  19928	 132156	  2043c	/Volumes/code/helena-project/tock/target/thumbv7em-none-eabi/release/hail

So for rough order-of-magnitude sense of impact, this adds ~1500 bytes to the kernel while removing ~500 bytes from the userspace console implementation — i.e., a code size improvement for systems >3 apps.

For simplest case (getnstr), results in >=1 syscall reduction (no subscribe == 1, >1 if other upcalls would have run).

For the primary goal of code complexity reduction, the putnstr implementation is qualitatively night and day different. To try to quantify, console.c went from 119 to 36 SLOC (source lines of code, from cloc).

@phil-levis
Copy link
Contributor

The original / primary motivation discussed at TockWorld was not any quantitative metric. It was correctness-oriented—i.e., no reentrancy concerns for userspace developers. I think the current text captures that as-intended.

That makes sense, but how do you evaluate that? The flipside is that you re-order callbacks, which has its own issues. E.g., if a do a yield-wait-for on packet reception, I can delay a timer upcall indefinitely. This seems a little tricky -- how do you write a userspace library when any other library (particularly one written by a developer who doesn't want to handle re-entrancy) can arbitrarily delay an upcall? Part of the challenge is that this is about the implementation of a library, i.e., which yield call it makes.

This makes me skittish because it's an extremely large architectural change (i.e., how upcalls are handled) without a substantive evaluation or tradeoff analysis plan. I buy the arguments -- it seems like it could be a good idea and so is worth exploring. But you have to build it. Is there a way to evaluate the cost of reentrancy concerns? I'm not arguing that this isn't the best solution, just that it's going to create different correctness problems. It seems very likely that those correctness problems are much less bad than the current ones. But this isn't a silver bullet.

Perhaps this would call for labeling particularly library functions "non-reentrant", and if you call a non-reentrant function, you are non-reentrant? That way, userspace functions that need re-entrance can know not to call these functions. I.e., if you need to handle a timer, you don't call "nr_receive" or something like that.

It's important that the final TRD not include long discussions of design rationales; this isn't an academic paper. Following the command 0 example we could include a link?

Comment on lines 944 to +949
#[derive(Copy, Clone)]
pub enum Task {
/// A task that should not actually be executed. This is used to resume a
/// suspended process without invoking any callbacks in userspace (e.g.,
/// when YieldFor resolves to a Null Upcall).
NullSubscribableUpcall(NullSubscribableUpcall),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This addition seems not strictly necessary. If we make fn_ptr in FunctionCall optional, we can re-use that enum option.

Perhaps it could be FunctionalCallOrReturn?

_ => false,
},
Task::NullSubscribableUpcall(nu) => nu.upcall_id == upcall_id,
Task::IPC(_) => todo!(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When benchmarking, keep in mind todo! adds ~100 bytes vs. false.

@alevy
Copy link
Member

alevy commented Nov 17, 2023

RE: @ppannuto's initial evaluation:

So for rough order-of-magnitude sense of impact, this adds ~1500 bytes to the kernel while removing ~500 bytes from the userspace console implementation — i.e., a code size improvement for systems >3 apps.

and

For the primary goal of code complexity reduction, the putnstr implementation is qualitatively night and day different. To try to quantify, console.c went from 119 to 36 SLOC (source lines of code, from cloc).

I buy this generally, but note I don't think this is an apples to apples comparison for putnstr since the current main-brach implementation queues prints to avoid blocking while the modified version doesn't. Most of the complexity in the user-space console driver is from this queuing and can be removed. As an exercise, I did this a while ago and got substantially more concise code and substantial savings, without changing the kernel (I don't think I got 500 bytes, and it was also untested, so I'm not saying there are no savings here, just that this particular comparison probably shouldn't be taken completely at face value).

Also, I didn't even share my findings in a place I can now find them, so @ppannuto's exercise is about 1000x more valuable and useful than mine and my comment here should not be interpreted as a complaint.

@alevy
Copy link
Member

alevy commented Nov 17, 2023

Brief summary of our state

  • It seems that at least the plurality are in favor of primarily opinion is to explore Yield-WaitFor-NoCallback.
  • I seems to me that Yield-WaitFor-OptionalSubscribe is different in interface (and I am partial to the options it might provide, particular for Rust userland) but not so different in how it might affect most application correctness and code size etc. So I think it might be a good idea to go ahead exploring the plurality opinion here, and gather evidence one way or the other for whether/how to include subscribe options at the yield callsite.
  • We don't have a concrete plan for how to "measure" whether this change will be an net benefit in terms of correctness. I think some of this comes down to articulating what choices belong in the userland library vs the app itself, but also an agreement on what precisely we're looking for.
  • We have a concrete idea of what we want from a performance perspective: smaller application code size with at limited code size increase in the kernel (such that a small number of savings in apps "wins"), and secondarily performance gains. It would be best to have some specific threshold of apps complexity/size (i.e. some examples) we might care about, but we can probably defer on that.

I suggest we forge a plan for exploring this completely, which would include a prototype implementation in the kernel (which I believe we already have) and porting several drivers & apps in both C and Rust to use this new interface.

@alevy
Copy link
Member

alevy commented Nov 17, 2023

The relevant text describing Yield-WaitFor-NoCallback (YWFNC) is this:

Argument Register
Yield number r0
yield-result r1
yield-param-A r2
yield-param-B r3

Yield-WaitFor-NoCallback, suspends the process until one
specific upcall---the WaitedForUpcallId---is ready to execute. If
other events arrive that would invoke an upcall on this process, they
are queued by the kernel, and will be delivered after the event for the
WaitedForUpcallId. Event order in this queue is maintained.

This process will resume execution when an event in the kernel generates
an upcall that matches the WaitedForUpcallId. No userspace callback
function will invoked by the kernel. Instead, the contents of r0-r3 will
be set to Upcall Arguments as-feasible, specifically:

  • A SubscribeUpcall will set r0-r2 and leave r3 untouched.

yield-param-A is the Driver number and yield-param-B is the
Subscribe number to wait for. Together, these make up the
WaitedForUpcallId.

Important properties:

  • Between YWFNC being called and returning, no callbacks will run.
  • There may be a callback subscribed to that subscribe number, and it will be ignored
  • Library/application may also call normal yield elsewhere, which would induce callbacks, but because YWFNC blocks the process completely, these are never in competition.

@brghena
Copy link
Contributor

brghena commented Nov 17, 2023

From discussion: the arguments originally listed above are incorrect. There should only be three arguments.

EDIT: specifically, it's yield-result which shouldn't be an argument. The other three are necessary.

@bradjc
Copy link
Contributor

bradjc commented Feb 20, 2024

This would also make it much easier to mix sync and async code in userspace, wouldn't it? I'm currently having my app crash from calling printf in a callback. I would like to see us do this just to simplify writing libtock-c apps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants