Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make individual print calls thread safe #1290

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

Thomas1664
Copy link
Contributor

@Thomas1664 Thomas1664 commented Nov 28, 2023

Currently, printing isn't thread safe because colorization requires multiple calls to the system API. Furthermore, functions like println wrapped multiple prints inside which may lead to \n being printed at the wrong place.

This PR fixes this by synchronizing access to the system API. With this PR, all calls to the Console's print functions are atomic with respect to each other.

However, there is no synchronization mechanism for printing multiple things after one another. Therefore, the next step is to delete println().

Copy link
Member

@BillyONeal BillyONeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is, at a minimum, missing documentation on what the thread safety guarantees are to callers. In the header there needs to be a statement like "This function is safe to call from multiple threads. When called from multiple threads, calls are atomic with respect to other callers, but are not atomic with respect to the underlying console due to buffering."

That said, I'm not sure this is the correct approach. It doesn't matter how thread safe print is if callers are doing

print("the answer is ");
print(std::to_string(42));

For example, println is still broken:

inline void println(Color c, const LocalizedString& s)
{
msg::write_unlocalized_text_to_stdout(c, s);
msg::write_unlocalized_text_to_stdout(Color::none, "\n");
}

My intent of pointing out the printing issue in the previous PR was that it indicated to me that the careful analysis necessary to safely parallelize had not happened, not that I believed the fix was to make that API thread safe.

@Thomas1664

This comment was marked as resolved.

@Thomas1664

This comment was marked as resolved.

@Thomas1664 Thomas1664 marked this pull request as draft December 16, 2023 20:49
BillyONeal added a commit to BillyONeal/vcpkg-tool that referenced this pull request Dec 21, 2023
…ome clear that we need a thread safe channel through which errors and warnings can both be reasonably reported. Now that microsoft#1279 is landed and functionally everything in the codebase uses ExpectedL, we can look at what the new thing that fixes issues is.

Consider the following:

```c++
    ExpectedL<T> example_api(int a);

    ExpectedL<std::unique_ptr<SourceControlFile>> try_load_port_manifest_text(StringView text,
                                                                              StringView control_path,
                                                                              MessageSink& warning_sink);
```

The reason this can't return the warnings through the ExpectedL channel is that we don't want the 'error' state to be engaged when there are merely warnings. Moreover, that these channels are different channels means that situations that might want to return errors and warnings together, as happens when parsing files, means that order relationships between errors and warnings is lost. It is probably a good idea in general to put warnings and errors about the same location next to each other in the output, but that's hard to do with this interface.

Rather than multiplexing everything through the return value, this proposal is to multiplex only the success or failure through the return value, and report any specific error information through an out parameter.

1. Distinguish whether an overall operation succeeded or failed in the return value, but
2. record any errors or warnings via an out parameter.

Applying this to the above gives:

```c++
    Optional<T> example_api(MessageContext& context, int a);

    // unique_ptr is already 'optional'
    std::unique_ptr<SourceControlFile> try_load_port_manifest_text(MessageContext& context,
                                                                   StringView text,
                                                                   StringView control_path);
```

Issues this new mechanism fixes:

* Errors and warnings can share the same channel and thus be printed together
* The interface between code wanting to report events and the code wanting to consume them is a natural thread synchronization boundary. Other attempts to fix this have been incorrect by synchronizing individual print calls ( microsoft#1290 ) or complex enough that we are not sure they are correct by trying to recover boundaries by reparsing our own error output ( microsoft#908 )
* This shuts down the "error: error:" and similar bugs where it isn't clear who is formatting the overall error message vs. talking about individual components

Known issues that are not fixed by this change:

* This still doesn't make it easy for callers to programmatically handle specific types of errors. Currently, we have some APIs that still use explicit `std::error_code` because they want to do different things for 'file does not exist' vs. 'there was an I/O error'. Given that this condition isn't well served by the ExpectedL mechanism I don't want to wait until we have a better solution to it to proceed.
* Because we aren't making the context parameter the 'success carrier' it's more complex to implement 'warnings as errors' or similar functionality where the caller decides how 'important' something is. I would be in favor of moving all success tests to the context parameter but I'm not proposing that because the other vcpkg maintainers do not like it.
* Contextual information / stack problems aren't solved. However, the context parameter might be extended in the future to help with this.
BillyONeal added a commit to BillyONeal/vcpkg-tool that referenced this pull request Dec 22, 2023
…ome clear that we need a thread safe channel through which errors and warnings can both be reasonably reported. Now that microsoft#1279 is landed and functionally everything in the codebase uses ExpectedL, we can look at what the new thing that fixes issues is.

Consider the following:

```c++
    ExpectedL<T> example_api(int a);

    ExpectedL<std::unique_ptr<SourceControlFile>> try_load_port_manifest_text(StringView text,
                                                                              StringView control_path,
                                                                              MessageSink& warning_sink);
```

The reason this can't return the warnings through the ExpectedL channel is that we don't want the 'error' state to be engaged when there are merely warnings. Moreover, that these channels are different channels means that situations that might want to return errors and warnings together, as happens when parsing files, means that order relationships between errors and warnings is lost. It is probably a good idea in general to put warnings and errors about the same location next to each other in the output, but that's hard to do with this interface.

Rather than multiplexing everything through the return value, this proposal is to multiplex only the success or failure through the return value, and report any specific error information through an out parameter.

1. Distinguish whether an overall operation succeeded or failed in the return value, but
2. record any errors or warnings via an out parameter.

Applying this to the above gives:

```c++
    Optional<T> example_api(MessageContext& context, int a);

    // unique_ptr is already 'optional'
    std::unique_ptr<SourceControlFile> try_load_port_manifest_text(MessageContext& context,
                                                                   StringView text,
                                                                   StringView control_path);
```

Issues this new mechanism fixes:

* Errors and warnings can share the same channel and thus be printed together
* The interface between code wanting to report events and the code wanting to consume them is a natural thread synchronization boundary. Other attempts to fix this have been incorrect by synchronizing individual print calls ( microsoft#1290 ) or complex enough that we are not sure they are correct by trying to recover boundaries by reparsing our own error output ( microsoft#908 )
* This shuts down the "error: error:" and similar bugs where it isn't clear who is formatting the overall error message vs. talking about individual components

Known issues that are not fixed by this change:

* This still doesn't make it easy for callers to programmatically handle specific types of errors. Currently, we have some APIs that still use explicit `std::error_code` because they want to do different things for 'file does not exist' vs. 'there was an I/O error'. Given that this condition isn't well served by the ExpectedL mechanism I don't want to wait until we have a better solution to it to proceed.
* Because we aren't making the context parameter the 'success carrier' it's more complex to implement 'warnings as errors' or similar functionality where the caller decides how 'important' something is. I would be in favor of moving all success tests to the context parameter but I'm not proposing that because the other vcpkg maintainers do not like it.
* Contextual information / stack problems aren't solved. However, the context parameter might be extended in the future to help with this.
BillyONeal added a commit to BillyONeal/vcpkg-tool that referenced this pull request Dec 29, 2023
…ome clear that we need a thread safe channel through which errors and warnings can both be reasonably reported. Now that microsoft#1279 is landed and functionally everything in the codebase uses ExpectedL, we can look at what the new thing that fixes issues is.

Consider the following:

```c++
    ExpectedL<T> example_api(int a);

    ExpectedL<std::unique_ptr<SourceControlFile>> try_load_port_manifest_text(StringView text,
                                                                              StringView control_path,
                                                                              MessageSink& warning_sink);
```

The reason this can't return the warnings through the ExpectedL channel is that we don't want the 'error' state to be engaged when there are merely warnings. Moreover, that these channels are different channels means that situations that might want to return errors and warnings together, as happens when parsing files, means that order relationships between errors and warnings is lost. It is probably a good idea in general to put warnings and errors about the same location next to each other in the output, but that's hard to do with this interface.

Rather than multiplexing everything through the return value, this proposal is to multiplex only the success or failure through the return value, and report any specific error information through an out parameter.

1. Distinguish whether an overall operation succeeded or failed in the return value, but
2. record any errors or warnings via an out parameter.

Applying this to the above gives:

```c++
    Optional<T> example_api(MessageContext& context, int a);

    // unique_ptr is already 'optional'
    std::unique_ptr<SourceControlFile> try_load_port_manifest_text(MessageContext& context,
                                                                   StringView text,
                                                                   StringView control_path);
```

Issues this new mechanism fixes:

* Errors and warnings can share the same channel and thus be printed together
* The interface between code wanting to report events and the code wanting to consume them is a natural thread synchronization boundary. Other attempts to fix this have been incorrect by synchronizing individual print calls ( microsoft#1290 ) or complex enough that we are not sure they are correct by trying to recover boundaries by reparsing our own error output ( microsoft#908 )
* This shuts down the "error: error:" and similar bugs where it isn't clear who is formatting the overall error message vs. talking about individual components

Known issues that are not fixed by this change:

* This still doesn't make it easy for callers to programmatically handle specific types of errors. Currently, we have some APIs that still use explicit `std::error_code` because they want to do different things for 'file does not exist' vs. 'there was an I/O error'. Given that this condition isn't well served by the ExpectedL mechanism I don't want to wait until we have a better solution to it to proceed.
* Because we aren't making the context parameter the 'success carrier' it's more complex to implement 'warnings as errors' or similar functionality where the caller decides how 'important' something is. I would be in favor of moving all success tests to the context parameter but I'm not proposing that because the other vcpkg maintainers do not like it.
* Contextual information / stack problems aren't solved. However, the context parameter might be extended in the future to help with this.
BillyONeal added a commit to BillyONeal/vcpkg-tool that referenced this pull request Jan 26, 2024
…ome clear that we need a thread safe channel through which errors and warnings can both be reasonably reported. Now that microsoft#1279 is landed and functionally everything in the codebase uses ExpectedL, we can look at what the new thing that fixes issues is.

Consider the following:

```c++
    ExpectedL<T> example_api(int a);

    ExpectedL<std::unique_ptr<SourceControlFile>> try_load_port_manifest_text(StringView text,
                                                                              StringView control_path,
                                                                              MessageSink& warning_sink);
```

The reason this can't return the warnings through the ExpectedL channel is that we don't want the 'error' state to be engaged when there are merely warnings. Moreover, that these channels are different channels means that situations that might want to return errors and warnings together, as happens when parsing files, means that order relationships between errors and warnings is lost. It is probably a good idea in general to put warnings and errors about the same location next to each other in the output, but that's hard to do with this interface.

Rather than multiplexing everything through the return value, this proposal is to multiplex only the success or failure through the return value, and report any specific error information through an out parameter.

1. Distinguish whether an overall operation succeeded or failed in the return value, but
2. record any errors or warnings via an out parameter.

Applying this to the above gives:

```c++
    Optional<T> example_api(MessageContext& context, int a);

    // unique_ptr is already 'optional'
    std::unique_ptr<SourceControlFile> try_load_port_manifest_text(MessageContext& context,
                                                                   StringView text,
                                                                   StringView control_path);
```

Issues this new mechanism fixes:

* Errors and warnings can share the same channel and thus be printed together
* The interface between code wanting to report events and the code wanting to consume them is a natural thread synchronization boundary. Other attempts to fix this have been incorrect by synchronizing individual print calls ( microsoft#1290 ) or complex enough that we are not sure they are correct by trying to recover boundaries by reparsing our own error output ( microsoft#908 )
* This shuts down the "error: error:" and similar bugs where it isn't clear who is formatting the overall error message vs. talking about individual components

Known issues that are not fixed by this change:

* This still doesn't make it easy for callers to programmatically handle specific types of errors. Currently, we have some APIs that still use explicit `std::error_code` because they want to do different things for 'file does not exist' vs. 'there was an I/O error'. Given that this condition isn't well served by the ExpectedL mechanism I don't want to wait until we have a better solution to it to proceed.
* Because we aren't making the context parameter the 'success carrier' it's more complex to implement 'warnings as errors' or similar functionality where the caller decides how 'important' something is. I would be in favor of moving all success tests to the context parameter but I'm not proposing that because the other vcpkg maintainers do not like it.
* Contextual information / stack problems aren't solved. However, the context parameter might be extended in the future to help with this.
@Thomas1664 Thomas1664 changed the title Make printing thread safe Make individual print calls thread safe Apr 28, 2024
@Thomas1664
Copy link
Contributor Author

That said, I'm not sure this is the correct approach. It doesn't matter how thread safe print is if callers are doing

print("the answer is ");
print(std::to_string(42));

I know but this isn't the point of this PR. Here, the goal really is to make single prints thread safe. However, the Console also allows printing multiple "buffered" strings with different colours at once. Additional PRs are required to ensure output of multiple things isn't interleaved.

@Thomas1664 Thomas1664 marked this pull request as ready for review April 29, 2024 06:17
@autoantwort
Copy link
Contributor

I know but this isn't the point of this PR. Here, the goal really is to make single prints thread safe.

Motivation, benefits?

@Thomas1664
Copy link
Contributor Author

Thomas1664 commented Apr 30, 2024

I know but this isn't the point of this PR. Here, the goal really is to make single prints thread safe.

Motivation, benefits?

Anything that prints something can't be parallelized right now. Protection with mutexes on the calling site doesn't work because prints may hide behind function calls.
This PR would allow to parallelize tasks that potentially print to console, i.e. PRs like #908 and #1256 to move forward.

@autoantwort
Copy link
Contributor

This PR would allow to parallelize tasks that potentially print to console, i.e. PRs like #908

At least not #908, because the printing would still be interleaved with this PR.

@autoantwort
Copy link
Contributor

In #1256 you could also use a MessageSink that internally locks before printing. But then you must still be sure that every message is printed via one print call, which you have to check manually which would be solved by #1323 on a type system level.

@Thomas1664
Copy link
Contributor Author

In #1256 you could also use a MessageSink that internally locks before printing. But then you must still be sure that every message is printed via one print call, which you have to check manually which would be solved by #1323 on a type system level.

I don't like these approaches because you would have to pass this down the call stack and potentially down to things like value_or_exit. IMO it's easier to not worry about this at all.

@Thomas1664
Copy link
Contributor Author

This PR would allow to parallelize tasks that potentially print to console, i.e. PRs like #908

At least not #908, because the printing would still be interleaved with this PR.

Yeah, but again the point here is about within a single print call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants