Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

address warnings and races using atomics #1378

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

yogo1212
Copy link
Contributor

@yogo1212 yogo1212 commented Nov 19, 2022

Based on #1105 and #1363

This should make the code safer but there's now an atomic operation and a loop (to prevent using the debug map before it has been or while it's being initialised) where debug mode is checked.

Fixes: #777
Fixes: #779
Closes: #1105

@yogo1212 yogo1212 marked this pull request as draft November 19, 2022 21:34
@yogo1212
Copy link
Contributor Author

yogo1212 commented Nov 22, 2022

i only re-arranged the macros - why is it now unhappy with the 'static inline' function?
need to boot the windows laptop again...

@azat
Copy link
Member

azat commented Nov 22, 2022

i only re-arranged the macros - why is it now unhappy with the 'static inline' function?

Maybe EVENT__inline will help? (from util-internal.h)
It is one of inline/__inline/__inline__

P.S. but I has almost zero knowledge about win32 stuff

@yogo1212
Copy link
Contributor Author

thanks for the pointer - i hadn't noticed those!
msvc should support plain inline, so i'm leaving it as is.

i manged to get visual code installed and configured with openssl (what a pain).
the character position from the error log is misleading. in vc17, the identifier 'bool' is highlighted.
and by the looks of it, #include <stdbool.h> was missing. so i did move more than just the macros around...

@yogo1212 yogo1212 marked this pull request as ready for review November 22, 2022 13:13
@yogo1212
Copy link
Contributor Author

the build runs through on my windows computer, so i'm marking this as ready for review

@yogo1212
Copy link
Contributor Author

yogo1212 commented Nov 23, 2022

tests for openbsd were failing in actions as well - should i look into the failed checks for this PR?

@widgetii
Copy link
Member

I guess that the issue is in CI environment itself:

[warn] Error from accept() call: Too many open files
[warn] Error from accept() call: Too many open files
[warn] Error from accept() call: Too many open files
[warn] Error from accept() call: Too many open files
[warn] Error from accept() call: Too many open files
[warn] Error from accept() call: Too many open files
[warn] Error from accept() call: Too many open files

Can be fixed with adjusting maximum of open file descriptors

@azat
Copy link
Member

azat commented Nov 24, 2022

I guess that the issue is in CI environment itself:

There is one test that needs to get EMFILE error (dns AFAIR), so the test sets rlimit_max to some small number and fill the fds, so maybe it is this test.

tests for openbsd were failing in actions as well - should i look into the failed checks for this PR?

No, they are not stable (I wish they can be marked as skipped instead of failed, but still run)

@yogo1212
Copy link
Contributor Author

@azat what does this PR need to go continue on its way?

Copy link
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@azat what does this PR need to go continue on its way?

This is a very desired patch! But let's address the comment first.

event.c Show resolved Hide resolved
atomic-internal.h Outdated Show resolved Hide resolved
event.c Outdated Show resolved Hide resolved
@azat azat self-assigned this Feb 11, 2023
@azat
Copy link
Member

azat commented Feb 12, 2023

@yogo1212 I've changed the primitives (copied one from ffmpeg) and also made some fixes and now everything should be safe. There is a simple test that before shows the data-race for both variables (event_debug_mode_too_late and event_debug_created_threadable_ctx_) that can find in suppression for TSAN, but now there are no suppressions for TSAN required.

How does this looks to you?

@yogo1212
Copy link
Contributor Author

yogo1212 commented Feb 13, 2023

@azat Sure!

The single definition looks much nicer and the windows-only header is a neat way to not have to do that in C code 👍
The switch to the Interlocked*Pointer endpoint, I also like.

No idea how atomic_store in the imported header will ever terminate.. But I assume it does.
It looks like the authors also came to the conclusion that Interlocked doesn't distinguish between weak and strong, so they implemented _weak using the "strong" function. This is cruft at the moment (nothing uses weak in libevent) but will save some work should there ever be a reason to use atomic*_weak in libevent.

About protecting event_debug_mode_too_late and event_debug_created_threadable_ctx_...
While investigating whether a load with relaxed ordering is any different from a plain load with sync, I found that I missed something about stdatomics.
(on Windows, there shouldn't be any problem because the Interlocked operation is a full barrier)

seq_cst only guarantees a barrier in regard to seq_cst variables. While compiler and CPU won't reorder the operations, they don't require waiting for changes to propagate through caches - to my understanding, at least on armv7 and x86.
So, both too_late and threadable_ctx_ would technically have to be accessed using seq_cst as well (on top of event_debug_mode_state).
But: That would be two complete barriers in the respective critical paths.

Atm, the best way I see to avoid using more costly synchronisation while having 100% thread safety, is using only one combined variable for event_debug_mode_state, event_debug_mode_too_late, and event_debug_created_threadable_ctx_.
For instance, lets call this new variable _state_new here.
The highest bit of event_debug_mode_state would be reserved for what is now event_debug_mode_too_late and event_debug_created_threadable_ctx_ - make that two bits if the distinction is important.
An atomic_fetch_xor_explicit with seq_cst could set too_late (or created_threadable) return the state. (there is an InterlockedXor)

(EDIT: relaxed + volatile might also work but i wouldn't bet any money on it)

Libevent requires users to do the debug initialisation before creating any event bases (or events). Imho, the goals are 1. deal with the warnings and 2. make sure the code doesn't fail in a fatal way.
Whether it's really worth protecting event_debug_created_threadable_ctx_ and event_debug_mode_too_late, I have no strong opinion on. We're adding memory safety for illegal use of the library at the cost of performance for everyone (and CO2 even for those without a computer).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

evsig_base is thread unsafe event_debug_mode_too_late is thread unsafe
4 participants