Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix waiting for mutiple extensions to be loaded not correctly waiting for timeout #8067

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sroache
Copy link

@sroache sroache commented Jun 21, 2023

Fixes #7256

  • Use a common delay when waiting for extensions.
  • Only set waited to finish waiting for extensions early on failure (timeout)

…nish waiting for extensions early on failure (timeout)
@sroache sroache requested review from a team as code owners June 21, 2023 10:16
@mike-myers-tob mike-myers-tob added ready for review Pull requests that are ready to be reviewed by a maintainer extensions Related to osquery extension SDK or to extensions themselves labels Jun 28, 2023
@directionless directionless added this to the 5.10.0 milestone Aug 1, 2023
@directionless
Copy link
Member

Close and reopen to kick ci

Copy link
Member

@Smjert Smjert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a stab at this.

Looking at how it was written I think that the intention was to apply the same timeout for each extension.
Here though we are changing it to be a global timeout (or wait time when an extension fails to load, not the real total time it take for it load + retries); more specifically I think the original implementation was to avoid waiting indefinitely for an extension that will not register either because it never have been started or because it's stuck somewhere in its own code.

With this instead the real wait time becomes: time taken by each successful extension + wait time for each retry for the successful extensions and failed one (if any). Only the second part is what it's being controlled by the global timeout.

I'm mostly stating things out loud here, but I think that this behavior should be documented properly in the wiki.

I also think that the waited mechanism doesn't make sense anymore.
The applyExtensionDelay function returns a failed status only when the predicate has failed (meaning no extension found yet or the ping failed) and the timeout has been reached (delay >= timeout) or a shutdown has been requested.
In both cases they enter in that if you moved the variable in, which returns, so there's no way for the passed lambda function to check it again.

I think the waited variable and the if inside the lambda have to be removed because they are a no-op.
Its existence I believe was to attempt to prevent further waiting if the timeout was being reached, given that if you have the first extension be successful, but also require retries that consume all the time, the second extension if not successful would still get at least one sleep of 20 ms (due to the do/while).
But the only way to stop that (where honestly I don't think is necessary) is to check again the timeout just before the sleep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extensions Related to osquery extension SDK or to extensions themselves ready for review Pull requests that are ready to be reviewed by a maintainer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--extensions_require=extension1,extension2 does not wait correctly for extensions after the first in the list
4 participants