Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process IN constraints all at once in xFilter #8263

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

Micah-Kolide
Copy link
Contributor

Currently when processing the IN constraint in xFilter, sqlite core likes to call xFilter once per value. This PR tells sqlite core to hand off all values on an IN constraint to xFilter to reduce the amount of calls to xFilter.

Example:

Current osquery:

osquery> SELECT * FROM users WHERE uid IN (1, 501);
osquery planner: xBestIndex Evaluating constraints for table: users [index=0 column=0 term=0 usable=1]
osquery planner: xBestIndex Adding index constraint for table: users [column=uid arg_index=1 op=2]
osquery planner: xBestIndex Recording constraint set for table: users [cost=1.000000 size=1 idx=0]
osquery planner: xBestIndex Evaluating constraints for table: users [index=0 column=0 term=0 usable=0]
osquery planner: xBestIndex Recording constraint set for table: users [cost=1000000.000000 size=0 idx=1]
osquery planner: xOpen Opening cursor (0) for table: users
osquery planner: xFilter Filtering called for table: users [constraint_count=2 argc=1 idx=0]
osquery planner: xFilter Adding constraint to cursor (0): uid = 1
osquery planner: Scanning rows for cursor (0)
osquery planner: xFilter users generate returned row count:1
osquery planner: xFilter Filtering called for table: users [constraint_count=2 argc=1 idx=0]
osquery planner: xFilter Adding constraint to cursor (0): uid = 501
osquery planner: Scanning rows for cursor (0)
osquery planner: xFilter users generate returned row count:1
osquery planner: Closing cursor (0)


This PR's build of osquery:

osquery> SELECT * FROM users WHERE uid IN (1, 501);
osquery planner: xBestIndex Evaluating constraints for table: users [index=0 column=0 term=0 usable=1]
osquery planner: xBestIndex Adding index constraint for table: users [column=uid arg_index=1 op=2]
osquery planner: xBestIndex Recording constraint set for table: users [cost=1.000000 size=1 idx=0]
osquery planner: xOpen Opening cursor (0) for table: users
osquery planner: xFilter Filtering called for table: users [constraint_count=1 argc=1 idx=0]
osquery planner: xFilter Adding constraint to cursor (0): uid = 1
osquery planner: xFilter Adding constraint to cursor (0): uid = 501
osquery planner: Scanning rows for cursor (0)
osquery planner: xFilter users generate returned row count:2
osquery planner: Closing cursor (0)

@Micah-Kolide Micah-Kolide requested review from a team as code owners January 31, 2024 19:56
Copy link
Member

@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty neat!

I'm not sure who a good reviewer is. @zwass, @Smjert either of you feel okay here?

@zwass
Copy link
Member

zwass commented Feb 13, 2024

Do we think there are any table implementations that rely on this behavior? Any thoughts on a strategy to determine that?

@directionless
Copy link
Member

Do we think there are any table implementations that rely on this behavior? Any thoughts on a strategy to determine that?

Feels unlikely. When I've brought this up previously, people thought (a) didn't work this way or (b) was inherent to the implementation and unfixable.

@Smjert
Copy link
Member

Smjert commented Feb 13, 2024

Yeah definitely it wasn't possible to do before, they added sqlite3_vtab_in 2 years ago.

The question here is if the IN operator gets translated to EQUALS in our code.

The problem I see here is that if the tables do not account for this (multiple constraints values for the same constraint), then the result is wrong, because they will filter only with the first value, while with the old method, the table logic would've been called multiple times, each with one of the values in the IN list, so it would've worked correctly.

If IN constraints are matched by EQUALS then we have to check the tables as I mentioned above

Here is a case, in the users table, where it should work fine for instance, but again, not all tables loop on the returned constraints.

if (context.constraints["uid"].exists(EQUALS)) {
    auto uids = context.constraints["uid"].getAll(EQUALS);
    for (const auto& uid : uids) {
      auto const auid_exp = tryTo<long>(uid, 10);
      if (auid_exp.isValue()) {
        getpwuid_r(auid_exp.get(), &pwd, buf.get(), bufsize, &pwd_results);
        if (pwd_results != nullptr) {
          genUser(pwd_results, results);
        }
      }
    }
  }

Also I think due to this, none of the table logic inside the loop has been tested with a loop count > 1. So I suspect we will encounter bugs.

@directionless
Copy link
Member

We talked about this briefly at Office Hours today.

@Smjert suggest that incorrect loops should be relatively to audit for -- we can examine tables that have getAll. But I'm not sure we can make comprehensive tests or catch things.

@Smjert
Copy link
Member

Smjert commented Feb 14, 2024

We talked about this briefly at Office Hours today.

@Smjert suggest that incorrect loops should be relatively to audit for -- we can examine tables that have getAll. But I'm not sure we can make comprehensive tests or catch things.

Just to clarify, there are 2 potential issues:

  1. Tables that do not handle multiple constraint values on a column
  2. Tables that do handle it, but the looping code has never been tested (which I fear is all of them)

For 1., the code will use getAll as the example above. The problematic code won't loop over the elements and will access only the first.

@Micah-Kolide
Copy link
Contributor Author

Sorry that I've been silent on this thread, I've been testing/fixing the various table generations this would affect. This was a big oversight I made on my part for the initial push.

For now I'm restricting this to only apply to indexed columns and not columns where additional=True as I felt refactoring all of the generates that would use it would be too big a scope for this PR.

@Micah-Kolide Micah-Kolide force-pushed the micah/allow_xfilter_to_process_in_constraint branch from bf05a19 to c2edd50 Compare February 16, 2024 06:49
@Micah-Kolide Micah-Kolide force-pushed the micah/allow_xfilter_to_process_in_constraint branch from 66f7878 to c5baea5 Compare February 26, 2024 20:40
@Micah-Kolide
Copy link
Contributor Author

Micah-Kolide commented Feb 26, 2024

I believe I've pushed all of the changes needed for safely utilizing this sqlite optimization. I'm still doing testing and will update the pr once I complete that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants