Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new ADS table #8190

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Add new ADS table #8190

wants to merge 7 commits into from

Conversation

nachorpaez
Copy link
Contributor

@nachorpaez nachorpaez commented Nov 13, 2023

This PR adds a new table called ads to query Alternate Data Streams (ADS) in Windows. Resolves #5250

The table uses FindFirstStreamW and FindNextStreamW to enumerate the streams in a file and read the content.

If the stream name is Zone.Identifier it parses the values in a separate function.

I tried to keep the logic similar to the extended_attributes table and how it handles the kMDItemWhereFroms and quarantine attributes.

Screenshots

image

Additional resources

Practical Guide to Alternative Data Streams in NTFS

Alternate Data Streams Documentation

Highway To The Danger Zone.Identifier

About URL Security Zones

Open question

NTFS ADS also support executable files, for example we can hide the notepad executable in test.txt.

C:\WINDOWS>echo Test>test.txt
C:\WINDOWS>type notepad.exe>test.txt:note.exe
C:\WINDOWS>type test.txt
Test

As of now the table will base64 encode the binary but would love some guidance on what would be the best option for this cases.

@nachorpaez nachorpaez requested review from a team as code owners November 13, 2023 20:01
Copy link

linux-foundation-easycla bot commented Nov 13, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link
Member

@zwass zwass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This is a Windows "feature" I did not know about previously 😅

specs/windows/ads.table Outdated Show resolved Hide resolved
specs/windows/ads.table Outdated Show resolved Hide resolved
osquery/tables/system/windows/ads.cpp Outdated Show resolved Hide resolved
Comment on lines +137 to +142
// Folders can have ADS streams too
if (!(boost::filesystem::is_regular_file(path, ec) ||
boost::filesystem::is_directory(path, ec))) {
continue;
}
enumerateStreams(results, path.string());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this miss directories that are included in the directory constraint?


if (hFind != INVALID_HANDLE_VALUE) {
do {
// Skip first stream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? As a reader previously unfamiliar with ADS, I don't understand why the first stream is not relevant. Is it because the first stream is the standard file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, per FindFirstStreamW docs:

The FindFirstStreamW function opens a search handle and returns information about the first $DATA stream in the specified file or directory. For files, this is always the default, unnamed data stream, "::$DATA". Directories do not have $DATA streams by default and cannot have an unnamed data stream, but may have named data streams set after they have been created.

The unnamed data stream is the file contents so I thought it's best to skip that one. Although I have just noticed that by the way the table works now it will skip the first stream of a directory when it shouldn't do it since they can't have unnamed data streams.

I will address this alongside the other comments.

const std::string& value) {
Row r;
r["path"] = path;
r["directory"] = boost::filesystem::path(path).parent_path().string();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of a directory would this erroneously return the parent directory? I'm thinking in that case path and directory should be the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if a directory is specified the table will return something like:

> select * from ads where path = 'C:\Users\ignacior\Downloads\subdir';
+------------------------------------+-----------------------------+----------+-----------------+--------+
| path                               | directory                   | key      | value           | base64 |
+------------------------------------+-----------------------------+----------+-----------------+--------+
| C:\Users\ignacior\Downloads\subdir | C:\Users\ignacior\Downloads | hide.txt | secret          | 0      |
+------------------------------------+-----------------------------+----------+-----------------+--------+

Which is a similar behaviour to the extended_attributes table:

Using a virtual database. Need help, type '.help'
osquery> select version from osquery_info;
+---------+
| version |
+---------+
| 5.4.0   |
+---------+
osquery> select * from extended_attributes where path = '/Users/ignacior/Downloads';
+---------------------------+-----------------+----------------+--------------------------------------+--------+
| path                      | directory       | key            | value                                | base64 |
+---------------------------+-----------------+----------------+--------------------------------------+--------+
| /Users/ignacior/Downloads | /Users/ignacior | com.apple.macl | BAAoRInjLHdMmrS8U0MEzCBvBADshbS1+VFBm| 1      |
+---------------------------+-----------------+----------------+--------------------------------------+--------+

I'm happy to change it though if it's better to keep path and directory the same.

@nachorpaez
Copy link
Contributor Author

@zwass I applied some of your suggested changes. I will take a look at having the tests create a file later this week.

}
}

QueryData genAds(QueryContext& context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little unsure of how the path and directory expansion logic work.

I think the intent is when one can query a directory, and get back all the files inside. (Similar to the file table). Or one can query a specific path. (And both probably support LIKE). This makes sense to me.

But the implementation enumerates the the path first, and then the directory. And I think that will result in equal work being done. Sometimes? I guess it's confusing.

If both are part of the query predicate, sqlite will filter to only return rows that match both. Eg select * from ads where directory = '/tmp/' AND path = '/var/tmp/foo' would result first in enumerating /var/tmp/foo, then all the files in /tmp, and would ultimately return nothing, because sqlite filtered them.

Though, to correct myself, that's not at all true there's an OR in there.

I guess I'd suggest a reasonable pattern is to generate the list of things to enumerate, and then find the union of them. This would, at least, prevent duplicate enumeration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is a copy paste from the extended_attributes table since it was from where I based by work form. I agree is a bit confusing 😅

I can take a look into making it more efficient.

Column("path", TEXT, "Absolute file path", required=True, index=True),
Column("directory", TEXT, "Directory of file(s)", required=True),
Column("key", TEXT, "Name of the value generated from the stream"),
Column("value", TEXT, "The parsed information from the attribute"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of data shows up here? Some internet things talk about malware smuggling entire file contents here. Will that be okay to push back in a column? (We don't generally push that much data through osquery, so it feels a little amiss)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree. I think the main value of this table is the content of the Zone.Identifier stream which can help during investigations to identify where the file was downloaded from.
In my PR description I left an open question about how to handle cases where a stream contains an entire file. Maybe we can set a hard limit on the length of the content and warn users if the content is too large to be displayed by osquery?
Happy to hear other thoughts.

I'm also not sure how the extended_attributes table handles this type of cases in nix systems.

Copy link
Member

@Smjert Smjert Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure how the extended_attributes table handles this type of cases in nix systems.

The possible max value size is much more limited: https://en.wikipedia.org/wiki/Extended_file_attributes

The Linux kernel allows extended attribute to have names of up to 255 bytes and values of up to 64 KiB, as do XFS and ReiserFS, but ext2/3/4 and btrfs impose much smaller limits, requiring all the attributes (names and values) of one file to fit in one "filesystem block" (usually 4 KiB)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it probably depends on what we're concerned about.

If it's content, the most conservative approach would be to only fetch the value for keys on a compiled allowlist. (pushing people to use carves if they want the content)

If it's size, truncation and warnings probably make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tables: alternate data streams on Windows
4 participants