Protect against malicious nested archives (aka zip bombs) #210

qkaiser · 2022-01-27T15:38:44Z

We should check the extracted file size during chunk parsing to avoid filling up the disk when extracting malicious nested archives.

Samples can be found here: https://www.bamsoftware.com/hacks/zipbomb/

For zip bombs, a check can be implemented in is_valid:

def is_valid(self, file: io.BufferedIOBase) -> bool:
        has_encrypted_files = False
        try:
            with zipfile.ZipFile(file) as zip:  # type: ignore
                for zipinfo in zip.infolist():
                    if zipinfo.flag_bits & ENCRYPTED_FLAG:
                        has_encrypted_files = True
            if has_encrypted_files:
                logger.warning("There are encrypted files in the ZIP")
            return True
        except (zipfile.BadZipFile, UnicodeDecodeError, ValueError):
            return False

Something similar to this could work:

with zipfile.ZipFile(file) as zip:  # type: ignore
    extracted_size = sum(e.file_size for e in zip.infolist())
    if extracted_size > SOME_CONSTANT:
        # bail out

I'll check if similar behavior (ie. "let's fill the whole disk") can be triggered with other formats.

The text was updated successfully, but these errors were encountered:

qkaiser · 2022-01-27T15:41:56Z

Please note that compression bombs only works with implementation that recursively extract the archive content, which technically we don't. I still think this is a good enhancement / hardening of unblob to protect against these files. I'll do some tests locally and get back with some notes :)

flukavsky · 2022-01-28T00:41:48Z

Instead of checking the extracted file size for the content of a single compressed archive, we should keep an eye on the overall extracted size.

With the proposed approach, we could run into issues if a compressed archive has nested archives that are below the threshold - e.g., 42.zip only extracts 530kb in the first loop, but if we continue recursively, we will eventually extract a ton of 4.3gb files, which by themselves may fly below the radar but still fill up the disk.

I'd suggest to add an option (e.g., --max-extraction-size), which would override a reasonable default value (e.g., 100gb). Alternatively, we could also monitor available disk space and abort once we have used e.g., 90% of the available free disk space.

vlaci · 2022-02-22T13:11:14Z

It is very hard to come up with a general solution for this. I'd suggest extraction to a separate volume or use quota to mitigate this issue.

qkaiser · 2023-02-10T11:11:21Z

I agree with the recommendation on separate volume and using quota. I'm sure we can provide sound recommendations on that subject in our documentation.

I was toying with the idea of implementing a "best effort" protection against zip bombs by collecting size information from StatReport.

A basic implementation is available on this branch: main...zip-bomb

I'm saying "best effort" because the smallest unit we can work on is a Task, so the increase in size is computed only after a file has been processed and decompressed/extracted.

I got good results against zip bombs with this, the only thing that is missing is a proper cancellation/cleanup of running processes (marked as TODO in the code).

There's no urgency to it, but I would be glad if anyone from the team could have a look and share their thoughts.

vlaci · 2023-02-10T12:47:03Z

I also had a weird idea: writing some wrapper that can be LD_PRELOAD-ed that captures write calls, tracks the total disk consumption and aborts when a limit is reached.

This could also be used to wire in files created by extractors to our reports in real-time.

qkaiser · 2023-02-10T13:19:33Z

This would go against our attempt at supporting OSX (or at least Apple M1/M2).

Another idea I had was to launch a "watcher process" registering inotify callbacks on the extracted directories recursively, killing watched processes once the size limit is reached. However inotify only exists on Linux, and the vast majority of inotify libraries in Python sucks (although not super complex to write a modern one).

vlaci · 2023-02-10T13:22:55Z

This would go against our attempt at supporting OSX (or at least Apple M1/M2).

There is an equivalent of LD_LIBRARY_PATH on mac, but for sure it would complicate things.

qkaiser added enhancement New feature or request format:archive labels Jan 27, 2022

qkaiser self-assigned this Jan 27, 2022

qkaiser added this to the v3.0 milestone Feb 8, 2022

qkaiser removed their assignment May 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protect against malicious nested archives (aka zip bombs) #210

Protect against malicious nested archives (aka zip bombs) #210

qkaiser commented Jan 27, 2022 •

edited

qkaiser commented Jan 27, 2022

flukavsky commented Jan 28, 2022

vlaci commented Feb 22, 2022

qkaiser commented Feb 10, 2023

vlaci commented Feb 10, 2023 •

edited

qkaiser commented Feb 10, 2023

vlaci commented Feb 10, 2023

Protect against malicious nested archives (aka zip bombs) #210

Protect against malicious nested archives (aka zip bombs) #210

Comments

qkaiser commented Jan 27, 2022 • edited

qkaiser commented Jan 27, 2022

flukavsky commented Jan 28, 2022

vlaci commented Feb 22, 2022

qkaiser commented Feb 10, 2023

vlaci commented Feb 10, 2023 • edited

qkaiser commented Feb 10, 2023

vlaci commented Feb 10, 2023

qkaiser commented Jan 27, 2022 •

edited

vlaci commented Feb 10, 2023 •

edited