Skip to content

Col-E/LL-Java-Zip

Repository files navigation

LLJ-ZIP

A closer to the spec implementation of ZIP parsing for Java.

Relevant ZIP information

Official spec

The notes and structure outlines are the basis for most of LLJ-ZIP.

JVM zip parsing & JLI

The JVM zip reader implementation is based off this piece.

This is a zip format reader for seekable files, that tolerates leading and trailing garbage, and tolerates having had internal offsets adjusted for leading garbage (as with Info-Zip's zip -A).

But that's not all it does. That's just what that one comment says. Some other fun quirks of the JVM zip parser:

  • The end central directory entry is found by scanning from the end of the file, rather than from the beginning.
  • The central directory values are authoritative. Names/values defined by the local file headers are ignored.
  • The file data of local file headers is not size bound by the file header's compressed size field. Instead, it uses the central directory header's declared size.
  • Class names are allowed to end in trailing / which most tools interpret as directories.

Additional features

  • Reads ZIP files using MemorySegment backed mapped files.
  • Highly configurable, offering 3 ZIP reading strategies out of the box (See ZipIO for convenience calls)
    • Std / Forward scanning: Scans for EndOfCentralDirectory from the front of the file, like many other tools
    • Naive: Scans only for LocalFileHeader values from the front of the file, the fastest implementation, but obviously naive
    • JVM: Matches the behavior of the JVM's ZIP parser, including a number of odd edge cases. Useful for opening JAR files to mirror java -jar <path> behavior.
  • Inputs do not have to be on-disk to be read, you can supply zip data in-memory.
  • Tracks data in front of ZIP contents as ZipArchive.getPrefixData()
    • Useful for cases like keeping track of the executable header of Jar2Exe archives.

Usage

Maven dependency:

<dependency>
    <groupId>software.coley</groupId>
    <artifactId>lljzip</artifactId>
    <version>${zipVersion}</version> <!-- See release page for latest version -->
</dependency>

Gradle dependency:

implementation group: 'software.coley', name: 'lljzip', version: zipVersion
implementation "software.coley:lljzip:${zipVersion}"

Basic usage:

// ZipIO offers a number of different utility calls for using different ZipReader implementations
ZipArchive archive = ZipIO.readJvm(path);

// Local files have the actual file data/bytes.
// These entries mirror data also declared in central directory entries.
List<LocalFileHeader> localFiles = archive.getLocalFiles();
for (LocalFileHeader localFile : localFiles) {
    // Data model mirrors how a byte-buffer works.
    ByteData data = localFile.getFileData();
    
    // You can extract the data to raw byte[]
    byte[] decompressed = ZipCompressions.decompress(localFile);
    
    // Or do so with a specific decompressor implementation
    byte[] decompressed = localFile.decompress(DeflateDecompressor.INSTANCE);
}

// Typically used for authoritative definitions of properties.
// Some ZIP logic will ignore properties of 'LocalFileHeader' values and use these instead.
//  - Try using a hex editor to play around with this idea. Plenty of samples in the test cases to look at.
List<CentralDirectoryFileHeader> centralDirectories = archive.getCentralDirectories();

// Information about the archive and its contents.
EndOfCentralDirectory end = archive.getEnd();

For more detailed example usage see the tests.

How does each ZipReader implementation map to standard Java ZIP handling?

If you're looking to see which implementation models different ways of reading ZIP files in Java, here's a table for reference:

Java closest equivalent LL-Java-Zip
ZipFile JvmZipReader / ZipIO.readJvm(...)
ZipInputSstream ForwardScanZipReader / ZipIO.readStandard(...)
N/A NaiveLocalFileZipReader / ZipIO.readNaive(...)

There is also a ZipFile delegating reader AdaptingZipReader but it should primarily be used only for debugging purposes.

Building

Due to some sun.misc.Unsafe hacks (For faster deflate performance), you will get compiler warnings when first opening the project in IntelliJ. You can resolve this by changing the compiler target:

intellij compiler settings