You can install stream-zip from PyPI using pip.
pip install stream-zip
This installs the latest version of stream-zip.
If you regularly install stream-zip, such as during application deployment, to avoid unexpected changes as new versions are released, you can pin to a specific version. Poetry or pip-tools are popular tools that can be used for this.
Some understanding of ZIP files is needed to use stream-zip. A ZIP file is a collection of member files, where each member file has 5 properties.
- File name
- Modification time
- Mode (File type and permissions)
- Method (Compression mechanism)
- Binary contents
stream-unzip does not offer defaults for any of these 5 properties.
A single function is exposed,
stream_zip. This function takes an iterable of member files, and returns an iterable yielding the bytes of the ZIP file. Each member file must be a tuple of the above 5 properties.
from datetime import datetime from stat import S_IFREG from stream_zip import ZIP_32, stream_zip member_files = ( ( 'my-file-1.txt', # File name datetime.now(), # Modification time S_IFREG | 0o600, # Mode - regular file that owner can read and write ZIP_32, # ZIP_32 has good support but limited to 4GiB (b'Some bytes 1',), # Iterable of chunks of contents ), ( 'my-file-2.txt', datetime.now(), S_IFREG | 0o600, ZIP_32, (b'Some bytes 2',), ), ) zipped_chunks = stream_zip(member_files): for zipped_chunk in zipped_chunks: print(zipped_chunk)
In the above example
member_files is a tuple. However, any iterable that yields tuples can be used, for example a generator.
from datetime import datetime from stat import S_IFREG from stream_zip import ZIP_32, stream_zip def member_files(): modified_at = datetime.now() mode = S_IFREG | 0o600 yield ('my-file-1.txt', modified_at, mode, ZIP_32, (b'Some bytes 1',)) yield ('my-file-2.txt', modified_at, mode, ZIP_32, (b'Some bytes 2',)) zipped_chunks = stream_zip(member_files()): for zipped_chunk in zipped_chunks: print(zipped_chunk)
Each iterable of binary chunks of file contents could itself be a generator.
from datetime import datetime from stat import S_IFREG from stream_zip import ZIP_32, stream_zip def member_files(): modified_at = datetime.now() mode = S_IFREG | 0o600 def file_1_data(): yield b'Some bytes 1' def file_2_data(): yield b'Some bytes 2' yield ('my-file-1.txt', modified_at, mode, ZIP_32, file_1_data()) yield ('my-file-2.txt', modified_at, mode, ZIP_32, file_2_data()) zipped_chunks = stream_zip(member_files()): for zipped_chunk in zipped_chunks: print(zipped_chunk)
This pattern of generators is typical for stream-unzip. Depending on how the generators are defined, it allows avoiding loading all the bytes of member files into memory at once.
Symbolic links can be stored in ZIP files. The mode must have
stat.S_IFLNK, and the binary contents of the file must be the path to the target of the symbolic link.
from datetime import datetime from stat import S_IFLNK from stream_zip import ZIP_32 link = ('source.txt', datetime.now(), S_IFLNK | 0o600, ZIP_32, (b'target.txt',))
Directories can be stored in ZIP files as empty member files whose name ends with a forward slash
/ that also have
stat.S_IFDIR in the mode.
from datetime import datetime from stat import S_IFDIR from stream_zip import ZIP_32 directory = ('my-dir/', datetime.now(), S_IFDIR | 0o700, ZIP_32, ())
stat.S_IFDIR on the file is technically optional, but is probably good practice since it will match the resulting directory after extraction. The permissions on the directory, in the above case
0o700, are not preserved by all clients on extraction.
It is not required to have a directory member file in order to have files in that directory. So this pattern is most useful to have empty directories in the ZIP.
Each member file is compressed with a method that must be specified in client code. See Methods for an explanation of each.
In general, not all valid ZIP files are possible to be stream unzipped. However, all files generated by stream-zip are suitable for stream unzipping, for example by stream-unzip.
NO_COMPRESSION_64 methods do not stream - they buffer the entire binary contents of the file in memory before output. They do this to calculate the length and CRC 32 to output them before the binary contents in the ZIP. This is required in order for ZIP to be stream unzippable.
However, if you are able to calculate the length and CRC 32 ahead of time, you can pass
NO_COMPRESSION_32(uncompressed_size, crc_32) or
NO_COMPRESSION_64(uncompressed_size, crc_32) as the method. These methods do not buffer the binary contents in memory, and so are streaming methods. See Methods for details of all supported methods.
Note that even for the streaming methods, it’s not possible to completely stream-write ZIP files. Small bits of metadata for each member file, such as its name, must be placed at the end of the ZIP. In order to do this, stream-zip buffers this metadata in memory until it can be output. This is likely to only to make a meaningful difference to memory usage for extremely high numbers of member files.