Contributor guide

Development environment

If you’re reading this, you’re probably interested in contributing to py7zr. Thank you very much! The purpose of this guide is to get you to the point where you can make improvements to the py7zr and share them with the rest of the team.

Setup Python

The py7zr is written in the Python programming language. Python installation for various platforms with various ways. You need to install Python environment which support pip command. Venv/Virtualenv is recommended for development.

We have a test suite with python 3.6, 3.7, 3.8 and pypy3. If you want to run all the test with these versions and variant on your local, you should install these versions. You can run test with CI environment on Github actions.

Get Early Feedback

If you are contributing, do not feel the need to sit on your contribution until it is perfectly polished and complete. It helps everyone involved for you to seek feedback as early as you possibly can. Submitting an early, unfinished version of your contribution for feedback in no way prejudices your chances of getting that contribution accepted, and can save you from putting a lot of work into a contribution that is not suitable for the project.

Code Contributions

Steps submitting code

When contributing code, you’ll want to follow this checklist:

  1. Fork the repository on GitHub.

  2. Run the tox tests to confirm they all pass on your system. If they don’t, you’ll need to investigate why they fail. If you’re unable to diagnose this yourself, raise it as a bug report.

  3. Write tests that demonstrate your bug or feature. Ensure that they fail.

  4. Make your change.

  5. Run the entire test suite again using tox, confirming that all tests pass including the ones you just added.

  6. Send a GitHub Pull Request to the main repository’s master branch. GitHub Pull Requests are the expected method of code collaboration on this project.

Code review

Contribution will not be merged until they have been code reviewed. There are limited reviewer in the team, reviews from other contributors are also welcome. You should implemented a review feedback unless you strongly object to it.

Code style

The py7zr uses the PEP8 code style. In addition to the standard PEP8, we have an extended guidelines

  • line length should not exceed 125 charactors.

  • It also use MyPy static type check enforcement.

Profiling

CPU and memory profiling

Run-time memory errors and leaks are among the most difficult errors to locate and the most important to correct. Memory profiling is used to detect memory leaks or unwanted memory usages.

It is also a difficult work to improve performance. CPU profiling help us to understand where is a hot spot of execution of a program.

mprofile

mprofile is a tool to do a memory profiling task for python. py7zr project has a test configuration for the memory profiling.

env PYTEST_ADDOPTS=--run-slow tox -e mprof

This example run all the test cases includes conditions which requires running duration.

After running test, you can find a chart in project root. memory-profiile.png and raw data as mprofile_yyyyMMddhhmmss.dat

Class and module design

The py7zr take class design that categorized into several sub modules to reflect its role.

The main class is py7zr.SevenZipFile() class which provide API for library users. The main internal classes are in the submodule py7zr.archiveinfo, which takes class structure as same as .7z file format structure.

Another important submodule is py7zr.compressor module that hold all related compression and encryption proxy classes for corresponding libraries to convert various interfaces into common ISevenZipCompressor() and ISevenZipDecompressor() interface.

All UI related classes and functions are separated from core modules. cli submodule is a place for command line functions and pretty printings.

digraph "packages" { charset="utf-8" rankdir=BT "0" [label="py7zr", shape="box"]; "1" [label="py7zr.__main__", shape="box"]; "2" [label="py7zr.archiveinfo", shape="box"]; "3" [label="py7zr.callbacks", shape="box"]; "4" [label="py7zr.cli", shape="box"]; "5" [label="py7zr.compressor", shape="box"]; "6" [label="py7zr.exceptions", shape="box"]; "7" [label="py7zr.helpers", shape="box"]; "8" [label="py7zr.properties", shape="box"]; "9" [label="py7zr.py7zr", shape="box"]; "10" [label="py7zr.win32compat", shape="box"]; "0" -> "6" [arrowhead="open", arrowtail="none"]; "0" -> "8" [arrowhead="open", arrowtail="none"]; "0" -> "9" [arrowhead="open", arrowtail="none"]; "2" -> "5" [arrowhead="open", arrowtail="none"]; "2" -> "6" [arrowhead="open", arrowtail="none"]; "2" -> "7" [arrowhead="open", arrowtail="none"]; "2" -> "8" [arrowhead="open", arrowtail="none"]; "4" -> "3" [arrowhead="open", arrowtail="none"]; "4" -> "5" [arrowhead="open", arrowtail="none"]; "4" -> "7" [arrowhead="open", arrowtail="none"]; "4" -> "8" [arrowhead="open", arrowtail="none"]; "4" -> "9" [arrowhead="open", arrowtail="none"]; "5" -> "6" [arrowhead="open", arrowtail="none"]; "5" -> "7" [arrowhead="open", arrowtail="none"]; "5" -> "8" [arrowhead="open", arrowtail="none"]; "9" -> "2" [arrowhead="open", arrowtail="none"]; "9" -> "3" [arrowhead="open", arrowtail="none"]; "9" -> "5" [arrowhead="open", arrowtail="none"]; "9" -> "6" [arrowhead="open", arrowtail="none"]; "9" -> "7" [arrowhead="open", arrowtail="none"]; "9" -> "8" [arrowhead="open", arrowtail="none"]; }

Here is a whole classes diagram. There are part by part descriptions at Next sections.

digraph "classes" { charset="utf-8" rankdir=BT "0" [label="{AESCompressor|AES_CBC_BLOCKSIZE : int\lbuf\lcipher\lcycles : int\lflushed : bool\liv\lmethod\lsalt : bytes\l|compress(data)\lencode_filter_properties()\lflush()\l}", shape="record"]; "1" [label="{AESDecompressor|buf\lcipher\l|decompress(data)\l}", shape="record"]; "2" [label="{ArchiveCallback|\l|}", shape="record"]; "4" [label="{ArchiveFile|archivable\lcompressed\lcrc32\lemptystream\lfilename\lfolder\lid\lis_directory\lis_junction\lis_socket\lis_symlink\llastwritetime\lorigin\lposix_mode\lreadonly\lst_fmt\luncompressed\l|file_properties()\l}", shape="record"]; "5" [label="{ArchiveFileList|files_list : list\lindex : int\loffset : int\l|append(file_info)\l}", shape="record"]; "7" [label="{ArchiveInfo|blocks\lfilename\lheader_size\lmethod_names\lsize\lsolid\luncompressed\l|}", shape="record"]; "13" [label="{Buffer|view : memoryview\l|add(data)\lget()\lreset()\lset(data)\l}", shape="record"]; "16" [label="{Callback|\l|report_end(processing_file_path, wrote_bytes)\lreport_postprocess()\lreport_start(processing_file_path, processing_bytes)\lreport_start_preparation()\lreport_warning(message)\l}", shape="record"]; "19" [label="{CompressionMethod|ARM\lARMT\lBCJ\lBCJ_ARM\lBCJ_ARMT\lBCJ_IA64\lBCJ_PPC\lBCJ_SPARC\lCOPY\lCRYPT_AES256_SHA256\lCRYPT_RAR29AES\lCRYPT_ZIPCRYPT\lDELTA\lIA64\lLZMA\lLZMA2\lMISC_BROTLI\lMISC_BZIP2\lMISC_DEFLATE\lMISC_DEFLATE64\lMISC_LIZARD\lMISC_LZ4\lMISC_LZH\lMISC_LZS\lMISC_Z\lMISC_ZIP\lMISC_ZSTD\lNSIS_BZIP2\lNSIS_DEFLATE\lP7Z_BCJ\lP7Z_BCJ2\lPPC\lPPMD\lSPARC\lSWAP2\lSWAP4\l|}", shape="record"]; "20" [label="{CompressorChain|digest : int\lfilters : list\lmethods_map\lpacksize : int\lunpacksizes\l|add_filter(filter)\lcompress(data)\lflush()\l}", shape="record"]; "22" [label="{CopyCompressor|\l|compress(data)\lflush()\l}", shape="record"]; "23" [label="{CopyDecompressor|\l|decompress(data)\l}", shape="record"]; "26" [label="{DecompressorChain|filters : list\l|add_filter(filter)\ldecompress(data, max_length)\l}", shape="record"]; "27" [label="{DeflateCompressor|\l|compress(data)\lflush()\l}", shape="record"]; "28" [label="{DeflateDecompressor|flushed : bool\l|decompress(data)\l}", shape="record"]; "30" [label="{ExtractCallback|\l|}", shape="record"]; "31" [label="{FileInfo|archivable\lcompressed\lcrc32\lcreationtime\lfilename\lis_directory\luncompressed\l|}", shape="record"]; "32" [label="{FilesInfo|emptyfiles : list\lfiles : list\l|retrieve(cls, file)\lwrite(file)\l}", shape="record"]; "33" [label="{Folder|bindpairs : list\lcoders : list\lcompressor : NoneType\lcrc : int, NoneType\ldecompressor : NoneType\ldigestdefined : bool\lfiles : NoneType\lpacked_indices : list\lsolid : bool\lunpacksizes : list\l|get_compressor()\lget_decompressor(packsize, reset)\lget_unpack_size()\lis_simple(coder)\lprepare_coderinfo(filters)\lretrieve(cls, file)\lwrite(file)\l}", shape="record"]; "34" [label="{Header|files_info : NoneType\lmain_streams : NoneType\lsize : int\lsolid : bool\l|build_header(folders)\lretrieve(cls, fp, buffer, start_pos)\lwrite(file, afterheader, encoded, encrypted)\l}", shape="record"]; "35" [label="{HeaderStreamsInfo|packinfo\lunpackinfo\l|write(file)\l}", shape="record"]; "37" [label="{ISevenZipCompressor|\l|compress(data)\lflush()\l}", shape="record"]; "38" [label="{ISevenZipDecompressor|\l|decompress(data)\l}", shape="record"]; "41" [label="{MemIO|parent\l|close()\lflush()\lmkdir(parents, exist_ok)\lopen(mode)\lread(length)\lseek(position)\lwrite(data)\l}", shape="record"]; "44" [label="{NullIO|parent\l|close()\lflush()\lmkdir()\lopen(mode)\lread(length)\lwrite(data)\l}", shape="record"]; "45" [label="{PackInfo|crcs : list\lenable_digests : bool\lnumstreams : int\lpackpos : int\lpackpositions\lpacksizes : list\l|retrieve(cls, file)\lwrite(file)\l}", shape="record"]; "53" [label="{SevenZipCompressor|cchain\lcoders : list\ldigest\lfilters : NoneType, list\lmethods_map\lpacksize\lunpacksizes\l|compress(data)\lflush()\l}", shape="record"]; "54" [label="{SevenZipDecompressor|cchain\lconsumed : int\lcrc\ldigest : NoneType, int\linput_size\lmethods_map\lunpacksizes\l|check_crc()\ldecompress(data, max_length)\l}", shape="record"]; "55" [label="{SevenZipFile|afterheader\ldereference : bool\lencoded_header_mode : bool\lfilename : str\lfiles : NoneType\lfolder : NoneType\lfp\lheader : NoneType\lmode : str\lpassword : NoneType\lpassword_protected : bool\lq\lreporterd : NoneType\lsig_header : NoneType\lworker : NoneType\l|archiveinfo()\lclose()\lextract(path, targets)\lextractall(path, callback)\lgetnames()\llist()\lread(targets)\lreadall()\lreporter(callback)\lreset()\lset_encoded_header_mode(mode)\ltest()\ltestzip()\lwrite(file, arcname)\lwriteall(path, arcname)\l}", shape="record"]; "56" [label="{SignatureHeader|nextheadercrc : int\lnextheaderofs : int\lnextheadersize : int\lstartheadercrc : int\lversion : tuple\l|calccrc(length, header_crc)\lretrieve(cls, file)\lwrite(file)\l}", shape="record"]; "57" [label="{StreamsInfo|packinfo : NoneType\lsubstreamsinfo : NoneType\lunpackinfo : NoneType\l|read(file)\lretrieve(cls, file)\lwrite(file)\l}", shape="record"]; "58" [label="{SubstreamsInfo|digests : list\ldigestsdefined : list\lnum_unpackstreams_folders : list\lunpacksizes : list, NoneType\l|retrieve(cls, file, numfolders, folders)\lwrite(file, numfolders)\l}", shape="record"]; "59" [label="{SupportedMethods|formats : list\lmethods : list\l|get_coder(cls, filter)\lget_filter_id(cls, coder)\lget_method_id(cls, filter)\lis_compressor(cls, filter)\lis_crypto(cls, filter)\lis_native_coder(cls, coder)\lis_native_filter(cls, filter)\l}", shape="record"]; "64" [label="{UnpackInfo|datastreamidx : NoneType\lfolders : list\lnumfolders : NoneType, int\l|retrieve(cls, file)\lwrite(file)\l}", shape="record"]; "66" [label="{Worker|files\lheader\lsrc_start\ltarget_filepath : dict\l|archive(fp, folder, deref)\ldecompress(fp, folder, fq, size, compressed_size, src_end)\lextract(fp, parallel, q)\lextract_single(fp, files, src_start, src_end, q)\lregister_filelike(id, fileish)\l}", shape="record"]; "67" [label="{ZstdCompressor|\l|compress(data)\lflush()\l}", shape="record"]; "69" [label="{ZstdDecompressor|\l|decompress(data)\l}", shape="record"]; "0" -> "37" [arrowhead="empty", arrowtail="none"]; "1" -> "38" [arrowhead="empty", arrowtail="none"]; "2" -> "16" [arrowhead="empty", arrowtail="none"]; "22" -> "37" [arrowhead="empty", arrowtail="none"]; "23" -> "38" [arrowhead="empty", arrowtail="none"]; "27" -> "37" [arrowhead="empty", arrowtail="none"]; "28" -> "38" [arrowhead="empty", arrowtail="none"]; "30" -> "16" [arrowhead="empty", arrowtail="none"]; "35" -> "57" [arrowhead="empty", arrowtail="none"]; "37" -> "20" [arrowhead="empty", arrowtail="none"]; "38" -> "26" [arrowhead="empty", arrowtail="none"]; "67" -> "37" [arrowhead="empty", arrowtail="none"]; "69" -> "38" [arrowhead="empty", arrowtail="none"]; "5" -> "55" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="files", style="solid"]; "5" -> "55" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="files", style="solid"]; "13" -> "0" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="buf", style="solid"]; "13" -> "1" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="buf", style="solid"]; "20" -> "53" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="cchain", style="solid"]; "26" -> "54" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="cchain", style="solid"]; "32" -> "34" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="files_info", style="solid"]; "32" -> "34" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="files_info", style="solid"]; "33" -> "64" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="folders", style="solid"]; "34" -> "55" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="header", style="solid"]; "34" -> "55" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="header", style="solid"]; "45" -> "35" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="packinfo", style="solid"]; "45" -> "57" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="packinfo", style="solid"]; "45" -> "57" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="packinfo", style="solid"]; "53" -> "33" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="compressor", style="solid"]; "54" -> "33" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="decompressor", style="solid"]; "56" -> "55" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="sig_header", style="solid"]; "56" -> "55" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="sig_header", style="solid"]; "57" -> "34" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="additional_streams", style="solid"]; "57" -> "34" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="main_streams", style="solid"]; "57" -> "34" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="main_streams", style="solid"]; "58" -> "57" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="substreamsinfo", style="solid"]; "58" -> "57" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="substreamsinfo", style="solid"]; "64" -> "35" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="unpackinfo", style="solid"]; "64" -> "57" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="unpackinfo", style="solid"]; "64" -> "57" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="unpackinfo", style="solid"]; "66" -> "55" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="worker", style="solid"]; }

Header classes

Header related classes are in py7zr.archiveinfo submodule.

digraph "classes" { charset="utf-8" rankdir=BT "33" [label="{Folder|bindpairs : list\lcoders : list\lcompressor : NoneType\lcrc : int, NoneType\ldecompressor : NoneType\ldigestdefined : bool\lfiles : NoneType\lpacked_indices : list\lsolid : bool\lunpacksizes : list\l|get_compressor()\lget_decompressor(packsize, reset)\lget_unpack_size()\lis_simple(coder)\lprepare_coderinfo(filters)\lretrieve(cls, file)\lwrite(file)\l}", shape="record"]; "34" [label="{Header|files_info : FilesInfo\lmain_streams : StreamsInfo\lsize : int\lsolid : bool\l|build_header(folders)\lretrieve(cls, fp, buffer, start_pos)\lwrite(file, afterheader, encoded, encrypted)\l}", shape="record"]; "35" [label="{HeaderStreamsInfo|packinfo : PackInfo\lunpackinfo : UnpackInfo\l|write(file)\l}", shape="record"]; "45" [label="{PackInfo|crcs : list\lenable_digests : bool\lnumstreams : int\lpackpos : int\lpackpositions\lpacksizes : list\l|retrieve(cls, file)\lwrite(file)\l}", shape="record"]; "55" [label="{SevenZipFile}", shape="record"]; "56" [label="{SignatureHeader|nextheadercrc : int\lnextheaderofs : int\lnextheadersize : int\lstartheadercrc : int\lversion : tuple\l|calccrc(length, header_crc)\lretrieve(cls, file)\lwrite(file)\l}", shape="record"]; "57" [label="{StreamsInfo|packinfo : NoneType\lsubstreamsinfo : NoneType\lunpackinfo : NoneType\l|read(file)\lretrieve(cls, file)\lwrite(file)\l}", shape="record"]; "58" [label="{SubstreamsInfo|digests : list\ldigestsdefined : list\lnum_unpackstreams_folders : list\lunpacksizes : list\l|retrieve(cls, file, numfolders, folders)\lwrite(file, numfolders)\l}", shape="record"]; "64" [label="{UnpackInfo|folders : list\lnumfolders : int\l|retrieve(cls, file)\lwrite(file)\l}", shape="record"]; "35" -> "57" [arrowhead="empty", arrowtail="none"]; "33" -> "64" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="folders", style="solid"]; "34" -> "55" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="header", style="solid"]; "34" -> "55" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="header", style="solid"]; "45" -> "35" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="packinfo", style="solid"]; "45" -> "57" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="packinfo", style="solid"]; "45" -> "57" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="packinfo", style="solid"]; "56" -> "55" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="sig_header", style="solid"]; "56" -> "55" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="sig_header", style="solid"]; "57" -> "34" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="additional_streams", style="solid"]; "57" -> "34" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="main_streams", style="solid"]; "57" -> "34" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="main_streams", style="solid"]; "58" -> "57" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="substreamsinfo", style="solid"]; "58" -> "57" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="substreamsinfo", style="solid"]; "64" -> "35" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="unpackinfo", style="solid"]; "64" -> "57" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="unpackinfo", style="solid"]; "64" -> "57" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="unpackinfo", style="solid"]; }

Compressor classes

There are compression related classes in py7zr.compressor submodule.

digraph "compressor_classes" { charset="utf-8" rankdir=BT "0" [label="{AESCompressor|cycles : int\liv\lmethod\lsalt : bytes\l|compress(data)\lencode_filter_properties()\lflush()\l}", shape="record"]; "1" [label="{AESDecompressor|\l|decompress(data)\l}", shape="record"]; "19" [label="{CompressionMethod|ARM\lARMT\lBCJ\lBCJ_ARM\lBCJ_ARMT\lBCJ_IA64\lBCJ_PPC\lBCJ_SPARC\lCOPY\lCRYPT_AES256_SHA256\lCRYPT_RAR29AES\lCRYPT_ZIPCRYPT\lDELTA\lIA64\lLZMA\lLZMA2\lMISC_BROTLI\lMISC_BZIP2\lMISC_DEFLATE\lMISC_DEFLATE64\lMISC_LIZARD\lMISC_LZ4\lMISC_LZH\lMISC_LZS\lMISC_Z\lMISC_ZIP\lMISC_ZSTD\lNSIS_BZIP2\lNSIS_DEFLATE\lP7Z_BCJ\lP7Z_BCJ2\lPPC\lPPMD\lSPARC\lSWAP2\lSWAP4\l|}", shape="record"]; "20" [label="{CompressorChain|digest : int\lfilters : list\lpacksize : int\lunpacksizes\l|add_filter(filter)\lcompress(data)\lflush()\l}", shape="record"]; "22" [label="{CopyCompressor|\l|compress(data)\lflush()\l}", shape="record"]; "23" [label="{CopyDecompressor|\l|decompress(data)\l}", shape="record"]; "26" [label="{DecompressorChain|filters : list\l|add_filter(filter)\ldecompress(data, max_length)\l}", shape="record"]; "27" [label="{DeflateCompressor|\l|compress(data)\lflush()\l}", shape="record"]; "28" [label="{DeflateDecompressor|\l|decompress(data)\l}", shape="record"]; "33" [label="{Folder}", shape="record"]; "37" [label="{ISevenZipCompressor|\l|compress(data)\lflush()\l}", shape="record"]; "38" [label="{ISevenZipDecompressor|\l|decompress(data)\l}", shape="record"]; "53" [label="{SevenZipCompressor|cchain\lcoders : list\ldigest\lfilters : list\lpacksize\lunpacksizes\l|compress(data)\lflush()\l}", shape="record"]; "54" [label="{SevenZipDecompressor|cchain : list\lcrc\ldigest : int\lunpacksizes\l|check_crc()\ldecompress(data, max_length)\l}", shape="record"]; "59" [label="{SupportedMethods|formats : list\lmethods : list\l|get_coder(cls, filter)\lget_filter_id(cls, coder)\lget_method_id(cls, filter)\lis_compressor(cls, filter)\lis_crypto(cls, filter)\lis_native_coder(cls, coder)\lis_native_filter(cls, filter)\l}", shape="record"]; "67" [label="{ZstdCompressor|\l|compress(data)\lflush()\l}", shape="record"]; "69" [label="{ZstdDecompressor|\l|decompress(data)\l}", shape="record"]; "0" -> "37" [arrowhead="empty", arrowtail="none"]; "1" -> "38" [arrowhead="empty", arrowtail="none"]; "22" -> "37" [arrowhead="empty", arrowtail="none"]; "23" -> "38" [arrowhead="empty", arrowtail="none"]; "27" -> "37" [arrowhead="empty", arrowtail="none"]; "28" -> "38" [arrowhead="empty", arrowtail="none"]; "37" -> "20" [arrowhead="empty", arrowtail="none"]; "38" -> "26" [arrowhead="empty", arrowtail="none"]; "67" -> "37" [arrowhead="empty", arrowtail="none"]; "69" -> "38" [arrowhead="empty", arrowtail="none"]; "20" -> "53" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="cchain", style="solid"]; "26" -> "54" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="cchain", style="solid"]; "53" -> "33" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="compressor", style="solid"]; "54" -> "33" [arrowhead="diamond", arrowtail="none", fontcolor="green", label="decompressor", style="solid"]; }

IO Abstraction classes

There are two IO abstraction classes to provide Mem API and check method.

digraph "abstractio" { charset="utf-8" rankdir=BT "41" [label="{MemIO|\l|close()\lflush()\lmkdir(parents, exist_ok)\lopen(mode)\lread(length)\lseek(position)\lwrite(data)\l}", shape="record"]; "44" [label="{NullIO|\l|close()\lflush()\lmkdir()\lopen(mode)\lread(length)\lwrite(data)\l}", shape="record"]; }

Callback classes

Here is a callback interface class. ExtractCallback class is a concrete class used in CLI.

digraph "callbacks" { charset="utf-8" rankdir=BT "16" [label="{Callback|\l|report_end(processing_file_path, wrote_bytes)\lreport_postprocess()\lreport_start(processing_file_path, processing_bytes)\lreport_start_preparation()\lreport_warning(message)\l}", shape="record"]; "30" [label="{ExtractCallback|\l|}", shape="record"]; "30" -> "16" [arrowhead="empty", arrowtail="none"]; }

Classes details

Here is a detailed interface documentation for implementer.

ArchiveFile Objects

Read 7zip format archives.

class py7zr.py7zr.ArchiveFile(id: int, file_info: Dict[str, Any])[source]

Represent each files metadata inside archive file. It holds file properties; filename, permissions, and type whether it is directory, link or normal file.

Instances of the ArchiveFile class are returned by iterating files_list of SevenZipFile objects. Each object stores information about a single member of the 7z archive. Most of users use extractall().

The class also hold an archive parameter where file is exist in archive file folder(container).

property archivable: bool

File has a Windows archive flag.

property compressed: int | None

Compressed size

property crc32: int | None

CRC of archived file(optional)

property emptystream: bool

True if file is empty(0-byte file), otherwise False

file_properties() Dict[str, Any][source]

Return file properties as a hash object. Following keys are included: ‘readonly’, ‘is_directory’, ‘posix_mode’, ‘archivable’, ‘emptystream’, ‘filename’, ‘creationtime’, ‘lastaccesstime’, ‘lastwritetime’, ‘attributes’

property filename: str

return filename of archive file.

has_strdata() bool[source]

True if file content is set by writestr() method otherwise False.

property is_directory: bool

True if file is a directory, otherwise False.

property is_junction: bool

True if file is a junction/reparse point on windows, otherwise False.

property is_socket: bool

True if file is a socket, otherwise False.

True if file is a symbolic link, otherwise False.

property lastwritetime: ArchiveTimestamp | None

Return last written timestamp of a file.

property posix_mode: int | None

posix mode when a member has a unix extension property, or None :return: Return file stat mode can be set by os.chmod()

property readonly: bool

True if file is readonly, otherwise False.

property st_fmt: int | None
Returns:

Return the portion of the file mode that describes the file type

class py7zr.py7zr.ArchiveFileList(offset: int = 0)[source]

Iteratable container of ArchiveFile.

class py7zr.py7zr.ArchiveInfo(filename: str, stat: stat_result, header_size: int, method_names: List[str], solid: bool, blocks: int, uncompressed: List[int])[source]

Hold archive information

class py7zr.py7zr.FileInfo(filename, compressed, uncompressed, archivable, is_directory, creationtime, crc32)[source]

Hold archived file information.

class py7zr.py7zr.SevenZipFile(file: BinaryIO | str | Path, mode: str = 'r', *, filters: List[Dict[str, int]] | None = None, dereference=False, password: str | None = None, header_encryption: bool = False, blocksize: int | None = None, mp: bool = False)[source]

The SevenZipFile Class provides an interface to 7z archives.

close()[source]

Flush all the data into archive and close it. When close py7zr start reading target and writing actual archive file.

extractall(path: Any | None = None, callback: ExtractCallback | None = None) None[source]

Extract all members from the archive to the current working directory and set owner, modification time and permissions on directories afterwards. path specifies a different directory to extract to.

getnames() List[str][source]

Return the members of the archive as a list of their names. It has the same order as the list returned by getmembers().

list() List[FileInfo][source]

Returns contents information

reset() None[source]

When read mode, it reset file pointer, decompress worker and decompressor

write(file: Path | str, arcname: str | None = None)[source]

Write single target file into archive.

writeall(path: Path | str, arcname: str | None = None)[source]

Write files in target path into archive.

class py7zr.py7zr.Worker(files, src_start: int, header, mp=False)[source]

Extract worker class to invoke handler.

archive(fp: BinaryIO, files, folder, deref=False)[source]

Run archive task for specified 7zip folder.

decompress(fp: BinaryIO, folder, fq: IO[Any], size: int, compressed_size: int | None, src_end: int, q: Queue | None = None) int[source]

decompressor wrapper called from extract method.

Parameters:
  • fp – archive source file pointer

  • folder – Folder object that have decompressor object.

  • fq – output file pathlib.Path

  • size – uncompressed size of target file.

  • compressed_size – compressed size of target file.

  • src_end – end position of the folder

  • q – the queue for the reporter

:returns None

extract(fp: BinaryIO, path: Path | None, parallel: bool, skip_notarget=True, q=None) None[source]

Extract worker method to handle 7zip folder and decompress each files.

extract_single(fp: BinaryIO | str, files, path, src_start: int, src_end: int, q: Queue | None, exc_q: Queue | None = None, skip_notarget=True) None[source]

Single thread extractor that takes file lists in single 7zip folder.

register_filelike(id: int, fileish: MemIO | Path | None) None[source]

register file-ish to worker.

py7zr.py7zr.is_7zfile(file: BinaryIO | str | Path) bool[source]

Quickly see if a file is a 7Z file by checking the magic number. The file argument may be a filename or file-like object too.

py7zr.py7zr.pack_7zarchive(base_name, base_dir, owner=None, group=None, dry_run=None, logger=None)[source]

Function for registering with shutil.register_archive_format().

py7zr.py7zr.unpack_7zarchive(archive, path, extra=None)[source]

Function for registering with shutil.register_unpack_format().

archiveinfo module

class py7zr.archiveinfo.Bond(incoder, outcoder)[source]

Represent bindings between two methods. bonds[i] = (incoder, outstream) means methods[i].stream[outstream] output data go to method[incoder].stream[0]

class py7zr.archiveinfo.FilesInfo[source]

holds file properties

class py7zr.archiveinfo.Folder[source]

a “Folder” represents a stream of compressed data. coders: list of coder num_coders: length of coders coder: hash list keys of coders: method, numinstreams, numoutstreams, properties unpacksizes: uncompressed sizes of outstreams

class py7zr.archiveinfo.Header[source]

the archive header

class py7zr.archiveinfo.HeaderStreamsInfo[source]

Header version of StreamsInfo

class py7zr.archiveinfo.PackInfo[source]

information about packed streams

class py7zr.archiveinfo.SignatureHeader[source]

The SignatureHeader class hold information of a signature header of archive.

class py7zr.archiveinfo.StreamsInfo[source]

information about compressed streams

class py7zr.archiveinfo.SubstreamsInfo[source]

defines the substreams of a folder

class py7zr.archiveinfo.UnpackInfo[source]

combines multiple folders

class py7zr.archiveinfo.WriteWithCrc(fp: BinaryIO)[source]

Thin wrapper for file object to calculate crc32 when write called.

tell()[source]

Return current stream position.

py7zr.archiveinfo.read_real_uint64(file: BinaryIO) Tuple[int, bytes][source]

read 8 bytes, return unpacked value as a little endian unsigned long long, and raw data.

py7zr.archiveinfo.read_uint32(file: BinaryIO) Tuple[int, bytes][source]

read 4 bytes, return unpacked value as a little endian unsigned long, and raw data.

py7zr.archiveinfo.read_uint64(file: BinaryIO) int[source]

read UINT64, definition show in write_uint64()

py7zr.archiveinfo.read_utf16(file: BinaryIO) str[source]

read a utf-16 string from file

py7zr.archiveinfo.write_real_uint64(file: BinaryIO | WriteWithCrc, value: int)[source]

write 8 bytes, as an unsigned long long.

py7zr.archiveinfo.write_uint32(file: BinaryIO | WriteWithCrc, value)[source]

write uint32 value in 4 bytes.

py7zr.archiveinfo.write_uint64(file: BinaryIO | WriteWithCrc, value: int)[source]

UINT64 means real UINT64 encoded with the following scheme:

Size of encoding sequence depends from first byte:
First_Byte Extra_Bytes Value
(binary)
0xxxxxxx : ( xxxxxxx )
10xxxxxx BYTE y[1] : ( xxxxxx << (8 * 1)) + y
110xxxxx BYTE y[2] : ( xxxxx << (8 * 2)) + y
1111110x BYTE y[6] : ( x << (8 * 6)) + y
11111110 BYTE y[7] : y
11111111 BYTE y[8] : y
py7zr.archiveinfo.write_utf16(file: BinaryIO | WriteWithCrc, val: str)[source]

write a utf-16 string to file

compressor module

class py7zr.compressor.AESCompressor(password: str, blocksize: int | None = None)[source]

AES Compression(Encryption) class. It accept pre-processing filter which may be a LZMA compression.

compress(data)[source]

Compression + AES encryption with 16byte alignment.

flush()[source]

Flush output buffer(interface) :return: output data

class py7zr.compressor.AESDecompressor(aes_properties: bytes, password: str, blocksize: int | None = None)[source]

Decrypt data

decompress(data: bytes | bytearray | memoryview, max_length: int = -1) bytes[source]

Decompress data (interface) :param data: input data :param max_length: maximum length of output data when it can respect, otherwise ignore. :return: output data

class py7zr.compressor.BCJDecoder(size: int)[source]
decompress(data: bytes | bytearray | memoryview, max_length: int = -1) bytes[source]

Decompress data (interface) :param data: input data :param max_length: maximum length of output data when it can respect, otherwise ignore. :return: output data

class py7zr.compressor.BCJEncoder[source]
compress(data: bytes | bytearray | memoryview) bytes[source]

Compress data (interface) :param data: input data :return: output data

flush()[source]

Flush output buffer(interface) :return: output data

class py7zr.compressor.BcjArmDecoder(size: int)[source]
decompress(data: bytes | bytearray | memoryview, max_length: int = -1) bytes[source]

Decompress data (interface) :param data: input data :param max_length: maximum length of output data when it can respect, otherwise ignore. :return: output data

class py7zr.compressor.BcjArmEncoder[source]
compress(data: bytes | bytearray | memoryview) bytes[source]

Compress data (interface) :param data: input data :return: output data

flush()[source]

Flush output buffer(interface) :return: output data

class py7zr.compressor.BcjArmtDecoder(size: int)[source]
decompress(data: bytes | bytearray | memoryview, max_length: int = -1) bytes[source]

Decompress data (interface) :param data: input data :param max_length: maximum length of output data when it can respect, otherwise ignore. :return: output data

class py7zr.compressor.BcjArmtEncoder[source]
compress(data: bytes | bytearray | memoryview) bytes[source]

Compress data (interface) :param data: input data :return: output data

flush()[source]

Flush output buffer(interface) :return: output data

class py7zr.compressor.BcjPpcDecoder(size: int)[source]
decompress(data: bytes | bytearray | memoryview, max_length: int = -1) bytes[source]

Decompress data (interface) :param data: input data :param max_length: maximum length of output data when it can respect, otherwise ignore. :return: output data

class py7zr.compressor.BcjPpcEncoder[source]
compress(data: bytes | bytearray | memoryview) bytes[source]

Compress data (interface) :param data: input data :return: output data

flush()[source]

Flush output buffer(interface) :return: output data

class py7zr.compressor.BcjSparcDecoder(size: int)[source]
decompress(data: bytes | bytearray | memoryview, max_length: int = -1) bytes[source]

Decompress data (interface) :param data: input data :param max_length: maximum length of output data when it can respect, otherwise ignore. :return: output data

class py7zr.compressor.BcjSparcEncoder[source]
compress(data: bytes | bytearray | memoryview) bytes[source]

Compress data (interface) :param data: input data :return: output data

flush()[source]

Flush output buffer(interface) :return: output data

class py7zr.compressor.BrotliCompressor(level)[source]
compress(data: bytes | bytearray | memoryview) bytes[source]

Compress data (interface) :param data: input data :return: output data

flush() bytes[source]

Flush output buffer(interface) :return: output data

class py7zr.compressor.BrotliDecompressor(properties: bytes, block_size: int)[source]
decompress(data: bytes | bytearray | memoryview, max_length: int = -1)[source]

Decompress data (interface) :param data: input data :param max_length: maximum length of output data when it can respect, otherwise ignore. :return: output data

class py7zr.compressor.CopyCompressor[source]
compress(data: bytes | bytearray | memoryview) bytes[source]

Compress data (interface) :param data: input data :return: output data

flush()[source]

Flush output buffer(interface) :return: output data

class py7zr.compressor.CopyDecompressor[source]
decompress(data: bytes | bytearray | memoryview, max_length: int = -1) bytes[source]

Decompress data (interface) :param data: input data :param max_length: maximum length of output data when it can respect, otherwise ignore. :return: output data

class py7zr.compressor.Deflate64Compressor[source]
compress(data: bytes | bytearray | memoryview, max_length: int = -1) bytes[source]

Compress data (interface) :param data: input data :return: output data

flush() bytes[source]

Flush output buffer(interface) :return: output data

class py7zr.compressor.Deflate64Decompressor[source]
decompress(data: bytes | bytearray | memoryview, max_length: int = -1) bytes[source]

Decompress data (interface) :param data: input data :param max_length: maximum length of output data when it can respect, otherwise ignore. :return: output data

class py7zr.compressor.DeflateCompressor[source]
compress(data)[source]

Compress data (interface) :param data: input data :return: output data

flush()[source]

Flush output buffer(interface) :return: output data

class py7zr.compressor.DeflateDecompressor[source]
decompress(data: bytes | bytearray | memoryview, max_length: int = -1) bytes[source]

Decompress data (interface) :param data: input data :param max_length: maximum length of output data when it can respect, otherwise ignore. :return: output data

class py7zr.compressor.ISevenZipCompressor[source]
abstract compress(data: bytes | bytearray | memoryview) bytes[source]

Compress data (interface) :param data: input data :return: output data

abstract flush() bytes[source]

Flush output buffer(interface) :return: output data

class py7zr.compressor.ISevenZipDecompressor[source]
abstract decompress(data: bytes | bytearray | memoryview, max_length: int = -1) bytes[source]

Decompress data (interface) :param data: input data :param max_length: maximum length of output data when it can respect, otherwise ignore. :return: output data

class py7zr.compressor.LZMA1Compressor(filters)[source]
compress(data: bytes | bytearray | memoryview) bytes[source]

Compress data (interface) :param data: input data :return: output data

flush() bytes[source]

Flush output buffer(interface) :return: output data

class py7zr.compressor.LZMA1Decompressor(filters, unpacksize)[source]
decompress(data: bytes | bytearray | memoryview, max_length: int = -1) bytes[source]

Decompress data (interface) :param data: input data :param max_length: maximum length of output data when it can respect, otherwise ignore. :return: output data

class py7zr.compressor.MethodsType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
class py7zr.compressor.PpmdCompressor(properties: bytes)[source]

Compress with PPMd compression algorithm

compress(data: bytes | bytearray | memoryview) bytes[source]

Compress data (interface) :param data: input data :return: output data

flush()[source]

Flush output buffer(interface) :return: output data

class py7zr.compressor.PpmdDecompressor(properties: bytes, blocksize: int | None = None)[source]

Decompress PPMd compressed data

decompress(data: bytes | bytearray | memoryview, max_length=-1) bytes[source]

Decompress data (interface) :param data: input data :param max_length: maximum length of output data when it can respect, otherwise ignore. :return: output data

class py7zr.compressor.SevenZipCompressor(filters=None, password=None, blocksize: int | None = None)[source]

Main compressor object to configured for each 7zip folder.

class py7zr.compressor.SevenZipDecompressor(coders: List[Dict[str, Any]], packsize: int, unpacksizes: List[int], crc: int | None, password: str | None = None, blocksize: int | None = None)[source]

Main decompressor object which is properly configured and bind to each 7zip folder. because 7zip folder can have a custom compression method

class py7zr.compressor.SupportedMethods[source]

Hold list of methods.

class py7zr.compressor.ZstdCompressor(level: int)[source]
compress(data: bytes | bytearray | memoryview) bytes[source]

Compress data (interface) :param data: input data :return: output data

flush() bytes[source]

Flush output buffer(interface) :return: output data

class py7zr.compressor.ZstdDecompressor(properties: bytes, blocksize: int)[source]
decompress(data: bytes | bytearray | memoryview, max_length: int = -1) bytes[source]

Decompress data (interface) :param data: input data :param max_length: maximum length of output data when it can respect, otherwise ignore. :return: output data

helpers module

class py7zr.helpers.ArchiveTimestamp[source]

Windows FILETIME timestamp.

as_datetime()[source]

Convert FILETIME to Python datetime object.

totimestamp() float[source]

Convert 7z FILETIME to Python timestamp.

exception py7zr.helpers.BufferOverflow[source]
class py7zr.helpers.LocalTimezone[source]
dst(dt)[source]

datetime -> DST offset as timedelta positive east of UTC.

fromutc(dt)[source]

datetime in UTC -> datetime in local time.

tzname(dt)[source]

datetime -> string name of time zone.

utcoffset(dt)[source]

datetime -> timedelta showing offset from UTC, negative values indicating West of UTC

class py7zr.helpers.MemIO(buf: BinaryIO)[source]

pathlib.Path-like IO class to write memory(io.Bytes)

class py7zr.helpers.NullIO[source]

pathlib.Path-like IO class of /dev/null

class py7zr.helpers.UTC[source]
dst(dt)[source]

datetime -> DST offset as timedelta positive east of UTC.

tzname(dt)[source]

datetime -> string name of time zone.

utcoffset(dt)[source]

datetime -> timedelta showing offset from UTC, negative values indicating West of UTC

py7zr.helpers.calculate_crc32(data: bytes, value: int = 0, blocksize: int = 1048576) int[source]

Calculate CRC32 of strings with arbitrary lengths.

py7zr.helpers.calculate_key(password: bytes, cycles: int, salt: bytes, digest: str) bytes

Calculate 7zip AES encryption key. Concat values in order to reduce number of calls of Hash.update().

py7zr.helpers.canonical_path(target: PurePath) PurePath[source]

Return a canonical path of target argument.

py7zr.helpers.check_archive_path(arcname: str) bool[source]

Check arcname argument is valid for archive. It should not be absolute, if so it returns False. It should not be evil traversal attack path. Otherwise, returns True.

py7zr.helpers.filetime_to_dt(ft)[source]

Convert Windows NTFS file time into python datetime object.

py7zr.helpers.get_sanitized_output_path(fname: str, path: Path | None) Path[source]

check f.filename has invalid directory traversals When condition is not satisfied, raise Bad7zFile

py7zr.helpers.is_path_valid(target: Path, parent: Path) bool[source]

Check if target path is valid against parent path. It returns False when target path has ‘..’ and point out of parent path. Otherwise, returns True.

py7zr.helpers.is_relative_to(my: PurePath, *other) bool[source]

Return True when path is relative to other path, otherwise False.

Cross-platform islink implementation. Support Windows NT symbolic links and reparse points.

Cross-platform compat implementation of os.readlink and Path.readlink(). Support Windows NT symbolic links and reparse points. When called with path argument as pathlike(str), return result as a pathlike(str). When called with Path object, return also Path object. When called with path argument as bytes, return result as a bytes.

py7zr.helpers.remove_relative_path_marker(path: str) str[source]

Removes ‘./’ from the beginning of a path-like string