Advanced usage

Stream 7z content

The py7zr provide a way to stream archive content with extract and extractall methods. The methods accept an object which implement py7zr.io.WriterFactory interface as factory arugment. Your custom class which implements WriterFactory interface should have create method. The create method should return your custom class object which implements Py7zIO interface. The Py7zIO interface is as similar as Python Standard BinaryIO but it also have size method.

When the py7zr extract the archive contents, it calls the factory object with filename and ask creating the io object. The py7zr write the io object with write method, which is as similar as an object returned by the standard open method.

You can process write in your custom class. The method is called on-the-fly when the py7zr move to extract the target archive file.

Note: Because the py7zr may run in multi-threaded, your custom class should be thread-safe.

The py7zr provide the way to stream content without buffering all the archive content in a memory.

Example to extract into network storage

Here is a pseudo code to demonstrate a way to extract contents into cloud storage. To simplify things in an example code, all the authentication, bucket operation, all the mandatory headers are omitted.

def extract_stream():
    factory = StreamIOFactory()
    with py7zr.SevenZipFile("target.7z") as archive:
        archive.extractall(factory=factory)


class StreamIO(py7zr.io.Py7zIO):
    """Example network storage writer."""

    def __init__(self, fname: str):
        self.fname = fname
        self.length = 0

    def write(self, data: [bytes, bytearray]):
        self.length += len(data)
        # the py7zr will call write multiple time, so you need to append data
        requests.put("https://your.custom.network.storage.example.com/append/to/file/command/", data=data)

    def read(self, size: Optional[int] = None) -> bytes:
        return b''

    def seek(self, offset: int, whence: int = 0) -> int:
        return offset

    def flush(self) -> None:
        pass

    def size(self) -> int:
        return self.length


class StreamIOFactory(py7zr.io.WriterFactory):
    """Factory class to return StreamWriter object."""

    def __init__(self):
        self.products = {}

    def create(self, filename: str) -> Py7zIO:
        product = StreamIO(filename)
        self.products[filename] = product
        return product