􀀂􀀟􀀍􀀅 􀀂􀀝􀀉 Python zip applications and static includes

When I started building progfiguration, I settled on Python, mostly because I know it very well and it has a batteries-included standard library. However, there were a few things I wanted that Python doesn’t have by default1:

  • Single binary deployment without unpacking
  • Very fast package builds

After some research, I discovered that Python has supported a simple solution (with some caveats) for this since version 2.6: the zip application archive format.

Single binary deployment

Python programs are typically distributed as pip packages, and installing them is a little slow. They are unpacked in special filesystem locations (/usr/lib/python3.xx/site-packages/..., venv/lib/python3.xx/site-packages/..., etc) before they can be run, and you cannot run a pip package without this unpacking step, but I’d like to be able to just copy and run my program like any regular executable.

And unfortunately, building Pip packages is even slower, especially when taking the recommended approach of building in an isolated environment (the default when running a simple python -m build command, see the build docs.)

A zip application solves these problems.

Python has been able to execute zip files which contain a __main__.py file since version 2.6. In order to be executed by Python, an application archive simply has to be a standard zip file containing a __main__.py file which will be run as the entry point for the application.

The Python Zip Application Archive Format

This is really exciting because it gets the distribution benefits of a compiled binary. You can distribute a package that anyone with a Python interpreter already on their system can run. They don’t have to worry about virtual environments, pip packages, etc. It works on Windows and Unix. All you have to do is zip up your project directory and add a __main__.py. It’s really cool!

Zip application caveats

  1. Running the code from inside a pyz file is a bit slower. I suspect this is not signficant for the average case.
  2. Only pure Python code can be executed this way. Python packages that use compiled extensions will not work. While we can include our dependencies as well (see below), they must not require copmiled extensions, which rules out important libraries like cryptography, and important packages which depend on them like requests.
  3. Unlike pip packages, zip applications have no concept of their own version.

Static includes

This gets us most of the way there, but what if we require dependencies? We can make them available to our program as well just by copying them into the zip archive. Once you do, you can import them and use them in your program the same as if they were installed by pip. As noted in the previously linked documentation:

As usual for any Python script, the parent of the script (in this case the zip file) will be placed on sys.path and thus further modules can be imported from the zip file.

The Python Zip Application Archive Format

This reminds me of static linking, which is possible in programming environments like C, and the default for code written in Go. Python programs don’t have a linking step, so the name doesn’t apply exactly, and I’ve taken to thinking of them as static includes2.

Examples

Python includes a module called zipapp for creating these archives. You can use its command-line interface:

# See help
python3 -m zipapp --help

# Create a pyz for a package with an existing main function
python3 -m zipapp /path/to/yourpackage --output=yourpackage.pyz --main="yourpackage.cmd:main"

You can use its Python API instead:

import zipapp
zipapp.create_archive(
    "/path/to/yourpackage",
    target="yourpackage.pyz",
    main="yourpackage:main"
)

But any tool that creates a zip file will work; the only requirement is a __main__.py that contains code to execute. Here’s an example adapted from progfiguration’s zipapp function. It uses the regular zipfile library, without zipapp at all.

import pathlib
import stat
import zipfile

yourpackage_path = pathlib.Path("/path/to/yourpackage")
somedep_path = pathlib.Path("/path/to/somedependency")
pyz_path = pathlib.Path("yourpackage.pyz")
main_py = "import yourpackage; yourpackage.main()"

with open(pyz_path, "wb") as fp:
    # Writing a shebang like this is optional in zipapp,
    # but there's no reason not to since it's a valid zip file either way.
    fp.write(b"#!/usr/bin/env python3\n")

    # Note that we cannot combine the zipfile context manager with the open() context manager,
    # because the zipfile context manager writes a zip header when it opens,
    # and we need to write the shebang before the zip header.
    with zipfile.ZipFile(fp, "w") as z:

        # Copy your package into the zipfile
        for child in yourpackage_path.rglob("*"):
            child_relname = child.relative_to(yourpackage_path)
            z.write(child, "yourpackage/" + child_relname.as_posix())

        # Copy the dependency package into the zipfile
        for child in somedep_path.rglob("*"):
            child_relname = child.relative_to(somedep_path)
            z.write(child, "somedependency/" + child_relname.as_posix())

        # Add the __main__.py file to the zipfile root, which is required for zipapps
        z.writestr("__main__.py", main_py.encode("utf-8"))

# Make the zipapp executable
pyz_path.chmod(pyz_path.stat().st_mode | stat.S_IEXEC)

How fast is this?

I haven’t measured hard numbers here, but I’m quite happy with the result in progfiguration.

Building zip applications is extremely fast for programs of a few thousand lines across a few dozen files, especially with compression disabled in the zip archive. It beats pip by a mile.

Executing them is fast enough that I don’t notice a slowdown compared to doing things like running external commands. For the kind of code I’m running, zip archive reading will not be a bottleneck so pulling files out of it will feel fast. YMMV.


  1. Aside from those packaging and deployment items, I also wanted a more convenient way to run system commands, and ended up writing a magicrun() function to make system commands a bit more ergonomic. ↩︎

  2. I think it’s worth differentiating between vendoring and static inclusion. Vendoring implies copying third party code into your own repository. Static inclusion implies copying third party code into the package you are distributing. You might statically include code that isn’t vendored by pulling it down at build time or copying it from your operating system, and you might vendor code that isn’t statically included if you need it during development but you don’t want to distribute it. ↩︎

Responses

Comments are hosted on this site and powered by Remark42 (thanks!).

Webmentions are hosted on remote sites and syndicated via Webmention.io (thanks!).