Farm Development

The Python Packaging Problem

At PyCon 2009 the fact that Python needs to solve the "packaging problem" came up a few times. This is not a new discussion. However, the problem is still not completely solved so here I'll point out the details of the problem, the unsolved parts, the solved parts, and how the solved parts could be solved better.

#1 Gimme A Module

You want to install a module that someone else built into your Python installation. Easy, you download the module from the Python Package Index (PyPI), untar it, and run

$ sudo python setup.py install

Or if you don't mind having setuptools installed you can do all that in one command. For a concise example, imagine you want to install docutils

$ sudo easy_install docutils

If you have a module but you haven't made it available on PyPI yet, simply create a standard setup.py script and run

$ python setup.py sdist register upload -s

Let's go back to easy_install for a minute. All this script does is lookup the package by name on PyPI. In its most straightforward form, it downloads the source package, patches distutils so it can run python setup.py egg_info within the downloaded source (for Python < 2.5), then runs python setup.py install like you would manually, shown above. If there is an egg, then it will download the egg (more on egg_info and eggs later).

Why don't people like easy_install? From the first couple hits on Google, it's because no one understands easy_install. But there are two things that I don't like:

  • It adds an easy-install.pth file (this is standard Python, see .pth files) but this file has a hack that alters sys.path so that once you install a package globally with easy_install you cannot use it locally using PYTHONPATH or sys.path very easily.
  • It installs module directories within a version stamped directory. This only works because of the easy-install.pth file and thus ties you to the site directories which makes it hard to work with local packages (more on that below). Setuptools uses this convention so that you can have simultaneous versions if you want and namespace packages. There are better solutions for this use case now.

#2 Gimme A Module For Just This One Project

Installing everything into your global system makes it hard to work on multiple projects on one machine. This is mostly a development problem but it's also a deployment problem because it's generally overkill to build a new machine (or new virtual machine) for each Python project you want to deploy.

The vanilla Python solution to this is PYTHONPATH or sys.path. Pretty straightforward.

However, due to the easy_install problems I pointed out above, the vanilla solution is not sufficient if you want to mix and match. Instead you need to use virtualenv which works well for both development and deployment. However, it's a little bit overkill. The vanilla approach is simple: just tell me where the other modules are. It shouldn't require you to symlink your entire Python installation into a new location, which is how virtualenv works.

#3 Gimme A Module Greater Than Version X But Less Than Version Y

But wait! It's all fine and well to download and install a module to put in your global system, but how do you upgrade it? And what version did that other-developer-who-no-longer-works-here install, anyway? Some modules define a version attribute __version__ in __init__.py (and django defines VERSION) but there is no standard and most modules don't define a version at all except for in their setup.py script. Between setuptools and PyPI this is solved.

In easy_install you can manage versions like this

$ easy_install docutils==0.4
$ easy_install docutils>=0.4
$ easy_install docutils>=0.4,<0.5

Those do what you'd expect. You can also upgrade a module like this

$ easy_install -U docutils

The way this all works is by storing meta data on disk in the egg-info format and by simply making HTTP requests to PyPI. So, to avoid the easy-install.pth problem there is now a new tool for this called pip and it works the same but instead of using version-stamped subdirectories it installs modules "flat" just like you would if you were to manually run python setup.py install. Pip also adds a module.egg-info file next to the flat module so that the currently installed version can be detected (for upgrading, requirements, etc). Pip even handles namespaced packages by preserving egg-info dirs and simply stitching together each one into a flat, python-compatible module. Pip does not support installing multiple versions of the same module in the same place but you can use virtualenv for that.

#3 b. Gimme A Module Greater Then Version X But I Don't Want an Alpha Release

The version request against the PyPI site gets tricky when people release "alpha" or "beta" versions. For example

$ easy_install SQLAlchemy>=0.5,<0.6

This will work as expected unless a package named SQLAlchemy-0.6-alpha exists. It will download 0.6-beta even though your code is only compatible with 0.5. This may be a bug in easy_install and pip but there is a lot of ambiguity around these types of version numbers. This is an unsolved problem.

#4 Gimme A Module At Version X For Just This One Project

This is the most important use case. When you start to work with lots of projects that have lots of dependencies (e.g. Pylons) you need a way to specify the different versions that each one requires and keep them independent from each other so the dependencies do not conflict. You can do that with install_requires=['SQLAlchemy>=0.5,<0.6'] in a setuptools enabled setup.py script but then you need to use easy_install and virtualenv or pip and virtualenv.

That's fine but what if you want to provide your users with all dependencies right out of the box? Out of all the projects that want to distribute dependencies (i.e. Google App Engine SDK, Django, Pinax, and others) I have not seen one to adopt egg-info. So how can you be sure of what versions they are distributing? (I think these projects all have the version numbers documented in human readable form but you see my point.) It also seems that pip and easy_install are already disagreeing on an egg-info format (see PEP 376). Sigh.

Conclusions

I think flattening modules to make them fully Python compatible and tacking on an egg-info directory is the way to go. This is how pip does it. Pip is not yet a drop-in replacement for easy_install though because it does not support binary packages. This is a problem if you don't keep build tools (gcc) on your production server because you can't download the source and build it with pip.

What should we do? For starters, apply a patch to pip for binary handling (in other words, so that it can download eggs). Next, we need better tools for managing a directory of modules that can be committed to version control and distributed. I'm working on a pip wrapper named eco for that but I'm still working out some kinks. Feel free to play around with it.

Why Not Just Use Nifty-Package-Manager-Foo ?

I have heard that the answer to all of this is to use rpm, apt-get, macports, fink, yum, BSD ports, or whatever. I don't really see how this is a solution since each package manager still has to decide how to install the package and where to store the version meta data.

Did I miss anything? Any other suggestions?

UPDATE: Tarek Ziadé is working on the metadata standardization process and posting drafts and links to PEPs here: http://wiki.python.org/moin/Distutils

  • Re: The Python Packaging Problem

    How about uninstalling things?

  • Re: The Python Packaging Problem

    I think you missed something in the argumentation about the "rpm, apt-get, macports, fink, yum, BSD ports, or whatever" part. The point here is not that these solve any problems for python in general, but that for any particular user, their respective packaging-system already does what easy_install could do.

    If you use one of these package-managers, then they are a one-stop solution for all packaging problems:

    You want to know if there are updates available for any software? The package-manager knows (except if you used easy_install for that package).

    You want to know if there is a clash in dependencies for any software (e.g. a ruby package)? Your package manager knows (except if you used easy_install for anything that is affected).

    You want to upgrade your whole system in one command that takes care of everything? Your package-manager can do that (except if you used easy_install for any kind of software).

    Easy_install might be a good idea (as every package-manager is), but only for systems that don't have a capable packaging system on their own.

    So I think the best would be to make it very easy for any package-manager to extract all necessary meta-information, but don't provide any alternative package-manager.

  • Re: The Python Packaging Problem

    How about zc.buildout Kumar?

  • Re: The Python Packaging Problem

    Installing files into global site-packages is a big problem IMHO. Maybe it's not a big deal for Python developers (they *all* recommend "sudo ez_instal ...") but it's a big deal for distro maintainers.

    Example: I didn't upload python-sqlalchemy 0.5.x to Debian unstable (it waited in experimental for quite some time) until I was sure all reverse dependencies are ready for 0.5.

    It's frustrating when you spend so much time to glue all the libraries together and then you receive bugs with Eggs in tracebacks (installed using ez_install). Please don't tell me that I should add pkg_resources.require() in all other modules/applications - setuptools is not stdlib and there are some upstreams (like me) who don't (refuse to?) use it.

    Please also note that there's no way to cleanly (!!!) uninstall what ez_install installs (and obviously rpm/dpkg/etc. cannot do this as these files are not registered)

  • Re: The Python Packaging Problem

    Easy install (and pip?) allows to create script than will work on windows in a virtual environment. For scripts created by Distutils, you would have to type "python x:\path\to\virtual\env\Scripts\some_script.py some_arg".

    If you target webapp deployment, lacking windows support is fine but it is not good enough for a general purpose package distribution system.

  • Re: The Python Packaging Problem

    As far as I can see, the "packaging problem" will keep returning as long as the people developing the multitude of apparently inadequate solutions keep ignoring and brushing aside "rpm, apt-get, macports, fink, yum, BSD ports, or whatever".

    I noticed that in Ian Bicking's PyCon keynote (with the Beavis and Butthead-level IRC commentary) it was remarked that suggesting that anyone take a look at how others solve this problem is "stop energy": the usual brush-off when people have their own "not invented here" issues. As long as everyone keeps trying to make end-to-end solutions ("let's solve all the problems again"/"let's play with all the toys again") and overriding apt-get and friends, their work will be limited to a neutered form when it meets existing package and dependency management solutions.

    I use distutils to populate Debian packages, although you'd have a hard time expressing that in the recent packaging survey - it was "distutils, setuptools or other" as far as I recall - and what I and the rest of the not insignificant number of system packaging users would like to see is some sanity in what distutils does, not to see clever reinventions of stuff we don't need.

    I'd spell out what would be required, but I imagine that it's just "stop energy" to everyone who thinks that making sane package images is not a sexy enough use-case for their attention. As a result, I can see people using better tools for such purposes in the longer term (just as there are things like python-central available right now).

  • Re: The Python Packaging Problem

    More problems with rpm, apt-get, macports, fink, etc:

    1. I don't have root on every machine that I use.

    2. I don't have root on every machine that I use. (Some people don't seem to remember this case, so I want to emphasize it.)

    3. You can't pick one package manager, because there is not one that is already installed everywhere. (Unless you would like to do a fink port for linux...)

    4. I can't assemble a distribution for all of the package managers, because I don't have any way to build/test for most of them.

    Therefore, I need some way to distribute my software that does not rely on those packaging tools. Somebody who makes those packages is welcome to derive one from my distribution, with the caveat that I expect them not to break anything.

  • Re: The Python Packaging Problem

    Maybe mod yum/apt/rpm/etc to allow:

    aptitude install --subsys=python --subsys-opt=version:>=0.5,<0.6;location:$HOME/pylib SQLAlchemy

  • Re: The Python Packaging Problem

    Mark S: "Therefore, I need some way to distribute my software that does not rely on those packaging tools. Somebody who makes those packages is welcome to derive one from my distribution, with the caveat that I expect them not to break anything."

    I wasn't arguing that people should rely on those tools. But those tools should be able to readily build on the output of distutils. The solution to the packaging problem is not trying to do everything and ignoring downstream solutions.

    And you can install Debian packages in a sandbox using tools like fakeroot, fakechroot and schroot, so you don't need to be root at all. This is more powerful than various Python-only solutions because you're then getting access to the entire range of dependency-managed software in Debian.

  • Re: The Python Packaging Problem

    @gryf Uninstalling a module is dead easy (you just remove it and delete the scripts) *unless* easy_install.pth was involved with the installation. In other words, the answer here is not to munge paths in confusing ways during installation.

    @Stefan I completely agree. I'm in fact making the same argument: let's provide metadata not a package manager. We have it in egg-info but it needs to be standardized and part of the Python lib. But also I think - in a perfect world - something like pip could install a module and a package manager like rpm could read and query metadata about the module pip installed.

    @Ben Ford: I did forget about zc.buildout, my bad. But for the recipes I have used, buildout will use eggs or flat modules so the arguments about avoiding versioned subdirs and needing standard metadata still applies.

    @Damien: excellent point. I am missing this user story: "Gimme A Module At Version X And Gimme A Script To Use It On Windows"

    @Paul Boddie: I didn't interpret "stop energy" about learning from what other package managers do during Ian's talk. I agree that it's really important to learn from other tools. I think we also need to learn from easy_install since it itself learned from Java jars and rpm. It got some things right and some things wrong. We can't afford to make those same mistakes twice.

  • Re: The Python Packaging Problem

    For #3b, use "<0.6dev", as "dev" is the lowest-possible pre-release tag.

  • Re: The Python Packaging Problem

    @Piotr: FWIW, it make me cry everytime I see sudo in front of any install method from sources, be it make install, python setup.py install or worse easy_install. I keep saying to people not to do that on (to :) ) their machine.

    @Mark s: on every platform, the best solution for a *stable* program is the native installer. Be it .msi, .deb, .rpm, whatever. You will never be able to beat that. But it takes a lot of effort, that many people can't make time for. That's where the meta-data and isolation of different tasks is crucial: by making the packager's job easier, it is more likely to get your package packaged natively by the distribution.

    It does not solve the install as non root, but in a sense, this is a simpler problem, because there is no need to interact with the general packaging system. Again, by supporting common metadata that everybody agree on, tools for local and *reliable* installation can be done. All the hacks with sys.path and co as done by setuptools and everything above have made the situation close to the insane level - that cannot work without python support, but doing it at the python level is not that difficult.

Note: HTML tags will be stripped. Hit enter twice for a new paragraph.

Recent Projects

  • JSTestNet

    Like botnet but for JS tests in CI.

  • Nose Nicedots

    Nose plugin that prints nicer dots.

  • Fudge

    Mock objects for testing.

  • Fixture

    Loading and referencing test data.

  • NoseJS

    Nose plugin that runs JavaScript tests for a Python project.

  • Wikir

    converts reST to various Wiki formats.