Like many of you, I've had my jaw on the floor since the release of Google App Engine. Although there are skeptics out there, a careful read of their terms will show you that it's for real — Google has released GOOGLE to the world and it's not for scary marketing purposes. In fact, I've been growing tired of paranoid Google haters; I'm hoping this will shut them up for a while.
Why is App Engine such a breakthrough? The concept of a hosted web application is nothing new but it has never been done this well. Mundane server maintenance? Gone. Infinite scalability? Check. 100% uptime? Let's face it, if Google went offline you'd probably be down in a nuclear bunker playing Parcheesi.
So ... how should we leverage this tool for the greater good of the community? I can't count the ways without getting dizzy. How about let's start with a mirror of PyPi, the Python Package Index?
PyPi on the App Engine
I barely spent two days on it, but here it is: http://pypi.appspot.com/. Test it out, play with it, try to break it.
As Python grows, especially due to App Engine, PyPi needs to scale too. Zope has put together a PyPi mirror but that's the only other one I know of (actually, I can't even find the link to it right now). Coincidentally, PyPi even went offline for a few min while I was writing this blog post.
Issues...
- urlfetch can't get big packages
- fetching SQLAlchemy triggered the ResponseTooLarge error but its package was barely over 1MB. We might be able to create a download proxy to work around this.
- May not be fully compatible with easy_install yet
- Can't run scripts on App Engine
- Not too big a deal, you have to think of web pages as scripts and use a separate crontab to make timed requests. There are security issues with this, but for mirroring pypi it's not a problem
- Note that I haven't actually mirrored all of PyPi, I'm still testing it out.
- It's not very pretty yet (see below)
- The prototype works, now it needs some tests :)
You Can Help
I'm not dedicated to this project, I just thought it sounded like a good idea and would be a fun way to experiment with the App Engine. If anyone is interested in working on it just let me know --kumar.mcmillan@gmail.com. If there is enough interest I'll put it on Google Code. Possibly the most exciting feature of App Engine is the Datastore API (aka BigTable) and Ben Bangert agrees. It's a little hard for me to wrap my head around it but so far the Expando class—besides being the coolest name for a class—seems to work great for storing package data. If EGG-INO grows a new parameter, it just gets tacked on to the row dynamically.
This has also been a great way to dig up bugs, some of which have already been fixed.

Re: PyPi (Cheeseshop) on Google App Engine
posted by Christopher Arndt on Tuesday Apr 15th, 2008 at 11:25a.m.
While GAE might be cool, it is also a completely new platform, to which applications must be ported. Also, I like to have control over the environment where I run my apps.
As for PyPI, yes we need mirrors, but sometimes the centralized approach of PyPI isn't the right solution for all needs. For example, if your app relies on many third-party packages, you have no control over their PyPI pages and the packages and download links they make available. This means that your installation with easy_install can break at any time when some of your dependencies get updated. Thats why sometimes you need to maintain your own package index. I have written a small PyPI server with the TurboGears framework, called EggBasket (http://chrisarndt.de/projects/EggBasket). Check it out! (It doesn't run on GAE, though ;-)
Re: PyPi (Cheeseshop) on Google App Engine
posted by Kumar McMillan on Tuesday Apr 15th, 2008 at 11:34a.m.
Christopher, many thanks for the EggBasket! I owe a lot to your code since it provided me witha good starting point. At first thought I could write the appengine version on top of yours, but I ran into many snags where EggBasket wanted to work with the file system. I.E. for all the file handling, I have to use StringIO buffers and this required fiddly changes everywhere. Also, yolk has given me some great ideas, thanks for that too ;)
I think EggBasket solves the problem of hosting a private egg index very nicely. What I'm doing is no way a replacement for that. For example, my company uses eggs for all Python deployment and most of these are not open source (trust me, you don't want them). For this we use a private repository that easy_install can access while we are on our intranet. In fact, after seeing EggBasket the other day I plan to migrate our home-grown server to EggBasket since ours doesn't support the upload command.
I see this PyPi mirror more as a way to achieve better redundancy for publically available eggs.
Re: PyPi (Cheeseshop) on Google App Engine
posted by mike bayer on Tuesday Apr 15th, 2008 at 12:21p.m.
100% uptime ? Don't count on it. I have a few google groups going, which is an application with far more operational constraints than GAE, and those certainly go down every few months, usually for half a day or so the group will be "temporarily unavailable". When they go down, there is *nobody* to complain to, either...you just have to wait it out and hope they realize something is broke.
So GAE may be paradigm shifting and all that, but for more serious host consumers like me I can't see what Google could do to make GAE more appealing than a straight VPS where I can run whatever sofware I want without restriction, still have no hardware issues, and get guaranteed service and portability.
Re: PyPi (Cheeseshop) on Google App Engine
posted by Kumar McMillan on Tuesday Apr 15th, 2008 at 12:32p.m.
ok, ok, 98% uptime ;)
for me, App Engine offers invisible hardware and a guaranteed set of dependencies. I like that. They have HUGE incentive to make it "just work". If it just works then I don't have to spend time upgrading the OS on my VPS, updating other libs for bugfixes, troubleshooting why the VPS is suddenly slow, or whatever.
But, of course, we all know that "just works" is always a lie so it has yet to be determined how reliable App Engine will be. Only time will tell.
Re: PyPi (Cheeseshop) on Google App Engine
posted by Christopher Arndt on Tuesday Apr 15th, 2008 at 7:55p.m.
Kumar, glad that you like EggBasket. BTW, I released version 0.4a a few hours ago (Changelog on the website). Yes, EggBasket relies very much on the file system because I wanted to avoid maintaining the packages in a database. You should be able to just point it at a directory with packages and be good to go. And add packages by just copying them to the right places (for example with scp). At the moment each package needs its own subdirectory, but I might implement scanning for packages in the top-level directory as well (but then caching would have to be implemented first).
For intranets and closed-source packages, I should implement options to protect package listings and downloads with a login, but in TurboGears there is not straightforward to restrict access to a group and at the same time make it an option to allow access to anonymous users as well. It's not a feature I need right now, but probably will soon. So stay tuned...
Re: PyPi (Cheeseshop) on Google App Engine
posted by Ian Bicking on Wednesday Apr 16th, 2008 at 8:59p.m.
For large downloads, you could use Range requests to fetch pieces, with a maximum size (I think if you give an over-expansive range, the server should just ignore it -- so you could send your first request with the maximum size range you can handle, and get back whatever you get).
And hey, WebOb has Range support! You'd just need to make a WSGI proxy for urlfetch. I bet such a proxy would be really easy to write.