Thoughts on Python

Creating a subversion checkout/ dev target for easy_install

Nov 29 2006 posted in Python

I was a little confused by setuptool's explanation of how to make an egg installable as

easy_install yourproject==dev

How do you tell it where your repository is? You actually have to put a URL somewhere for setuptools to find. I.E. In the long_description, the body of the page that your url loads, or perhaps elsewhere that looks like this:

http://svn.yourproject.com/trunk#egg=yourproject-dev

In other words, a URL to the repository containing a fragment in the correct format.

read article

You vs. The Real World: Testing With Fixtures (Coming Soon)

Nov 30 2006 posted in Python, Testing

I'm very pleased to announce that my proposal for a talk at pycon 2007 was accepted (#83). I'm pretty new to pycon but, wow, what a tough time this year the reviewers had! 104 submissions, only 50-60 could be accepted; ouch.

The talk is titled You vs. The Real World: Testing With Fixtures and is a way to demonstrate some practical usage for the fixtures portion of the testtools module, as well as talk about why testing with real data is highly effective and fairly easy.

Actually, before submitting the proposal I began pulling out the fixtures logic into a new module for distribution, named simply fixture. This will be a sort-of 1.0 of testtools and will allow me to address the many problems I've run into by changing the interface some.

I hope to have the 1.0 release as the new module somewhat stable before my talk (grins) and with also a few new features. I.E. An official release of the command line fixture generator; A better interface for defining rows in a fixture, like cloning a super row and handling id sequences automatically; support for the with statement (this will work like the current @with_fixtures decorator); and better docs, examples, etc.

read article

Python gets true closures in 3000 - do I care?

Dec 2 2006 posted in Python

Really the only thing stopping python 2.x from having closures like how Martin Fowler defines them is that, aside from the global module space, you can't re-bind names in enclosed scopes; you can only have a local reference. An *ahem* unnamed coworker of mine who mainly uses Ruby seems to enjoy pointing this out. I actually like how you only get local refs in python; it seems safer. (As an aside I think Rite, Ruby2, is deprecating non-local refs by default but maybe that has changed.)

Python 3000 will solve this problem once and for all with the nonlocal keyword. Rejoice? I guess so. The PEP demonstrates the current workaround, a class they call Namespace, which—in the rare case that it's necessary—is how I do it and that never bothered me ... but doing without a workaround will certainly be nice.

I should also note that Python 2.5 added the with statement which provides a nice shortcut for transactional closures. This was spawned, I think, from a request to add ruby-style blocks to python.

read article

Generating python with python

Jan 17 2007 posted in Python

I've been working on this generate command that creates fixture code from an existing database and, I just have to say, I really like that I can do:

template.add_import("from %s import %s" % (cls.__module__, cls.__name__))

It just feels so ... right ;)

read article

Housecall from the pydoctor (finally, a doc generator that works!)

Jan 24 2007 posted in Python

On a fair day in a far away land there was once pydoc, a cunning warrior indeed. Generating docs for distribution wasn't too hard, just a few lines like ...


os.chdir(build_dir)
src_dir = os.path.basename(mymod.__file___)
pydoc.writedocs(src_dir, '%s.' % mymod.__name__)

... or something. But the output is hard to customize. Along came epydoc, which looked to be based loosely on javadoc. Getting over the cringe induced by that latter statement I tried it out anyway and immediately got a headache trying to figure out where my docstrings were ... or how to navigate the frames within frames of whatever I was looking at.

There was also once HappyDoc but I had never tried that one. Somewhere, PythonDoc popped up and I have a vague memory of trying it out and getting a similar epydoc headache ... or being turned off by its javadocness, etc. Let me remind you this is all just one humble, impatient coder's opinion ;)

Then ... there was a very hopeful pudge which aimed to solve a lot of problems (templating, RST formatting) but seemed to break easily if your module did something out of the ordinary. I think we can all agree that generating python docs is hard! Let's hear a round of applause for pudge. Seriously, I tried very hard to use it and it served me well, up to a point. I think I even submitted a patch once.

Now...

I just tried out pydoctor and ... I like it! It seems in early stages yet but it made a very nice navigation out of my fixture module (I'll post that once I add it to the build). A friend of mine tried it out on a module with rST docstrings and reports that this was working nicely too, minus some css tweaks for doctests. Did I mention that it's fast?? The main page has some links to examples of generated API docs.

The link above will also explain how to get it from subversion. Which brings me to a big disclaimer: all observations here are with revision 37310! It also needs twisted and the dev version of Nevow (this wasn't available from easy_install but the link to subversion is on the Nevow page). There was also a confusing warning, something like no epytext found, but that was actually a swallowed ImportError and easy_install epydoc was the necessary fix. Epydoc isn't mandatory, but is an option and I am about to read about epytext now.

This is definitely one to keep an eye on.

read article

Coffee! ... and python

Jan 31 2007 posted in Chicago, Python

I'm going to try my first Tech Coffee—early morning coder's meetup—this Thursday to work on the fixture module somewhere other than the purple line to and from work. When Tech Coffee first started it was on Mondays. That was the worst idea, ever. The last thing I am is a morning person but I am negatively-last a Monday morning person!

read article

2 stupid things I coded this week

Feb 9 2007 posted in Python

#1


try:
    do_stuff()
except:
    etype, val, tb = sys.exc_info()
    raise etype, "%s (%s)" % (val, "happened in the context of X"), tb

#2


class UsedToBeADict(object):
    foobar = make_foobar('with sugar'),
    bazbar = 1
    fezbar = 2

If you don't see the mistakes already, here are some hints...

#1
- Hmm, I'm looking at the constructor of exceptions.SQLError, which the traceback led me to, and it is getting 3 arguments just like it should, but why am I getting an error saying there aren't enough arguments?!
#2
- ok, wtf, why is UsedToBeADict.foobar a tuple??! I'm looking at the return value of make_foobar() and it is definitely not a tuple!

read article

Why People Don't Use Hand Dryers

Feb 14 2007 posted in Python, The Future

It's because they don't know that you have to 1) wring out or shake off extra water then 2) rub your hands vigorously under the dryer. If you don't do that, it doesn't work (unless it's some turbo dryer I've seen in restaurants recently). Most dryers do not have these instructions printed on the dryer. Are people stupid? I won't answer that, but let's be fair, it's not very intuitive. Tools need to be intuitive and dead simple to succeed in today's society. This is why a lot of people curse computers and/or hate certain desktop apps.

Having said all this I was a little surprised to hear that Humanized—a Windows application [1] that works like Quicksilver in that you can just start typing something you want to do—is gaining popularity. I was surprised because I thought mainly programmers thought "with their fingers" since they [at least I...] find 10 fingers to be some of the most useful tools on the body. However, it also doesn't surprise me because this is a very intuitive way to work: start typing (or, possibly, start saying) what you want to do ... then watch it happen.

Does this mean we are about to see a counter-revolution to the mouse? I certainly hope so.

[1] behind the scenes I believe Humanized is written in Python and Freetype, but that's not so relevant.

read article

You vs. The Real World: Writing Tests With Fixtures (Sunday at Pycon!)

Feb 19 2007 posted in Python, Testing

I've been having some trouble trying to edit my proposal summary (maybe it's not possible anymore?) so I wanted to point out that the talk will cover the fixture module, not the testtools.fixtures module. This is actually a new module, a rewrite of the testtools one from scratch, that I started in November when my proposal was accepted.

I'll post the slides up tonight but the gist of the talk is how to use fixture to load and reference test data. Here is an updated summary:

One of the biggest challenges of testing is creating an environment akin to the real world that your code lives in. This talk will focus on general strategies for tackling the problem and then will move into specific examples using the fixture module for setting up and interacting with data stored in databases and other storage media.

The goal of the talk is to promote better code coverage in tests, more maintainable test suites, and techniques for easy and painless refactoring.

read article

PyCon: A Star Schema in pure python code? Is this guy INSANE?

Feb 24 2007 posted in Pycon 2007, Python

Steven Lott gave a talk roughly about implementing the star schema in Python (slides) suggesting that ETL operations—that is, the process of extracting, transforming, and loading data into a dimensional model (the Star Schema)—then analyzing that data can be done completely in Python memory. No databases. Yes, that's right, he's saying: create these traditional ETL entities as Python objects. Just so you are hearing me correctly: create the Entity, the Dimension, and the Fact AS PYTHON CODE, and report on the facts IN MEMORY!

In case you're not sure what this is all about: the Entity is something used to characterize the fact. Say, for example, a consumer event with regard to advertising: a click on a banner, a submit on the landing page, a product order, etc. Say this comes from an external data source, a CSV file from the client containing simply the session ID, the date the event occurred, and the event type. Next we will have, in python code, at line 1, an Entity object created with a level of ('session ID foo', '2007-02-26 09:25:05', 'submit'). Simple, right? Just a tuple. The level is the hierarchy, the segments of your data.

Since this data is important to things other than reporting, you would load this into the ODS (operational data store). In this example, it would end up [roughly] as a "submit" in the Event Log with a foreign key pointing to the session that started the user's activity. But then a separate process would add it to the Event Fact table, so that reports can be generated on the lifecycle of the user's activity, i.e. the fact that he/she clicked on the banner at 9:15 Monday morning then submitted the form on the offer page at 9:25.

More specifically, the Fact is "a measurement plus the entities which characterize the measurement"; it has many dimensions, which are "a collection of related entities [think: sub-entities]". In our example, we have the Start Date Of The Session as one dimension of the fact, the Campaign that the user clicked on (derived from the session) as another, the Event Type itself, and so on. Great, we have this in a big database because reports are generated off these tables daily.

So why would I ever consider doing this all in python code? That's exactly what I asked Steven after the talk. Several forth and backs and a whiteboard scribble later, he showed me the light. I saw his proposal not as "do all this in python memory" but instead as an intelligent caching mechanism for ETL processing, free to use databases. Caching is very hard to make transparent. Well, only hard because it's usually an afterthought, so you end up with stupid, error prone functions that say if object in cache, return object, otherwise create one. With these "dumb" objects instead you end up with a transparent representation of what's going on, thus a perfect jumping off point into implementing caching. Funny how it takes something ridiculously simple to solve a complex problem sometimes.

However, to get into the nitty gritty, I still don't see how you would ever want to run reports off any real business data from within python code. Steven suggests:

    
for k,e in dim1.items():
    if k[2] == 'criteria':
        sum=0
        for f in e.facts:
            sum += f.measure
        yield e.level, sum

Bwahh! Yes, that is insane. This is what SQL is for. But I can actually see that when you are loading facts, you can start building dimensions by first creating a link to that dimension (here, the select query occurs) then propagating an instance of that dimension (cached) for future facts. When it's time to insert the facts, the dimension instances all become foreign keys. Steven says if you must use a database then this approach should start by caching the entire table! I like the lazy select/cache approach better but he says sometimes dimensions can be small data sets and I can see how that might yield efficient results.

There's nothing like a good rebellious idea to get ones gears greased up and turning. My jaw was literally on the floor after this talk; I couldn't believe someone would ever suggest such a ridiculous idea. Steven, thank you :)

read article

Live doctest in TextMate (IPython + Twisted?)

Mar 1 2007 posted in Python, TextMate

I use TextMate for all python dev and I've had this idea to make typing doctests yield live results. In other words, when you are editing a module, if you were in a doctest you would type a line of code and the next line of your module would contain the result.

Just imagine typing and getting back...

This is how you use the class::
>>> foo = Barbaz()
>>> if foo.babulated:
...     foo.unbabulate()
...
'foo is now unbabulated'
>>>

...I dunno about you, but I think that'd be slick!

The implementation is pretty simple with IPython :

from IPython.iplib import InteractiveShell
s = InteractiveShell('shell', user_global_ns=globals())
s.push('class foo(object):')
s.push('    pass')
s.push('')
s.push('foo')
# <class '__main__.foo'>

...huzzah. In TextMate, you can execute a script on the <return> key and the script would get $TM_CURRENT_LINE, full text of the line you're on. However, the current framework would execute the script once each time you pressed return so there is a problem with shared memory (no record of the previous lines you fed into it).

This is the part I'm still trying to figure out. I could fork another process and make some kind of interface (REST, pyro?) for feeding it new lines. The shared namespaces could be hashed per file, using $TM_FILENAME, and could simply use timeouts to do cleanup. It would need some security—maybe a block of all non-local IPs + some kind of permission check per user, but this is all I can think of. Anyone know of a better way? Anyone interested in this kind of thing? I am getting a heavy feeling that I need to learn Twisted! Can anyone point me in a good direction for how to do this in Twisted?

read article

unicode and unicorns

Mar 12 2007 posted in Python

Feihong Hsu gave an excellent talk on handling unicode in python at the last ChiPy meeting. The slides are there on his blog and if you missed the talk, they are still a logical read. He also posted his demos which contain much useful code for working out common problems. Hopefully python 3000 will make this all much much easier.

read article

testing just got easier (a few nose plugins)

Mar 22 2007 posted in Python, Testing

While procrastinating on writing documentation for fixture I managed to code up a few nose plugins. (Seriously though, the fixture docs are nearing a stage of completion, I swear it!)

If you're not familiar with nose and its nosetests command for running test files, then it's worth checking out. Titus Brown even wrote a comprehensive introduction and usage guide.

The coolest part of course is that you can write plugins very easily (installable via easy_install even). Secondly, nosetests is for programmers and ... programmers are motivated to create software to make their life easier! Thus, here are a few useful plugins that myself and others have released lately:

nosetty
- A plugin to run nosetests more interactively. The crux of this is to give you some convenient ways to edit code based on the traceback ... with your favorite editor, of course. I'm getting some great feedback from users on how to use it with different editors; all this is detailed on the recipes page.
nosetrim
- A nose plugin that reports only unique exceptions. This is a small little thing I wrote to reduce the "blowup" effect when you do something stupid that causes the same exception to pollute many many tests. It still needs a little work.
spec
- by Michal Kwiatkowski
- Generate test description from test class/method names. I had never used testdox so this was a new concept for me. I think it's really clever and it's already gotten me to name my tests better ;) However, I'd like to see a little more flexibility out of it and I might try to present a patch or two.
outputsave
- by Titus Brown
- Save your stdout into files. Since I have a lot of tests that deal with data processing, they spew lots of messages and this plugin is perfect for managing that output. In fact, I liked it so much that I added a command for nosetty to open stdout captures

On a separate but related note, in writing these plugins, I came up with a fairly easy way to make functional tests for plugins themselves. It's a combination of two classes, PluginTester and NoseStream; these will probably be part of nose 0.10 but if you want a sneak peak, take a looksee at the nosetrim test suite. The nosetty test suite also makes use of it, but that one is a little more confusing to read because it automates interactaction with the subprocess.

read article

multiple inheritance woes

Mar 29 2007 posted in Python

multiple inheritance in python starts to fall apart when you want to mash together two very similar objects. I haven't found a clean way to get this example to work (without hardcoding the super calls to self.rollback_db() or doubling up implementations of rollback_db, both of which seem like they shouldn't be necessary).

Does anyone have a suggestion? Aside from this gotcha, I often find multiple inheritance to be an elegant solution.

"""
>>> rolledback = set()
>>> class A(object):
...     def rollback(self):
...         self.rollback_db()
...         
...     def rollback_db(self):
...         rolledback.add('A')
... 
>>> class B(object):
...     def rollback(self):
...         self.rollback_db()
...         
...     def rollback_db(self):
...         rolledback.add('B')
... 
>>> class C(A, B):
...     def rollback(self):
...         A.rollback(self)
...         B.rollback(self)
... 
>>> c = C()
>>> c.rollback()
>>> rolledback
set(['A', 'B'])
"""
import doctest
doctest.testmod()

output is...

kumar$ python test_mro.py 
**********************************************************************
File "test_mro.py", line 24, in __main__
Failed example:
    rolledback
Expected:
    set(['A', 'B'])
Got:
    set(['A'])
**********************************************************************

and, yes, this was not a fun one to debug :( Thankfully this was discovered in a functional test of --dry-run!

UPDATE

There are some helpful suggestions in the comments, but I still don't see a solution! Here is an example fixed a little bit using super(), but still only 50% there:

"""
>>> rolledback = []
>>> class Rollbackable(object):
...     def rollback(self):
...         pass
... 
>>> class A(Rollbackable):
...     def rollback(self):
...         super(A, self).rollback()
...         rolledback.append('A')
...         self.rollback_db()
...         
...     def rollback_db(self):
...         rolledback.append('db(A)')
... 
>>> class B(Rollbackable):
...     def rollback(self):
...         super(B, self).rollback()
...         rolledback.append('B')
...         self.rollback_db()
...         
...     def rollback_db(self):
...         rolledback.append('db(B)')
... 
>>> class C(A, B):
...     def rollback(self):
...         super(C, self).rollback()
... 
>>> c = C()
>>> c.rollback()
>>> rolledback
['B', 'db(B)', 'A', 'db(A)']
"""
import doctest
doctest.testmod()

... and the output...

kumar$ python test_mro.py 
**********************************************************************
File "test_mro.py", line 31, in __main__
Failed example:
    rolledback
Expected:
    ['B', 'db(B)', 'A', 'db(A)']
Got:
    ['B', 'db(A)', 'A', 'db(A)']
**********************************************************************
1 items had failures:
   1 of   8 in __main__
***Test Failed*** 1 failures.

UPDATE #2

Thanks for all the helpful comments. It appears that the only way to accomplish this in python is to actually change the self.rollback_db() to a private method, via python magic underscores, like so: self.__rollback_db(). The downside to this of couse is that no subclass can ever override __rollback_db(). This can be inflexible at times. For example, I've wanted to override private methods before but couldn't (coincidentally, when trying to extend doctest; Instead I had to copy/paste about 20 lines of code). So use private methods only if you absolutely have to. In this case it makes that rollback_db() is private since it only rolls back a single db transaction and that's it. Also take note that super doesn't need to be called from class C since everything just works itself out (because of super in the base classes).

Now ... here is what passes the doctest!

"""
>>> rolledback = []
>>> class Rollbackable(object):
...     def rollback(self):
...         pass
... 
>>> class A(Rollbackable):
...     def rollback(self):
...         super(A, self).rollback()
...         rolledback.append('A')
...         self.__rollback_db()
...         
...     def __rollback_db(self):
...         rolledback.append('db(A)')
... 
>>> class B(Rollbackable):
...     def rollback(self):
...         super(B, self).rollback()
...         rolledback.append('B')
...         self.__rollback_db()
...         
...     def __rollback_db(self):
...         rolledback.append('db(B)')
... 
>>> class C(A, B):
...     pass
... 
>>> c = C()
>>> c.rollback()
>>> rolledback
['B', 'db(B)', 'A', 'db(A)']
"""
import doctest
doctest.testmod()

read article

documentation for fixture module

Apr 17 2007 posted in Projects, Python, Testing

In a mad sea of too-busy-to-blink I've managed to write documentation on the fixture module for python. If this interests anyone could he/she please let me know how the docs read? Too much? Too little? Hard to navigate? Examples too complicated? Thanks.

For reference, here is the main fixture project page

read article