Friday, January 30, 2015

Python logging for maximum output and maximum speed

Log everything in development and test, and run full speed in production

After writing a high(ish)-frequency trading system, one thing I learned is that you need to log a lot and this means pretty much everything when you are developing and testing. It is crucial you understand what your code does, and logging is huge part of making this happen. I log every local variable, so that if something goes wrong I can track what just happened. Without it, you are flying blind.

But you can't have it slow down your code when in production. In production, as the code solidifies, you want to not have all that logging code run. So you set you're log level to WARN, and you are happy...

Well you would be unless you looked at your profile stats. I was looking at the time taken on a recent performance run in the Python trading application I am writing for trading equity options, and noticed that half the time taken was doing str.format() - all my logging. Being somewhat new to Python, I realized my error.

It is all to do with lazy string evaluation. If you log code looks like this, you have a problem:

logger.debug('Placing order {} at {}, zs={}'.format(o, t, zs))

Even if you have disabled all logging except for WARNING and above, this is still an expensive call. The call to debug is causing the parameters passed to be evaluated, leading to a a string being created and formatted, which can be an expensive operation if you do it a lot.

One approach advocated on Stack Exchange and in the documentation is:

if logger.isEnabledFor(DEBUG): 
    logger.debug('Placing order {} at {}, zs={}'.format(o, t, zs))

This is good if one of the parameters is expensive to evaluate for some reason. However decorating every log method in your project with isEnabledFor() is just ugly.

The way to do it is this:

logger.debug('Placing order %s at %s, zs=%s', o, t, zs)

This way the logging framework will call format only if isEnabledFor()is True which is just what we want. The Python Logging documentation does mention this early on briefly, but not in the Optimization section - perhaps it should.

Except I already have thousands of log lines in my system - arrgg!

PyCharm to the rescue.  Edit-Find-Replace.In.Path (Ctrl-Shift-R) allows entry of regular expressions. Its a two pass process, but it works:

Pass 1: Search for(logger.*)\.format\((.*)\) and replace with $1, $2
Pass 2: Search for(logger.*)\{\} and replace with $1%s

For Pass 2, it will replace all the {} in your format lines with %s ({ is a special character, so I escaped it with \.)  I had luckily used {} pretty much everywhere. However if you use multiple {} on a single line, it just replaces one. So you need to re-do this multiple times (i.e. re-do the Replace-In-Path Pass 2)

Also, it seems like PyCharm is somewhat broken. I tried clicking 'Replace in All Files' but it mixed up the various log messages in my code, moving them around and generally jumbling them up - yikes.

However just clicking 'Replace'-'Replace'-'Replace'... one at a time worked fine. 

This didn't work for multi-line logging, but for me this was a minority of cases and I fixed these by hand.

And so I just implemented a 50% speed increase for an hour or so of work. Good news!


No comments:

Post a Comment