Case study: Porting and modularizing an in-house Python application
May 30, 2011
Is it just a feeling, or has the number of technical posts decreased on the Planet? I definitely need to change that! (Update: The post got quite long, so I’ve shortened the syndicated version.)
These days, I’m usually working on my Diplomarbeit at the Institute of Theoretical Physics at the Technical University here in Dresden. (A Diplomarbeit is about comparable to a master’s thesis, though preparation time is one year since there was no bachelor’s thesis before it.) Our working group has an in-house application for inspecting iterated functions, which is quite intuitively called Iterator. Like many scientific applications, it has on one hand a very sharp functional objective, but contains tons of different tools and plugins.
As a part-time project besides my usual work, I’m working on a Qt port of the Iterator, which is currently based on wxWidgets. Of course, a port offers the opportunity to refactor badly engineered parts of the application. It clearly shows that all involved developers are first and foremost physicists which have not received any formal training in software design. Many antipatterns can be observed in the code. The most interesting is the existence of a god class called IteratorApp.
IteratorApp is not the application instance which drives the eventloop, as one might expect from the name, but the main window. The iterator.gui.iteratorapp module imports about all other core modules, and instantiates them inside the IteratorApp class. The reference to the IteratorApp class is then passed around to all components inside the IteratorApp, and also all plugins loaded by it. The IteratorApp also contains most of the basic business logic: loading of plugins, creating and managing all widgets in the mainwindow, etc.
There are evidences that it was even worse some years ago. People have already worked on extracting modules out of IteratorApp, but they are still closely tied to IteratorApp, and thus have direct connections to all other parts of the code. The most interesting thing about this design is that the Python programming language makes it very hard to discover dependencies between modules in this design. Consider the following Python code:
from external import BarComponent
self.bar = BarComponent()
FooApp is our equivalent to IteratorApp here. Now when you have a reference “foo” somewhere, “foo.bar” will give you the corresponding BarComponent instance, which you can directly use if you need functions of BarComponent. You can also pass it to other modules with different variable names and stuff. All this makes it quite hard to track efficiently on which components the specific module you are looking at depends. One can grep for “foo.bar” if “foo” is the established name for FooApp references. But if you pass the BarComponent reference to some other code, this also won’t help. The problem is that you do not need to explicitly import BarComponent like in FooApp. This is a non-issue in C++, where used classes are usually forward-declared:
BarComponent* bar() const;
If you use methods of “foo->bar()” somewhere in your code, you absolutely need to include the BarComponent header. The dependency of the component in question to BarComponent is then obvious. (Executive summary for this part: Forward declarations are good not only for reducing compilation time.) How can we transfer this advantage of C++ to Python?
The Qt port of the Iterator does not include something as powerful as the IteratorApp. However, in an application consisting mostly of plugins (i.e. the iterable functions, and the tools that the different group members use to solve their distinct problems), there must be some authority that keeps everything together, and which is passed to plugins. In the Qt Iterator, this position is taken by a new PluginLoader class.
The PluginLoader itself is – on purpose – very light-weight: Its tasks are restricted to creating application-global instances of other classes which are called application plugins. For example, the Qt main window is an application plugin. One can ask the plugin loader to load a specific plugin by calling it (i.e. its __call__ method) with the type of this plugin. For example, to instantiate a PluginLoader and get a MainWindow from it, the code is:
from iterator.common.pluginloader import PluginLoader
from iterator.qtgui.mainwindow import MainWindow
pl = PluginLoader()
mw = pl(MainWindow)
The plugin loader places a reference to itself in all loaded plugin instances in the “pl” attribute. This behavior is similar to the IteratorApp, which distributed “app” references all over the place. The important difference is that all business logic resides in application plugins. These can be obtained from the plugin loader at any time, but using a syntax which requires to name the type. Naming the type requires the Python developer to import the type explicitly, thus making module inter-dependencies visible e.g. to automated dependency graphing solutions, which can, as a follow-up to this initial refactoring, be used to decide on subsequent refactoring steps.
Along with the plugin loader, I defined (and documented!) a simple protocol which plugins must follow. This protocol defines a common set of functions which plugins can implement e.g. to be stopped at runtime or to get notified when other plugins are started or stopped. The latter allows for optional dependencies between modules. This protocol allows for some nifty features: The plugin loader installs a sys.excepthook and stops interface plugins which throw unhandled exceptions. The exception and trace is displayed directly in the interface, as can be seen in the lower right here:
The PluginLoader has been designed with the migration path from wx to Qt in mind: It has no Qt dependency, and can also be used in the old Iterator. (Legacy installations might not have PyQt available, so Qt can only be an optional code path at this point.) In fact, some of the new application plugins, which are used for internal data management, are also used in the old IteratorApp now, although the data is additionally made available in the old attributes for compatibility.
For communication between application plugins, Qt signals/slots have been chosen. As these are obviously not available in wx or Python itself, I’ve written a basic, sufficiently source-compatible implementation which is used as a drop-in replacement in non-Qt environments.
Here’s the source code for the plugin loader (minus some highly application-specific parts like plugin shutdown, which can be easily inferred by copying the startup code) and the platform detection including optionally Qt-based signals and slots. But I cannot upload text files or tar archives here in this blog, and any linked external source to which I upload the archive file can go away any time, which I do not like. (I tend to get mad on 404 links.) I therefore insert the BZip2-compressed tar archive as Base64. To get the archive, copy the following character block into a file e.g. “foo.txt” and then do “base64 -d foo.txt | tar xjf -“.