So I’ve been coding some Python lately, because this is the language of choice for the “Computational Physics” course I’m attending. As the name says, it is more about physical and numerical problems than about programming, and the instructors chose Python as it’s easy to get into (most Physics students had not coded up to now).
I was quite excited about that, as I have wanted to learn Python for about a year. The fact that the course has its own mailing list where students can ask questions to their fellows (and, of course, the teachers), allowed for some insight on how beginners in programming get into Python.
First off, Python is mostly good for beginners. “print” works on almost everything, and allows you to very easily inspect your program. You can load your work-in-progress code in IPython, and inspect the values of the computations. (I must admit though that I never used this feature, I’m just too used to inserting “print” statements everywhere and re-running the program.)
In the course of the course (pun intended), the only problem that many people ran into was copy-by-reference. When you’re working on some numerical problem, you’re dealing with number arrays most of the time (we were instructed to use NumPy, together with matplotlib for graphical output). And those are, unlike simple numbers, copied by reference. If you say “b = a”, where a is an array, and then modify the contents of b, you’ll also modify a, as b is just another name for a. The solution is “b = 1 * a”. The multiplication operation causes a deep copy of a to be created.
A big mistake which I (as a senior C++ programmer) did often in the first few weeks is the excessive usage of for-loops with numpy arrays. For example, the following two code pieces are equivalent:
a = numpy.ndarray((N, N))
for x in xrange(N):
for y in xrange(N):
a[x, y] = x ** 2 + y ** 2
# or
a = numpy.arange(N)[:,newaxis] ** 2 + numpy.arange(N)[newaxis,:] ** 2
On the first look, the first one looks more well-arranged than the second one (given we do not worry about the double parentheses), but the first one is sllllllloooooooowwwww. For N = 2000, the first one needs 4.16 seconds on my system, while the second one is processed in 0.03 seconds. The simple reason that the for-loops in the second codes are hidden deep in Numpy’s implementation, which is done mostly in C (and some Fortran). In an actual example, I had to compare the numerical precision of various discrete integration algorithms. The first version needed 16 seconds to calculate everything and start up, and after transforming all for-loops into Numpy operator expressions, the startup time was reduced to 2 seconds (most of this time was actually spent on loading all modules and initializing matplotlib).
But even this is just a problem of how accustomed you are to an API. I would not say that Python is inferior because it takes ages to execute for-loops in Python itself, rather than in underlying C libraries. Rather, Python is an elegant wrapper on top of many nice libraries, which helps to greatly reduce boilerplate code. Or, as I said to a fellow student, “every scripting language has its use case: Perl has extremely fast string operations, PHP is for website coding, and Python is for extremely rapidly escaping back to C code”.
Today I’ve been working on my final project. (We have to build a bigger application around some numerical or physical problem and present it to the rest of the class.) This has mostly been fun, as this was the first time where I could escape to well-known APIs through the magical key called PyQt4. While I’ve been working on this project, I finally found a way to criticize Python: It’s not optimized on escaping back to C++ code, because the OOP concepts in Python and C++ are, well, completely different things. In C++, classes are blueprints which can be used to create objects. Compare it to civil engineering: Using a construction plan, you put together various materials until the building has emerged.
In Python, a class is essentially a special type of object that works as a function. When this class function is called, it creates a copy of itself and calls the __init__ method of this copy. Here we see that these concepts do not quite fit: It’s like one would just copy the construction plan of a building, lay it out at the building site, connect an air pump to it, inflate it and hope that a building emerges.
There is nothing bad about the class concept in Python in general, the bad thing for me is that it does not match my perspective that I gained to OOP in the last ~10 years. And because it does not match the OOP model in C++, creating bindings seems to be more complicated. For example, I could not create a class that subclasses QObject and QGraphicsItem (a pattern which I regularly used e.g. in Kolf, to get a 2D representation together with signals/slots and properties), and I could not create a Borg-style class that derives from QTimer.
Apart from that, my criticism of Python sums up to missing explicit namespace declaration (why am I bound to file and even directory structure here?) and missing visibility modifiers for class members (“public”, “protected” and “private”) and, most prominently, that duck typing.
The last one is usually destined to be discussed most controversial, so I’ll just neutrally put down my two cents on this topic. If I can choose, I’ll only use strongly typed languages, as in: languages where each variable has a fixed type and unsafe conversions are not allowed, just because it eliminates a very big mass of problem sources. Actually, I would like a programming language which ensures that the most problems do never happen to me (i.e. a strongly typed language with garbage collection, bounds checking and stuff) _and_ runs fast, with very little overhead, and always with the possibility to bypass these checks if the overhead becomes too big. Sadly, I do not know of any language currently in existence that offers me such things.
I do not feel that I’m at a good point to end this blog post, so I’ll just show off some random bling-bling from my Python code at the end: The actual goal of the project is to implement simple pathfinding with genetic algorithms. That means that you start with some random paths, kill bad ones, and mutate and combine the good ones to produce possibly better paths (you know, “survival of the fittest”). I decided to make the thing a bit fancier by allowing obstacles to move around during the calculation, to see how the population reacts on them. This was the perfect moment to try something I’ve wanted to try since a long time: a fluent programming interface. These are programming interfaces where the statements read like sentences:
s = SimulationArea(500, 500)
s.addObject(WallObstacle(400, 0)
.moveTo(-200, 100).immediately()
.standStill().forSeconds(2)
.moveTo(200, 100).forSeconds(5)
)
s.addObject(WallObstacle(400, 0)
.moveTo(700, 200).immediately()
.standStill().forSeconds(2)
.moveTo(300, 200).forSeconds(5)
)
“Hey, wall, move to (700, 200) immediately, wait there for 2 seconds, then move to (300, 300) over 5 seconds.” You get the trick?
If someone’s interested in the full code (once it is finished), I can put it up somewhere (although it will be uninteresting for most of you).
Update: Mh, this blog post got much longer than I intended.