julia from python, part 1

Recently I've gotten frustrated with how slowly my Python-based dynamical modeling code was running. There may be a little nominative determinism at work here - slomo stands for SLOw rotator MOdels - but the abbreviation was originally supposed to be tongue-in-cheek. The main bottleneck in the code is a series of numeric integrations which get called at each likelihood evaluation. While the integration routines found in scipy are well-tested and reliable, if the input function is written in pure Python, the resulting integration is going to be slow. There are, of course, solution within Python-land. I could re-write the inner function in Cython (or even C or Fortran). I could also try waving some magic decorators at my Python functions and hope for the best. But with Julia finally reaching its 1.0 release, I decided it was worth checking it out.

I set up a few benchmark tests for the inner integration functions last month and I found I could get a decrease in run-time by a factor of anywhere from 10 to 50! That's the difference between running a model over my lunch break and running one over a night or two. Over the past few weeks I've been re-writing the code in Julia, and I've finally got a version that's close to feature-parity with the Python version. To boot, I think it's a lot cleaner looking and easier to understand, though this would have likely been the case anyway had I done a complete re-write in Python again.

In this this post and hopefully more posts to follow, I'll go over some of the difficulties (and joys!) I had in picking up Julia and how it compares with Python for the purpose of scientific computing.

Multiple dispatch vs. object-oriented programming

At the heart of the differences between Julia and Python is the decision of how a function call gets mapped (or "dispatched") to a particular method definition. In most object-oriented languages like Python, the first argument determines what method of a function is actually run. With Julia, the types of all arguments of a function call determine which method is run.

Two implementations of complex numbers

Let's look at an example of a simple implementation of complex numbers to get a sense of this difference.


We want to make a Complex class in Python with attributes a (the real part) and b (the imaginary part) that can be added or subtracted with other complex numbers.

class Complex:

    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __repr__(self):
        return f"<Complex: {self.a} + {self.b}i>"

    def __add__(self, z):
        return Complex(self.a + z.a, self.b + z.b)

    def __sub__(self, z):
        return Complex(self.a - z.a, self.b - z.b)

Here we've used some magic methods (__add__ and __sub__) as specified in the Python docs for emulating numeric types. This lets us write z1 + z2 as a shortcut for z1.__add__(z2) when both z1 and z2 are instances of Complex.

This works okay for adding two complex numbers;

z1 = Complex(1, 2)
z2 = Complex(-1, 1)
z1 + z2

spits out <Complex: 0 + 3i> as expected. But what if we tried to add a float and a complex number? Well, it depends on the order in which we add things! If we try to add an instance of Complex to a float, we get an attribute error:

AttributeError                            Traceback (most recent call last)
<ipython-input-49-283f5fb7e931> in <module>
----> 1 z1 + 1.0

<ipython-input-37-a544e0189dbe> in __add__(self, z)
      9     def __add__(self, z):
---> 10         return Complex(self.a + z.a, self.b + z.b)
     12     def __sub__(self, z):

AttributeError: 'float' object has no attribute 'a'

If we instead try adding a float to a complex number, we get a type error:

TypeError                                 Traceback (most recent call last)
<ipython-input-50-f151bcb23a3f> in <module>
----> 1 1.0 + z1

TypeError: unsupported operand type(s) for +: 'float' and 'Complex'

We could go back into the addition method for Complex and deal with cases where z is a real number. But that would only work for the case where we add a complex number to a real number.


Julia's solution to this conundrum is to break away from the object-oriented paradigm and decouple methods from their first argument.

import Base: +, -

struct MyComplex <: Number 

function +(z1::MyComplex, z2::MyComplex)
    return MyComplex(z1.a + z2.a, z1.b + z2.b)

function -(z1::MyComplex, z2::MyComplex)
    return MyComplex(z1.a - z2.a, z1.b - z2.b)

Let's break this down for those coming into Julia fresh from Python. That first import line isn't necessary to let us use addition and subtraction on pre-defined numeric types - you don't have to import basic numeric operations. Rather the import lets us re-define addition and subtraction for newly defined types. The struct keyword lets us define a new immutable data type. The <: Number part says that the type MyComplex is a subtype of Number. The new data type has two fields (or attributes), a and b, both of which we specify to be instances of subtypes of Real. Then when defining methods for addition and subtraction, we specify the the types for which they apply, again using the double colon notation, z::MyComplex.

This ends up looking quite similar to the Python implementation:

z1 = MyComplex(1, 2)
z2 = MyComplex(-1, 1)
z1 + z2

returns MyComplex(0, 3). In fact, if we try to do the same real plus complex operation with this implementation in Julia, we still get an error. z1 + 1.0 results in

promotion of types MyComplex and Float64 failed to change any arguments

 [1] error(::String, ::String, ::String) at ./error.jl:42
 [2] sametype_error(::Tuple{MyComplex,Float64}) at ./promotion.jl:308
 [3] not_sametype(::Tuple{MyComplex,Float64}, ::Tuple{MyComplex,Float64}) at ./promotion.jl:302
 [4] promote at ./promotion.jl:285 [inlined]
 [5] +(::MyComplex, ::Float64) at ./promotion.jl:313
 [6] top-level scope at In[29]:1

However this message offers hope for a solution, namely in the judicious use of the promote function. For a detailed discussion of type conversion and promotion, see the official Julia docs. Conceptually, any operation that works generically with complex numbers also works with real numbers (which are just complex numbers with b = 0). Here we want to say that we want any real number is promotable to a complex number.

To do this we have to do two things. First we need to say how a real number can be converted to a complex number, and second we need to specify that any operations with only complex numbers will also work with any combination of real and complex numbers. To do this we'll need to implement methods for convert and promote_rule.

import Base: promote_rule, convert
convert(::Type{MyComplex}, x::Real) = MyComplex(x, 0)
promote_rule(::Type{T} where T <: Real, ::Type{MyComplex}) = MyComplex

The first line is, again, necessary for defining additional methods for newly defined data types. The second line says that any real number can be converted to a complex number with imaginary component equal to 0. The third line says that any type which is a subtype of Real can be promoted to a complex number to try to find an applicable method to which to dispatch.

Miraculously, this works!

z1 + 1 == MyComplex(2, 2)
42.0 - z1 == MyComplex(41.0, -2)

Both these Python and Julia implementations are missing lots of features built into the respective standard library implementations of complex numbers. But with just two additional lines of Julia code, we've managed to get our implementation of complex numbers to play nicely with real numbers. This has pleasant unforeseen consequences on anything that relies on basic numeric operations. For instance, sum([1, 2, z1, z2]) does what you might expect; it sums up the elements of the array, returning MyComplex(3, 3).

MD vs OOP wrap-up

To me, multiple dispatch is one of the shining features of Julia. It makes it really easy to express relationships between data types that you define and data types that others have defined. For example, the wonderful DifferentialEquations.jl package can solve differential equations for user-defined data types such Unitful quantities and dual numbers.

Coming from a background in object-oriented programming in Python, it's been a bit of a learning process to get comfortable with the multiple-dispatch model. But it's changed (for the better hopefully!) how I think about organizing code, clarifying the separation between data structures and the functions that operate on them.