Scientific Computing in Python

December 10, 2007

Recently there has been a lot of momentum behind Python, which seems to have a very active scientific computing community which places a lot of emphasis on efficiency. I spent much of the weekend reading about various Python packages and tools for numerical and mathematical work and my curiosity is piqued. I’ll try to summarize some of the things I’ve discovered.

My primary reason for considering a “scripting language” over a “compiled language” is that such languages tend to result in shorter programs, less error-prone code, and thus a decrease in development time. A second concern is the need for completely reproducible research. That is, there should be a way for another researcher to completely replicate what you have done from start to finish. It seems to me that this is most easily accomplished through open source tools and libraries which are usually widely-available for many different platforms.

SciPy

SciPy is the center of the scientific Python community. The two primary libraries are NumPy and SciPy. NumPy is the foundation, providing a multidimensional array object, linear algebra functions, and random number generators. SciPy is built around NumPy and contains modules for optimization, integration, interpolation, etc.

In this brief example from the SciPy Tutorial, you can see that at a basic level, the code is no more complicated than Matlab:

from numpy import matrix
from scipy.linalg import inv, det, eig

A = matrix([[1,1,1],[4,4,3],[7,8,5]]) # 3 lines 3 rows
b = matrix([1,2,1]).transpose()       # 3 lines 1 rows.

print det(A)     # Determinant of A
print inv(A)*b   # The solution of the linear system Ax=b.
print eig(A)     # Eigenvalues and eigenvectors of A

Of course, being a fully functional programming language, it has many much more powerful features.

On Ubuntu or Debian Linux, installing SciPy and NumPy is as simple as:

sudo apt-get install python2.4 python-scipy python-numpy

At first I had a few questions regarding the implementation and performance of the various SciPy algorithms, but the FAQ quickly answered them:

How can SciPy be fast if it is written in an interpreted language like Python?

Actually, the time-critical loops are usually implemented in C or Fortran. Much of SciPy is a thin layer of code on top of the scientific routines that are freely available at <www.netlib.org>. Netlib is a huge repository of incredibly valuable and robust scientific algorithms written in C and Fortran.

Source: SciPy FAQ

So, many of the algorithms are implemented as wrappers to existing libraries such as LAPACK and the GSL. Also, in practice it is very easy to include inline Fortran or C++ code using the Weave or f2py packages.

OpenOpt

OpenOpt is a free Python-based alternative to commercial frameworks like AMPL and GAMS. It provides a unified framework for solving many types of problems through the use of various solvers. Here is an example from the website which optimizes a simple nonlinear objective function f(x)=(x1) 2 starting at x 0=4 using the ralg solver:

from openopt import NLP
p = NLP(lambda x: (x-1)**2, 4)
r = p.solve('ralg')

You only need to choose a solver which is appropriate for your problem, based on the class of problems it belongs to (e.g., nonlinear minimization or linear least-squares problems) and the kind of constraints you need to impose. Some of the solvers, such as ralg, were written by the OpenOpt developers. You can also use solvers from SciPy, such as scipy.optimize.fmin_bfgs, or LAPACK, such as lapack_dgelss for linear least-squares problems. The latest OpenOpt news can be found in the development forum.

PyMC

PyMC is a Python module for conducting Bayesian estimation through Markov Chain Monte Carlo (MCMC) sampling. It implements the Metropolis Hastings algorithm as a Python class and provides routines to assist in generating plots and summary statistics. The author has written a very nice users guide which also provides a good amount of theory along with usage information.

The SAGE project

The SAGE project aims to be a sort of open source Matlab-Mathematica hybrid. Instead of rewriting every algorithm under the sun, it provides a unified interface to various open source programs such as Octave and Maxima alongside commercial offerings such as Matlab and Mathematica. It’s language is essentially Python code so it is much more versatile than your typical math environment. There are several options for interacting with SAGE, including a web based interface called the Sage Notebook, an interactive shell, or by generating compiled code via cython. There is also an Emacs mode.

Additional Resources