Thursday, July 24, 2008

Austria, ISSAC, and Hidden Markov Models

Yesterday, I gave a controversial plenary lecture on Sage at the 2008 ISSAC symbolic computer algebra conference. It was well received by some proportion of the large audience of about 170 people, and will hopefully influence that research community to be more supportive of open source. In particular, I hope professors doing computer algebra research will allow their Ph.D. students to use open source software on research projects instead of forcing them to use Maple or Mathematica like most of them currently do at RISC.

Many people asked me what I thought of the ISSAC conference -- it was very similar to the yearly ANTS (Algorithmic Number Theory Symposium) meetings we number theorists have, but without number theorists. The meeting has a generally positive "vibe" and participants are enthusiastic about doing computation. My only criticism compared to ANTS is that the publication process for the proceedings isn't nearly as professional as what ANTS does -- the ISSAC publisher's website was in my opinion hell to use, working with the publisher to get my abstract in shape was no fun, and the final paper proceedings look like they were done at Kinko's, whereas ANTS proceedings are part of Springer-Verlag's lecture notes in computer science series, hence look very professional *and* are available online.

I also started looking at getting Hidden Markov Model functionality into Sage, since HMM's are very relevant to certain areas of machine learning, language processing, statistics, financial time series, etc., and Sage doesn't do much in that direction yet. I was prepared to have to write something from scratch myself in Cython, but quickly found GHMM.org, which is GPLv2+, actively used and developed, written in C with a Python interface, and with some work could possibly work very well for Sage. I would certainly rather spend a solid week writing high-quality documentation and tests (and reporting bugs) than months learning, implementing, and optimizing algorithms followed by a solid week writing high-quality documentation and tests, followed by months building a community of developers to maintain said code. The GHMM program linked to above only has an svn distribution and depends on xml, and it depends on swig. I've created an spkg that one can build into sage and which doesn't depend on libxml; it does assume you have swig installed, and takes about 30 seconds to install from source. It's installed into the system-wide sage on sage.math.washington.edu:

was@sage:~/patches$ sage
----------------------------------------------------------------------
| SAGE Version 3.0.5, Release Date: 2008-07-11 |
| Type notebook() for the GUI, and license() for information. |
----------------------------------------------------------------------

sage: import ghmm
sage: ghmm.[tab key]
ghmm.Alphabet ghmm.AminoAcids



In a few hours Michael Abshoff and I are heading to Vienna to meet with Harald Schilly (who I've never met), who is the new sagemath.org webmaster.