A couple of exciting new projects

This week I’m happy to announce that I’m starting two new projects that I’ll be working on over the course of the next year. The first will be my master’s thesis: a program to help bioinformaticians analyze gene expression networks, utilizing high performance computing clusters. The program will parse the data and suggest workflows based upon that data. There will also be an accompanying book that explains the ins-and-outs of gene network inference. This will deploy to client’s clusters in a portable, standard way. The best-of-breed algorithms will be collected from the literature, and a very nice interface for interacting with the data flow will be provided. This project is going to be a blast!

The second project is a personal project. For a long time now, I’ve been unhappy with music playing solutions. The closest I’ve gotten to a program that I love is Amarok, however I use Ubuntu’s Gnome interface, and Amarok does not integrate with Gnome in a clean fashion. Also, since I am rarely at home anymore, I want my music to travel with me. Thus, the obvious solution was to write a web application that my server would run. I’ve decided to start this project in Scala using the Lift framework with Comet actors. The client side player will be flash based, embedded in the webpage or in a pop-up, and all of the application functions will be handled through the web interface. A similar program to this is Ampache, which is a wonderful program. However, the interface is not what I want, and the on-the-fly transcoding can be flakey. I require something that has a very fast interface, is playlist oriented like Amarok (as opposed to library oriented like iTunes), web accessible, on-the-fly transcoding and streaming, and large-scale database support. To my knowledge, nothing like that exists. It’ll use an Apache Derby database by default, but it will support MySQL and PostgreSQL databases.

I’m very excited to start these projects this week, and I’ll be blogging about both of them as they happen (with, hopefully, some screen shots).

Of course, both of these projects will be free and open source under the GPL when they are released.

Posted in Projects | Tagged , , , | Leave a comment

The Actor/Message Passing System of Concurrency

Lately, a lot of research has gone into how to make concurrency easier, both conceptually and programmatically. One of the models, originally created by Erikkson for the language Erlang, is the Actor/Message Passing system. The idea behind this system is that you a dispatcher creating tasks in response to some data. Information about the tasks are bundled up into objects, and each object is sent to an actor. The actors are sitting on top of a thread pool, waiting for a task. When they get a task, it goes into their mailbox. The next time that actor checks its mailbox, it retrieves the information in the message, grabs a thread from the thread pool, and begins performing that task. When it finishes that task, it sends the output of that task either back to the dispatcher or to another actor, and it checks its mailbox again.

The above model is conceptually very simple as long as your tasks are easily broken up into repeatable chunks. For example, it would be very easy to write a log parser. I have an actor polling the log, and when a new line comes in, it sends it to one of the parser actors. The parser actor figures out what to do with it, such as printing it to a different file if it is a particularly important event, and then returns to checking its mailbox. This works wonderfully where, say, the parser must go to the web or to disk to fetch some data to complete the task. The parser gives up the thread while waiting for the network or disk to return the data, and another parser gladly picks it up and begins executing. When the waiting parser is ready to complete its task, it gets back in line for a thread in the thread pool. Eventually, the operating system will assign it a thread, and it can finish its task.

This works well for systems where you have many similar tasks to perform concurrently, such as running the same program with different parameters many times. It also works well for networked solutions, as the messages are easily encapsulated and sent over a network to a waiting receiving dispatcher.

A modern implementation that has been gaining a lot of traction lately is Scala. Scala is a functional language that runs on the JVM and uses a Python or Ruby-esque syntax. It embrasses the actor/message passing system and makes it trivial to write code that utilizes concurrency with this model. As an example, I’ve written a simple IRC bot that uses this model. Polling the socket is done by a function running on one thread. The entire body of the polling function, once we’re connected to the socket, is:

val responder = new IRCResponder(connect, ircBotNick, ircBotDescription, homeChannel)
responder.start()

while(true)
      {
      val line = in.readLine()

      if(line != null)
        {
        if(line.substring(0,4).equalsIgnoreCase("ping")) {
          val pongmsg = "pong " + line.substring(5)
          responder ! IRC_Response(pongmsg)
        }
        else {
          responder ! IRC_Message(line)
          println("SENT TO RESPONDER")
          println(line)
        }
      }
    }

Note the lines with the exclamation point. Those are the lines that send the IRC_Message object to the responder actor. Really easy, right? responder.start() tells the responder object we just created to start running as an actor. When objects are sent to the responder, different actions are taken depending on what type of object was sent.

  def act() {
    // This thread will throttle the parsing threads by only allowing one thing to write at once.
    // So when data is sent back to here, write it out in the order that it came in.
    loop {
      react {
        case message: IRC_Message =>
        // Figure out which parser to send this message to and send it
        listOfParsers(findParser()) ! message

        case response: IRC_Response =>
        // Write the response to the socket
        sendData(response.response)
      }
    }
  }

The above code is the code the Responder object uses to figure out how to handle the object sent to it. If a IRC_Message is sent to it, then find a parser (this function finds the parser with the smallest mailbox) and send it to that parser for processing. If it is an IRC_Response, then we are getting a response back from a parser, so write it to the socket. The Parser’s act function looks a lot like this, except that the only case is for IRC_Messages, and then it probes the string in that message to see how it should respond.

The code is similar for the parser:

def act() {
    loop {
      react {
        case message: IRC_Message =>
        val line = message.message.toLowerCase

          // So we've received a message from the responder, figure out how to parse it.
          if ( line contains "define" ) {
            sender ! IRC_Response(toChat(getDef(line)))
          }
          else if ( line contains "find" ) {
            sender ! IRC_Response(toChat(getTorr(line)))
          }
          else if ( line contains "roll the dice" ) {
            sender ! IRC_Response(toChat("4."))
          }
      }
    }
  }

Using this method, the same bot could easily handle several chat rooms at once, including requests to grab data from the internet (such as torrents, Wikipedia summaries, definitions, etc.) using screen scraping or APIs. It would also be trivial to write a simple web server using this model, just treat each HTTP request as a separate task to be completed by an actor. In fact, using a very simple web server with a file cache, I was able to easily handle over 3000 conn/s with under 150 lines of code. Try it out, I think you might end up enjoying it. It is a very fun model to program with.

Posted in Coding Tips | Tagged , , , | Leave a comment

First month of Vim

I’ve recently made the upgrade from the wonderful Mac text editor Textmate to Vim (in the form of MacVim).  What initially led me away was a hope to be able to run SPSS syntax from within Vim, which would allow me to avoid SPSS 16′s abysmal syntax editor.  While this turned out not to work quite as easily as I had hoped, and is still a bit broken (a post for another time), I learned quite a bit about Vim’s scripting language.  Seeing how easily extensible Vim was, I decided to make the switch full time.  I downloaded a copy of MacVim and started creating my .vimrc.

The first thing I did was create an easy way to update my Git repositories when I was finished editing them.  This turned out to be ridiculously simple. Just put this in your .vimrc:

command -nargs=+ Commit :!git commit -a -m "<args>"; git push</args>

creates a new command in Command mode called “Commit”. So if you hit “:” to switch to Command Mode in vim, and type:

Commit Version 0.96 Updated XXX to do XXX, fixed bugs in XXX

and hit return, it will commit the changes to the repository and then push the code for you. This makes it effortless to keep your repositories up to date from within Vim.

I’m still retraining myself to navigate around the document using the shortcut keys. The shortcut keys, when I am in the mood where I actually use them instead of arrowing around, sped up my code development greatly. Also, being able to run terminal commands from within Vim has provided me with countless new ways of interacting with my code. I’d highly recommend everyone spend a month or so using Vim. The learning curve is a bit harsh at first, but the payoff is fantastic.

Here is what my MacVim looks like:

More Vim tips to come, along with some zsh tips and the SPSS/Vim bridge.

Posted in Coding Tips | Tagged , , , | Leave a comment

First Post

I’ve decided to give the website a revamp graphically, finally update my CV, and begin adding details on my research.  This website is primarily here in order for me to share handy programming tips and tricks, espouse my viewpoints on various things, and generally engage in narcissism.

Keep posted, I’ll try to actually update this thing this time.

Posted in Blogging | Leave a comment