Archive for the ‘Uncategorized’ Category.

GTC 2010 Trip Report

I spent the week of September 20th though 23rd at NVIDIA’s GPU Technology Conference, and I’m now reporting back on some of the events that particularly interested me. I went to a whole range of talks, varying wildly in subject and quality, so I will gracefully ignore some of the talks while focusing on events that relate to my research or particularly struck my fancy.

As is usually the case with these conferences, I find that the most valuable experience is meeting the community of researchers around the world. Talks, papers and the like can be read by yourself, but conversation and networking at these conferences makes the visit absolutely worth going. GTC was no different, and I connected with several people that I aim to stay in touch with as research colleagues. This also seems to be an apt place to start my trip report! So, I met the following people during the week:

  • Tobias Brandvik (Cambridge), doing stencil abstractions for fluid flow solvers on multi-GPU clusters
  • Bryan Catanzaro (UC Berkeley), building Copperhead, a data-parallel python derivative that completely abstracts away the GPU.
  • Vasily Volkov (UC Berkeley), working on low-level performance tuning of GPUs and breaking down the CUDA abstractions.
  • Jason Cohen (NVIDIA), part of the Parallel NSight debugger team (Rick Shane from Adobe introduced us)
  • Nathan Bell (NVIDIA), the other half of the Thrust developer team (Jared Hobernock introduced us)

I will now attempt to distill the most interesting talks I attended to their core observation, and any notes I found especially interesting.

Opening Keynote

Jen-Hsun Huang, NVIDIA CEO

Jen-Hsuan was very much pushing on the importance of parallel programming, reiterating the arguments about the power wall the industry has hit, and pushing the fact the NVIDIA has been building highly parallel hardware for years now. The Senior Vice President of Content and Technology, Tony Tamasi, shows off several demos of Fermi’s tessellation capability (an endless city, procedurally generated, and tessellated on the GPU to give the equivalent of a 100 billion triangle scene). He moves on to the physics simulation capabilities of these GPUs by showing a re-imagination of Ron Fedkiw’s lighthouse scene running in real-time. A multi-grid height field combined with particles give real-time water, while flotsam is simulated as rigid bodies. All three simulations are coupled, and run in real time. Although it still looks simulated, it’s definitely ahead of today’s games.

The big statistic here for developers is the rate of CUDA adoption. NVIDIA very much pushes the idea that they have 100 million GPUs out in the field, all that can run CUDA programs. The reality of the situation is, naturally, not nearly this good, but it’s a nice statistic to have. The Folding@Home and SETI@Home people are reporting massively skewed statistics towards people running massively parallel processors, so there’s surely some truth to these numbers.

NVIDIA accounced CUDA-x86, a new compiler from PGI that allows compilation from CUDA code to x86 code, allowing developers to write programs that runs on multicore processors or throughput-based GPUs. In my mind this is just a nice-to-have, since none of the serious optimizations you do for the GPU (think coalesced memory accesses, specific thread groupings to exploit vector lanes and vector lane synchronization) will carry over to x86, and might even hurt performance (cache misses being caused by GPU-focused optimizations). Still, the write-one-run-anywhere dream is clearly very important, which is great for the research I’m working on.

Several other impressive demos were also shown: Dr. Black’s beating-heart-surgery that tracks a heart in real time to make incisions with a robotic scalpel, Abobe’s David Salesin showing off refocusing by using Plenoptic Lenses (originally done by Ren Ng from Stanford) and the iRay photorealistic raytracer running on 64 Fermi’s, rendering to your web browser at interactive rates. Clearly graphics has lots of evolution left as it enters the world of massively distributed computing.

Lastly, NVIDIA announced that their next two chips – Kepler and Maxwell – will have those codenames, and will aim for 3 times and 10 times the performance per watt of today’s Fermi’s.

A Fast, Scalable High-Order Unstructured Compressible Flow Solver

David Williams & Patrice Castonguay (Stanford)

I was curious to find out how this group built their flow solver to run on a GPU cluster. Since this is an example of what we’d like Liszt (our research language) to be able to do, so seeing a hand-written version was profitable. They followed the same MPI ideas as is generally used – partition your unstructured mesh and create ghost cells for the data you want to share across partition boundaries, placing a partition on each machine. They implemented their algorithm using a gather approach: The GPU would perform two stages of work, a first stage to calculate cell-based values, and a second stage to reduce these values to edge-based values. The synchronization between these two stages would include the MPI all-to-all step to resolve ghost cell values.

Since they wrote a specific instance of a RANS algorithm, they did not do any scheduling work or fine-grain synchronization, their two-stage gather was enough to run the algorithm. They were getting good linear speedups on their cluster, and managed to achieve a sustained 1.3 Teraflops on a cluster of 16 GPUs using a mesh of 320 000 cells.

New Programming Tools GPU Computing

Sequoia, Copperhead, GMAC, Thrust

Unfortunately, panel discussions with 3 minute introductions for each project is never enough to really understand any of the projects. The most striking part of this panel was the obvious programming language direction researchers have taken. Except for Thrust (although it can be considered an embedded domain specific language) all the work has a programming language spin on it. The major concern of the audience was clearly the support issues and feature-scarcity of new programming languages, which the different projects addressed differently – Sequioa tries to be a runtime more than a full language, Copperhead attempts to be deeply coupled to Python, Thrust passes itself off as a library and GMAC aims to be language-agnostic, creating a universal address space between accelerators (GPUs) and processors that any language can take advantage of.

PyCUDA (2041)

Andreas Klockener

Andreas’ PyCUDA talk was mostly an introduction to PyCUDA and a brief overview of how it works and the motivation behind it. I found this talk especially interesting, since he took an approach very similar to the way web frameworks use templates to generate web pages. Kernels in PyCUDA are strings of text, with embedded Python variables that is replaced when you ask his framework to compile the kernel. He built this JITting engine as an extention of Python, allowing you to write kernels at runtime and pass it off to the nvcc compiler to generate CUDA code. I liked the fairly low level control he allows you to achieve inside of Python, but PyCUDA does not attempt to abstract away CUDA or the GPU. It is, rather, very similar in spirit to the Boost Python bindings – allowing you to build software in Python, and rewrite the slow parts in C (or CUDA), calling from Python these low-level functions directly. PyCUDA has the added benefit that you do not even need to leave the Python interpreter. His whole approach was fascinating, especially since this is what I would have done were I faced with a similar problem, given my web framework experience. Andreas likens this to the LISP-style metaprogramming that’s been around since the 60s – manipulating string kernels, “pasting” in values on the fly.

PyCUDA in general is built to interface tightly with numpy and scipy, two Python packages that supply matlab-like functionality to Python users. PyCUDA does not attempt to address the type inference issue of moving from a dynamically typed language to a statically types one, since it depends on the user to write kernels with the correct types, and on numpy to supply runtime types of the multi-dimensional arrays that PyCUDA works with. Copperhead, Bryan Catanzaro’s data-parallel version of Python, abstracts away the GPU entirely, thus it has to deal with type inference, and he built a Hindley-Milner style type inference system into Python to handle this. Copperhead is built on top of PyCUDA, so he uses the JITting capabilities of PyCUDA to get to the GPU – a great decision in my mind, since someone else is now responsible for the low level details of catching errors and generating kernels.

Better Performance at Lower Occupancy (2238, Wed 15:00)

Vasily Volkov

(Slides here) Vasily has published several papers on understanding GPU hardware and tuning codes for the GPU, and his talk addressed the focus on massive multi-threading of GPU apps, showing the Instruction Level Parallelism is still a very important approach for the GPU. In the process of demonstrating this, he disproved several of NVIDIA’s claims in their Programming Guide. This talk was very interesting to me, since it addressed many of the low level architectural questions myself, Kayvon, Solomon, Zach and Jorge has discussed in detail. The use of the word “occupancy” in this talk refers to the percentage of threads spawned out of the total supported number on the multiprocessor.

The general recommendation to hide latencies is using more threads per block and more threads per multiprocessor. Vasily demonstrates that faster codes tend to run at lower occupancies, citing as examples the differences between CUBLAS and CUFFT versions – every performance improvement came with a lowering of threads per block. Vasily shows in the talk how to hide arithmetic latency and memory latency using fewer threads, and get a total performance increase with fewer threads. He also attempts to disprove the fallacies of shared memory being as fast as register files, addressing the bandwidth differences between the two.

The heart of his talk is the fact that Streaming Multi-Processors are still pipelined machines, regardless of the multi-threaded wide-vector-lane nature of these processors. By writing your code as sets of independent operations, keeping data dependencies to a minimum and structuring code to keep the pipeline filled, you can get massive performance regardless of the machine’s occupancy. He shows the roofline model for a simple SAXPY code, and how he can influence the memory-bound part of the model by doing multiple independent SAXPY operations in a single thread (since all but one of the input values are the same in a SAXPY operation). He continues to show that he can get 87% of the peak bandwidth available to an SMP at only 8% occupancy (while cudaMemCpy achieves only 71% of peak). Lastly he makes the point that the banked nature of shared memory makes it impossible for shared memory codes to achieve full bandwidth utilization. This leads to the recommendation to use as few threads using as much registers as possible.

The attention to detail in this talk, as Vasily played with the limits of the GPU, allowed him to break down several of the “nice” abstractions CUDA provides.

Large-Scale Gas Turbine Simulations on GPU Clusters (2118, Wed 16:00)

Tobias Brandvik

Tobias and his group at Cambridge has addressed a very similar problem as we have with Liszt. They want to write a mesh-based simulation system that runs on today’s high performance machines, without having to rewrite the code for each architecture. Specifically, they are building a production-quality solver for use in the Aerospace industry. Their approach has many overlaps with Liszt, targeting GPU clusters while avoiding the need to rewrite all their code for every variation of today’s heterogeneous machines. In contrast to our approach, they work at a fairly low level, since they only attempt to rewrite the approximately 10% of their code base that is stencil-based calculations. A stencil is defined as mesh accesses and data read/writes a kernel will perform, which allows them to generate specialized CUDA source code for each stencil in their application. This 10% of the code is roughly responsible for 90% of the run time, and can be abstracted as math kernels running specific stencils across a mesh.¬† The cluster aspect of their code is still hand written MPI code, but rather than write GPU or SMP specific codes that runs on an MPI node, they use these stencils and math kernels.

In terms of domain-specific optimizations, he referred to the 2008 Supercomputing paper by Datta et al that showed a set of optimizations to run stencil codes at high performance on GPU devices. They attempted to implement these optimizations as part of their source-to-source compilation process for kernels.

Their approach requires the programmer to hand-write the stencil and the math kernel. This approach allowed them to embed this stencil language partially inside fortran. They then took their current simulation system (TBLOCK, approximately 40kloc in Fortran) and factored out the stencil-based calculations into separate stencil definitions and math kernels. This allowed them to keep most of their current simulation code while spending “a couple of months” (Tobias’ words) on rewriting calculations that fit this stencil scheme into their embedded language with accompanying stencil definition files. Their system, called TurboStream, has on the order of 15 different stencils in it, with 3000 different stencil definitions, and they run simulations on a 64-GPU cluster at the University of Cambridge.

Tobias made an interesting comment during the talk, saying that their biggest concern is not pure speed, since their solvers are relatively simple, but that they want to be able to handle much more data than they currently do – this was their biggest motivation for moving to GPU clusters. Per way of example, he showed the fine detail of turbofan fins spinning through slits cut in the housing of a jet engine, and the fine-grain simulation details around these slits – geometry that was previously ignored.

Overall Impressions

The biggest gain of the conference was the networking with several other researchers, and getting an overall view of the field as several groups attempt to solve the same problem – how do we write codes that can run on all of today’s hardware choices.

I find myself using OpenTerminal a lot – mostly to open a terminal in a directory, followed by “mate .” to open a textmate project in this directory. This quickly becomes annoying, so after looking into AppleScript, I took the plunge and wrote my first AppleScript. what a weird language. Anyway, you can dump this into AppleScript Editor, and when you run it, it opens a textmate project of the front-most finder window:

on run
tell application "Finder"
set frontWin to folder of front window as string
set frontWinPath to (get POSIX path of frontWin)
tell application "TextMate"
open frontWinPath
end tell
on error error_message
display dialog error_message buttons {"OK"} default button 1
end try
end tell
end run

Save this as an Application (not a Script), and drag it onto your finder toolbar. Voila! TextMate at your fingertips.

Thanks you Mac OS X Hints, from where I got the pattern to do this.

Comparing floating point numbers for equality.

Everyone who’s taken a architecture course (or messed around with scientific computing) knows that floating point numbers are not associative. That means mathematically:

a*(b+c) \neq a*b+a*c

Or, in layman’s terms:

The order of operations influences the result of the calculation

This implies that floating point calculations that mathematically give the same answer, does not necessarily produce exactly the same floating point number. So, when comparing two floating point results, using a == b will not give the correct result.

You can attempt to remedy this by using the mathematical approach of allowing an absolute error metric:

(a*b)^2 < E

which does not account for the fact that floating point numbers are unequally distributed over the real number line. We can attempt to use a relative error metric:

\frac{|a-b|}{b} < E

but this does not take into account the difference between very small positive and negative numbers (including positive and negative zero, since floats have both).

So, from the very enlightening “comparing floats” article by Bruce Dawson, we try something quite different.

Floats can be lexographically ordered if you consider their bitstream to be signed-magnitude integers. We can exploit this fact to calculate exactly how many representable floating point numbers there are between two floats. So, for example, we can find that there is only one floating point number between 9,999.99999 and 10,000.00001 and use an error metric that states “I will consider floats to be equal if they are within E representable floats of each other.

The details of this routine is in the comparing floats article, but I will mirror the code here:

// Usable AlmostEqual function

bool AlmostEqual2sComplement(float A, float B, int maxUlps)
    // Make sure maxUlps is non-negative and small enough that the
    // default NAN won't compare as equal to anything.
    assert(maxUlps > 0 && maxUlps < 4 * 1024 * 1024);
    int aInt = *(int*)&A;
    // Make aInt lexicographically ordered as a twos-complement int
    if (aInt < 0)
        aInt = 0x80000000 - aInt;
    // Make bInt lexicographically ordered as a twos-complement int
    int bInt = *(int*)&B;
    if (bInt < 0)
        bInt = 0x80000000 - bInt;
    int intDiff = abs(aInt - bInt);
    if (intDiff <= maxUlps)
        return true;
    return false;


This has saved me huge amounts of headaches comparing CUDA and CPU generated results for our CUDA programming class, CS193G.

Compiling the pbrt 1.04 raytracer on mac OS X 10.6

I’m taking Prof. Pat Hanrahan’s CS348B “Advanced Rendering” course this quarter, and we’re extending the pbrt renderer as part of the course assignments. It’s probably worth documenting how I compiled this on my Snow Leopard machine.

After downloading and extracting pbrt 1.04 from the pbrt downloads page I had to install OpenEXR using MacPorts:

sudo port install OpenEXR

MacPorts installs libraries like this one in /opt/local/ to prevent conflicts with libraries from other sources (it has a handy pkgconfig directory for each library in /opt/local/var/macports/software/.../lib/ that is full of info). We need to update pbrt’s makefile to point here. We modify lines 13 and 14 in the Makefile to read:


You should now be able to make the directory and produce pbrt. Remember, you need XCode installed!

Now you need to set the PBRT_SEARCHPATH environmental variable. I did this the easy way and cd‘d to the pbrt bin directory, and ran:

export PBRT_SEARCHPATH="`pwd`"

Installing Numpy and SciPy on Snow Leopard – the easy way

Just a quick note on the easiest way I’ve found to install NumPy and SciPy on Snow Leopard. It can be quite a pain, since you have to leave the Apple python install alone, even though it includes NumPy by default. It does not support SciPy, and anyway it’s not the latest.

So, don’t try to build from SVN and all that fancy stuff. Just do:

1) Download the latest Python from
2) Download the numpy dmg from sourceforge
3) Download the scipy dmg from sourceforge

Install in that order and move on.

RexPC: Running EEEbuntu on eeePc 701 as a Carputer in my WRX

My latest big project is installing my eeePC 701 into my Subaru WRX. This is going to be a long project with several stages, and I’ll talk more later on the actual requirements. For now I first want to see if the eeePC 701 is actually powerful enough to make it worth installing into my car. Its low power requirements (22 watts) and tiny size makes it such a perfect fit that i want to make it work.

The eeePc I am planning to build into my wrx

The eeePc I am planning to build into my wrx

The Xandros packaged with the 701 was more of a joke than anything else. Fantastic for the very inexperienced user who will only ever use skype, pidgin and firefox. I need something more powerful and less buggy, so I’m going with EEEbuntu 3.0 base, since this is the most stripped down “full-featured” ready-to-go OS for the 701 I could find.

After downloading, burning and installing (luckily I have an external CDROM) I went through the following steps:

  1. Run Update Manager and install all the updates (it’s based on Ubuntu 9.04)
  2. Install GPSD, gpsdrive, tangogps, python-gps though synaptic
  3. Plug in GPS (gpsd launches automatically), and go for a test drive

I drove around the neighborhood with the GPS pushed into my sunroof (smart huh?) and the eeepc on the passanger seat. I was very impressed with tangogps and logged my trip around campus with it. At first glance everything worked seamlessly with no configuration needed!

A quick test trip logged using tangogps

A quick test trip logged using tangogps

I’ll be looking into touchscreens to install in place of the regular cd/radio/tape head that is currently in there, and keep writing about my progress.

RexPC: Planning my WRX’s built-in computer

What if... WRX + eeePC

What if... WRX + eeePC

I’m planning to build a computer into my 2004 Subaru Impreza WRX. And not just any computer, but hopefully my small eeePC 701 – the original netbook that started the revolution.

So, as with any big project, this one starts with a list of dreams I wish I had, and some research online into what other people are doing. Actually, any big project starts with a glass of Zinfandel (check) and brainstorming for a name. In this case, it was easy. My WRX is called Rex. And it’s getting a computer. So, RexPC. Onwards, then!

I should quickly address those of you thinking “Why is this guy not upgrading his exhaust to a full catback, or getting new rims and high performance tires, or upgrading his intercooler and air intake, or rechipping the engine ECU, or, or, or… why is he installing a *computer*??” (TJ I know what’s going through your mind). Simple, really, the computer is the cheapest mod I can do at this point, since the eeepc is just collecting dust on my shelf at the moment, and it’ll be awesome once it’s in the car.

My list of requirements. The design process (courtesy of CS147 with Scott Klemmer), once a certain type of user has been identified, kicks off with a need-finding phase to explore possible problems to address in your product. Since I’m the user, this should be easy. Here’s the list of things I want Rex to be able to do

RexPC User Requirements (aka Dreams)


  • Play my complete (160gb) music library. And stay sync’d with my desktop.
  • Play other people’s plugged-in iPods
  • Play AM/FM Radio
  • Record video with backup and front camera (obviously, tagged with audio and location)
  • Show backup camera when in reverse


  • Map my trips in detail
  • Provide navigation when I get lost
  • Provide weather and road condition information (incl. current temperature, etc)


  • Provide Engine Diagnostics (OBDII readout)
  • Provide extra gauges (for example, oil temperature and pressure)
  • Provide chassis orientation information (angles, direction)
  • Provide performance information (acceleration, cornering, wheel slip, lap timing, bodyroll, etc.)
  • Control interior and exterior lighting, and windows.

Communication and Countersurveillance

  • HAM Radio abilities including APRS
  • Family Talk radio
  • Police Scanner
  • Show traffic cameras
  • Radar/Laser detector alert logging and tagging with location
  • Bluetooth phone integration


  • Work seamlessly with the car’s ignition system to provide startup and shutdown of electronic components with the rest of the car.

Now I have this initial list of things I want my car to do (some of them probably beyond the scope of this project since it will demand always-on interwebs, and I don’t know if I’m ready to shell out for a monthly 3g contract). So what do we need to do this?

Hardware Requirements

  • eeepc running custom linux
  • external usb-powered harddrive (250gb or more)
  • microphone
  • audio-out to current speaker system
  • GPS
  • accelerometer
  • compass
  • network connectivity (wifi definitely, possible 3G for always-on)
  • OBD II interface to car diagnostics
  • outside temperature sensor
  • ham radio, computer-controlled
  • police scanner (hopefully part of ham radio)
  • integration into current radar detector (I’m not building one of these things…)
  • touchscreen built into car

Current Carputers out in the wild

I’m obviously not the first to want to attempt this project. Several awesome people who are inspiring me to do this is the work. Avatar-X built a dell laptop into his Subaru Legacy that does most of these things and more! His process is nice to read (although not very well documented in terms of replication) and inspired me to look into this. Redian has a much more detailed post on installing a real motherboard into his wrx wagon which is also very informative. has, in general, been a good source of inspiration, information and encouragement, so check them out.

Next up, testing the eeepc’s abilities to handle this kind of workload, and looking at bashing out the list of dreams into a more concrete set of features interacting with each other.

Windows to Mac Screen Sharing

My old black macbook has been collecting dust for no reason whatsoever, so I decided to use it as the dev machine for our HCI class (CS147) since none of my team members had their own mac machines. Surely setting up a windows-to-mac screen sharing session couldn’t be too hard!

Unfortunately Mac’s fantastic screen sharing implementation doesn’t play well with Windows. You can connect to it with a VNC client (I recommend TightVNC) but its incredibly slow.

An easy fix is to use the awesome Vine VNC Server for OS X on your mac, and connect to it with TightVNC.To get that¬† buttery smooth feel, use TightVNC’s CoRRE encoding as your compression medium. Boom! A usable remote desktop connection from a windows to a mac box.

Resizing your HFS+ partition? Oh boy, Adobe licenses suck!

After deleting a partition on my hackintosh, I’ve needed to resize my Mac partition to use the extra free space. Boy oh boy was I about to get acquainted with a monster. After lots of research online, I find that the only free way to increase the size of an existing HFS+ partition is to trick Bootcamp into creating a partition in whatever free space you have, then telling bootcamp to reclaim that partition into your mac partition. And for some reason my install didn’t have bootcamp on it, which you can’t download since bootcamp is not integrated into the OS.

So I came up with a smart alternative. And boy did that create probems… But I did get something running in the end!

Continue reading ‘Resizing your HFS+ partition? Oh boy, Adobe licenses suck!’ »

Arduino 16 SD and SPI interfacing

Just a quick note – the latest Arduino software does something terribly wrong in its interfacing with SD cards through the SPI interface (dunno if this affects all SPI connections or not, maybe!). I’ve struggled with this for days on end until I downgraded to 0014 and everything started working just fine!

Domain specific knowledge in Music. Mainstream hip-hop’s problem.

As a follow up to one of my previous opinions of the importance of domain specific knowledge to be productive, I happened across an interesting example worth sharing.

Domain specific knowledge not only helps with productivity, it also makes a big difference in accuracy. That’s where the example comes in.

I’m a big fan of Lupe Fiasco’s music, and his old mixtapes were some of the best works in hip hop since the early 90s. So I was listening to his “Happy Industries”, enjoying his brilliant lyrics and mash-up abilities. So I figured I should check out the full lyrics and post it on facebook. This is what I found on each of the top 5 google results for “Lupe Fiasco Happy Industries Lyrics”:

Once upon a time not long ago
An ID yeah that’s what I had
To take DNA
As a little pro two
With my MCing ways and make em mad
Just having fun not chasing cash
Apologise now for it make ya mad
Had to call g wall tell em warm up the mic
Put the pendant on the wall tell em make some maaagiicc
Shorty it’s nothing lavish
Matter of fact
It’s just an attic
Background noise from the family
Hearing the mic slaying in the outside traffic
Still turned out fantastic
Turn my vocals up just a tad bit
Fresh from the first and fifteen
Quarantine touching you super cool that asset

I’m sorry, but this is utter crap. Some fan with very little knowledge about the music industry must have transcribed this. It makes no sense whatsoever, and unfortunately the state of hip-hop is such that most people will accept that fact that it makes no sense. But Lupe tends to have great lyrics, so on listening to the song again, this is what he’s really saying:

Once upon a time not long ago
An idea yeah that’s what I had
To take demon days
And a little pro tools
With my MCing ways and make a mash.
Just having fun not chasing cash
Apologize now if I make ya mad
Had to call g wall tell em warm up the mic
Put the pendant on the wall tell em make some maaagiicc
Studio is nothing lavish
Matter of fact
It’s just an attic
Background noise from the fan
Hearing the mic slaying in the outside traffic
Still turned out fantastic
Turn my vocals up just a tad bit
Fresh from the first and fifteen
Quarantine touching you super cool thats just ah sick!

Notice what just happened. The lyrics went from some song filled with what we can only consider to be slang we don’t understand and randomly “slaying the mic” to a song about him making “sick” music using just his laptop and his little home studio in his attic, not here to make money but here for the magic. He talks about using Pro Tools, something that people with experience in the music industry knows about, and making mash-ups between tracks. Funny, because the song itself is exactly a mashup of Gorillas’ Demon Days album and his lyrics. It should be obvious that these are the correct lyrics.

If Hip-Hop is so infused with the ideas of making money that a song saying “its not about money” can so quickly become so convoluted… You be the judge.

Python is Wrong

I recently did about 3 days of solid hacking in Python, and discovered some limitations and some nice features of the language and its libraries in the process.
I can complain about how limited the lambda is compared to my experiences with Scheme, or how lacking its process management utilities are, but more importantly, there’s something fundamentally wrong with python.

You see, it has this neat easter egg. “import this” prints the following poem, see if you can state the gross error. To make it easier, I’m putting the gross error in BOLD.

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren’t special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one– and preferably only one –obvious way to do it.
Although that way may not be obvious at first unless you’re Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it’s a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea — let’s do more of those!

Oh come ON! Anyone who’s ever done numerical simulation or any kind of computational physics knows that Implicit has the same error as Explicit but is unconditionally stable!

Give me implicit euler integration or give me death.

Real Time Raytracing Success!

Oh man oh man oh man, two bottles of 5 hour energy and a delicous mug of Peet’s Major Dickasons freshly roasted coffee later and I’m doing real time raytracing!

Its nothing super fancy, but as part of the assignments I’ve been working out for the graphics class I’m TAing (CS184 at UC Berkeley) I’ve been putting together a framework for the students to explore raytracing in. And while we’re at it, why not try to make it run in realtime. Turns out that cutting out disk access and loading everything up into RAM, using OpenGL as a final pixel buffer to display images, gives you gobs of performance for free. Now who would have thought that? ;)

So, I’ll clean this stuff up and post some demos. Phong shading has never looked so good as when you can swing the camera around objects!

I got into Stanford, now what?

As the good news keep rolling in, with PhD acceptances suddenly going from scarce to abundant, I’m being slapped in the face by the question I should have been asking while applying – WHICH ONE?!!!?!

Undergrad was a fairly easy decision. Go to the best school you get into. Grad school… A little more complicated. The questions range from “Can I afford the area?” (easy, fellowships!) to “Will I want to marry someone from here?” (interesting… but not very informative still) to “Are there people I want to work with?” (crucial… but true for too many!) to “Do I want to live here?” (which just makes it harder).

I’ve had a fantastic run at Berkeley, and although there’s plenty I don’t agree with and plenty I’ve loved, I came out on top overall. But now that I need to again ask the question of where to go, life gets a lot more complicated really quickly!

On the plus side, it is President’s day, so maybe I’ll spend some money on two new monitors to complete my 4-screen desktop setup. Hmmmmmmmmm how does 3800 by 2400 pixels on your desk sound?

Valve Complete Pack

I made the plunge and shelled out $99.99 for the Complete Valve Pack, which includes a list of games to keep anyone busy for many hours. Too many of my roommates are playing Left 4 Dead, and if you’re going to spend money, this is a sweet deal to get everything! Counter strike, the Half-Life series, and of course Portal are such fun and innovative games (if not quite revolutionary) that this collection gives it all.


The Ending of an Era.

With graduate applications sent out and another semester coming to an end, I can’t help but look back at where I came from. If I have to choose one expression that really influences and reflects on life, something that touched me, that changed my outlook on life and that reinforced my awe and wonder at our magnificent world, it would have to be the words of Albert Knag in Jostein Gaarder’s novel “Sophie’s World”:

“Life is both sad and solemn. We are let into a wonderful world, we meet one another here, greet each other and wander together for a brief time. Then we lose each other and disappear as suddenly and unreasonably as we arrived.”

My response to this was (and still is) a humble “Wow”. Gaarder expresses both the majestical highs of exuberance and the unthinkable but ultimately true end of life without judging or diminishing both. And is that not how life truly is? Although this quote deals on a first level with life as a whole, it is just as true for our daily lives. It amazes me to experience the daily comings and goings of people, the connections we make with humans that we meet one evening and afterwards, as we walk away, not realize that we will never see them again. The profound sorrow that is a part of all existence, but also the profound joy of every moment that we share amongst those we connect with. I sometimes wish that we can hold on to the beautiful moments, the great achievements and the times of joy and happiness, that we can freeze time, that we can relive our profound moments in more than just memory. But as this quote so aptly conveys, this is not the way of the world. But that is not a reason for despair or sorrow. No, it is just a motivation to cherish every moment for all that it encompasses. If we could relive times at our slightest whims, if we could get a second chance at life, maybe we would find that, instead of finding recaptured glory and awe, we are only diminishing the worth of the moment. Maybe the biggest factor in creating the exuberance and awe that we experience is the fact that we can’t relive it. Why would we walk the extra mile now if we can do it tomorrow? But still, our heart yearns for the chance to recapture and relive. And not in vain, for by doing so, I believe we keep the memories unstained and unspoilt, the memories of our “brief time” in this wonderful world. Although we all spend only a limited time here, this world in filled with so much emotion, so much strength and weakness, so much love and hate, so much exaltation and so much sorrow, that “wonderful” fails to describe the awe, humility, joy and love that we find here on earth. I would not exchange my memories for any riches or glory.

Installing Eclipse on Fedora Core 5 (with its own Java JRE)

I’ve been dying to try out the new Eclipse Ganymede, especially throwing the multi-million-line codebase i’m working on at Pixar into the new CDT version to see what will happen. Until now I haven’t been able to get eclipse working on Fedora Core 5 – the machine i’m using at work.

The main difficulty is to get Fedora 5 to use the latest JVM from Sun rather than the default GNU 1.4.2 compiler. There are several resources on how to make the global switch (this being the most complete I’ve found) but for some reason Eclipse was still not using it. So here’s how I managed to do it:

* Download and extract Eclipse to a local directory
* Download the self-extracting Java version
* Run the Java .bin file and extract its contents.
* Copy the directory extracted from the .bin file (“jdk1.6.0_06″ in my case) into the eclipse directory
* Create a symbolic link called “jre” in the eclipse directory to the jdkx.x.x/jre directory


Rockband – with Computer Vision

Rockband Vision from njoubert on Vimeo.

We built this computer vision system that can play Rockband as our final project for CS184 at UC Berkeley.

In the space of two weekends we designed and built a system that uses computer vision to monitor the xbox display through a camera and play the game. More details can be found at

We were, well, sleep deprived for a good section of this work, which explains the craziness in some parts of the video.

Our system is similar to Slashbot and AutoGuitarHero, but we do not take a video feed from the console – no, we’re doing it through a panasonic handycam pointed at the screen. We’re interested in machine vision, and this was a fun project to get into the field!

Building OpenCV in Ubuntu 8.04

I’m using OpenCV for my current computer graphics project – hacking Harmonix’ Rock Band – so naturally I have to build it from source in Ubuntu. I downloaded the source from Sourceforge.

The procedure was fairly simple – the most important part was the packages needed to satisfy all the requirements. OpenCV depends on several other libraries to really get the full potential of our system (although simple installs are possible).

Since I wanted to do image input/output I apt-get’ted the following packages:

  • libpng-dev
  • libjpeg-dev

To do ffmpeg development – which is the library OpenCV uses for video campture:

  • libavcodec-dev – development files for libavcodec
  • libavformat-dev – development files for libavformat
  • libavutil-dev – development files for libavutil
  • libpostproc-dev – development files for libpostproc
  • libswscale-dev – development files for libswscale
  • libdlna-dev – development files for libdlna
  • libmpeg4ip-dev – end-to-end system to explore streaming multimedia

For all the funky GUI development:

  • libgtk2.0-0
  • libgtk2.0-dev

You can install all of this using audo apt-get install

Once this is done, I unpacked the TAR file, cd’d to the directory and ran the good old standard set of building commands:
<br />sudo ./configure<br />sudo make<br />sudo make install<br />

That’s it!

CHI 2008: Yay we made it in!

The CHI 2008 conference is winding down today, and I’m still excited that our Work in Progress paper got accepted to the conference!

CHI is arguably the biggest conference HCI/Design conference around – from their website: “CHI 2008 focuses on the balance between art and science, design and research, practical motivation and the process that leads the way to innovative excellence.”

Our paper was titled “Enhancing online personal connections through the synchronized sharing of online video” and came from the work that Ayman, myself, Marcello and Yiding did at Yahoo Research Berkeley during 2007. Some of our prototypes are making it into that allows for synchronous sharing of video. Also, our Yahoo messenger plugin Zync is not officially integrated into Yahoo Messenger – just click the “Watch with me” button when dropping in a video, and you get to watch video synchronously with the person you’re chatting with. Cool stuff!