GTC 2010 Trip Report

I spent the week of September 20th though 23rd at NVIDIA’s GPU Technology Conference, and I’m now reporting back on some of the events that particularly interested me. I went to a whole range of talks, varying wildly in subject and quality, so I will gracefully ignore some of the talks while focusing on events that relate to my research or particularly struck my fancy.

As is usually the case with these conferences, I find that the most valuable experience is meeting the community of researchers around the world. Talks, papers and the like can be read by yourself, but conversation and networking at these conferences makes the visit absolutely worth going. GTC was no different, and I connected with several people that I aim to stay in touch with as research colleagues. This also seems to be an apt place to start my trip report! So, I met the following people during the week:

• Tobias Brandvik (Cambridge), doing stencil abstractions for fluid flow solvers on multi-GPU clusters
• Bryan Catanzaro (UC Berkeley), building Copperhead, a data-parallel python derivative that completely abstracts away the GPU.
• Vasily Volkov (UC Berkeley), working on low-level performance tuning of GPUs and breaking down the CUDA abstractions.
• Jason Cohen (NVIDIA), part of the Parallel NSight debugger team (Rick Shane from Adobe introduced us)
• Nathan Bell (NVIDIA), the other half of the Thrust developer team (Jared Hobernock introduced us)

I will now attempt to distill the most interesting talks I attended to their core observation, and any notes I found especially interesting.

Opening Keynote

Jen-Hsun Huang, NVIDIA CEO

Jen-Hsuan was very much pushing on the importance of parallel programming, reiterating the arguments about the power wall the industry has hit, and pushing the fact the NVIDIA has been building highly parallel hardware for years now. The Senior Vice President of Content and Technology, Tony Tamasi, shows off several demos of Fermi’s tessellation capability (an endless city, procedurally generated, and tessellated on the GPU to give the equivalent of a 100 billion triangle scene). He moves on to the physics simulation capabilities of these GPUs by showing a re-imagination of Ron Fedkiw’s lighthouse scene running in real-time. A multi-grid height field combined with particles give real-time water, while flotsam is simulated as rigid bodies. All three simulations are coupled, and run in real time. Although it still looks simulated, it’s definitely ahead of today’s games.

The big statistic here for developers is the rate of CUDA adoption. NVIDIA very much pushes the idea that they have 100 million GPUs out in the field, all that can run CUDA programs. The reality of the situation is, naturally, not nearly this good, but it’s a nice statistic to have. The Folding@Home and SETI@Home people are reporting massively skewed statistics towards people running massively parallel processors, so there’s surely some truth to these numbers.

NVIDIA accounced CUDA-x86, a new compiler from PGI that allows compilation from CUDA code to x86 code, allowing developers to write programs that runs on multicore processors or throughput-based GPUs. In my mind this is just a nice-to-have, since none of the serious optimizations you do for the GPU (think coalesced memory accesses, specific thread groupings to exploit vector lanes and vector lane synchronization) will carry over to x86, and might even hurt performance (cache misses being caused by GPU-focused optimizations). Still, the write-one-run-anywhere dream is clearly very important, which is great for the research I’m working on.

Several other impressive demos were also shown: Dr. Black’s beating-heart-surgery that tracks a heart in real time to make incisions with a robotic scalpel, Abobe’s David Salesin showing off refocusing by using Plenoptic Lenses (originally done by Ren Ng from Stanford) and the iRay photorealistic raytracer running on 64 Fermi’s, rendering to your web browser at interactive rates. Clearly graphics has lots of evolution left as it enters the world of massively distributed computing.

Lastly, NVIDIA announced that their next two chips – Kepler and Maxwell – will have those codenames, and will aim for 3 times and 10 times the performance per watt of today’s Fermi’s.

A Fast, Scalable High-Order Unstructured Compressible Flow Solver

David Williams & Patrice Castonguay (Stanford)

I was curious to find out how this group built their flow solver to run on a GPU cluster. Since this is an example of what we’d like Liszt (our research language) to be able to do, so seeing a hand-written version was profitable. They followed the same MPI ideas as is generally used – partition your unstructured mesh and create ghost cells for the data you want to share across partition boundaries, placing a partition on each machine. They implemented their algorithm using a gather approach: The GPU would perform two stages of work, a first stage to calculate cell-based values, and a second stage to reduce these values to edge-based values. The synchronization between these two stages would include the MPI all-to-all step to resolve ghost cell values.

Since they wrote a specific instance of a RANS algorithm, they did not do any scheduling work or fine-grain synchronization, their two-stage gather was enough to run the algorithm. They were getting good linear speedups on their cluster, and managed to achieve a sustained 1.3 Teraflops on a cluster of 16 GPUs using a mesh of 320 000 cells.

New Programming Tools GPU Computing

Unfortunately, panel discussions with 3 minute introductions for each project is never enough to really understand any of the projects. The most striking part of this panel was the obvious programming language direction researchers have taken. Except for Thrust (although it can be considered an embedded domain specific language) all the work has a programming language spin on it. The major concern of the audience was clearly the support issues and feature-scarcity of new programming languages, which the different projects addressed differently – Sequioa tries to be a runtime more than a full language, Copperhead attempts to be deeply coupled to Python, Thrust passes itself off as a library and GMAC aims to be language-agnostic, creating a universal address space between accelerators (GPUs) and processors that any language can take advantage of.

PyCUDA (2041)

Andreas Klockener

Andreas’ PyCUDA talk was mostly an introduction to PyCUDA and a brief overview of how it works and the motivation behind it. I found this talk especially interesting, since he took an approach very similar to the way web frameworks use templates to generate web pages. Kernels in PyCUDA are strings of text, with embedded Python variables that is replaced when you ask his framework to compile the kernel. He built this JITting engine as an extention of Python, allowing you to write kernels at runtime and pass it off to the nvcc compiler to generate CUDA code. I liked the fairly low level control he allows you to achieve inside of Python, but PyCUDA does not attempt to abstract away CUDA or the GPU. It is, rather, very similar in spirit to the Boost Python bindings – allowing you to build software in Python, and rewrite the slow parts in C (or CUDA), calling from Python these low-level functions directly. PyCUDA has the added benefit that you do not even need to leave the Python interpreter. His whole approach was fascinating, especially since this is what I would have done were I faced with a similar problem, given my web framework experience. Andreas likens this to the LISP-style metaprogramming that’s been around since the 60s – manipulating string kernels, “pasting” in values on the fly.

PyCUDA in general is built to interface tightly with numpy and scipy, two Python packages that supply matlab-like functionality to Python users. PyCUDA does not attempt to address the type inference issue of moving from a dynamically typed language to a statically types one, since it depends on the user to write kernels with the correct types, and on numpy to supply runtime types of the multi-dimensional arrays that PyCUDA works with. Copperhead, Bryan Catanzaro’s data-parallel version of Python, abstracts away the GPU entirely, thus it has to deal with type inference, and he built a Hindley-Milner style type inference system into Python to handle this. Copperhead is built on top of PyCUDA, so he uses the JITting capabilities of PyCUDA to get to the GPU – a great decision in my mind, since someone else is now responsible for the low level details of catching errors and generating kernels.

Better Performance at Lower Occupancy (2238, Wed 15:00)

Vasily Volkov

(Slides here) Vasily has published several papers on understanding GPU hardware and tuning codes for the GPU, and his talk addressed the focus on massive multi-threading of GPU apps, showing the Instruction Level Parallelism is still a very important approach for the GPU. In the process of demonstrating this, he disproved several of NVIDIA’s claims in their Programming Guide. This talk was very interesting to me, since it addressed many of the low level architectural questions myself, Kayvon, Solomon, Zach and Jorge has discussed in detail. The use of the word “occupancy” in this talk refers to the percentage of threads spawned out of the total supported number on the multiprocessor.

The general recommendation to hide latencies is using more threads per block and more threads per multiprocessor. Vasily demonstrates that faster codes tend to run at lower occupancies, citing as examples the differences between CUBLAS and CUFFT versions – every performance improvement came with a lowering of threads per block. Vasily shows in the talk how to hide arithmetic latency and memory latency using fewer threads, and get a total performance increase with fewer threads. He also attempts to disprove the fallacies of shared memory being as fast as register files, addressing the bandwidth differences between the two.

The heart of his talk is the fact that Streaming Multi-Processors are still pipelined machines, regardless of the multi-threaded wide-vector-lane nature of these processors. By writing your code as sets of independent operations, keeping data dependencies to a minimum and structuring code to keep the pipeline filled, you can get massive performance regardless of the machine’s occupancy. He shows the roofline model for a simple SAXPY code, and how he can influence the memory-bound part of the model by doing multiple independent SAXPY operations in a single thread (since all but one of the input values are the same in a SAXPY operation). He continues to show that he can get 87% of the peak bandwidth available to an SMP at only 8% occupancy (while cudaMemCpy achieves only 71% of peak). Lastly he makes the point that the banked nature of shared memory makes it impossible for shared memory codes to achieve full bandwidth utilization. This leads to the recommendation to use as few threads using as much registers as possible.

The attention to detail in this talk, as Vasily played with the limits of the GPU, allowed him to break down several of the “nice” abstractions CUDA provides.

Large-Scale Gas Turbine Simulations on GPU Clusters (2118, Wed 16:00)

Tobias Brandvik

Tobias and his group at Cambridge has addressed a very similar problem as we have with Liszt. They want to write a mesh-based simulation system that runs on today’s high performance machines, without having to rewrite the code for each architecture. Specifically, they are building a production-quality solver for use in the Aerospace industry. Their approach has many overlaps with Liszt, targeting GPU clusters while avoiding the need to rewrite all their code for every variation of today’s heterogeneous machines. In contrast to our approach, they work at a fairly low level, since they only attempt to rewrite the approximately 10% of their code base that is stencil-based calculations. A stencil is defined as mesh accesses and data read/writes a kernel will perform, which allows them to generate specialized CUDA source code for each stencil in their application. This 10% of the code is roughly responsible for 90% of the run time, and can be abstracted as math kernels running specific stencils across a mesh.  The cluster aspect of their code is still hand written MPI code, but rather than write GPU or SMP specific codes that runs on an MPI node, they use these stencils and math kernels.

In terms of domain-specific optimizations, he referred to the 2008 Supercomputing paper by Datta et al that showed a set of optimizations to run stencil codes at high performance on GPU devices. They attempted to implement these optimizations as part of their source-to-source compilation process for kernels.

Their approach requires the programmer to hand-write the stencil and the math kernel. This approach allowed them to embed this stencil language partially inside fortran. They then took their current simulation system (TBLOCK, approximately 40kloc in Fortran) and factored out the stencil-based calculations into separate stencil definitions and math kernels. This allowed them to keep most of their current simulation code while spending “a couple of months” (Tobias’ words) on rewriting calculations that fit this stencil scheme into their embedded language with accompanying stencil definition files. Their system, called TurboStream, has on the order of 15 different stencils in it, with 3000 different stencil definitions, and they run simulations on a 64-GPU cluster at the University of Cambridge.

Tobias made an interesting comment during the talk, saying that their biggest concern is not pure speed, since their solvers are relatively simple, but that they want to be able to handle much more data than they currently do – this was their biggest motivation for moving to GPU clusters. Per way of example, he showed the fine detail of turbofan fins spinning through slits cut in the housing of a jet engine, and the fine-grain simulation details around these slits – geometry that was previously ignored.

Overall Impressions

The biggest gain of the conference was the networking with several other researchers, and getting an overall view of the field as several groups attempt to solve the same problem – how do we write codes that can run on all of today’s hardware choices.

I find myself using OpenTerminal a lot – mostly to open a terminal in a directory, followed by “mate .” to open a textmate project in this directory. This quickly becomes annoying, so after looking into AppleScript, I took the plunge and wrote my first AppleScript. what a weird language. Anyway, you can dump this into AppleScript Editor, and when you run it, it opens a textmate project of the front-most finder window:

on run tell application "Finder" try activate set frontWin to folder of front window as string set frontWinPath to (get POSIX path of frontWin) tell application "TextMate" activate open frontWinPath end tell on error error_message beep display dialog error_message buttons {"OK"} default button 1 end try end tell end run

Save this as an Application (not a Script), and drag it onto your finder toolbar. Voila! TextMate at your fingertips.

Thanks you Mac OS X Hints, from where I got the pattern to do this.

Comparing floating point numbers for equality.

Everyone who’s taken a architecture course (or messed around with scientific computing) knows that floating point numbers are not associative. That means mathematically:

$a*(b+c) \neq a*b+a*c$

Or, in layman’s terms:

The order of operations influences the result of the calculation

This implies that floating point calculations that mathematically give the same answer, does not necessarily produce exactly the same floating point number. So, when comparing two floating point results, using $a == b$ will not give the correct result.

You can attempt to remedy this by using the mathematical approach of allowing an absolute error metric:

$(a*b)^2 < E$

which does not account for the fact that floating point numbers are unequally distributed over the real number line. We can attempt to use a relative error metric:

$\frac{|a-b|}{b} < E$

but this does not take into account the difference between very small positive and negative numbers (including positive and negative zero, since floats have both).

So, from the very enlightening “comparing floats” article by Bruce Dawson, we try something quite different.

Floats can be lexographically ordered if you consider their bitstream to be signed-magnitude integers. We can exploit this fact to calculate exactly how many representable floating point numbers there are between two floats. So, for example, we can find that there is only one floating point number between 9,999.99999 and 10,000.00001 and use an error metric that states “I will consider floats to be equal if they are within $E$ representable floats of each other.

The details of this routine is in the comparing floats article, but I will mirror the code here:

// Usable AlmostEqual function

bool AlmostEqual2sComplement(float A, float B, int maxUlps)
{
// Make sure maxUlps is non-negative and small enough that the
// default NAN won't compare as equal to anything.
assert(maxUlps > 0 && maxUlps < 4 * 1024 * 1024);
int aInt = *(int*)&A;
// Make aInt lexicographically ordered as a twos-complement int
if (aInt < 0)
aInt = 0x80000000 - aInt;
// Make bInt lexicographically ordered as a twos-complement int
int bInt = *(int*)&B;
if (bInt < 0)
bInt = 0x80000000 - bInt;
int intDiff = abs(aInt - bInt);
if (intDiff <= maxUlps)
return true;
return false;

}

This has saved me huge amounts of headaches comparing CUDA and CPU generated results for our CUDA programming class, CS193G.

Compiling the pbrt 1.04 raytracer on mac OS X 10.6

I’m taking Prof. Pat Hanrahan’s CS348B “Advanced Rendering” course this quarter, and we’re extending the pbrt renderer as part of the course assignments. It’s probably worth documenting how I compiled this on my Snow Leopard machine.

sudo port install OpenEXR

MacPorts installs libraries like this one in /opt/local/ to prevent conflicts with libraries from other sources (it has a handy pkgconfig directory for each library in /opt/local/var/macports/software/.../lib/ that is full of info). We need to update pbrt’s makefile to point here. We modify lines 13 and 14 in the Makefile to read:

EXRINCLUDE=-I/opt/local/include/OpenEXR EXRLIBDIR=-L/opt/local/lib

You should now be able to make the directory and produce pbrt. Remember, you need XCode installed!

Now you need to set the PBRT_SEARCHPATH environmental variable. I did this the easy way and cd‘d to the pbrt bin directory, and ran:

export PBRT_SEARCHPATH="pwd"

Installing Numpy and SciPy on Snow Leopard – the easy way

Just a quick note on the easiest way I’ve found to install NumPy and SciPy on Snow Leopard. It can be quite a pain, since you have to leave the Apple python install alone, even though it includes NumPy by default. It does not support SciPy, and anyway it’s not the latest.

So, don’t try to build from SVN and all that fancy stuff. Just do:

Install in that order and move on.

RexPC: Running EEEbuntu on eeePc 701 as a Carputer in my WRX

My latest big project is installing my eeePC 701 into my Subaru WRX. This is going to be a long project with several stages, and I’ll talk more later on the actual requirements. For now I first want to see if the eeePC 701 is actually powerful enough to make it worth installing into my car. Its low power requirements (22 watts) and tiny size makes it such a perfect fit that i want to make it work.

The eeePc I am planning to build into my wrx

The Xandros packaged with the 701 was more of a joke than anything else. Fantastic for the very inexperienced user who will only ever use skype, pidgin and firefox. I need something more powerful and less buggy, so I’m going with EEEbuntu 3.0 base, since this is the most stripped down “full-featured” ready-to-go OS for the 701 I could find.

After downloading, burning and installing (luckily I have an external CDROM) I went through the following steps:

1. Run Update Manager and install all the updates (it’s based on Ubuntu 9.04)
2. Install GPSD, gpsdrive, tangogps, python-gps though synaptic
3. Plug in GPS (gpsd launches automatically), and go for a test drive

I drove around the neighborhood with the GPS pushed into my sunroof (smart huh?) and the eeepc on the passanger seat. I was very impressed with tangogps and logged my trip around campus with it. At first glance everything worked seamlessly with no configuration needed!

A quick test trip logged using tangogps

I’ll be looking into touchscreens to install in place of the regular cd/radio/tape head that is currently in there, and keep writing about my progress.

RexPC: Planning my WRX’s built-in computer

What if... WRX + eeePC

I’m planning to build a computer into my 2004 Subaru Impreza WRX. And not just any computer, but hopefully my small eeePC 701 – the original netbook that started the revolution.

So, as with any big project, this one starts with a list of dreams I wish I had, and some research online into what other people are doing. Actually, any big project starts with a glass of Zinfandel (check) and brainstorming for a name. In this case, it was easy. My WRX is called Rex. And it’s getting a computer. So, RexPC. Onwards, then!

I should quickly address those of you thinking “Why is this guy not upgrading his exhaust to a full catback, or getting new rims and high performance tires, or upgrading his intercooler and air intake, or rechipping the engine ECU, or, or, or… why is he installing a *computer*??” (TJ I know what’s going through your mind). Simple, really, the computer is the cheapest mod I can do at this point, since the eeepc is just collecting dust on my shelf at the moment, and it’ll be awesome once it’s in the car.

My list of requirements. The design process (courtesy of CS147 with Scott Klemmer), once a certain type of user has been identified, kicks off with a need-finding phase to explore possible problems to address in your product. Since I’m the user, this should be easy. Here’s the list of things I want Rex to be able to do

RexPC User Requirements (aka Dreams)

Media

• Play my complete (160gb) music library. And stay sync’d with my desktop.
• Play other people’s plugged-in iPods
• Record video with backup and front camera (obviously, tagged with audio and location)
• Show backup camera when in reverse

Location

• Map my trips in detail
• Provide navigation when I get lost
• Provide weather and road condition information (incl. current temperature, etc)

Vehicle

• Provide Engine Diagnostics (OBDII readout)
• Provide extra gauges (for example, oil temperature and pressure)
• Provide chassis orientation information (angles, direction)
• Provide performance information (acceleration, cornering, wheel slip, lap timing, bodyroll, etc.)
• Control interior and exterior lighting, and windows.

Communication and Countersurveillance

• HAM Radio abilities including APRS
• Police Scanner
• Show traffic cameras
• Bluetooth phone integration

Infrastructure

• Work seamlessly with the car’s ignition system to provide startup and shutdown of electronic components with the rest of the car.

Now I have this initial list of things I want my car to do (some of them probably beyond the scope of this project since it will demand always-on interwebs, and I don’t know if I’m ready to shell out for a monthly 3g contract). So what do we need to do this?

Hardware Requirements

• eeepc running custom linux
• external usb-powered harddrive (250gb or more)
• microphone
• audio-out to current speaker system
• GPS
• accelerometer
• compass
• network connectivity (wifi definitely, possible 3G for always-on)
• OBD II interface to car diagnostics
• outside temperature sensor
• police scanner (hopefully part of ham radio)
• integration into current radar detector (I’m not building one of these things…)
• touchscreen built into car

Current Carputers out in the wild

I’m obviously not the first to want to attempt this project. Several awesome people who are inspiring me to do this is the work. Avatar-X built a dell laptop into his Subaru Legacy that does most of these things and more! His process is nice to read (although not very well documented in terms of replication) and inspired me to look into this. Redian has a much more detailed post on installing a real motherboard into his wrx wagon which is also very informative. mp3car.com has, in general, been a good source of inspiration, information and encouragement, so check them out.

Next up, testing the eeepc’s abilities to handle this kind of workload, and looking at bashing out the list of dreams into a more concrete set of features interacting with each other.

Windows to Mac Screen Sharing

My old black macbook has been collecting dust for no reason whatsoever, so I decided to use it as the dev machine for our HCI class (CS147) since none of my team members had their own mac machines. Surely setting up a windows-to-mac screen sharing session couldn’t be too hard!

Unfortunately Mac’s fantastic screen sharing implementation doesn’t play well with Windows. You can connect to it with a VNC client (I recommend TightVNC) but its incredibly slow.

An easy fix is to use the awesome Vine VNC Server for OS X on your mac, and connect to it with TightVNC.To get that  buttery smooth feel, use TightVNC’s CoRRE encoding as your compression medium. Boom! A usable remote desktop connection from a windows to a mac box.

After deleting a partition on my hackintosh, I’ve needed to resize my Mac partition to use the extra free space. Boy oh boy was I about to get acquainted with a monster. After lots of research online, I find that the only free way to increase the size of an existing HFS+ partition is to trick Bootcamp into creating a partition in whatever free space you have, then telling bootcamp to reclaim that partition into your mac partition. And for some reason my install didn’t have bootcamp on it, which you can’t download since bootcamp is not integrated into the OS.

So I came up with a smart alternative. And boy did that create probems… But I did get something running in the end!

Arduino 16 SD and SPI interfacing

Just a quick note – the latest Arduino software does something terribly wrong in its interfacing with SD cards through the SPI interface (dunno if this affects all SPI connections or not, maybe!). I’ve struggled with this for days on end until I downgraded to 0014 and everything started working just fine!

Students arent made the way they used to be

A recent talk at Berkeley about the Engineering mentality, Freedom and Patents blew me out of the water. Here it is, transcribed. The author wished to remain anonymous.

Intelligentsia, and what I hope to see in more cafe’s

The wonderful Gleb Denisov and Ashley Brown took me to the newly-opened Intellgentsia Coffee Bar in Venice Beach on my last visit to Los Angeles. As a fan of Blue Bottle and Ritual Coffee, both in San Francisco, my idea of a funky, fun, excellent coffee house involves highly-trained baristas serving me connoisseur-quality coffee-based drinks magically prepared along with a row of other drinks for those who are ordering with me. I will then take this little cup of heaven, find a place to sit or stand, and rave about the quality of the coffee and the hip-ness of the atmosphere. After finishing the delightful drink (possibly over some textbook or code), I leave with a happy caffeine high and none the wiser of where all this magic came from.

Intelligentsia does things a little different. As you walk into their glass enclosure of a shop, a barista offers to help you at their own espresso and coffee station, on one of the four corners of what looks like a big lunch cart that fills most of the store. As you follow him to his espresso machine, you pass by delicate pastries with shocking price tags (absolutely worth it, might I add) to the tune of, in my case, the great RJD2, one of my favorite electronic musicians. The decor and architecture makes you want to hang around this place, and the large assortment of coffee-related merchandise (from $1800 espresso machines to$10 milk frother jugs) pleases any coffee fanatic looking for that missing piece for their home setup.

But then comes the best part of Intelligentsia. Once you and the barista gets comfortable at his espresso machine, he makes you what you want, to order, right in front of you. You get to look at the whole process, and see how your cup is made. This is your espresso, made as you want it, while you’re chatting away with your obviously skilled and very down to earth barista. After sitting down with my espresso and finding an absurd amount of enjoyment from sipping the dark drink, I found myself more attached to this espresso than I usually am at coffee bars. In fact, it felt just a little bit like my own espresso’s I brew at home. Better in quality, yes, but also more personal. This is not your starbucks/peets/insert-other-coffeehouse experience of being handed a drink from behind some mysterious silver machine. No no, you were involved! By golly, you might not have turned any knobs or pushed any buttons, but you where there for every step of the way. And that makes a difference. Because if espresso is art, visiting Intelligentsia is like commissioning the artist. And I loved it.

Domain specific knowledge in Music. Mainstream hip-hop’s problem.

As a follow up to one of my previous opinions of the importance of domain specific knowledge to be productive, I happened across an interesting example worth sharing.

Domain specific knowledge not only helps with productivity, it also makes a big difference in accuracy. That’s where the example comes in.

I’m a big fan of Lupe Fiasco’s music, and his old mixtapes were some of the best works in hip hop since the early 90s. So I was listening to his “Happy Industries”, enjoying his brilliant lyrics and mash-up abilities. So I figured I should check out the full lyrics and post it on facebook. This is what I found on each of the top 5 google results for “Lupe Fiasco Happy Industries Lyrics”:

Once upon a time not long ago
An ID yeah that’s what I had
To take DNA
As a little pro two
With my MCing ways and make em mad
Just having fun not chasing cash
Apologise now for it make ya mad
Had to call g wall tell em warm up the mic
Put the pendant on the wall tell em make some maaagiicc
Shorty it’s nothing lavish
Matter of fact
It’s just an attic
Background noise from the family
Hearing the mic slaying in the outside traffic
Still turned out fantastic
Turn my vocals up just a tad bit
Fresh from the first and fifteen
Quarantine touching you super cool that asset

I’m sorry, but this is utter crap. Some fan with very little knowledge about the music industry must have transcribed this. It makes no sense whatsoever, and unfortunately the state of hip-hop is such that most people will accept that fact that it makes no sense. But Lupe tends to have great lyrics, so on listening to the song again, this is what he’s really saying:

Once upon a time not long ago
An idea yeah that’s what I had
To take demon days
And a little pro tools
With my MCing ways and make a mash.
Just having fun not chasing cash
Apologize now if I make ya mad
Had to call g wall tell em warm up the mic
Put the pendant on the wall tell em make some maaagiicc
Studio is nothing lavish
Matter of fact
It’s just an attic
Background noise from the fan
Hearing the mic slaying in the outside traffic
Still turned out fantastic
Turn my vocals up just a tad bit
Fresh from the first and fifteen
Quarantine touching you super cool thats just ah sick!

Notice what just happened. The lyrics went from some song filled with what we can only consider to be slang we don’t understand and randomly “slaying the mic” to a song about him making “sick” music using just his laptop and his little home studio in his attic, not here to make money but here for the magic. He talks about using Pro Tools, something that people with experience in the music industry knows about, and making mash-ups between tracks. Funny, because the song itself is exactly a mashup of Gorillas’ Demon Days album and his lyrics. It should be obvious that these are the correct lyrics.

If Hip-Hop is so infused with the ideas of making money that a song saying “its not about money” can so quickly become so convoluted… You be the judge.

BruteSoft comes out of stealth!

In between my studies and research (which is drawing to a close, by the way!) I’m also involved in BruteSoft, a startup pushing a dramatically different system for enterprise software distribution. We’ve been working hard at this over the last 6 months (although preliminary talks started almost 2 years ago), and we’re ready to come out and play!

To give you a snippet of the kind of things we do, I’ll pull out some highlights from the product page:

Today, BruteSoft provides enterprises with a radically new approach to managing their computers in an efficient and effective way, saving you money and reducing your carbon footprint.

BruteSoft innovates software solutions based on our patent-pending federated distribution technology (DBx). Our solutions are secure, exponentially scalable and self-healing, eliminating hardware layers and delivering unrivalled speeds in an energy efficient way. DBx decouples client demand from distribution servers, which enables software distribution to an unlimited number of clients without the need for additional infrastructure.

Our products have reached the pinnacle of software distribution efficiency. As a proof point, our products are capable of transferring the equivalent of a DVD of 5GB within 5 minutes to 10,000 desktops on a 1Gbit LAN/WAN.

Go check out the website, and if you’re managing a large amount of computers (or know people who do!) send them out way!

BruteSoft.com

Python is Wrong

I recently did about 3 days of solid hacking in Python, and discovered some limitations and some nice features of the language and its libraries in the process.
I can complain about how limited the lambda is compared to my experiences with Scheme, or how lacking its process management utilities are, but more importantly, there’s something fundamentally wrong with python.

You see, it has this neat easter egg. “import this” prints the following poem, see if you can state the gross error. To make it easier, I’m putting the gross error in BOLD.

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Special cases aren’t special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one– and preferably only one –obvious way to do it.
Although that way may not be obvious at first unless you’re Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it’s a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea — let’s do more of those!

Oh come ON! Anyone who’s ever done numerical simulation or any kind of computational physics knows that Implicit has the same error as Explicit but is unconditionally stable!

Give me implicit euler integration or give me death.

The 10x programmer’s secret – Domain Specific Knowledge

There’s an interesting discussion going on on Hacker News, about “coding fast” and the mythical “10x programmer”. I know at times I’ve been that 10x coder, and at other times I was the 0.1x guy confused in the back, so I was curious to see what others were thinking.

The discussion centers around learning languages and becoming comfortable with the features, the APIs and your tools, but some comments focused on another area of programmer productivity that can be called “Knowing what to write”. Domain specific knowledge allows you to have huge boosts in productivity since you only code what is really necessary, and you don’t waste time coding peripheral features or get mired down in struggling with where to start and how to move forward.

The discussion is here: http://news.ycombinator.com/item?id=590460

So I have one suggestion for both building domain specific knowledge and avoiding the slump of getting stuck, or writing unnecessary code: Prototype and Iterate! It’s already a fairly well established idea in design and programmer circles, but the advantages of prototyping becomes even more clear if you consider it in the light of learning a domain.

That 10x programming speedup you’re looking for probably lies in coding simple systems, and building on top of them, rather than spending hours writing code that “will come in handy later” or attemping to complete some set of the code before moving on to the next.

My friend Marcello mentioned the amount of projects he’s started – much more than he’s ever finished. I think that this points to going through the process of learning a domain by building prototypes, throwing them away, and letting your ideas organically grow as you build things.

So let’s go be productive!

Repair: Rewiring your Sennheiser HD 280 Pro

2 years ago I, with much excitement, ordered a pair of Sennheiser HD 280 Pro’s. Both Gleb and Matt has a pair, and after listening to theirs… Apple’s little iPod buds just didn’t cut it anymore. I loved my pair so much that their connector ended up being severely bent when i squeezed past a, um, slightly oversized person sitting next to me in economy class on a flight back to South Africa. Anyways, the deed was done and the headphones became pretty much unusable, since only the one channel was getting through the bent pin!

I finally got around to rewiring them, which was more tricky than i expected! So here’s a post for others trying to do the same thing.

On stripping the connector, you discover 4 wires, rather than the expected 3:

First thing’s first, TIN THESE WIRES WITH SOME SOLDER! I had no idea that the copper strands themselves were covered by a thin film of resin, which needs to be burned off with some solder. If you try to connect an alligator clip straight to the bared wire, you get no connection, causing much confusion.

The 4 wire mystery was solved when I peeked into the left earphone. The two drivers are separately wired all the way to the connector. Two wires per channel = 4 wires. The mapping I discovered is as follows:

White – LEFT, GROUND
Black – LEFT, SIGNAL
Blue – RIGHT, GROUND
Red – RIGHT, SIGNAL

As follows:

If you’re interested, inside the left headphone there’s a little splitter board:

I bought a nice connector from Radioshack and wired this up. Be really careful when soldering the wires to the connector and don’t use too much heat! The shielding melts quickly and you don’t want your cable all melted together inside. I connected the two grounds from the two drivers together, which worked just fine.

After doing this, I was rewarded with a fantastic set of cans working again!

Real Time Raytracing Success!

Oh man oh man oh man, two bottles of 5 hour energy and a delicous mug of Peet’s Major Dickasons freshly roasted coffee later and I’m doing real time raytracing!

Its nothing super fancy, but as part of the assignments I’ve been working out for the graphics class I’m TAing (CS184 at UC Berkeley) I’ve been putting together a framework for the students to explore raytracing in. And while we’re at it, why not try to make it run in realtime. Turns out that cutting out disk access and loading everything up into RAM, using OpenGL as a final pixel buffer to display images, gives you gobs of performance for free. Now who would have thought that?

So, I’ll clean this stuff up and post some demos. Phong shading has never looked so good as when you can swing the camera around objects!

Monitors monitors monitors! What’s with 16:9 and shiny plastic bevels?

After Microcenter rudely and unceremoniously canceled my in-store pickup order of the Samsung 2343BWX (errors in their inventory database…). Apparently this monitor is pretty hard to get – 23″ and 2048×1152 for $199 is such a sweet deal – that neither Fry’s, Central Computers or any of the online retailer could even get me this monitor (although there were some refurbished models around). At this point I’m very happy with my dual 20″ Samsungs, both at 1680×1050, but I had some reason to add another two screens to my setup. The 16:10 aspect ratio of that resolution is really great for coding with side-by-side editor windows. They’re just not big enough to prevent me from constantly resizing windows. And whenever I work at home on my dad’s 1920×1200 screen, I have significantly less of these issues. My preferred coding setup is two sets of 80-characters-wide text screens with a project explorer and outline view flanking them. This just fits well with 1920 by 1200 pixels. Which is why I’m out shopping for a good pair. Fry’s had one of the 2343BWX on the showroom floor so I had the opportunity to see it in bad light running at a shitty resolution. Hmm, the 16:9 did look a little less “coding-friendly” than what I currently had, and the incredibly shiny bevel looks plasticky next to the matte bevels found on monitors aimed at professionals. Since I didn’t want to order a monitor online and find that I didn’t prefer its features. So right now I’m looking at a comparable Samsung 2233 monitor ($199). Also shiny bevel, also 19:6 but a lower resolution of 1920×1080 (compared to 2048×1152) and I find that my worries were unfounded.

The shiny bevel, although nothing to be excited about, becomes unnoticeable against the very bright screens and impressive contrast of the latest Samsung releases. The 16:9 is great for movies, but since a 1080p monitor has less vertical than the 16:10 monitor of equivalent width. After messing with Eclipse on the 1920×1080 resolution, I came to the conclusion that upgrading would only be worth it if I gain a decent amount of pixels both vertically and horizontally. So I’m taking these back and waiting until the larger 23″ 2048×1152 monitors are back in stock.

Of course, the real answer comes in a much simpler package – 30″ of pure viewing bliss like the monitors in the Graphics lab!

Dawn on a Rainy Day – Hackathon 09

There’s something quite magical of watching dawn from your apartment, rain streaking the windows. And its an ideal time to reflect why you’re up at this hour, and what you’ve been doing over the last 48 hours.

In my case, I’ve been hacking at Prycr.com all through Friday and Saturday. The website is blank, since its not a web service (yet) and it was for Hackathon 09, so no time was wasted on nice frivolities like “websites” and “marketing”. All the focus was on our SMS application, that does price lookups for UPC codes texted to it.

The scenario is as follows: You’re standing in Fry’s, looking at some piece of tech gadgetry that you just have to have. But are you going to be angry that you bought it here if there’s a sweet deal online? Or even better, what if Best Buy across the street had it for 20% off and you didnt know? Send off a text message with the UPC product code to our service, and you’ll receive a reply looking something like this:

"WD 250gb My Passport Hard Drive. (4.5/5) $52 at CompUPlus.com, average price of$69. Locally at Best Buy for $75" I built this with an impromptu team of three other Berkeley students – Timothy Liu, Dounan Shi and Irving Lin – and decided to do this text message based service similar to DialPrice.com (which, BTW, is also a very cool service, but I find that whenever I use it I’m extremely frustrated that I have to make a call and stand there, waiting for the voice prompts to read me info on my product.) It was a really fun experience, and although we didn’t win anything we’re planning to build this out into a serious web service that people can use. For future hackers, if you’re doing a mobile app, have it ready to demo on the judges’ phones. Let them whip out their cell and use it. We didn’t do this and we realized after showing it to people later that day that the coolness factor is just about zero until someone can do it themselves. And good luck! Another cool thing I saw at Hackathon was Mugasha.com – online electronic music sets from premier DJs. I’m jamming out to it right now! They release DJ sets (those hour-long musical journeys that DJs create by mixing many different tracks) in a track-by-track form in their music player. Finally, you can get both the awesomeness of these DJ sets and the convenience of knowing which song it being played, and jump to the songs you particularly like. Finally, it was interesting to see a different interpretation of the “Hack day” concept from the Yahoo hack days I’ve been involved in. Yahoo hack days are known for their 90 seconds presentation, and I wish they did that as well. We had 4 minute presentations, and it was a lot harder to follow the main points of people! 90 seconds is an excellent time limit to explain hacks done in 24 hours. Schwag, Pizza and Beer was also notoriously missing… is the recession taking its toll? Hmm, no, because they had Sushi (which disappeared in a matter of minutes) and burritos in the afternoon. Possibly the lack of alcohol explains the productivity! The turnout was amazing – 25 teams in total! – which they managed to do by offering$200 to the student group that turns out the most entries. So they had CSUA, IEEE and UPE all working for them, which was utter genius.

Anyway, the sun is rising and I’m off to go pick up my new Samsung 23″ HD monitor. My aim for desk domination through sheer pixelcount is nearing completion since I’m about to put down the third monitor on my desk. Once I upgrade to 4 by duplicating the current purchase (yea, im waiting for the end-of-the-month paycheck) I’ll finally have that 3840 by 2300 pixels of screen space on my desktop. 30″ monitors be damned!