Will Ware's blog: GPUs

Showing posts with label GPUs. Show all posts

Wednesday, February 24, 2010

Fixing a versioning problem with CUDA 2.3

In an earlier posting, I observed that CUDA 2.3 wants to use GCC 4.3, which is a problem for Fedora 11 and Ubuntu 9.10. I've been itching to upgrade my distribution on my NVIDIA Linux box, and particularly itching to move to Ubuntu. I found some help on Thomas Moelhave's blog. Thanks, Thomas!

In addition to his instructions, I needed to install some stuff.

sudo aptitude install freeglut3 \
   freeglut3-dev libXmu-dev libXi-dev

Once I did that and completed his instructions, everything worked great. The rest of my Ubuntu 9.10 installation is completely intact and happy.

Monday, October 12, 2009

Hacking CUDA and OpenCL on Fedora 10

I discovered Fedora 11 is not compatible with NVIDIA's CUDA toolkit (now on version 2.3; see note about driver version below) because the latter requires GCC 4.3 where Fedora 11 provides GCC 4.4. So I'll have to back down to Fedora 10. Here are some handy notes for setting up Fedora 10. I installed a number of RPMs to get CUDA to build.

sudo yum install eclipse-jdt eclipse-cdt \
freeglut freeglut-devel kernel-devel \
mesa-libGLU-devel libXmu-devel libXi-devel

The Eclipse stuff wasn't all necessary for CUDA but I wanted it.

In a comment to an earlier posting, Jesper told me about OpenCL, a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. NVIDIA supports this and has an OpenCL implementation which required updating my NVIDIA drivers to version 190.29, more recent than the version 190.18 drivers on NVIDIA's CUDA 2.3 page. When I installed 190.29, it warned me that it was uninstalling the 190.18 drivers.

Python enthusiasts will be interested in PyOpenCL.

NVIDIA provides a lot of resources and literature for getting started with OpenCL.

Tuesday, July 21, 2009

Building a GPU machine

I've been reading lately about what NVIDIA has been doing with CUDA and it's quite impressive. CUDA is a programming environment for their GPU boards, available for Windows, Linux, and Mac. I am putting together a Linux box with an NVIDIA 9600GT board to play with this stuff. The NVIDIA board cost me $150 at Staples. Eventually I intend to replace it with a GTX280 or GTX285, which both have 240 processor cores to the 9600GT's 64. I purchased the following from Magic Micro, which was about $300 including shipping:

Intel Barebones #2
* Intel Pentium Dual Core E2220 2.4 GHz, 800FSB (Dual Core) 1024K
* Spire Socket 775 Intel fan
* ASRock 4Core1600, G31, 1600FSB, Onboard Video, PCI Express, Sound, LAN
* 4GB (2x2GB) PC6400 DDR2 800 Dual Channel
* AC 97 3D Full Duplex sound card (onboard)
* Ethernet network adapter (onboard)
* Nikao Black Neon ATX Case w/side window & front USB
* Okia 550W ATX Power Supply w/ 6pin PCI-E

I scavenged an old DVD-ROM drive and a 120-gig HD from an old machine, plus a keyboard, mouse, and 1024x768 LCD monitor. I installed Slackware Linux. I went to the CUDA download website and picked up the driver, the toolkit, the SDK, and the debugger.

This is the most powerful PC I've ever put together, and it was a total investment of just a few hundred dollars. For many years I've drooled at the prospect of networking a number of Linux boxes and using them for scientific computation, but now I can do it all in one box. It's a real live supercomputer sitting on my table, and it's affordable.

I am really starting to like NVIDIA. They provide a lot of support for scientific computation. They are very good about sharing their knowledge. They post lots of videos of scientific uses for their hardware.

NVIDIA's SDK includes several demos, some of them visually attractive: n-body, smoke particles, a Julia set, and a fluid dynamics demo. When running the n-body demo, the 9600GT claims to be going at 125 gigaflops.

A few more resources...

Friday, July 10, 2009

Moore's Law and GPUs

Way back when, Gordon Moore of Intel came up with his "law" that the number of transistors on a given area of silicon would double every 18 months. Currently chip manufacturers use a 45 nm process, and are preparing to move to a 32 nm process. There is an International Technology Roadmap for Semiconductors that lays all this out. As feature sizes shrink, we need progressively more exotic technology to fabricate chips. The ITRS timeframe for a 16 nm process is 2018, well beyond the expectation set by Moore's Law. There is a lot of punditry around these days about how Moore's Law is slowing down.

That's process technology. The other way to improve computer performance is processor architecture. As advances in process technology become more expensive and less frequent, architecture plays an increasingly important role. It's always been important, and in the last 20 years, microprocessors have taken on innovations that had previously appeared only in big iron, things like microcode, RISC, pipelining, cacheing of instructions and data, and branch prediction.

Every time process technology hits a bump in the road, it's a boost for parallelism. In the 1980s, a lot of start-ups tried to build massively parallel computers. I was a fan of Thinking Machines in Cambridge, having read Danny Hillis's PhD thesis. The premise of these machines was to make thousands of processors, individually fairly feeble, arranged in a broadcast architecture. The Transputer chip was another effort in a similar direction. One issue then was that people wanted compilers that would automatically parallelize code written for serial processors, but that turned out to be an intractable problem.

Given the slowing of Moore's Law these days, it's good to be a GPU manufacturer. The GPU guys never claim to offer a parallelizing compiler -- one that can be applied to existing code written for a serial computer -- instead they just make it very easy to write new parallel code. Take a look at nVIDIA's GPU Gems, and notice there's a lot of math and very little code. Because you write GPU code in plain old C, they don't need to spend a lot of ink explaining a lot of wierd syntax.

Meanwhile the scientific community has realized over the last five years that despite the unsavory association with video games, GPUs are nowadays the most bang for your buck available in commodity computing hardware. Reading about nVIDIA's CUDA technology just makes me drool. The claims are that for scientific computation, an inexpensive GPU represents a speed-up of 20x to 100x over a typical CPU.

When I set out to write this, GPUs seemed to me like the historically inevitable next step. Having now recalled some of the earlier pendulum swings between process technology and processor architecture, I see that would be an overstatement of the case. But certainly GPU architecture and development will be important for those of us whose retirements are yet a few years off.

Friday, May 29, 2009

Molecular modeling with Hadoop?

Hadoop is Apache's implementation of the MapReduce distributed computing scheme innovated by Google. Amazon rents out Hadoop services on their cluster. It's fairly straightforward to set up Hadoop on a cluster of Linux boxes. Having myself a long-standing interest in distributed computing approaches to molecular modeling, I have been trying to figure out how Hadoop could be applied to do very large-scale molecular simulations.

MapReduce is great for problems where large chunks of computation can be done in isolation. The difficulty with molecular modeling is that every atom is pushing or pulling on every other atom on every single time step. The problem doesn't nicely partition into large isolated chunks. One could run a MapReduce cycle on each time step, but that would be horribly inefficient - on each time step, every map job needs as input the position and velocity of every atom in the entire simulation.

There are existing solutions like NAMD, which uses DPMTA for the long-range forces between atoms. For a cluster of limited size these are the appropriate tools. For large clusters with hundreds or thousands of machines, the rate of hardware failures becomes a consideration that can't be ignored.

MapReduce provides a few principles for working in the very-large-cluster domain:

Let your infrastructure handle hardware failures, just like the way the Internet invisibly routes around dead servers.
Individual machines are anonymous. You never write application code that directly addressses an individual machine.
Don't waste too much money trying to make the hardware more reliable. It won't pay off in the end.
Use a distributed file system that reliably retains the inputs to a task until that task has been successfully completed.

Could the tasks that NAMD assigns to each machine be anonymized with respect to which machine they run on, and the communications routed through a distributed filesystem like Hadoop's HDFS? Certainly it's possible in principle. Whether I'll be able to make any reasonable progress on it in my abundant spare time is another matter.