Sunday, November 18, 2012

Setting up an RDF server in VirtualBox, part 1

VirtualBox is an open-source virtual machine that you can use on Windows, Mac OSX, or Linux to run one of the other operating systems. Here I'll be using VBox on an Ubuntu Linux desktop machine to set up an Ubuntu server machine. The point in doing that when the two operating systems are so similar is to keep the two environments separate, and to discover what I'll need when I move the server to a VPS.

I'm doing this on a laptop with a 160 gig partition with Ubuntu 10.04 desktop. I can comfortably dedicate 20 gig to the virtual hard disk image. The VM will run Ubuntu 12.04 server. No shared folders because they won't be available on the VPS.

Make sure you have a good fast Internet connection for the Ubuntu desktop machine, and download the Ubuntu server ISO. It's available as 32-bit or 64-bit. If you're not sure about your CPU, you're probably better off with 32-bit. Set up VirtualBox:
$ sudo apt-get install virtualbox-ose

Now you'll find "VirtualBox OSE" in the Applications->Accessories menu in the upper left of the screen. Click on that, and when the window comes up, click on the light blue "New" icon. Pick a machine name, and Linux/Ubuntu as the virtual machine type, and give yourself a decent amount of RAM and hard disk space. It's nice to start the hard drive with 20 or 30 gig if you can spare it. Once the VM is created, go into settings for it, click "Storage" and click the third line with the CDROM icon. To the right of "CD/DVD device" click the yellow folder icon and navigate to the Ubuntu server ISO that you downloaded earlier. Set "Network" to the "Bridged adapter" option. Under shared folders, add your home directory (read only) and give it a name you'll remember. Now it's time to start the VM and install Ubuntu server on the virtual hard disk you've created. Before I could do this on my laptop, I found I needed to go into Settings->System->CPU and enable PAE.

During the installation process, you'll be asked what kinds of servers you want to run. Select "OpenSSH server" (so you can ssh/scp into the VM) and "LAMP server" (to get Apache and MySQL) and "Tomcat Java server" (to pick up a bunch of Java stuff you'll want for Jena). If you select an empty password for the root user on MySQL, you'll need to enter it multiple times, so you may want to select something almost as trivial like "root". You don't need to worry too much about security with a VM that will only be reachable on the local subnet.

When the installation is done, the machine will reboot, and the VM window will close and re-open. Go to the "Devices" menu at the top and under "CD/DVD devices", unclick the Ubuntu server ISO.

On to Jena. Looks like there is some good advice here, but some of it is dated so I'm tweaking it a bit: http://ricroberts.com/articles/installing-jena-and-joseki-on-os-x-or-linux

Log into the virtual machine and type:
$ sudo su -
# chmod 777 /opt
# exit
$ cd /opt
$ wget http://www.apache.org/dist/jena/binaries/apache-jena-2.7.4.tar.gz
$ wget http://www.apache.org/dist/jena/binaries/jena-fuseki-0.2.5-distribution.tar.gz
$ for x in *.gz; do tar xfz $x; done
$ rm *gz               
$ mv apache-jena-2.7.4 apache-jena
$ mv jena-fuseki-0.2.5 jena-fuseki
$ chmod u+x jena-fuseki/s-*

Add these lines to your .bashrc:
export FUSEKIROOT="/opt/jena-fuseki"
export JENAROOT="/opt/apache-jena"
export PATH="$FUSEKIROOT:$PATH"
export CLASSPATH=".:$JENAROOT/lib/*.jar:$FUSEKIROOT/*.jar"

You'll want to copy some client-side Ruby scripts from the server's Fuseki directory to your host machine. My VM is at 192.168.2.7, so on the host machine I typed:
$ scp 192.168.2.7:/opt/jena-fuseki/s-* .
I also needed to install Ruby on the host machine.

Now you can start up the Fuseki server and load it with some data. The docs for Fuseki are here. On the server:
$ cd /opt/jena-fuseki
$ ./fuseki-server --update --mem /dataset

This starts an empty database of RDF triples. This database is in-memory and non-persistent, and will vanish when you control-C. Back on the host machine, you can enter some data into the database:
$ ./s-put http://192.168.2.7:3030/dataset/data default family.rdf
$ ./s-put http://192.168.2.7:3030/dataset/data default wares.rdf

This is a small semantic graph talking about who in my family is married and whose kids are whose. To  make sure the data was actually stored, we can query it.
$ ./s-get http://192.168.2.7:3030/dataset/data default

This prints out the entire database in Turtle, an update of N3. Or we can get the same thing in JSON:
./s-query --service http://192.168.2.7:3030/dataset/query 'SELECT * {?s ?p ?o}'

I can see that I don't have the time and energy to get everything done in one sitting that I hoped I would accomplish. So this is part 1, and a part 2 will follow later, and I'll make sure they include links to each other.

Tuesday, November 13, 2012

A Semantic Network of Patient Data

This idea has two inspirations. One is this TED talk by Dave deBronkart or "e-Patient Dave". The other is the work that has been done on the semantic web and linked data.

Dave's talk is about patients taking control of their medical records and sharing them with other like-minded patients, so that they can learn from one another's histories and experiences. Some of these patients, including Dave, had terminal diagnoses and were able to improve or resolve those conditions because of having shared data with others.


The semantic web is the idea of formatting information so that computers can do more with it than simply store it or transmit it or display it on a screen. Computers can understand the meaning of the information much as a human would, so they can reason about it and draw new conclusions that aren't already spelled out. I first learned about it in a 2001 article in Scientific American. There are some more details here. I've blogged in the past about some of the basic ideas.

In the semantic web, all "things" (nouns, basically) are assigned URIs (web addresses). Relationships between things (and relationships are also things) are represented as RDF, where every statement is a triple of URIs, being a subject, predicate, and object. These statements are often printed or transmitted in XML, but the N3 language is more readable for people. Typical relationships look something like this.
  Will, town, "Framingham MA".
  Will, name, "William Ware".
  Will, pet, cat#12345.
  cat#12345, name, "Kokopelli".
  cat#12345, birthyear, 2003.
Strings ("William Ware", "Kokopelli") and numbers (2003) can be raw data, everything else is a URI. The idea is that a URI connects you to the rest of the semantic web of meaning, so if you don't know what a "pet" is, you can follow that URI, or query other triples with "pet" in them, to find out more.

You might wonder if it's silly to have such a primitive representation for knowledge. It allows the same kinds of economies of scale that we get by representing information in a computer with ones and zeroes. Because the format is so simple and uniform, we can build processing architectures that can be very efficient, and people have been doing that for over ten years. We have scalable databases for RDF, and when we set up rules that mimic set theory, we can build reasoning engines that extract new conclusions from the data.

When data is formatted with an appropriate ontology, it can be searched in rich complex ways, and computers can look for patterns and correlations that a human might not notice. When applied to patients' medical data, the results might be new medical knowledge or new treatment options.

There are other ways to find new information hidden in patient data. Semantic web technology is great for pure logic, but for quantitative measures (a dosage increase in this medication seems to cause a decreased amount of that neurotransmitter) we can turn to machine learning, where progress in the last decade or two has been explosive, given the data available on the web and the economic rewards for finding patterns in it.

An idea I've blogged about in the past (and spoken about at a couple of very small conferences) is applying this to general scientific literature, with the goal of hastening scientific progress and in particular medical progress (since I'm an old fart now and interested in that sort of thing).

If this topic interests you and you wish to discuss it, I'm starting a Google Groups forum for that purpose.

UPDATE: I've discovered that there is a company in Cambridge, MA called PatientsLikeMe which already pools patient data into a database, and sells subscriptions to that database. I don't know if they place the same emphasis on machine-tractable formats that I've done above. But knowing that somebody is doing it on a commercial basis, I don't see much point in trying to replicate that effort in my evenings and weekends.

Thursday, November 08, 2012

Node.JS on the Raspberry Pi

Most of this procedure is taken from a posting on Github by Sander Tolsma. His post is a little bit old and some of the steps he included can be skipped because the versions of things have become better synchronized. So very briefly, here is what to do, assuming you've successfully booted into Raspbian.

$ sudo apt-get install git-core build-essential
$ # IIRC, build-essential is already present on Raspbian
$ git clone https://github.com/joyent/node.git
$ cd node
$ git checkout v0.8.14-release
$ #             ^^^^^^ update to most recent stable version
$ ./configure
$ make        # this takes a while
$ sudo make install

Voila, you're done. Type "node" at the Linux prompt and you'll get Node's ">" prompt. Then you can type in JavaScript and watch it run interactively, or you can create a file of JavaScript and run it.

pi@raspberrypi ~ $ cat > foo.js
for (var i = 0; i < 3; i++)
  console.log(i);
^D
pi@raspberrypi ~ $ node foo.js
0
1
2

I'd like to import events from hardware so that they can take handlers, just like DOM event handlers running on a browser. One approach would be to run an HTTP server in Node and set up endpoints for the events I want to handle. That sounds like quite a bit of overhead for hardware events.

Alternatively, I could do what looks like the right thing, involving eventfd and its cohorts. I need to dig into the Node source code to see how to do that, and do more research in general.

Friday, October 26, 2012

NodeJS and JSHint on Fedora

Yesterday I blogged that it's a hassle to install these on Fedora. Apparently I was suffering from brain fog. It's not so bad once you do enough research and stumble across the right advice online.

First, if you're a bonehead and you've made a mess trying unsuccessfully to install v8/nodejs/npm/jshint eight or ten times already, clean things up:

sudo yum -y remove v8

The repository for picking up nodejs, npm, and v8 is http://nodejs.tchol.org/ which you can enable on your system as follows:

sudo yum localinstall --nogpgcheck \
  http://nodejs.tchol.org/repocfg/fedora/nodejs-stable-release.noarch.rpm

You want to avoid installing the wrong version of the V8 Javascript engine, so edit /etc/yum.repos.d/fedora-updates.repo and add the line

exclude=v8*

to the "[updates]" section.

Now you're ready to install everything:

sudo yum install npm
sudo npm install jshint -g

Thursday, October 25, 2012

Setting up Ubuntu 12.04

My Linux distribution of preference is Ubuntu. Debian did a nice job on the package manger and Canonical did a nice job making it user-friendly. But some things in 12.04 desktop version, I don't need, like the Unity interface. So here's what I like to do to make 12.04 a little friendlier. This is particularly worthwhile when running in VirtualBox, a necessity because my employer is a Fedora shop, and it's a big hassle to run Node.js on Fedora, which I need to run JSHint to sanity-check our Javascript code.

I'm starting with a 40 gig disk image and for networking, a bridged adapter, so that when I run the server on the Ubuntu instance it will be accessible on the rest of the subnet. So first let's get rid of that silly Unity interface. Open a terminal window and run:

$ sudo apt-get install gnome-panel

Close the terminal window and log out. On the login screen, to the right of your name, you'll see a circular logo. Click on that to get a menu and select "GNOME Classic". Type in your password, and notice your blood pressure gently lowering as the familiar old Ubuntu desktop appears before you.

Assuming you're on a virtual machine, you'll really want to stop the screen locking up and requiring your password. So click on the gear in the upper right and select "System Settings", then select "Brightness and Lock". Toggle the "Lock" switch to "OFF" and uncheck the box "Require my password when waking from suspend". Dismiss that window.

Now install Node.js, JSHint, and a few other conveniences.

$ sudo su -
# apt-get install python-software-properties
# add-apt-repository ppa:chris-lea/node.js
# # You'll need to hit the Enter key to continue...
# apt-get update
# apt-get install nodejs npm
# npm install jshint -g
# apt-get install vim git gitk emacs subversion meld
# apt-get install apache2 mysql-server mysql-client pphp5 libapache2-mod-php5
# ^D
$

I like to package up my .ssh directory into a tarball and bring it to new machines when I set them up. Having stored my id_dsa.pub key in the .ssh/authorized_keys file on the Subversion server, I don't get constantly bothered to supply a password for every Subversion operation.

By all means set up a shared folder with your host machine. I map the host machine's /home/wware directory to a directory in /media. I need to be root to access it but it's still the easiest way to move things back and forth.

Friday, October 05, 2012

Blogging from a Raspberry Pi board

I've plugged an Ethernet cable into the Raspberry Pi and brought up the Midori web browser (which I had previously never heard of). As a web-browsing experience, the RPi is extremely slow, but it works. Google's new authentication scheme is a little dubious about Midori, not too surprising. But it rendered Facebook readably and I was able to look at some of my pictures on Flickr.

I intend to find some interesting hardware hack for the board, probably a musical instrument of some sort. Of course it won't be running X Windows at that point. If I get ambitious I might try to figure out how to create a stripped-down X-less Raspian distribution. I've done a little bit of distro hacking on Ubuntu in the past.

Even without X, I'm concerned about performance issues. Linux is not normally used for real-time use, but apparently I'm not the first person to wonder if that's feasible. Well I think I've exhausted my patience here.

Back on a normal computer. That was interesting, but enough is enough. Any RPi hacking that reaches a state of readiness for public consumption will be posted on Github, and notifications will appear here.

Before I forget, one handy note to other Yanks thrown by Raspian's curious keyboard mapping. You might have skipped over most of the options in raspi-config as I did (the first thing that appears on your screen after all those boot messages finish). Look for "keyboard configuration" and look for the canonical U.S. keyboard choices.
Here's a custom search engine for Raspberry Pi stuff, courtesy of Google.

Friday, September 28, 2012

Fun with the Raspberry Pi

If you've heard about the Raspberry Pi, (wikipedia, elinux.org) a $35 single-board Linux computer, you probably won't learn much new here. The main point of this post is that I got mine to boot, so I do have one or two small bits of advice to pass on to those working toward that goal. First, photographic evidence (veracity: if I were faking it I wouldn't put a big reflected flash in the middle of the screen).

Here's the board booting. Booting (and everything else) is a little slower on the Raspberry Pi than you're probably accustomed to. But then, hey, it's $35, and you can stuff it into whatever piece of hardware you're building and have full-blown Linux with X Windows. So live with it.

My experience has been that at least in the near future, it's not really $35. Here's what happens: the Raspberry Pi folks build a bunch of boards and sell them for $35, but they get picked up by people who want to resell them, so you end up getting yours on eBay for somewhere in the $55 to $70 range. Eventually this mischief will end and the price will stabilize.

 Photographic evidence number two: the board has booted into X Windows. This is using the recommended-for-beginners Debian-based distribution, 2012-09-18-wheezy-raspbian.zip, which unpacks into 2012-09-18-wheezy-raspbian.img, and obviously the date in that name will be updated periodically.

So here are the tips for fellow beginners:
  • First, remember this board is designed for complete novices, and even, heaven help us, artists. Do not despair, you ARE smart enough to get it working.
  • The Raspberry Pi folks warn you away from micro-SD cards, but the one I'm using works fine.
  • You may find that your keyboard is mapped in a funny way. My (@) key and (") key were swapped, and the (#) key was mapped to a British pound sign (£). One solution to this appears here. Mine was to create the file /home/pi/.xsessionrc which contained this line:
setkbmap us -option grp:ctrl_shift_toggle

A better approach to the keyboard issue is one of the options in raspi-config (the first thing that appears on your screen after all those boot messages finish). Look for "keyboard configuration" and select the canonical U.S. keyboard choices.

Everything else seems to be working, but I haven't tried to do much yet. I have no idea how to talk to GPIOs or other peripherals yet, but I've done that with other Linux boards and expect that my past experience will get me there pretty painlessly. That, and there is a HUGE community for this thing.

Looking forward to checking out Adafruit's WebIDE when time permits. The development system runs as a web server on the board, and you develop in the browser on your laptop over a network connection.


And now for your viewing pleasure, assorted Raspberry Pi pr0n:
 

Thursday, September 27, 2012

Cool new 3D printers

I don't want to fall into the habit of only blogging once per year about MakerFaire. So this post is actually about a crop of cool new 3D printers, and I'll probably see a few of them there, but it's not about MakerFaire proper. These all fall in the $1500 to $2500 price range.


First up is Makerbot's Replicator 2. There is some controversy around this one, because it's a mix of open source technology under the GPL, and some new technology that's very likely not open source, which allows for a much higher print quality. The open source 3D printer advocates are concerned that it violates the GNU General Public License. The open source technology is primarily the work of Adrian Bowyer who started the RepRap project, and he's given (unenthusiastic) permission to Makerbot to use it.

One of the RepRap enthusiasts is my friend Jeff, who will have a table at MakerFaire this year to show off the printer that has occupied two or three years of his nights and weekends. I like Jeff and I think he'll probably not be too happy with Makerbot's decision to include closed-source technology. But the step up in quality for the price is pretty appealing for a non-GPL-purist like myself. I don't worry about running GPL software on closed-source laptops, after all.

Second is the FORM1 from some Media Lab folks. I don't know much about these folks or their history, but the Media Lab has been at the cutting edge of high-end 3D printing for a couple of decades now, so they've probably got something pretty interesting. I think their raw material is a liquid rather than the long plastic spaghetti sticks used by most other affordable machines (based on one photo on their Kickstarter page). This is the most expensive of the lot, price currently listing as $2500.

Third is the UP!Plus from 3D Printing Systems. Their output doesn't look as nice as the Replicator 2 or the FORM1, but they are at the more affordable end.


What's cool about all these printers and some other new ones is that the user friendliness and quality of output are improving rapidly in recent years. Before long, these things will be popping up in homes, dorm rooms, high schools, and the local mall.

Makerfaire NYC 2012 is this weekend, and I'll be there to checkout 3D printers, microcontroller boards, art installations, and whatever else is around, and I'll blog about what I see.