Sunday, November 18, 2012

Setting up an RDF server in VirtualBox, part 1

VirtualBox is an open-source virtual machine that you can use on Windows, Mac OSX, or Linux to run one of the other operating systems. Here I'll be using VBox on an Ubuntu Linux desktop machine to set up an Ubuntu server machine. The point in doing that when the two operating systems are so similar is to keep the two environments separate, and to discover what I'll need when I move the server to a VPS.

I'm doing this on a laptop with a 160 gig partition with Ubuntu 10.04 desktop. I can comfortably dedicate 20 gig to the virtual hard disk image. The VM will run Ubuntu 12.04 server. No shared folders because they won't be available on the VPS.

Make sure you have a good fast Internet connection for the Ubuntu desktop machine, and download the Ubuntu server ISO. It's available as 32-bit or 64-bit. If you're not sure about your CPU, you're probably better off with 32-bit. Set up VirtualBox:
$ sudo apt-get install virtualbox-ose

Now you'll find "VirtualBox OSE" in the Applications->Accessories menu in the upper left of the screen. Click on that, and when the window comes up, click on the light blue "New" icon. Pick a machine name, and Linux/Ubuntu as the virtual machine type, and give yourself a decent amount of RAM and hard disk space. It's nice to start the hard drive with 20 or 30 gig if you can spare it. Once the VM is created, go into settings for it, click "Storage" and click the third line with the CDROM icon. To the right of "CD/DVD device" click the yellow folder icon and navigate to the Ubuntu server ISO that you downloaded earlier. Set "Network" to the "Bridged adapter" option. Under shared folders, add your home directory (read only) and give it a name you'll remember. Now it's time to start the VM and install Ubuntu server on the virtual hard disk you've created. Before I could do this on my laptop, I found I needed to go into Settings->System->CPU and enable PAE.

During the installation process, you'll be asked what kinds of servers you want to run. Select "OpenSSH server" (so you can ssh/scp into the VM) and "LAMP server" (to get Apache and MySQL) and "Tomcat Java server" (to pick up a bunch of Java stuff you'll want for Jena). If you select an empty password for the root user on MySQL, you'll need to enter it multiple times, so you may want to select something almost as trivial like "root". You don't need to worry too much about security with a VM that will only be reachable on the local subnet.

When the installation is done, the machine will reboot, and the VM window will close and re-open. Go to the "Devices" menu at the top and under "CD/DVD devices", unclick the Ubuntu server ISO.

On to Jena. Looks like there is some good advice here, but some of it is dated so I'm tweaking it a bit: http://ricroberts.com/articles/installing-jena-and-joseki-on-os-x-or-linux

Log into the virtual machine and type:
$ sudo su -
# chmod 777 /opt
# exit
$ cd /opt
$ wget http://www.apache.org/dist/jena/binaries/apache-jena-2.7.4.tar.gz
$ wget http://www.apache.org/dist/jena/binaries/jena-fuseki-0.2.5-distribution.tar.gz
$ for x in *.gz; do tar xfz $x; done
$ rm *gz               
$ mv apache-jena-2.7.4 apache-jena
$ mv jena-fuseki-0.2.5 jena-fuseki
$ chmod u+x jena-fuseki/s-*

Add these lines to your .bashrc:
export FUSEKIROOT="/opt/jena-fuseki"
export JENAROOT="/opt/apache-jena"
export PATH="$FUSEKIROOT:$PATH"
export CLASSPATH=".:$JENAROOT/lib/*.jar:$FUSEKIROOT/*.jar"

You'll want to copy some client-side Ruby scripts from the server's Fuseki directory to your host machine. My VM is at 192.168.2.7, so on the host machine I typed:
$ scp 192.168.2.7:/opt/jena-fuseki/s-* .
I also needed to install Ruby on the host machine.

Now you can start up the Fuseki server and load it with some data. The docs for Fuseki are here. On the server:
$ cd /opt/jena-fuseki
$ ./fuseki-server --update --mem /dataset

This starts an empty database of RDF triples. This database is in-memory and non-persistent, and will vanish when you control-C. Back on the host machine, you can enter some data into the database:
$ ./s-put http://192.168.2.7:3030/dataset/data default family.rdf
$ ./s-put http://192.168.2.7:3030/dataset/data default wares.rdf

This is a small semantic graph talking about who in my family is married and whose kids are whose. To  make sure the data was actually stored, we can query it.
$ ./s-get http://192.168.2.7:3030/dataset/data default

This prints out the entire database in Turtle, an update of N3. Or we can get the same thing in JSON:
./s-query --service http://192.168.2.7:3030/dataset/query 'SELECT * {?s ?p ?o}'

I can see that I don't have the time and energy to get everything done in one sitting that I hoped I would accomplish. So this is part 1, and a part 2 will follow later, and I'll make sure they include links to each other.

No comments: