Monday, March 21, 2011

March 2011 trip to Shanghai, Suzhou

Last week I was in Suzhou, a city a little west of Shanghai, and took some photos. Very interesting place with a lot of rather ancient history. I liked very much the Humble Administrator's Garden. It's quite large, with several small buildings and waterways and paths, and very pretty as you can see here.

Suzhou is a very pleasant place. I felt quite safe walking around after dark. There are lots of little outdoor markets. I stopped at one to get some squid on a stick, which was spicy and tasty. My photo of the squid-on-a-stick guy is unfortunately a little blurry.
I'm still kind of tired with jet lag. When my energy level is a little higher I will add more stuff to this. Generally it left me with a very positive impression of mainland China, which was a surprise as I'd been told to expect it to be a bit backward culturally. We were there for electronics manufacturing and there was certainly plenty of that, and plenty of heavy industry in the Shanghai area. Lots of construction, lots of big cranes all over the place.

Wednesday, March 09, 2011

AT91SAM7S and Android help you bang bits

There are plenty of test instruments (oscilloscopes, logic analyzers, spectrum analyzers, etc) where you plug some hardware into your laptop's USB port, and the laptop screen shows a display that would have appeared on a cathode-ray tube in decades past. It's very cool that we can do this, and these USB instruments are much more affordable (and much much easier to carry) than the old-school stuff that I grew up with.

The BluetoothBitBang is a gadget that comprises two boards from Sparkfun Electronics. One is a AT91SAM7S-64 header board, the other is a Bluetooth serial interface. You can see there are also some AA batteries in there to power the thing. This connects over Bluetooth to your phone, running a free app available on the Android Market. You can use buttons on your phone's screen to set or clear six output bits, and you can read six input bits. The two boards cost $71, and if you're willing to do some fine soldering and use the bare version of the Bluetooth module, you can knock off twenty bucks. If I'm energetic, maybe I'll see about putting together some kind of significantly cost-reduced version. That might depend on the level of interest I see in the thing. I've posted a Wikipedia page with a lot more information, including the schematic of how the boards are wired up.

The SAM7 firmware and the Android app source code are both publicly available on Github. I'm an Android fan, but the Bluetooth protocol for talking to the board is quite simple and if anybody is interested in writing an iPhone or BlackBerry app for the thing, I'll be happy to provide some support to make that relatively easy.

I think this whole thing gets a lot more interesting when (1) you move from a phone to an Android tablet, which will be cost-effective as tablets flood the market over the next year or two, and (2) start building much more sophisticated data acquisition front-ends. This is just about the simplest acquisition hardware I could imagine that would still be worth the effort of building and debugging it, but no reason one couldn't do a Bluetooth-connected oscilloscope or logic analyzer.

Tuesday, March 01, 2011

Sparkfun's Bluetooth serial-port board

This was preparation for the project in the next post.

I've been tinkering with the BTM-182 Bluetooth serial port module, available from Sparkfun as either a raw module or a convenient breakout board. I've set the baud rate to 115.2 kbaud and connected it to a USB serial port (appearing as /dev/ttyUSB0 on my Linux netbook) and getting power from a USBMOD4 board from Hobby Engineering, whose only purpose here is to provide 3.3 volts. The serial port uses a RS-232 level shifter from Sparkfun.

I wrote some Python code that runs on the Linux netbook. It opens the serial port and provides a teeny calculator-like command interpreter to anybody connecting over the Bluetooth serial connection offered by the BTM-182. Currently I'm using CoolTerm running on a Macbook for that, pairing with the "Serial Adaptor" device using PIN "1234".

Using the calculator-over-Bluetooth looks like this:
Good morning
multiply 3 4 5
60.000000
add 6 8 12
26.000000
Later I'll replace the netbook with a AT91SAM7 microcontroller board, also running a little command interpreter, and use the Bluetooth connection to talk to my Android phone. The next step is to hang some analog data acquisition hardware off the SAM7 and make a low-speed oscilloscope, displaying waveforms on the phone.

Friday, February 25, 2011

Will tries interval training

Lately I've been doing some high-intensity interval training, where "high" takes into account that I'm a baby boomer with a desk job. If you have my sort of Homer-Simpson-esque physique, then start with things you can do without injury, not what the 18-year-old neighbor can do without injury. ObDisclaimer: Talk to your doctor before beginning any exercise regimen.

Studies (1, 2) conclude that brief interval training periods two or three times per week, totaling just a few minutes of high-intensity exercise per session, can produce benefits similar to those from tedious 90- to 120-minute aerobic Jane Fonda workouts. The clearly measurable part appears to be that you can bump up the oxygen usage of your muscles, which means that your muscle mass has increased as has the number of mitochondria. Interval training also causes your muscles to continue burning extra calories for several hours following your workout (1, 2, 3). If I leave my heart-rate monitor on, I see that my heart rate remains mildly elevated long after I've stopped.

After about a month of doing this, I haven't seen any visible shrinkage of my midsection, but I've definitely got better stamina. I have a much easier time climbing stairs or getting up from sitting on the floor. All my exercise has been lower body, to take advantage of the larger muscles, but I'm pretty sure I've gained strength in my upper body as well.

I think I would benefit faster if I were more careful with my diet. I keep thinking about cutting back on carbs, maybe I'll actually do that. Silly to put in the trouble to exercise and not add the piece that would actually allow me to lose some body fat. Sillier still to regard interval training as a license to eat donuts.

Friday, February 18, 2011

Random punditry regarding IBM's Watson

I followed with considerable interest this week the game show Jeopardy! where one of the contestants was an artificial intelligence built by IBM called Watson.

Ordinarily I would try to offer some unique insight of my own about Watson. I would be tempted to acknowledge Ken Jennings' rephrasing of the now-ubiquitous Simpsons quote, "I, for one, welcome our new XYZ overlords". And I'd give my thoughts about what problems of modern society might be effectively addressed by this new technology, possibly in economics, medicine, or social policy.

But so many large buckets of ink have already been poured over the topic of Watson that I think I'll kick back and let the harder-working pundits and bloggers have this one. So let's get started.

An online publication called Washington Technology, whose business is to ensure that Beltway contractors know just enough 1337speak to get by, mentions that Watson will now be working with some medical schools, presumably to suck their knowledge into its database. The original source for that information appears to be an AP news story. Then it will absorb speech recognition technology from Nuance, Inc who had previously absorbed Dragon Systems. This will address the problem Watson faced during game play that it could only receive queries as electronic text messages.

Not much insight from EETimes, alas. They talk about a couple of pedestrian applications of data mining (basically what Netflix or Amazon does all day) in medical diagnosis where, like Watson's possible Jeopardy answers, each is assigned a confidence level, and in... wait for it... identifying patterns in shopping behavior, like the card readers at my local grocery store. Gee, that sounds world-transforming.

MSNBC talks about the same stuff Washington Technology talked about, and adds the data mining angle, this time playing Whack-a-Terrorist with license plates, credit card transactions, Internet activity, flight manifests, phone records, bank records, blah blah blah, every dystopian movie you've seen since 1993.

That appears to cover 99% of the recent writings about Watson. A little disappointing. Maybe I'll need to come up with something myself after all. Hmm. Maybe Watson's next skill set should be online punditry.

Saturday, February 12, 2011

Watson competes on Jeopardy

Watson is a computer developed by IBM researchers with the goal of competing on the game show Jeopardy. Watson's appearance on Jeopardy is in only two days, during which it will compete against the planet's two best human Jeopardy players, Ken Jennings and Brad Rutter. Watson will be appearing on Monday, Tuesday and Wednesday evenings.

This is a publicity event for IBM in the same spirit as the 1997 six-game chess match in which Deep Blue defeated Garry Kasparov. But this is much more important. Deep Blue's technology was applicable only to chess and other deterministic games, amounting to a deep search of the tree of possible future moves.

Watson uses a much broader range of technologies in natural language processing, data mining, machine learning, and resolving ambiguities of communication. It is much likelier that work done on Watson will be applicable to really important problems in medicine, economics, foreign policy, and other areas where there is a significant opportunity to raise the quality of human life.

I don't ordinarily go around recommending that people watch a particular television program, but I'll make an exception here. I'll make this easy: go to Jeopardy's When-to-Watch page, click on your state, and see history unfold. As if Egypt wasn't enough history unfolding for the month of February. If you're in the Boston area, Jeopardy is at 7:30 PM on WBZ (channel 4).

Thursday, February 03, 2011

Android app: a timer for Sprint 8 workouts

I recently learned about an interesting exercise technique called Sprint 8, promoted by a guy named Phil Campbell, due to my sister's interest in Joseph Mercola, a doctor who took an interest in Sprint 8. The idea is pretty simple. Pick a favorite exercise, maybe a stairmaster or a stationary bike, and do eight sprints in the following way. Remember to consult your physician before starting any exercise program.
  • Do two or three minutes of warm-up, nothing too strenuous.
  • Push yourself for 30 seconds. Work as hard as you can without risk of injury. This is a "sprint".
  • For 90 seconds to 2 minutes, move to a slower easier pace. Catch your breath. This is called "active resting".
  • Do a second 30-second sprint, followed by another 90-to-120-second active rest.
  • Repeat until you've done a total of eight sprints.
The Android app is a timer for doing Sprint 8, and the source code is posted on Github. If you're set up for Android app development, feel free to compile it and try it on your Android phone. The app is now available in the Android Market.

There's a lot of exercise physiology knowledge to Sprint 8 that, in all honesty, I haven't yet studied. Maybe I will in future, and possibly blog about it. But I do know that after just a couple of short Sprint 8 workouts I feel really good. My back pain is way down and I get less winded when I climb a flight of stairs. Sprint 8 workouts are claimed to produce human growth hormone (the stuff outlawed in Olympic and professional sports because it gives athletes an unfair competitive advantage) which appears to have anti-aging effects. Also see "interval training", believed to work well for fat loss.

Tuesday, December 21, 2010

Developing for the AT91SAM7 microcontroller

I once purchased a SAM7 P256 board from Sparkfun for $72. This post is a bunch of pointers to the resources I’ll need to develop for it. The same code will work on the H64 header board (only $35), which can be used in future USB projects. UPDATE: Sparkfun no longer sells the H64 header board, but they have a H256 board for $45.

This post assumes an Ubuntu Linux environment.
Set up the development environment.
Use Sam_I_Am to access the SAM-BA bootloader.
Some easily available demos and example programs
Other resources

Saturday, November 06, 2010

Remastering Ubuntu Mini Remix

Ubuntu Mini Remix is a very small, very efficient Ubuntu distribution created by Fabrizio Balliano. I discovered it when I needed to create a kiosk-like boot disk image that booted the user directly into a simple command-line application. I didn't need X Windows, OpenOffice, web access, or email, I just needed to run my application, and UMR was perfect for that. The baseline UMR image is about 165 MB, and is available for i386 and for amd64.

In my previous blog post, I explained how to create a debian package. Since then I've written a shell script that takes one or more debian packages and uses them to remaster UMR. If you wanted to use the debian package from the last blog post to update UMR, you could simply type:
(cd ../make-debian-pkg/; make)
./customize.sh \
    ../make-debian-pkg/foo-bar_1.0_i386.deb
If you have multiple debian packages to be included, simply add them as additional command line arguments. The result will be a file called "customized-umr-10.10-i386.iso", located in your home directory.

For very small additions, a package may be overkill. It might make better sense to edit the script to simply add the files you need, and in my situation at work, that was the approach I ultimately chose. But if your change is more substantial, and especially if you want to include other packages that you'll rely upon, you'll want to think about using package management.

If you want your boot image to take the user directly into your application as I did, you can go into the chroot jail and replace the file /etc/skel/.bashrc with a short script that calls your application. The command would look like this:
sudo cp my-bashrc ${CHROOT_JAIL}/etc/skel/.bashrc
and you'd want to put that with the lines where the comment says "Prepare to chroot", near line 63. (Take the line number with a grain of salt as I may end up editing it at some point.)

Tuesday, November 02, 2010

Making a debian release package

I've been doing a lot with Ubuntu Linux lately, and decided it was time to find out how to put together a debian package. Ubuntu shares Debian's package management system, having descended from Debian. I've banged my way through the code so you don't have to. Here I will just discuss a few highlights. There isn't that much to it. Since this is just an example, I didn't get very creative with the name or the contents. Running make will build foo-bar_1.0_i386.deb.

First a quick look at the files.
./Makefile
./control
./copyright
./postinst
./prerm
./usr/bin/say-hello
./usr/lib/python2.6/foobar.py
./usr/share/doc/foo-bar/foobar.txt
The three files in the "usr" directory tree will be copied into the target system when you say "dpkg -i foo-bar_1.0_i386.deb". If you then say "dpkg -r foo-bar", those three files will be removed. The postinst and prerm scripts can be used to perform actions after installation and before removal respectively, but here they just print messages to the console. The real meat is in the two files control and Makefile. control specifies the package name, version number, dependencies, and other information. Makefile takes pieces of that information to build makefile variables that will be used in creating the package. There isn't a heck of a lot to say about the process other than there is a special DEBIAN directory with meta-information about the package and how it should be installed and removed.

This is the minimal possible example. You can build a deb file this way and install it on an Ubuntu machine. But there are a lot of things that could be improved and cleaned up, and put in better compliance with common practices for making these packages more maintainable. The two big areas are (1) there are recommendations about additional fields to go into control, and (2) there is a tool called lintian, a sort of lint command for deb packages, whose advice should be applied. When I build this package, the advice I get is the following:
E: foo-bar: changelog-file-missing-in-native-package
W: foo-bar: third-party-package-in-python-dir usr/lib/python2.6/foobar.py
W: foo-bar: binary-without-manpage usr/bin/say-hello
W: foo-bar: maintainer-script-ignores-errors postinst
W: foo-bar: maintainer-script-ignores-errors prerm
where "E" and "W" are "Error" and "Warning". So that's not too bad.

Thursday, October 28, 2010

Customized Ubuntu distributions part three

Apparently most of the stuff in my previous posting was the wrong approach. I think I've finally got what I want. All this time, I've been trying to figure out how to do a live Linux CD that (a) includes some code we've been developing at work, and (b) boots very quickly and simply to where the user can use that code. The goal is to provide software tools to a partner company where everybody's laptop runs Windows, but all our stuff is written in Linux.

First I tried to build our code in Cygwin using the Windows version of libusb. I found that fraught with complexities of all sorts and eventually decided the Live CD approach sounded a lot easier. Besides, we wanted the Live CD/bootable USB stick anyway for some later plans.

Theoretically there are small Linux distributions (the most famous being Damn Small Linux) that can be used for this sort of thing. As soon as I started getting into that, I found that DSL is no longer maintained, the documentation for it is insufficient and the pieces that do exist contradict one another. I struggled to resolve dependency and version issues in porting our code to DSL and finally gave up. By that time, I had already discovered how to make an Ubuntu Live CD, and so I delivered one with the first piece of our code to our partner.

But I really wanted a much shorter boot time. I don't need X Windows or networking or OpenOffice or a web browser. I'd prefer to have a development environment on there in case the code required modification but even that is unnecessary.

In the past several days I've tinkered with about a dozen Linux distributions claiming to be "small" and found them all deficient in one way or another. I've tried dozens and dozens of permutations of dumb little tricks involving VirtualBox and QEMU and Ubuntu Customization Kit and burning CD-Rs and USB sticks. I've looked at what feels like hundreds of different web pages and blog postings, each claiming to have an authoritative and trustworthy solution to my problem. Each involves failures to account for discrepancies between versions, or the document I've found is old and inapplicable to what I'm doing, or the author made several minor assumptions that don't work in my environment.

Currently I'm looking at something called Ununtu Mini Remix which looks promising. It's looking very good so far, as I am remastering it with the information on the Ubuntu help website. Adding a shell script to /bin to make sure I can, and adding an "echo HELLO" to /etc/skel/.bashrc to make sure it appears when the disk boots into a bash session.

Everything was going great and then mksquashfs got hung up on the proc directory -- AH, this happened because when you finish a chroot session you must do three umounts (dev/pts, proc, sys) even if you weren't aware of having mounted them. Apparently chroot mounts them without telling you. Umount those in the chroot environment, exit, then umount edit/dev, and the mksquashfs goes just dandy.

So my two dumb tricks in /bin and /etc/skel/.bashrc worked like a champ in VirtualBox and now I'm going to try to make a bootable USB stick. Ubuntu's Startup Disk Creator likes the file (it's very picky about what ISO files are considered bootable) and the USB stick works great in my Windows laptop. Now we make the Angry Birds WHEEEE noise, however dumb some people might find the game. The next step is to make my tweaks into a Deb file using this HOWTO so they go in painlessly.

Grazie mille to Fabrizio Balliano for creating Ubuntu Mini Remix.


165MB Ubuntu livecd containing only the minimal Ubuntu package set. Wonderful as a rescue disk or as a base for Ubuntu derivative creation.

Saturday, October 23, 2010

Customized Ubuntu distributions part two

Turns out I was mostly still spinning my wheels here -- move on to part three of this tale for the solution to the problem.

After a brief visit to the world of Damn Small Linux and subsequent narrow escape from eternal damnation, I returned to Ubuntu with the idea of reducing the size (not as big a priority as I initially thought), speeding up the boot time, and running a program when the user logs into X.

First a quick note about the 9.10/10.04 UCK thing. What's working well for me is to run UCK on a box that has 10.04 installed, and apply it to a 9.10 ISO. Do not attempt to run UCK inside a VMware instance, it's just a lot of pain. You can run the resulting ISO in QEMU but don't forget the "-boot d" argument.

To shrink the distro, I removed OpenOffice, Ubuntu Docs, and Evolution. Things I added included autoconf, emacs, git, guile, and openssh-server.

The low-hanging fruit for speeding up boot time is to bring up networking as a background process, described here. Doing this in Ubuntu 9.10 means editing /etc/init.d/networking, adding "&" here:
case "$1" in
start)
        /lib/init/upstart-job networking start &
        ;;
People have given a lot of thought to other approaches to speed up Ubuntu's boot time, and maybe I'll blog more about that as I investigate it further. I really need to dig deeper into the boot time topic. It will likely warrant another blog post.

Having an app start immediately when the user logs in is somewhat interesting. The X session startup stuff is all in /etc/X11/Xsession.d/ and the main thing here is /etc/X11/Xsession.d/40x11-common_xsessionrc where we find a call to the user's ".xsessionrc" file. The user's directory is populated from /etc/skel, so the trick here is to create /etc/skel/.xsessionrc:
export LC_ALL=C
${HOME}/hello.sh &
where hello.sh is a sample shell script just to make sure I've got the principle down pat:
#!/bin/sh
sleep 2   # wait for other xinit stuff to finish
xterm -geometry 120x50+0+0 -e "echo HELLO WORLD; sleep 5"

Hmm... that worked for a bit, then stopped working. I've since discovered another file, /etc/gdm/PreSession/Default, which seems more relevant. But that starts the app just a little too early, before the user is actually logged into the X session, so maybe I should put a time delay in my app? Annoying.

Monday, October 18, 2010

Fun with customized Ubuntu distributions

At work we have an interesting problem. We are working with some companies in Taiwan. Obviously there's a language difference, but there is another difference as well. We are an Ubuntu Linux shop, and they all have Windows laptops. Periodically we have bits of test code that they need to use, and the OS gulf needs to be overcome.

My first whack at this issue was to try to use Cygwin to rebuild our tools from source on a Windows platform. But after I'd spent a few days dealing with libusb, and making not a whole lot of progress, a co-worker suggested a bootable USB stick. The Taiwanese folks get to keep their Windows laptops, but with a quick reboot they can temporarily use Linux machines just like ours. So I set about learning the art of bootable USB sticks, which in Ubuntu 9.10 is pretty painless. (This is not the case with Ubuntu 10.04. If you need to do this, stick with 9.10.)

Not to keep you in suspense, the two magical things are
  • Ubuntu Customization Kit, (sudo apt-get install uck) which produces an ISO file suitable for burning a CD or DVD which you can boot from, and
  • USB Startup Disk Creator (already present in your System>Administration menu) which puts that ISO file onto a USB stick and makes the stick bootable.
These are amazingly easy-to-use tools, given the complexity of what they're doing. In the bad old days, the Knoppix distribution existed solely for the purpose of rendering this feat possible for mortals. That said, I learned a few tricks about these things which I'll pass along here. Do NOT use Ubuntu 10.04, as there is a serious bug in that version of UCK plus a handful of annoying behavioral oddities. These are fixed in a future UCK release, but that's not available in the Ubuntu 10.04 repositories. In order to produce a USB stick which could be used with a Windows laptop to produce another bootable USB stick, I put a copy of the ISO file onto the USB stick. The instructions for copying the USB stick then go like this.
  • Boot into Windows and insert the first USB stick. Copy the ISO file somewhere memorable. Restart the laptop.
  • Boot into Ubuntu using the USB stick. Once you're booted, insert the second USB stick.
  • Bring up USB Startup Disk Creator. The original ISO file on the first USB stick (from which you are now running) will not be visible in the file system. But the Windows hard drive will be readable, so dig around in it to find the ISO file copy you just used. Use that as the source, and select the 2nd USB stick as the destination. Push the button.
  • Once that installation is complete, copy the ISO file from the Windows hard drive onto the second USB stick. Voila, a copy.
Using the first part (making an ISO image) I was able to produce a DVD with some of our tools for the Taiwanese folks to use. I set up Traditional Chinese and English as languages, with the default to boot into Traditional Chinese. But then because it had some of our source code, I encrypted this entire 725 MB file, which is ironic given that Ubuntu is open source. But there had to be a way to encrypt only the proprietary stuff.

On the next boot image I send them (which will be a USB stick, not a DVD, since USB sticks are oh so much sexier), the contents of the stick will be open source, and the proprietary stuff will be pulled down from a little tarball on some handy little server. The thing that pulls down the tarball and handles security is my little tarball runner script. The new ISO is at http://willware.net/tbr-disk.iso, and if you need to share some closed-source Linux code with people in China or Taiwan, feel free to use it.

To use this bit of cleverness, build some code on your Linux box, package it up as a tarball (including a run.sh shell script at the root level, in case you need to do installation stuff), and if necessary, encrypt it using (my tweaked version of) the Twofish algorithm found on Sourceforge. Then post it to the Internet and email the password only to your intended recipients.

If I find the time and energy, I'll package up the tarball runner and the Twofish module as a Deb package to make the installation painless.

Sunday, October 10, 2010

Preliminary things from the Life Extesion Conference

This weekend I'm attending Christine Peterson's Life Extension Conference in San Francisco. Chris wanted to put together information that is both scientifically valid and actionable, so she lined up a lot of really high-quality speakers. One thing I learned pretty quickly is that there are a large number of areas of expertise, generally interrelated, all pretty deep. I'll try to do a series of blog postings about these topics so this one will just skim a few highlights.

Here are some very quick bits of advice.
  • Completely stop eating sugar.
  • Exercise.
  • Eat spinach and other leafy greens, take vitamin D and drink green tea.
  • The health of your brain is crucial to your overall health. Meditation is better for your brain than puzzles and games.
  • Intermittent fasting (e.g. 24 hours every 2 or 3 day) is good for you.
The popular aging theory that our bodies wear out over time is false. We know this because there are animals and plants thousands of years old which may die from accidents or mishaps, but they do not age biologically. Michael Rose has been breeding long-lived "Methuselah" fruit flies for over 30 years and he discussed his approach. There were a lot of great talks but I found this one clarified some bsic information about aging for me.

Simplistically assume that flies always begin reproducing at age A and always stop reproducing at age B. Any heritable cause of death that takes effect before age A will be strongly selected against, and any heritable cause of death that takes effect after age B will face no selection pressure at all. What Rose did was to tinker with A and B, delaying both, and discardiing the flies who didn't live very long, and he did this from 1980 to the present day. I think I'll have more to say about this when I've gone over my notes more, but a few quick things about these Methuselah flies.

We couldn't do this in 1980 but we can now sequence the DNA of these flies and compare it to the DNA of normal flies. What you see is that there are a lot of teeny differences widely spread over the genome. This leads me to think that there's no silver bullet longevity gene, but rather a lot of small tweaks that address a large number of heritable causes of death.

More stuff to come as I sift through my notes. Chris has talked about posting all the slides online and making the presentation videos available as a DVD.

Thursday, September 16, 2010

Tim BL's talk on Linked Data

I learned a lot from Tim Berners-Lee's TED talk from February 2009 about Linked Data. He talks a bit about his motivation for inventing the Web, which was that the data he encountered at CERN was in all different formats and on all different computer architectures and he spent a huge fraction of his time writing code to translate one format to another. He talks about how much of the world's data is still locked up in information silos -- a million disconnected little islands -- and how many of the world's most urgent problems require that data be made available across the boundaries between corporations, organizations, laboratories, universities, and nations. He has laid out two sets of guidelines for linked data. The first is for the technical crowd:

  1. Use URIs to identify things.
  2. Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents.
  3. Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF/XML.
  4. Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web.

The second set is for a less technical crowd:

  1. All kinds of conceptual things, they have names now that start with HTTP.
  2. I get important information back. I will get back some data in a standard format which is kind of useful data that somebody might like to know about that thing, about that event.
  3. I get back that information it's not just got somebody's height and weight and when they were born, it's got relationships. And when it has relationships, whenever it expresses a relationship then the other thing that it's related to is given one of those names that starts with HTTP.

It's a very eloquent talk, reminding me in places of David Gelernter's prophetic book Mirror Worlds.



What's remarkable about the Linked Data idea is that, as much as people tend to dismiss the whole semantic web vision, it really is making remarkable progress. The diagram above shows several interlinked websites with large and mutually compatible data sets.
  • DBPedia aims to extract linked data from Wikipedia and make it publicly available.
  • YAGO is a huge semantic knowledge base. Currently, YAGO knows more than 2 million entities (like persons, organizations, cities, etc.). It knows 20 million facts about these entities.
  • Lexvo.org brings information about languages, words, characters, and other human language-related entities to the Linked Data Web and Semantic Web.
  • The Calais web service is an API that accepts unstructured text (like news articles, blog postings, etc.), processes them using natural language processing and machine learning algorithms, and returns RDF-formatted entities, facts and events. It takes about 0.5 to 1.0 second depending on how big a document you send and the size of your pipe.
  • Freebase is an open repository of structured data of more than 12 million entities. An entity is a single person, place, or thing. Freebase connects entities together as a graph.
  • LinkedCT is a website full of linked data about past and present clinical trials.
Berners-Lee has recommended a very small set of Linked Data principles.
  • Use URIs as names for things.
  • Use HTTP URIs so that people can look up those names.
  • When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
  • Include links to other URIs so that they can discover more things.

Monday, September 13, 2010

Set theory, OWL, and the Semantic Web

Despite my interest in semantic web technology, there is one area I've had a little mental block about, which is OWL. If you just sit down and try to read the available technical information about OWL, it's clear as mud. Imagine my surprise when clarity dawned in the form of the book Semantic Web for Dummies by Jeffrey Pollock, who explains in Chapter 8 that OWL amounts to set theory. The book is surprisingly good, I recommend it.

I attended elementary school in the 1960s, when the U.S. was trying a stupid educational experiment called New Math. The basic premise was that little kids needed to know axiomatic set theory, in order for the U.S. to raise a generation of uber math geeks who could outperform the Soviet engineers who put Sputnik into orbit. If only I'd taken more seriously all this nonsense about unions and intersections and empty sets, I might have avoided all that trouble with schoolyard bullies. Oh wait.... Anyway, in order to fulfill this obviously pointless requirement, our teacher would spend the first three weeks of every school year drilling us on exercises in set theory and then move on to whatever math we actually really needed to learn for that year. The take-home lesson was that intersection was preferable to union, because writing the result of a union operation meant I had to do more writing and it made my hand hurt. In retrospect it's amazing that I retained any interest in mathematics.

Set theory came into vogue as guys like David Hilbert and Bertrand Russell were fishing around for a formal bedrock on which to place the edifice of mathematics. The hope was to establish a mathematics that was essentially automatable, in the belief that as a result it would be infallible. So they went around formalizing the definitions of various mathematical objects by injecting bits of set theory. One of the more successful examples was to use Dedekind cuts to define the real numbers in terms of the rational numbers.

Hopes of the infallibility of mathematics' new foundation were dashed by Kurt Godel's brilliant incompleteness theorem, described as “the most signal paper in logic in two thousand years.” It was possible to define mathematical ideas in set theoretic terms, and to formalize the axioms, and to automate the proof process, but at a cost. Godel proved the existence of mathematical truths that were formally undecidable -- they could neither be proved nor disproved. Hilbert had hoped that once mathematics was formalized, no stone would be left unturned, and all true mathematical statements would be provable. The story of Godel's theorem (not the history, just an outline of the proof itself) is a wonderful story, well told in Hofstatder's book Godel, Escher, Bach.

But getting back to semantic web stuff. Here are some basic ideas of OWL.
  • Everything is an instance of owl:Thing. Think of it as a base class like java.lang.Object.
  • Within an ontology, you have "instances", "classes", and "properties".
  • "Classes" are essentially sets. "Individuals" are elements of sets.
  • A "property" expresses some relationship between two individuals.
  • OWL includes representations for:
    • unions and intersections of classes (sets)
    • the idea that a set is a subset of another
    • the idea that two sets are disjoint
    • the idea that two sets are the same set
    • the idea that two instances are the same instance
  • Properties can by symmetric (like "sibling") or transitive (like "equals")
  • A property can be "functional", or a function in a mathematical sense. If p is functional, and you assert that p(x)=y and p(x)=z, then the reasoning engine will conclude that y=z.
  • One property can be declared to be the inverse of another.
  • One can declare a property to have specific classes (sets) as its domain and range.
It would be really nice if, at this point, I had some brilliantly illustrative examples of OWL hacking ready to include here. Hopefully those will be forthcoming.

    Saturday, August 21, 2010

    Setting up a small electronics lab

    At Litl, I will be setting up a small electrical engineering lab. What should I get? There's a lot of great stuff at SparkFun, and more at Digi-Key.

    Here's my shopping list, all from Sparkfun. No, I'm not a fan boy. Thanks for asking.
    Honorable mention, stuff that is not a near-term priority. Most of these are entirely non-work-related toys that just look fun.

    Saturday, August 14, 2010

    In Taiwan next week

    I'm not a big traveler generally speaking, but my new job with Litl is bringing me to Taipei and Hsinchu in Taiwan for a few days next week. I'm excited and a little nervous. I've tried to pick up a few words of Mandarin over the past week using Rosetta Stone, but it's a tough language for an American with only dim memories of high school French.

    Hopefully I'll be posting some cool pictures soon, if I get a chance to wander anywhere interesting.

    Tuesday evening

    What the heck is Tuesday when you're 12 time zones from home? Here in Taiwan it's 8:30 PM. Back home in Massachusetts, it's 8:30 AM on Wednesday morning. Between the time difference and the jet lag, not a lot of luck in reasoning about time.

    I'm doing very slightly better with language, strangely. I've identified two glyphs. One (å­—) looks like a seven digit with a horizontal line through it. It's pronounced "tze" (the vowel is a schwa) and I've seen it occur at the ends of several words or small phrases but I don't know its meaning. The other, I can't remember now because I'm too jet-lagged. Another thing this morning was that I identified a glyph that I believe is a very recent invention without any of the historical roots of the other characters. It's an outline of the Red Cross symbols, with crossbars along the top and bottom edges, and my guess is that it indicates a hospital. Some of the mechanics of how these characters are formed is fascinating.

    Today I visited Hsinchu with a couple of other engineers, one also from Litl, and one from Motorola. We did a bunch of debug on some boards that a contractor is designing for us.

    I'm surprised how normal things feel in Taiwan. I had expected it to feel more alien. But everything kinda fits and makes sense. It's interesting to be immersed in a culture that's a little different but not very, and a language that is thoroughly alien. (Though I suppose the clicking languages of Australian aborigines would be even more alien.)

    Friday evening, Taiwan time

    你好 East Coast folks! I should be back in about 24 hours. Just in time to liberate the cat from cat jail and spend the rest of Saturday morning napping.

    I really wish I'd gotten an earlier start on learning some Chinese and applied myself more diligently. It was frustrating to look around and see and hear all this interesting language and understand nearly none of it. Oh well, there should be more opportunities. I'm given to understand that my work will bring me to mainland China before too long.

    I also wish I'd thought to take more photos. I just totally spaced on the fact that I'd brought along a camera.

    Saturday evening, back home

    Still a bit dazed about time zones. Spent 18 hours on airplanes getting home, with a layover in SF long enough to stroll around Fisherman's Wharf. Both airplanes were useless for sleeping so I needed to nap. Gonna try to use melatonin to get my biorhythms resynchronized.

    I think I was mistaken in thinking the Red-Cross-like character was a recent invention. I later saw other usages that were inconsistent with that theory. It just doesn't feel calligraphic to me in the same way as the rest of the written language.

    Here's something humorous: most of the comments to most of my blog postings are in Chinese, with a string of periods, each an HTML link to some Chinese porn site. They're doing this to try to crank up the Google ratings of their porn industry, obviously. The same is true of this posting, there is currently one comment from a friend in Kolkata and four of these porn-site-promoting comments. It just seems kinda funny that they're in response to a posting about visiting Taiwan. I dunno, it sounded funnier when I first thought of it. If anybody knows how to block such comments on one's blog without blocking any legitimate comments from the same geographical area, I'd love to hear about it.

    Thursday, August 12, 2010

    This Google-Verizon deal and Net Neutrality and all that...

    Like everybody else, I'm disappointed with Google on this one. The stuff about the wired Internet is good, it's actually a stronger stance on net neutrality than has existed to date. But the wireless Internet is now supposed to be the Wild West of high tech, a lawless place where anybody big enough can do anything they want. Google should know better. But Google is not the important party in all this.

    My feelings about Verizon are very different. Verizon paid for the network (having purchased it from its builders and/or previous owners) and now pays to maintain it. When the network in my neighborhood goes down, the trucks that come to fix it are Verizon trucks. It's fair and reasonable for Verizon to decide which packets its network will carry, and how those packets will be prioritized.

    What would not be fair or reasonable would be to allow Verizon to block other efforts to build traffic-bearing networks.

    I would love to see a parallel Internet built by hobbyists and local communities and small businesses. A few years back there was a wonderful book called Building Wireless Community Networks by Rob Flickinger. It seemed to me that Flickinger envisioned a nation-wide and perhaps world-wide community network. Maybe I was projecting my own hopes, but I like to think he might have shared that sentiment.

    The right response to the Google-Verizon deal is not to complain about Google's duplicity. They are a publicly traded company, with all that entails. The right response is to start building a network that isn't supported by already-large corporations, where individuals and small new companies don't need to worry about policy decisions by the Googles and Verizons of the world.

    Maybe this should replace Amateur Radio, which has been in decline since the Internet came along.

    Saturday, July 31, 2010

    Evolution and randomness

    Lately I've been watching a video of Richard Dawkins reading from his new book "The Greatest Show on Earth". As always, he is fascinating and lucid.



    Sometimes people criticize evolution on the grounds that "it's all about randomness". They ask questions like this:
    If I spread a bunch of airplane parts on a football field, and a tornado comes around and stirs up all the parts, what is the likelihood that the result will be a correctly assembled, functioning airplane? This is the same likelihood that the human body (or the eye, or the brain, or the hand) could have arisen out of evolution, a process characterized entirely by randomness.
    Evolution consists of two parts. One is variation, which can be random but need not be, and the other is selection, which is not random at all. The part of evolution that is random, the point mutations and crossovers among chromosomes, is not where its explanatory power resides. If that were the whole story, then complex forms really would be no more likely than working airplanes popping out of tornadoes. These random bits of variation merely supply the variety upon which the filter of selection operates.

    It is in selection that the explanatory power of evolution resides. Selection is the non-random part of evolution, where the signal (this trait works) is separated from the noise (that trait doesn't work). It is because selection is consistent and non-random that we see the re-appearance of traits at very different times and places in the history of life. Tyrannosaurus Rex and my cat both have sharp claws. Are cats direct ancestors of T. Rex, and did those sharp-claw genes somehow survive tens of millions of years unmodified? No, but they are both hunting predators faced with problems that sharp claws solve. Likewise, complex eyes with focusing lenses have independently evolved dozens of times, because clear vision is useful.

    Randomness in the physical world is of two types, fundamental randomness and consequent randomness. Fundamental randomness is the stuff of quantum mechanics. When particles appear to act randomly, are there hidden variables which, if we could see them, we'd be able to see through the apparent randomness to an underlying determinism? If there aren't, then the universe includes a component of fundamental randomness -- some things are just random and there's nothing you can do about it. My understanding is that it's still an open question among physicists whether fundamental randomness exists in the universe, but the weight of opinion favors it, as experimentation has ruled out local hidden variables and only non-local hidden variables remain as a possibility.

    Consequent randomness is the appearance of randomness among things that are individually deterministic. A cryptographic hash algorithm is a good example. If we feed this deterministic process with a deterministic input sequence (e.g. 1, 2, 3, 4, 5...) what we get is an output sequence of large integers that look entirely random. They pass every statistical test of randomness with flying colors. Yet in some important sense they aren't random at all, because we can start the input sequence over and we get exactly the same output sequence repeated. So we have apparent randomness arising from deterministic pieces in a complicated Rube Goldberg fashion.

    Consequent randomness can easily arise where there is a mixing of data with different explanations or from different domains. Peoples' cholesterol levels and their telephone numbers are unrelated, so if telephone numbers are put in order of the person's cholesterol level, the sequence appears random.

    Often people object to evolution on the grounds that it requires fundamental randomness, and these same people often find the notion of fundamental randomness personally abhorrent, and so they accept this situation as a disproof of the theory of evolution. In fact, evolution works just fine when variation is driven by consequent randomness. All genetic algorithms running on computers work this way.

    In his talk above, Dawkins discusses a much better potential disproof of evolution, for which he thinks the creationists ought to be scrambling to find evidence. If we found fossils in the wrong geological strata, for instance a rabbit fossil among dinosaur fossils or trilobite fossils, then the case for evolution would be significantly weakened. Such fossils, which Dawkins calls "anachronistic", have never been found among the many hundreds of thousands of fossils recorded in natural history museums and universities around the world. While we may find gaps in the fossil record, we never find such temporal discrepancies.

    So why do I personally believe in evolution? I have two answers to that. The first is that the entire process can be done on a computer. It's a standard thing, people have been doing it for years, it reliably solves hard problems, it's a classic technique in computer science. For me to disbelieve in the efficacy of genetic algorithms would be akin to an auto mechanic whose personal convictions prevent him believing in the inflation of tires, while his colleagues inflate tires on a daily or weekly basis in the shop around him.

    Second, the fundamental idea of evolution is so simple. It has so few parts. There are only a very small number of places it could possibly go wrong. If it went wrong in one of those places, no malicious cabal or conspiracy of evolutionary biologists could cover up its failure for long. The logic of evolution is simply too simple and too compelling to be incorrect.

    Objectors to evolution are sometimes motivated by the fear that it rules out the possibility of an afterlife. Having lost loved ones and myself being mortal, I have some appreciation for this concern. Personally I cannot rule out the possibility of a universe of Cartesian dualism, and in fact I very much hope it's the case. As far as I am aware the strongest arguments against dualism are Occam's razor and Dennett's objection that in a dualist universe, a philosopher like himself would have no hope of understanding or explaining anything because everything would be arbitrary. I also appreciate Dennett's position, but it seems to me to lack imagination -- perhaps there is a dualism that is lawful, understandable, and explainable, and which could ultimately become part of science, but which also allows for some piece of a person's mind or personality that outlasts the physical body. Then it might be possible that such minds and physical bodies might undergo parallel processes of evolution as organisms increase in complexity over billions of years.