I once purchased a SAM7 P256 board from Sparkfun for $72. This post is a bunch of pointers to the resources I’ll need to develop for it. The same code will work on the H64 header board (only $35), which can be used in future USB projects. UPDATE: Sparkfun no longer sells the H64 header board, but they have a H256 board for $45.
Ubuntu Mini Remix is a very small, very efficient Ubuntu distribution created by Fabrizio Balliano. I discovered it when I needed to create a kiosk-like boot disk image that booted the user directly into a simple command-line application. I didn't need X Windows, OpenOffice, web access, or email, I just needed to run my application, and UMR was perfect for that. The baseline UMR image is about 165 MB, and is available for i386 and for amd64.
In my previous blog post, I explained how to create a debian package. Since then I've written a shell script that takes one or more debian packages and uses them to remaster UMR. If you wanted to use the debian package from the last blog post to update UMR, you could simply type:
If you have multiple debian packages to be included, simply add them as additional command line arguments. The result will be a file called "customized-umr-10.10-i386.iso", located in your home directory.
For very small additions, a package may be overkill. It might make better sense to edit the script to simply add the files you need, and in my situation at work, that was the approach I ultimately chose. But if your change is more substantial, and especially if you want to include other packages that you'll rely upon, you'll want to think about using package management.
If you want your boot image to take the user directly into your application as I did, you can go into the chroot jail and replace the file /etc/skel/.bashrc with a short script that calls your application. The command would look like this:
sudo cp my-bashrc ${CHROOT_JAIL}/etc/skel/.bashrc
and you'd want to put that with the lines where the comment says "Prepare to chroot", near line 63. (Take the line number with a grain of salt as I may end up editing it at some point.)
I've been doing a lot with UbuntuLinux lately, and decided it was time to find out how to put together a debian package. Ubuntu shares Debian's package management system, having descended from Debian. I've banged my way through the code so you don't have to. Here I will just discuss a few highlights. There isn't that much to it. Since this is just an example, I didn't get very creative with the name or the contents. Running make will build foo-bar_1.0_i386.deb.
First a quick look at the files. ./Makefile ./control ./copyright ./postinst ./prerm ./usr/bin/say-hello ./usr/lib/python2.6/foobar.py ./usr/share/doc/foo-bar/foobar.txt
The three files in the "usr" directory tree will be copied into the target system when you say "dpkg -i foo-bar_1.0_i386.deb". If you then say "dpkg -r foo-bar", those three files will be removed. The postinst and prerm scripts can be used to perform actions after installation and before removal respectively, but here they just print messages to the console. The real meat is in the two files control and Makefile. control specifies the package name, version number, dependencies, and other information. Makefile takes pieces of that information to build makefile variables that will be used in creating the package. There isn't a heck of a lot to say about the process other than there is a special DEBIAN directory with meta-information about the package and how it should be installed and removed.
This is the minimal possible example. You can build a deb file this way and install it on an Ubuntu machine. But there are a lot of things that could be improved and cleaned up, and put in better compliance with common practices for making these packages more maintainable. The two big areas are (1) there are recommendations about additional fields to go into control, and (2) there is a tool called lintian, a sort of lint command for deb packages, whose advice should be applied. When I build this package, the advice I get is the following:
Apparently most of the stuff in my previous posting was the wrong approach. I think I've finally got what I want. All this time, I've been trying to figure out how to do a live Linux CD that (a) includes some code we've been developing at work, and (b) boots very quickly and simply to where the user can use that code. The goal is to provide software tools to a partner company where everybody's laptop runs Windows, but all our stuff is written in Linux.
First I tried to build our code in Cygwin using the Windows version of libusb. I found that fraught with complexities of all sorts and eventually decided the Live CD approach sounded a lot easier. Besides, we wanted the Live CD/bootable USB stick anyway for some later plans.
Theoretically there are small Linux distributions (the most famous being Damn Small Linux) that can be used for this sort of thing. As soon as I started getting into that, I found that DSL is no longer maintained, the documentation for it is insufficient and the pieces that do exist contradict one another. I struggled to resolve dependency and version issues in porting our code to DSL and finally gave up. By that time, I had already discovered how to make an Ubuntu Live CD, and so I delivered one with the first piece of our code to our partner.
But I really wanted a much shorter boot time. I don't need X Windows or networking or OpenOffice or a web browser. I'd prefer to have a development environment on there in case the code required modification but even that is unnecessary.
In the past several days I've tinkered with about a dozen Linux distributions claiming to be "small" and found them all deficient in one way or another. I've tried dozens and dozens of permutations of dumb little tricks involving VirtualBox and QEMU and Ubuntu Customization Kit and burning CD-Rs and USB sticks. I've looked at what feels like hundreds of different web pages and blog postings, each claiming to have an authoritative and trustworthy solution to my problem. Each involves failures to account for discrepancies between versions, or the document I've found is old and inapplicable to what I'm doing, or the author made several minor assumptions that don't work in my environment.
Currently I'm looking at something called Ununtu Mini Remix which looks promising. It's looking very good so far, as I am remastering it with the information on the Ubuntu help website. Adding a shell script to /bin to make sure I can, and adding an "echo HELLO" to /etc/skel/.bashrc to make sure it appears when the disk boots into a bash session.
Everything was going great and then mksquashfs got hung up on the proc directory -- AH, this happened because when you finish a chroot session you must do three umounts (dev/pts, proc, sys) even if you weren't aware of having mounted them. Apparently chroot mounts them without telling you. Umount those in the chroot environment, exit, then umount edit/dev, and the mksquashfs goes just dandy.
So my two dumb tricks in /bin and /etc/skel/.bashrc worked like a champ in VirtualBox and now I'm going to try to make a bootable USB stick. Ubuntu's Startup Disk Creator likes the file (it's very picky about what ISO files are considered bootable) and the USB stick works great in my Windows laptop. Now we make the Angry Birds WHEEEE noise, however dumb some people might find the game. The next step is to make my tweaks into a Deb file using this HOWTO so they go in painlessly.
Grazie mille to Fabrizio Balliano for creating Ubuntu Mini Remix.
Turns out I was mostly still spinning my wheels here -- move on to part three of this tale for the solution to the problem.
After a brief visit to the world of Damn Small Linux and subsequent narrow escape from eternal damnation, I returned to Ubuntu with the idea of reducing the size (not as big a priority as I initially thought), speeding up the boot time, and running a program when the user logs into X.
First a quick note about the 9.10/10.04 UCK thing. What's working well for me is to run UCK on a box that has 10.04 installed, and apply it to a 9.10 ISO. Do not attempt to run UCK inside a VMware instance, it's just a lot of pain. You can run the resulting ISO in QEMU but don't forget the "-boot d" argument.
To shrink the distro, I removed OpenOffice, Ubuntu Docs, and Evolution. Things I added included autoconf, emacs, git, guile, and openssh-server.
The low-hanging fruit for speeding up boot time is to bring up networking as a background process, described here. Doing this in Ubuntu 9.10 means editing /etc/init.d/networking, adding "&" here: case "$1" in start) /lib/init/upstart-job networking start & ;;
People have given a lotofthought to other approaches to speed up Ubuntu's boot time, and maybe I'll blog more about that as I investigate it further. I really need to dig deeper into the boot time topic. It will likely warrant another blog post.
Having an app start immediately when the user logs in is somewhat interesting. The X session startup stuff is all in /etc/X11/Xsession.d/ and the main thing here is /etc/X11/Xsession.d/40x11-common_xsessionrc where we find a call to the user's ".xsessionrc" file. The user's directory is populated from /etc/skel, so the trick here is to create /etc/skel/.xsessionrc: export LC_ALL=C ${HOME}/hello.sh &
where hello.sh is a sample shell script just to make sure I've got the principle down pat:
#!/bin/sh sleep 2 # wait for other xinit stuff to finish xterm -geometry 120x50+0+0 -e "echo HELLO WORLD; sleep 5"
Hmm... that worked for a bit, then stopped working. I've since discovered another file, /etc/gdm/PreSession/Default, which seems more relevant. But that starts the app just a little too early, before the user is actually logged into the X session, so maybe I should put a time delay in my app? Annoying.
At work we have an interesting problem. We are working with some companies in Taiwan. Obviously there's a language difference, but there is another difference as well. We are an UbuntuLinux shop, and they all have Windows laptops. Periodically we have bits of test code that they need to use, and the OS gulf needs to be overcome.
My first whack at this issue was to try to use Cygwin to rebuild our tools from source on a Windows platform. But after I'd spent a few days dealing with libusb, and making not a whole lot of progress, a co-worker suggested a bootableUSB stick. The Taiwanese folks get to keep their Windows laptops, but with a quick reboot they can temporarily use Linux machines just like ours. So I set about learning the art of bootable USB sticks, which in Ubuntu 9.10 is pretty painless. (This is not the case with Ubuntu 10.04. If you need to do this, stick with 9.10.)
Not to keep you in suspense, the two magical things are
Ubuntu Customization Kit, (sudo apt-get install uck) which produces an ISO file suitable for burning a CD or DVD which you can boot from, and
USB Startup Disk Creator (already present in your System>Administration menu) which puts that ISO file onto a USB stick and makes the stick bootable.
These are amazingly easy-to-use tools, given the complexity of what they're doing. In the bad old days, the Knoppix distribution existed solely for the purpose of rendering this feat possible for mortals. That said, I learned a few tricks about these things which I'll pass along here. Do NOT use Ubuntu 10.04, as there is a serious bug in that version of UCK plus a handful of annoying behavioral oddities. These are fixed in a future UCK release, but that's not available in the Ubuntu 10.04 repositories. In order to produce a USB stick which could be used with a Windows laptop to produce another bootable USB stick, I put a copy of the ISO file onto the USB stick. The instructions for copying the USB stick then go like this.
Boot into Windows and insert the first USB stick. Copy the ISO file somewhere memorable. Restart the laptop.
Boot into Ubuntu using the USB stick. Once you're booted, insert the second USB stick.
Bring up USB Startup Disk Creator. The original ISO file on the first USB stick (from which you are now running) will not be visible in the file system. But the Windows hard drive will be readable, so dig around in it to find the ISO file copy you just used. Use that as the source, and select the 2nd USB stick as the destination. Push the button.
Once that installation is complete, copy the ISO file from the Windows hard drive onto the second USB stick. Voila, a copy.
Using the first part (making an ISO image) I was able to produce a DVD with some of our tools for the Taiwanese folks to use. I set up Traditional Chinese and English as languages, with the default to boot into Traditional Chinese. But then because it had some of our source code, I encrypted this entire 725 MB file, which is ironic given that Ubuntu is open source. But there had to be a way to encrypt only the proprietary stuff.
On the next boot image I send them (which will be a USB stick, not a DVD, since USB sticks are oh so much sexier), the contents of the stick will be open source, and the proprietary stuff will be pulled down from a little tarball on some handy little server. The thing that pulls down the tarball and handles security is my little tarball runner script. The new ISO is at http://willware.net/tbr-disk.iso, and if you need to share some closed-source Linux code with people in China or Taiwan, feel free to use it.
To use this bit of cleverness, build some code on your Linux box, package it up as a tarball (including a run.sh shell script at the root level, in case you need to do installation stuff), and if necessary, encrypt it using (my tweaked version of) the Twofish algorithm found on Sourceforge. Then post it to the Internet and email the password only to your intended recipients.
If I find the time and energy, I'll package up the tarball runner and the Twofish module as a Deb package to make the installation painless.
This weekend I'm attending Christine Peterson's Life Extension Conference in San Francisco. Chris wanted to put together information that is both scientifically valid and actionable, so she lined up a lot of really high-quality speakers. One thing I learned pretty quickly is that there are a large number of areas of expertise, generally interrelated, all pretty deep. I'll try to do a series of blog postings about these topics so this one will just skim a few highlights.
Here are some very quick bits of advice.
Completely stop eating sugar.
Exercise.
Eat spinach and other leafy greens, take vitamin D and drink green tea.
The health of your brain is crucial to your overall health. Meditation is better for your brain than puzzles and games.
Intermittent fasting (e.g. 24 hours every 2 or 3 day) is good for you.
The popular aging theory that our bodies wear out over time is false. We know this because there are animals and plants thousands of years old which may die from accidents or mishaps, but they do not age biologically. Michael Rose has been breeding long-lived "Methuselah" fruit flies for over 30 years and he discussed his approach. There were a lot of great talks but I found this one clarified some bsic information about aging for me.
Simplistically assume that flies always begin reproducing at age A and always stop reproducing at age B. Any heritable cause of death that takes effect before age A will be strongly selected against, and any heritable cause of death that takes effect after age B will face no selection pressure at all. What Rose did was to tinker with A and B, delaying both, and discardiing the flies who didn't live very long, and he did this from 1980 to the present day. I think I'll have more to say about this when I've gone over my notes more, but a few quick things about these Methuselah flies.
We couldn't do this in 1980 but we can now sequence the DNA of these flies and compare it to the DNA of normal flies. What you see is that there are a lot of teeny differences widely spread over the genome. This leads me to think that there's no silver bullet longevity gene, but rather a lot of small tweaks that address a large number of heritable causes of death.
More stuff to come as I sift through my notes. Chris has talked about posting all the slides online and making the presentation videos available as a DVD.
I learned a lot from Tim Berners-Lee's TED talk from February 2009 about Linked Data. He talks a bit about his motivation for inventing the Web, which was that the data he encountered at CERN was in all different formats and on all different computer architectures and he spent a huge fraction of his time writing code to translate one format to another. He talks about how much of the world's data is still locked up in information silos -- a million disconnected little islands -- and how many of the world's most urgent problems require that data be made available across the boundaries between corporations, organizations, laboratories, universities, and nations. He has laid out two sets of guidelines for linked data. The first is for the technical crowd:
Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents.
Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF/XML.
Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web.
The second set is for a less technical crowd:
All kinds of conceptual things, they have names now that start with HTTP.
I get important information back. I will get back some data in a standard format which is kind of useful data that somebody might like to know about that thing, about that event.
I get back that information it's not just got somebody's height and weight and when they were born, it's got relationships. And when it has relationships, whenever it expresses a relationship then the other thing that it's related to is given one of those names that starts with HTTP.
It's a very eloquent talk, reminding me in places of David Gelernter's prophetic book Mirror Worlds.
What's remarkable about the Linked Data idea is that, as much as people tend to dismiss the whole semantic web vision, it really is making remarkable progress. The diagram above shows several interlinked websites with large and mutually compatible data sets.
DBPedia aims to extract linked data from Wikipedia and make it publicly available.
YAGO is a huge semantic knowledge base. Currently, YAGO knows more than 2 million entities (like persons, organizations, cities, etc.). It knows 20 million facts about these entities.
Lexvo.org brings information about languages, words, characters, and other human language-related entities to the Linked Data Web and Semantic Web.
The Calais web service is an API that accepts unstructured text (like news articles, blog postings, etc.), processes them using natural language processing and machine learning algorithms, and returns RDF-formatted entities, facts and events. It takes about 0.5 to 1.0 second depending on how big a document you send and the size of your pipe.
Freebase is an open repository of structured data of more than 12 million entities. An entity is a single person, place, or thing. Freebase connects entities together as a graph.
LinkedCT is a website full of linked data about past and present clinical trials.
Berners-Lee has recommended a very small set of Linked Data principles.
Use URIs as names for things.
Use HTTP URIs so that people can look up those names.
When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
Include links to other URIs so that they can discover more things.
Despite my interest in semantic web technology, there is one area I've had a little mental block about, which is OWL. If you just sit down and try to read the available technical information about OWL, it's clear as mud. Imagine my surprise when clarity dawned in the form of the book Semantic Web for Dummies by Jeffrey Pollock, who explains in Chapter 8 that OWL amounts to set theory. The book is surprisingly good, I recommend it.
I attended elementary school in the 1960s, when the U.S. was trying a stupid educational experiment called New Math. The basic premise was that little kids needed to know axiomatic set theory, in order for the U.S. to raise a generation of uber math geeks who could outperform the Soviet engineers who put Sputnik into orbit. If only I'd taken more seriously all this nonsense about unions and intersections and empty sets, I might have avoided all that trouble with schoolyard bullies. Oh wait.... Anyway, in order to fulfill this obviously pointless requirement, our teacher would spend the first three weeks of every school year drilling us on exercises in set theory and then move on to whatever math we actually really needed to learn for that year. The take-home lesson was that intersection was preferable to union, because writing the result of a union operation meant I had to do more writing and it made my hand hurt. In retrospect it's amazing that I retained any interest in mathematics.
Set theory came into vogue as guys like David Hilbert and Bertrand Russell were fishing around for a formal bedrock on which to place the edifice of mathematics. The hope was to establish a mathematics that was essentially automatable, in the belief that as a result it would be infallible. So they went around formalizing the definitions of various mathematical objects by injecting bits of set theory. One of the more successful examples was to use Dedekind cuts to define the real numbers in terms of the rational numbers.
Hopes of the infallibility of mathematics' new foundation were dashed by Kurt Godel's brilliant incompleteness theorem, described as “the most signal paper in logic in two thousand years.” It was possible to define mathematical ideas in set theoretic terms, and to formalize the axioms, and to automate the proof process, but at a cost. Godel proved the existence of mathematical truths that were formally undecidable -- they could neither be proved nor disproved. Hilbert had hoped that once mathematics was formalized, no stone would be left unturned, and all true mathematical statements would be provable. The story of Godel's theorem (not the history, just an outline of the proof itself) is a wonderful story, well told in Hofstatder's book Godel, Escher, Bach.
But getting back to semantic web stuff. Here are some basic ideas of OWL.
Everything is an instance of owl:Thing. Think of it as a base class like java.lang.Object.
Within an ontology, you have "instances", "classes", and "properties".
"Classes" are essentially sets. "Individuals" are elements of sets.
A "property" expresses some relationship between two individuals.
OWL includes representations for:
unions and intersections of classes (sets)
the idea that a set is a subset of another
the idea that two sets are disjoint
the idea that two sets are the same set
the idea that two instances are the same instance
Properties can by symmetric (like "sibling") or transitive (like "equals")
A property can be "functional", or a function in a mathematical sense. If p is functional, and you assert that p(x)=y and p(x)=z, then the reasoning engine will conclude that y=z.
One property can be declared to be the inverse of another.
One can declare a property to have specific classes (sets) as its domain and range.
It would be really nice if, at this point, I had some brilliantly illustrative examples of OWL hacking ready to include here. Hopefully those will be forthcoming.
I'm not a big traveler generally speaking, but my new job with Litl is bringing me to Taipei and Hsinchu in Taiwan for a few days next week. I'm excited and a little nervous. I've tried to pick up a few words of Mandarin over the past week using Rosetta Stone, but it's a tough language for an American with only dim memories of high school French.
Hopefully I'll be posting some cool pictures soon, if I get a chance to wander anywhere interesting.
Tuesday evening
What the heck is Tuesday when you're 12 time zones from home? Here in Taiwan it's 8:30 PM. Back home in Massachusetts, it's 8:30 AM on Wednesday morning. Between the time difference and the jet lag, not a lot of luck in reasoning about time.
I'm doing very slightly better with language, strangely. I've identified two glyphs. One (å) looks like a seven digit with a horizontal line through it. It's pronounced "tze" (the vowel is a schwa) and I've seen it occur at the ends of several words or small phrases but I don't know its meaning. The other, I can't remember now because I'm too jet-lagged. Another thing this morning was that I identified a glyph that I believe is a very recent invention without any of the historical roots of the other characters. It's an outline of the Red Cross symbols, with crossbars along the top and bottom edges, and my guess is that it indicates a hospital. Some of the mechanics of how these characters are formed is fascinating.
Today I visited Hsinchu with a couple of other engineers, one also from Litl, and one from Motorola. We did a bunch of debug on some boards that a contractor is designing for us.
I'm surprised how normal things feel in Taiwan. I had expected it to feel more alien. But everything kinda fits and makes sense. It's interesting to be immersed in a culture that's a little different but not very, and a language that is thoroughly alien. (Though I suppose the clicking languages of Australian aborigines would be even more alien.)
Friday evening, Taiwan time
ä½ å„½ East Coast folks! I should be back in about 24 hours. Just in time to liberate the cat from cat jail and spend the rest of Saturday morning napping.
I really wish I'd gotten an earlier start on learning some Chinese and applied myself more diligently. It was frustrating to look around and see and hear all this interesting language and understand nearly none of it. Oh well, there should be more opportunities. I'm given to understand that my work will bring me to mainland China before too long.
I also wish I'd thought to take more photos. I just totally spaced on the fact that I'd brought along a camera.
Saturday evening, back home
Still a bit dazed about time zones. Spent 18 hours on airplanes getting home, with a layover in SF long enough to stroll around Fisherman's Wharf. Both airplanes were useless for sleeping so I needed to nap. Gonna try to use melatonin to get my biorhythms resynchronized.
I think I was mistaken in thinking the Red-Cross-like character was a recent invention. I later saw other usages that were inconsistent with that theory. It just doesn't feel calligraphic to me in the same way as the rest of the written language.
Here's something humorous: most of the comments to most of my blog postings are in Chinese, with a string of periods, each an HTML link to some Chinese porn site. They're doing this to try to crank up the Google ratings of their porn industry, obviously. The same is true of this posting, there is currently one comment from a friend in Kolkata and four of these porn-site-promoting comments. It just seems kinda funny that they're in response to a posting about visiting Taiwan. I dunno, it sounded funnier when I first thought of it. If anybody knows how to block such comments on one's blog without blocking any legitimate comments from the same geographical area, I'd love to hear about it.
Like everybody else, I'm disappointed with Google on this one. The stuff about the wired Internet is good, it's actually a stronger stance on net neutrality than has existed to date. But the wireless Internet is now supposed to be the Wild West of high tech, a lawless place where anybody big enough can do anything they want. Google should know better. But Google is not the important party in all this.
My feelings about Verizon are very different. Verizon paid for the network (having purchased it from its builders and/or previous owners) and now pays to maintain it. When the network in my neighborhood goes down, the trucks that come to fix it are Verizon trucks. It's fair and reasonable for Verizon to decide which packets its network will carry, and how those packets will be prioritized.
What would not be fair or reasonable would be to allow Verizon to block other efforts to build traffic-bearing networks.
I would love to see a parallel Internet built by hobbyists and local communities and small businesses. A few years back there was a wonderful book called Building Wireless Community Networks by Rob Flickinger. It seemed to me that Flickinger envisioned a nation-wide and perhaps world-wide community network. Maybe I was projecting my own hopes, but I like to think he might have shared that sentiment.
The right response to the Google-Verizon deal is not to complain about Google's duplicity. They are a publicly traded company, with all that entails. The right response is to start building a network that isn't supported by already-large corporations, where individuals and small new companies don't need to worry about policy decisions by the Googles and Verizons of the world.
Maybe this should replace Amateur Radio, which has been in decline since the Internet came along.
Lately I've been watching a video of Richard Dawkins reading from his new book "The Greatest Show on Earth". As always, he is fascinating and lucid.
Sometimes people criticize evolution on the grounds that "it's all about randomness". They ask questions like this:
If I spread a bunch of airplane parts on a football field, and a tornado comes around and stirs up all the parts, what is the likelihood that the result will be a correctly assembled, functioning airplane? This is the same likelihood that the human body (or the eye, or the brain, or the hand) could have arisen out of evolution, a process characterized entirely by randomness.
Evolution consists of two parts. One is variation, which can be random but need not be, and the other is selection, which is not random at all. The part of evolution that is random, the point mutations and crossovers among chromosomes, is not where its explanatory power resides. If that were the whole story, then complex forms really would be no more likely than working airplanes popping out of tornadoes. These random bits of variation merely supply the variety upon which the filter of selection operates.
It is in selection that the explanatory power of evolution resides. Selection is the non-random part of evolution, where the signal (this trait works) is separated from the noise (that trait doesn't work). It is because selection is consistent and non-random that we see the re-appearance of traits at very different times and places in the history of life. Tyrannosaurus Rex and my cat both have sharp claws. Are cats direct ancestors of T. Rex, and did those sharp-claw genes somehow survive tens of millions of years unmodified? No, but they are both hunting predators faced with problems that sharp claws solve. Likewise, complex eyes with focusing lenses have independently evolved dozens of times, because clear vision is useful.
Randomness in the physical world is of two types, fundamental randomness and consequent randomness. Fundamental randomness is the stuff of quantum mechanics. When particles appear to act randomly, are there hidden variables which, if we could see them, we'd be able to see through the apparent randomness to an underlying determinism? If there aren't, then the universe includes a component of fundamental randomness -- some things are just random and there's nothing you can do about it. My understanding is that it's still an open question among physicists whether fundamental randomness exists in the universe, but the weight of opinion favors it, as experimentation has ruled out local hidden variables and only non-local hidden variables remain as a possibility.
Consequent randomness is the appearance of randomness among things that are individually deterministic. A cryptographic hash algorithm is a good example. If we feed this deterministic process with a deterministic input sequence (e.g. 1, 2, 3, 4, 5...) what we get is an output sequence of large integers that look entirely random. They pass every statistical test of randomness with flying colors. Yet in some important sense they aren't random at all, because we can start the input sequence over and we get exactly the same output sequence repeated. So we have apparent randomness arising from deterministic pieces in a complicated Rube Goldberg fashion.
Consequent randomness can easily arise where there is a mixing of data with different explanations or from different domains. Peoples' cholesterol levels and their telephone numbers are unrelated, so if telephone numbers are put in order of the person's cholesterol level, the sequence appears random.
Often people object to evolution on the grounds that it requires fundamental randomness, and these same people often find the notion of fundamental randomness personally abhorrent, and so they accept this situation as a disproof of the theory of evolution. In fact, evolution works just fine when variation is driven by consequent randomness. All genetic algorithms running on computers work this way.
In his talk above, Dawkins discusses a much better potential disproof of evolution, for which he thinks the creationists ought to be scrambling to find evidence. If we found fossils in the wrong geological strata, for instance a rabbit fossil among dinosaur fossils or trilobite fossils, then the case for evolution would be significantly weakened. Such fossils, which Dawkins calls "anachronistic", have never been found among the many hundreds of thousands of fossils recorded in natural history museums and universities around the world. While we may find gaps in the fossil record, we never find such temporal discrepancies.
So why do I personally believe in evolution? I have two answers to that. The first is that the entire process can be done on a computer. It's a standard thing, people have been doing it for years, it reliably solves hard problems, it's a classic technique in computer science. For me to disbelieve in the efficacy of genetic algorithms would be akin to an auto mechanic whose personal convictions prevent him believing in the inflation of tires, while his colleagues inflate tires on a daily or weekly basis in the shop around him.
Second, the fundamental idea of evolution is so simple. It has so few parts. There are only a very small number of places it could possibly go wrong. If it went wrong in one of those places, no malicious cabal or conspiracy of evolutionary biologists could cover up its failure for long. The logic of evolution is simply too simple and too compelling to be incorrect.
Objectors to evolution are sometimes motivated by the fear that it rules out the possibility of an afterlife. Having lost loved ones and myself being mortal, I have some appreciation for this concern. Personally I cannot rule out the possibility of a universe of Cartesian dualism, and in fact I very much hope it's the case. As far as I am aware the strongest arguments against dualism are Occam's razor and Dennett's objection that in a dualist universe, a philosopher like himself would have no hope of understanding or explaining anything because everything would be arbitrary. I also appreciate Dennett's position, but it seems to me to lack imagination -- perhaps there is a dualism that is lawful, understandable, and explainable, and which could ultimately become part of science, but which also allows for some piece of a person's mind or personality that outlasts the physical body. Then it might be possible that such minds and physical bodies might undergo parallel processes of evolution as organisms increase in complexity over billions of years.
In December I wrote a very lame Android app. It had a couple of buttons, a date picker, and a green background. The buttons incremented and decremented a counter. Clay Shirky's talk on cognitive surplus referred to LOLcats as the minimal creative act, the feeblest teeny quantum of effort one can make in a creative direction. He only said that because he wasn't familiar with my first Android app.
My second app, written over the last few days, is way cooler. So much so that I'm willing to post the source code, risking public humiliation. Good thing nobody ever reads this blog.
This app actually serves a purpose. The Motorola Droid phone can find your location using its GPS receiver, but there is no convenient way to then share that location information with a friend (via email, SMS, Twitter, or what have you). The app determines your location, converts it to a Google Maps URL, and then you copy/paste it into an email, a SMS message, or a tweet.
When I tweet a location from my Droid phone it looks like this. I don't know if that location bit appearing under the tweet (which allowed me to pop up the map) came from the location I tweeted, or whether it was some kind of metadata that the phone's Twitter client somehow attached to the tweet separately.
If you have some strange urge to try this on your own Android phone, you can download the unsigned APK file, load it onto your phone's SD card, and install it with AppInstaller, available in the Android app market.
The interesting things I ran into with this app are mostly in the single Java source file. I learned that you need to make sure that your onPause and onResume methods call the parent. I think (not sure) it's a bad idea to call removeUpdates() on a LocationListener more than once.
Still tinkering and figuring my way around Android's JSR-179 implementation. Interesting cool stuff. I look forward to writing an Android client for some sort of web-accessible database thing that I'd probably throw up on App Engine at some point, when I can come up with a worthwhile application.
I recently got a fortune cookie that said something along these lines:
Skillful actions come from experience. Experience comes from unskillful actions.
Maybe I can share some experience and save somebody a little grief. I anticipate I'll post more things along these lines so I'm calling this "part 1".
I've spent some months building a Django website. The website has been growing more complex and the requirement has been a moving target, so I have been developing practices accordingly.
The first thing is automated testing. We all know it's good, but we don't always remember just how good. Plan your test strategy early.
Django's templating system works well with nested Python dictionaries, so a data structure like
{"foo":
{"bar":
{"baz": "some content here"},
... }
... }
can be included in an HTML page with a notation like this.
Let's put {{ foo.bar.baz }} in our web page.
Python dictionaries are essentially the same as JSON data structures. So all my view functions produce nested Python dictionaries, which can either be plugged into HTML templates, or returned as JSON if "json=1" is present in the HTTP request parameters. In the short term, the JSON output makes it very easy to write automated tests that don't have to scrape HTML to find the content. In the longer term, I'll want JSON when I move to AJAX some day.
The second topic is URL design. I've discovered a new way to write spaghetti code -- I've produced a profusion of URLs as I've grown the functionality rapidly. Each URL ("/foo/", "/bar/", "/profile/", etc) has an entry in urls.py and a function in views.py. If you're careless about planning, these tend to sprawl.
I think the right thing is to draw something like a state machine diagram for your website. The nodes are the URLs, each mapping to a HTML page. The edges are the actions users take to go from page to page, clicking buttons or controls or submitting HTML forms. Somewhere in there you need notations for the stuff happening in the back end, things being fetched from the DB or stored, various computations being done, various complex data structures being constructed. My thoughts on how to construct a proper state diagram are not yet complete.
The Symphony No. 3 in C minor, Op. 78, was completed by Camille Saint-Saƫns in 1886 at what was probably the artistic zenith of his career. It is also popularly known as the "Organ Symphony", even though it is not a true symphony for organ, but simply an orchestral symphony where two sections out of four use the pipe organ. The French title of the work is more accurate: Symphonie No. 3 "avec orgue" (with organ).
Of composing the work Saint-Saƫns said that he had "given everything to it I was able to give." The composer seemed to know it would be his last attempt at the symphonic form, and he wrote the work almost as a type of "history" of his own career: virtuoso piano passages, brilliant orchestral writing characteristic of the Romantic period, and the sound of a cathedral-sized pipe organ. The work was dedicated to Saint-Saƫns's friend Franz Liszt, who died that year, on July 31, 1886.
On Saturday I went to the New England Steampunk Festival in Waltham, Massachusetts. It was delightful. A lot of people in Victorian dress with complicated goggles, and elaborate gadgets hanging off their belts or strapped to their backs. I took some pictures and did some twittering while I was there.
Steampunk has its apologists, but I'm not sure it needs them. I heard a few complaints that some gadgets were simply props and did nothing, and some gadgets were built with obviously modern pieces. To the first I'd say that steampunk is a style, not a technology (the enthusiasts are very clear on this, and unashamedly use the word "prop" for their toys) and to the second, I'd say that you can't expect them all to be equally skilled and ambitious, and if they're having fun and not hurting anyone, is it really so terrible that you can see the plastic Coke bottle cap on their ray-gun?
I'm surprised that there aren't more steampunk graphic novels. That strikes me as a natural fit. I also wish they weren't quite so obsessed with "airships", the way Fifties sci-fi was obsessed with flying saucers and robots.
Attending Steampunk Festival, Charles River Museum of Industry and Innovation, Waltham MA
Steam engines http://twitgoo.com/u4fvs
More steampunk stuff http://twitgoo.com/u4g12
The guy calls it a "spirit harvester" http://twitgoo.com/u4g74
Best costume imho http://twitgoo.com/u4gb6
I later found this woman's blog, mostly a compendium of sci-fi and fantasy events happening around New England. Useful and interesting.
Won a piece of optometry equipment in the raffle at the New England Steampunk Festival in Waltham MA http://twitgoo.com/u4yn5
The next day I was thinking about pipe organs, and about all the cool stuff I saw on Saturday, and it occurred to me that it would be feasible (even for me) to build a small USB-controlled pipe organ rank. My one area of uncertainty is the solenoid valves, it seems difficult to find them at a price that's affordable if I want to put in fifty-or-so pipes.
Yesterday I had fun giving a talk on the automation of science at Bar Camp Boston. I was very fortunate to (A) have very little to say myself, so that I quickly got out of the way for others to discuss, and (B) have some very smart people in the room who got the idea immediately, some of them able to give the scientist's-eye view of this idea.
Discussion centered around a few topics. One was how comprehensive a role would computers play in the entire scientific process. There seemed to be consensus that computers could easily identify statistical patterns in data, could perform symbolic regression in cases of limited complexity and not too many variables, but that in the creation of scientific theories and hypotheses, there are necessary intuitive leaps that a machine can't make. Personally I believe that's true but I imagine that computers might demonstrate an ability to make leaps we can't make as humans, and I have no idea what those leaps would look like because they would be the product of an alien intelligence. If no such leaps occur, at least the collection of tools available to human scientists will hopefully have grown in a useful direction.
Another topic was the willingness of scientists to provide semantic markup for research literature. Only those expert in the field are qualified to provide such markup since it requires an in-depth understanding of the field as a whole, and the paper's reasoning process in particular. It's also likely to be a lot of work, at least initially, and there is as yet no incentive to offer scientists in exchange for such work. The notion of posting papers on some kind of wiki and hoping that semantic markup could be crowd-sourced was quickly dismissed. Crowd-sourcing doesn't work when there is a very precise correct answer and the number of people with that answer is very small.
There has been a lot of Twitter traffic around Bar Camp Boston, and I was able to find a few comments on my talk afterward. It looks like people enjoyed it and found it stimulating and engaging, so that's very cool. It turned out to be a good limbering-up for an immediately following talk on Wolfram Alpha. I found one particularly evocative tweet:
Has anyone approached a CS journal to have their content semantically marked up? #BCBos @BarCampBoston
Thinking about that question, I realized that computer science is the right branch of science to begin this stuff, and that the way to make it most palatable to scientists is to publish papers that demonstrate how to do semantic markup as easily as possible at time of publication (not as a later retrofit), how a scientist can benefit himself or herself by doing that work, and how to do interesting stuff with the markup of papers that have already been published. My quick guess is that some sort of literate programming approach (wiki) is appropriate. So lots to think about.
If you attended my talk, thanks very much for being there. I had a lot of fun, and hope you did too.
I came across this tidbit on the Lambda the Ultimate website. It's a pointer to a juicy paper by some Carnegie Mellon folks.
Abstract. We present a formal system, E, which provides a faithful model of the proofs in Euclid’s Elements, including the use of diagrammatic reasoning.
"Diagrammatic reasoning" is the interesting part. People have recognized the Elements as an exemplar of rigorous reasoning for many centuries, but it took some time for the question to emerge, "are the diagrams a necessary component of the logical argument?" Liebniz believed they were not:
...it is not the figures which furnish the proof with geometers, though the style of the exposition may make you think so. The force of the demonstration is independent of the figure drawn, which is drawn only to facilitate the knowledge of our meaning, and to fix the attention; it is the universal propositions, i.e. the definitions, axioms, and theorems already demonstrated, which make the reasoning, and which would sustain it though the figure were not there.
The authors note that "there is no [historical] chain linking our contemporary diagrams with the ones that Euclid actually drew; it is likely that, over the years, diagrams were often reconstructed from the text". Their abstract seems to say that the design of E recognizes some essential role for the diagrams, so I assume one must exist. I haven't finished reading the paper yet. But the whole thing is very interesting.
We software developers have a knee-jerk hatred of specifications. Rather than write a document describing work we plan to do, we would rather throw together a quick prototype and grow it into the final system. We sometimes feel like specs are for liberal-arts sissies and pointy-haired bosses. Our prehistoric brains want us to dismiss specifications as a waste of time or even an intentional misdirection of energy.
The truth of it is that specs build consensus between developers, testers, tech writers, managers, and customers. They make sure everybody agrees about what to build, how to test it, how to write a user manual for it, and what the priorities are.
The Agile guys talk about the exponentially increasing cost of fixing a bug. The later in the process you find that bug, the more troublesome and expensive it is to fix it. Fixing bugs in code is hard, even prototype code, and fixing text is easy.
Let's learn to trick our brains to work around our reluctance. The Head-First books always start with a great little explanation about how our prehistoric brain circuitry divvies up our attention, classifying things as interesting or boring, and determines what sticks in our memories. Sesame Street learned how to make stuff sticky by
repetition
lighting up more brain circuitry
infusing the topic with emotional content
relating it to things that were already sticky
One way to infuse your spec with emotional content would be to make it a turf war. That hooks into all our brain circuitry for tribes and feuds. But turf wars are traumatic and damaging to people and projects, so let's not do this.
To light up more brain circuitry, sketch out pieces of the spec on a big whiteboard. Draw a lot of pictures and diagrams. Use different colored markers. Get a few people together and generate consensus (not a turf war), and ask them to help identify issues that you forgot. That meeting is called a design review, like a code review for specs.
Who should write and own the spec? Part three of Joel Spolsky's great four-part (1, 2, 3, 4) article answers this question, drawing on his experience at Microsoft. One person should write and own the spec, and the programmers should not report to that person. At Microsoft, that person is a program manager.
It's important to differentiate between
a functional spec (what the user sees and experiences, what the customer wants) dealing with features, screens, dialog boxes, UI and UX, work flow
and a technical spec (the stuff under the hood) dealing with system components, data structures and algorithms, communication protocols, database schemas, tools, languages, test methodologies, and external dependencies which may have hard-to-predict schedule impacts
Write the functional spec first, then the technical spec, then the code. If you love test-driven development then write the specs, then the tests, then the code.
Write simply, clearly, and briefly. Don't pontificate.
Re-read your own spec, many times. Eat your own literary dogfood. If you can't stay awake, nobody else will either.
Avoid working to a template unless politically necessary.
How do you know when the spec is done?
The functional spec is done when the system can be designed, built, tested, and deployed without asking more questions about the user interface or user experience.
The technical spec is done when each component of the system can be designed, built, tested, and deployed without asking more questions about the rest of the system.
This doesn't mean that these documents can never be updated or renegotiated. But the goal is to aim for as little subsequent change as possible.
I am still sorely tempted by the idea of a quick prototype, an "executable spec" that exposes bugs in design or logical consistency. Maybe it's OK to co-develop this with the spec, or tinker with it on one's own time, or consider it as a first phase of the coding. I'm still sorting this out. The basic rationale of a spec, that fixing bugs in text is easier and cheaper than fixing bugs in code, still needs to be observed.
Back in the days of its founding, Apple championed hobbyists and experimenters, even including circuit board schematics with the Apple ][+ to help people who wanted to tinker with the electronics. Not so now. Cory Doctorow (brilliant guy, read his Disneyland sci-fi novel) recently blogged about how Apple has switched its loyalty to the DRM-and-eternal-copyright crowd, and like the iPhone, the iPad reflects this. Consequently, the common temptation to covet an iPad is an evil one.
I like my Android phone (a Motorola Droid from Verizon) except for the PHONE part, the one thing it does poorly. Every other function, I adore. Also I'd like a bigger keyboard and screen, maybe Kindle size. So: Android tablet with bigger keyboard and screen, and no phone (therefore no messy dependency on mobile carriers).
I wouldn't want to try to build a tablet from scratch, but the Touch Book from AlwaysInnovating looks good. The tablet piece (sans keyboard, which makes it a netbook) is $300, loaded with their custom Linux OS. The OS can be replaced with Ubuntu, Android, Chrome, etc. An SD card makes it easy to get apps and files onto and off the tablet. There's a wiki to help developers get up to speed.
In another video, the inventor shows how to enable route tracking on Google Maps by popping off the back cover and plugging a GPS receiver into an internal USB connector. I am currently between jobs, but this is going on my shopping list for later.
Earlier I blogged about how it seemed like web app development had just zoomed past me. Since then, I've buckled down and actually started to study this stuff. My earlier posting only talked about the presentation layer, HTML, Javascript, and CSS. I still have more to learn about those, but the really interesting stuff happens on the server.
In December I went to a two-day session on Hibernate and Spring, and it was full of mysterious jargon that made me sleepy: dependency injection, inversion of control, aspects, object-relational mapping, convention over configuration, blah blah blah. I kept at it, though, looking at Rails and later Django. I'm now waist-deep in building a MySQL-backed Django site. What I learned is that (A) all these web app frameworks are remarkably similar to one another, and (B) those jargon terms are a lot simpler than they seem.
Inversion of control means that the framework makes calls into your app code, rather than you calling the framework from a main() function. Dependency injection is a set of tricks to minimize dependencies between different Java source files. Aspects are Java tricks that you can do by wrapping your methods in other methods with the same signatures, a lot like decorators in Python. Object-relational mapping is creating classes to represent your DB tables: each instance represents a row, each column is represented by a setter and getter. The MVC pattern gives the lay of the land for all these frameworks, and all the presentation stuff I talked about before is limited to the "view" piece.
As I find my footing in the basics, I start to notice where the interesting bits of more advanced topics pop up. If I put a Django app and a Mediawiki on the same server, can I do a single sign-on for both of them? I think I can, by writing an AuthPlugin extension to make the Mediawiki accept Django's authenticationcookie.
Don't ask Django to serve a PHP page because it doesn't include a PHP interpreter (what mod_php does for Apache). Your Apache config file must deal with PHP files before routing to Django.
One thing I haven't quite understood is why the Django community seems to love Prototype and hate jQuery. Is that just because Prototype is included in the standard Django package? Is it purely historical, with jQuery the abandoned but superior Betamax to Prototype's VHS?
Digilent, a partner of Xilinx, makes eval boards for Xilinx FPGAs. I bought one and plan to hack some Verilog with it. My past experiments involved a board of my own design with a FPGA and a USB-enabled microcontroller. I successfully programmed the microcontroller over the USB cable to wiggle GPIO pins, which should have allowed me to program the FPGA via JTAG. But for some reason, JTAG programming of the FPGA didn't work. This time the JTAG programming pins will be wired directly to parallel port pins and there is a Linux library for programming them, so I should have better luck this time. Fewer unknowns and variables, more easily probed.
Attention (hey, shiny!) deficit break: I stumbled across a coupleof very affordable logic analyzers. Amazing stuff, just the thing for debugging errant JTAG signals.
Some nice folks have released a PCI soft core under the LGPL. I'm not ready to tackle that yet, but hope to get there before too long. Speaking of PCI, here is a nice FPGA board for a PCI bus slot from Enterpoint in the UK. They also have a PCI soft core but the licensing is a bit pricey for a hobbyist. I wonder if the LGPLed PCI core would work on the Enterpoint board.
Earlier I posted about TA-65, a telomerase activator, which some hope could reverse some of the effects of aging. Amiya Sarkar is a doctor in Calcutta who writes a fascinating blog on physiology and physics. He and I have emailed back and forth for a couple years now, starting with a very cool idea he had for an inexpensive open-source electrocardiogram. (One of these days we really need to get that project back on track.)
Amiya expressed the concern that any telomerase activator could be viewed as a potential cancer risk. Cancerous cells use telomerase to support the unlimited replication that characterizes cancer. The folks at Sierra Sciences openly recognize this concern, and give reasons why they believe it's a red herring, on this webpage:
In most cases (85–95%), cancers accomplish this indefinite cell division by turning on telomerase. For this reason, forcing telomerase to turn off throughout the body has been suggested as a cure for cancer, and there are several telomerase inhibitor drugs presently being tested in clinical trials.
So, anti-aging scientists must be out of their minds to want to turn the telomerase gene on, right?
No! Although telomerase is necessary for cancers to extend their lifespan, telomerase does not cause cancer. This has been repeatedly demonstrated: at least seven assays for cancer have been performed on telomerase-positive human cells: the soft agar assay, the contact inhibition assay, the mouse xenograft assay, the karyotype assay, the serum inhibition assay, the gene expression assay, and the checkpoint analysis assay. All reported negative results...
Paradoxically, even though cells require telomerase to become dangerous cancers, turning on telomerase may actually prevent cancer. This is not just because the risk of chromosome rearrangements is reduced, but also because telomerase can extend the lifespan of our immune cells, improving their ability to seek out and destroy cancer cells.
In support of this, they list several papers.
Jiang, X.-R. et al. Telomerase expression in human somatic cells does not induce changes associated with a transformed phenotype. Nature Genet., 21, 111–114 (1999)
Morales, C.P., et. al. Absence of cancer-associated changes in human fibroblasts immortalized with telomerase. Nature Genet., 21, 115–118 (1999)
Harley, C. B. Telomerase is not an oncogene. Oncogene 21(4): 494-502 (2002).
From other writings on their website, and from their postings to Twitter and Facebook, it's clear that the Sierra Sciences folks are 100% confident that telomerase activators pose zero cancer risk. They are in a much better position to know about this than I. But if I started taking TA-65 and they were somehow mistaken, they wouldn't be the ones at risk for cancer. I hope to find out about those seven assays and try to read those three papers in my abundant spare time, and maybe discuss the matter with my doctor. (My present circumstances do not permit me to afford TA-65 even if I decide I want it.) Wouldn't it be cool if the Sierra Sciences people turn out to be correct...
I'm learning Ruby on Rails to help a friend with his website and to be able to put it on my resume, and keeping notes as I go.
I've gotten the thing to do typical CGI script stuff, and now I'm figuring out how database access works. One big surprise is that as Rails advanced to version 2.0, one of the basic commands for setting up database access changed. Google "rails 2.0 scaffolding" for details.
The Jena code has two representations for nodes in an RDF graph. One is the class Node, which has several subclasses: Node_Variable, Node_Literal, Node_URI, etc. The other is the interface RDFNode, which has many subinterfaces: Literal, Resource, Property, etc.
These two node representations have very different roles and very different idiomatic usages, and this doesn't appear to be spelled out in the Jena documentation anywhere. RDFNode is in the com.hp.hpl.jena.rdf.model package, where Node is in the com.hp.hpl.jena.graph package, but I don't think the packaging by itself is a big enough hint.
The Jena tutorials mostly talk only about the RDFNode variants, usually instantiating them by calling a "create" method on the Model. The poorly documented distinction between RDFNode and Node extends to the distinction between Model and Graph, and between Statement and Triple.
Since this information didn't appear in the documentation, we need to look at the Jena mailing list to find it.
A key difference between Resource and Node is that Resources know which model they are in, and Nodes are general. That's what makes resource.getProperty() work. Now in a query that is not a concept that has any meaning in the general case and patterns can span graphs.
We have found that Model/Statement/RDFNode (the API) works as an application interface but it's not the right thing for storage abstractions and the Graph/Triple/Node (the SPI) works better where the regularity is more valuable. That is, we have split the application-facing design from the sub-system-facing design.
So an instance of RDFNode is associated with a specific Model, where an instance of Node is free-floating, and is used to build Rules, which are also model-independent. The two representations can be connected by URIs. If you have a Node and a Model, and you want the corresponding RDFNode, do this (or use createProperty or createLiteral as needed):
Resource r = model.createResource(uri1);
and if you have an RDFNode, you can do this to get a Node:
So I can understand that there are two very different appropriate interfaces for writing Jena apps and for interfacing to a storage system. What I don't get is why I would ever see the latter while writing an application. If I define a Rule, I need to deal in Nodes. Presumably this is because I've been constructing Rules programmatically rather than just reading them in from a file. Maybe I should stick with the latter.