Sunday, April 18, 2010

Autosci talk at Bar Camp Boston

Yesterday I had fun giving a talk on the automation of science at Bar Camp Boston. I was very fortunate to (A) have very little to say myself, so that I quickly got out of the way for others to discuss, and (B) have some very smart people in the room who got the idea immediately, some of them able to give the scientist's-eye view of this idea.

Discussion centered around a few topics. One was how comprehensive a role would computers play in the entire scientific process. There seemed to be consensus that computers could easily identify statistical patterns in data, could perform symbolic regression in cases of limited complexity and not too many variables, but that in the creation of scientific theories and hypotheses, there are necessary intuitive leaps that a machine can't make. Personally I believe that's true but I imagine that computers might demonstrate an ability to make leaps we can't make as humans, and I have no idea what those leaps would look like because they would be the product of an alien intelligence. If no such leaps occur, at least the collection of tools available to human scientists will hopefully have grown in a useful direction.

Another topic was the willingness of scientists to provide semantic markup for research literature. Only those expert in the field are qualified to provide such markup since it requires an in-depth understanding of the field as a whole, and the paper's reasoning process in particular. It's also likely to be a lot of work, at least initially, and there is as yet no incentive to offer scientists in exchange for such work. The notion of posting papers on some kind of wiki and hoping that semantic markup could be crowd-sourced was quickly dismissed. Crowd-sourcing doesn't work when there is a very precise correct answer and the number of people with that answer is very small.

There has been a lot of Twitter traffic around Bar Camp Boston, and I was able to find a few comments on my talk afterward. It looks like people enjoyed it and found it stimulating and engaging, so that's very cool. It turned out to be a good limbering-up for an immediately following talk on Wolfram Alpha. I found one particularly evocative tweet:
Has anyone approached a CS journal to have their content semantically marked up? #BCBos @BarCampBoston
Thinking about that question, I realized that computer science is the right branch of science to begin this stuff, and that the way to make it most palatable to scientists is to publish papers that demonstrate how to do semantic markup as easily as possible at time of publication (not as a later retrofit), how a scientist can benefit himself or herself by doing that work, and how to do interesting stuff with the markup of papers that have already been published. My quick guess is that some sort of literate programming approach (wiki) is appropriate. So lots to think about.

If you attended my talk, thanks very much for being there. I had a lot of fun, and hope you did too.

Monday, April 12, 2010

A Formal System For Euclid's Elements

I came across this tidbit on the Lambda the Ultimate website. It's a pointer to a juicy paper by some Carnegie Mellon folks.
Abstract. We present a formal system, E, which provides a faithful model of the proofs in Euclid’s Elements, including the use of diagrammatic reasoning.
"Diagrammatic reasoning" is the interesting part. People have recognized the Elements as an exemplar of rigorous reasoning for many centuries, but it took some time for the question to emerge, "are the diagrams a necessary component of the logical argument?" Liebniz believed they were not:
...it is not the figures which furnish the proof with geometers, though the style of the exposition may make you think so. The force of the demonstration is independent of the figure drawn, which is drawn only to facilitate the knowledge of our meaning, and to fix the attention; it is the universal propositions, i.e. the definitions, axioms, and theorems already demonstrated, which make the reasoning, and which would sustain it though the figure were not there.
The authors note that "there is no [historical] chain linking our contemporary diagrams with the ones that Euclid actually drew; it is likely that, over the years, diagrams were often reconstructed from the text". Their abstract seems to say that the design of E recognizes some essential role for the diagrams, so I assume one must exist. I haven't finished reading the paper yet. But the whole thing is very interesting.

Saturday, April 10, 2010

Learning to live with software specifications

We software developers have a knee-jerk hatred of specifications. Rather than write a document describing work we plan to do, we would rather throw together a quick prototype and grow it into the final system. We sometimes feel like specs are for liberal-arts sissies and pointy-haired bosses. Our prehistoric brains want us to dismiss specifications as a waste of time or even an intentional misdirection of energy.

The truth of it is that specs build consensus between developers, testers, tech writers, managers, and customers. They make sure everybody agrees about what to build, how to test it, how to write a user manual for it, and what the priorities are.

The Agile guys talk about the exponentially increasing cost of fixing a bug. The later in the process you find that bug, the more troublesome and expensive it is to fix it. Fixing bugs in code is hard, even prototype code, and fixing text is easy.

Let's learn to trick our brains to work around our reluctance. The Head-First books always start with a great little explanation about how our prehistoric brain circuitry divvies up our attention, classifying things as interesting or boring, and determines what sticks in our memories. Sesame Street learned how to make stuff sticky by
  • repetition
  • lighting up more brain circuitry
  • infusing the topic with emotional content
  • relating it to things that were already sticky
One way to infuse your spec with emotional content would be to make it a turf war. That hooks into all our brain circuitry for tribes and feuds. But turf wars are traumatic and damaging to people and projects, so let's not do this.

To light up more brain circuitry, sketch out pieces of the spec on a big whiteboard. Draw a lot of pictures and diagrams. Use different colored markers. Get a few people together and generate consensus (not a turf war), and ask them to help identify issues that you forgot. That meeting is called a design review, like a code review for specs.

Who should write and own the spec?  Part three of Joel Spolsky's great four-part (1, 2, 3, 4) article answers this question, drawing on his experience at Microsoft. One person should write and own the spec, and the programmers should not report to that person. At Microsoft, that person is a program manager.

It's important to differentiate between
  • a functional spec (what the user sees and experiences, what the customer wants) dealing with features, screens, dialog boxes, UI and UX, work flow
  • and a technical spec (the stuff under the hood) dealing with system components, data structures and algorithms, communication protocols, database schemas, tools, languages, test methodologies, and external dependencies which may have hard-to-predict schedule impacts
Write the functional spec first, then the technical spec, then the code. If you love test-driven development then write the specs, then the tests, then the code.

Joel's article includes some great points on keeping the spec readable.
  • Use humor. It helps people stay awake.
  • Write simply, clearly, and briefly. Don't pontificate.
  • Re-read your own spec, many times. Eat your own literary dogfood. If you can't stay awake, nobody else will either.
  • Avoid working to a template unless politically necessary.
How do you know when the spec is done?
  • The functional spec is done when the system can be designed, built, tested, and deployed without asking more questions about the user interface or user experience.
  • The technical spec is done when each component of the system can be designed, built, tested, and deployed without asking more questions about the rest of the system.
This doesn't mean that these documents can never be updated or renegotiated. But the goal is to aim for as little subsequent change as possible.

I am still sorely tempted by the idea of a quick prototype, an "executable spec" that exposes bugs in design or logical consistency. Maybe it's OK to co-develop this with the spec, or tinker with it on one's own time, or consider it as a first phase of the coding. I'm still sorting this out. The basic rationale of a spec, that fixing bugs in text is easier and cheaper than fixing bugs in code, still needs to be observed.

Sunday, April 04, 2010

Don't covet Apple's new iPad

Back in the days of its founding, Apple championed hobbyists and experimenters, even including circuit board schematics with the Apple ][+ to help people who wanted to tinker with the electronics. Not so now. Cory Doctorow (brilliant guy, read his Disneyland sci-fi novel) recently blogged about how Apple has switched its loyalty to the DRM-and-eternal-copyright crowd, and like the iPhone, the iPad reflects this. Consequently, the common temptation to covet an iPad is an evil one.

I like my Android phone (a Motorola Droid from Verizon) except for the PHONE part, the one thing it does poorly. Every other function, I adore. Also I'd like a bigger keyboard and screen, maybe Kindle size. So: Android tablet with bigger keyboard and screen, and no phone (therefore no messy dependency on mobile carriers).

I wouldn't want to try to build a tablet from scratch, but the Touch Book from AlwaysInnovating looks good. The tablet piece (sans keyboard, which makes it a netbook) is $300, loaded with their custom Linux OS. The OS can be replaced with Ubuntu, Android, Chrome, etc. An SD card makes it easy to get apps and files onto and off the tablet. There's a wiki to help developers get up to speed.

In another video, the inventor shows how to enable route tracking on Google Maps by popping off the back cover and plugging a GPS receiver into an internal USB connector. I am currently between jobs, but this is going on my shopping list for later.

All web app frameworks lead to Rome

Earlier I blogged about how it seemed like web app development had just zoomed past me. Since then, I've buckled down and actually started to study this stuff. My earlier posting only talked about the presentation layer, HTML, Javascript, and CSS. I still have more to learn about those, but the really interesting stuff happens on the server.

In December I went to a two-day session on Hibernate and Spring, and it was full of mysterious jargon that made me sleepy: dependency injection, inversion of control, aspects, object-relational mapping, convention over configuration, blah blah blah. I kept at it, though, looking at Rails and later Django. I'm now waist-deep in building a MySQL-backed Django site. What I learned is that (A) all these web app frameworks are remarkably similar to one another, and (B) those jargon terms are a lot simpler than they seem.

Inversion of control means that the framework makes calls into your app code, rather than you calling the framework from a main() function. Dependency injection is a set of tricks to minimize dependencies between different Java source files. Aspects are Java tricks that you can do by wrapping your methods in other methods with the same signatures, a lot like decorators in Python. Object-relational mapping is creating classes to represent your DB tables: each instance represents a row, each column is represented by a setter and getter. The MVC pattern gives the lay of the land for all these frameworks, and all the presentation stuff I talked about before is limited to the "view" piece.

As I find my footing in the basics, I start to notice where the interesting bits of more advanced topics pop up. If I put a Django app and a Mediawiki on the same server, can I do a single sign-on for both of them? I think I can, by writing an AuthPlugin extension to make the Mediawiki accept Django's authentication cookie.

Don't ask Django to serve a PHP page because it doesn't include a PHP interpreter (what mod_php does for Apache). Your Apache config file must deal with PHP files before routing to Django.
    AliasMatch /([^/]*\.php) ..../phpdir/$1
    WSGIScriptAlias / ..../djangodir/django.wsgi

One thing I haven't quite understood is why the Django community seems to love Prototype and hate jQuery. Is that just because Prototype is included in the standard Django package? Is it purely historical, with jQuery the abandoned but superior Betamax to Prototype's VHS?