Monday, September 13, 2010

Set theory, OWL, and the Semantic Web

Despite my interest in semantic web technology, there is one area I've had a little mental block about, which is OWL. If you just sit down and try to read the available technical information about OWL, it's clear as mud. Imagine my surprise when clarity dawned in the form of the book Semantic Web for Dummies by Jeffrey Pollock, who explains in Chapter 8 that OWL amounts to set theory. The book is surprisingly good, I recommend it.

I attended elementary school in the 1960s, when the U.S. was trying a stupid educational experiment called New Math. The basic premise was that little kids needed to know axiomatic set theory, in order for the U.S. to raise a generation of uber math geeks who could outperform the Soviet engineers who put Sputnik into orbit. If only I'd taken more seriously all this nonsense about unions and intersections and empty sets, I might have avoided all that trouble with schoolyard bullies. Oh wait.... Anyway, in order to fulfill this obviously pointless requirement, our teacher would spend the first three weeks of every school year drilling us on exercises in set theory and then move on to whatever math we actually really needed to learn for that year. The take-home lesson was that intersection was preferable to union, because writing the result of a union operation meant I had to do more writing and it made my hand hurt. In retrospect it's amazing that I retained any interest in mathematics.

Set theory came into vogue as guys like David Hilbert and Bertrand Russell were fishing around for a formal bedrock on which to place the edifice of mathematics. The hope was to establish a mathematics that was essentially automatable, in the belief that as a result it would be infallible. So they went around formalizing the definitions of various mathematical objects by injecting bits of set theory. One of the more successful examples was to use Dedekind cuts to define the real numbers in terms of the rational numbers.

Hopes of the infallibility of mathematics' new foundation were dashed by Kurt Godel's brilliant incompleteness theorem, described as “the most signal paper in logic in two thousand years.” It was possible to define mathematical ideas in set theoretic terms, and to formalize the axioms, and to automate the proof process, but at a cost. Godel proved the existence of mathematical truths that were formally undecidable -- they could neither be proved nor disproved. Hilbert had hoped that once mathematics was formalized, no stone would be left unturned, and all true mathematical statements would be provable. The story of Godel's theorem (not the history, just an outline of the proof itself) is a wonderful story, well told in Hofstatder's book Godel, Escher, Bach.

But getting back to semantic web stuff. Here are some basic ideas of OWL.
  • Everything is an instance of owl:Thing. Think of it as a base class like java.lang.Object.
  • Within an ontology, you have "instances", "classes", and "properties".
  • "Classes" are essentially sets. "Individuals" are elements of sets.
  • A "property" expresses some relationship between two individuals.
  • OWL includes representations for:
    • unions and intersections of classes (sets)
    • the idea that a set is a subset of another
    • the idea that two sets are disjoint
    • the idea that two sets are the same set
    • the idea that two instances are the same instance
  • Properties can by symmetric (like "sibling") or transitive (like "equals")
  • A property can be "functional", or a function in a mathematical sense. If p is functional, and you assert that p(x)=y and p(x)=z, then the reasoning engine will conclude that y=z.
  • One property can be declared to be the inverse of another.
  • One can declare a property to have specific classes (sets) as its domain and range.
It would be really nice if, at this point, I had some brilliantly illustrative examples of OWL hacking ready to include here. Hopefully those will be forthcoming.

    No comments: