Friday, February 05, 2010

Playing with the Jena semantic web framework

I've begun tinkering with Jena, a semantic web framework written in Java. It embodies a lot of ideas and technologies that were once considered AI or expert systems, and which I neglected at the time (1980s).
The semantic web frames a body of knowledge as a collection of three-word sentences called triples. These can be diagrammed as a directed graph such as the one below.


The corresponding three-word sentences appear below, written in N3, a human-readable formal language used by the semantic web community.
@prefix :  <#> .
:Cat :has :Fur .
:Bear :has :Fur .
:Cat :is-a :Mammal .
:Bear :is-a :Mammal .
:Mammal :has :Vertebrae .
:Whale :is-a :Mammal .
:Whale :lives-in :Water .
:Mammal :is-a :Animal .
:Fish :is-a :Animal .
:Fish :lives-in :Water .
In Jena, a Model is one of these things. Having constructed it (or having loaded it from either a file on the internet or a file on your computer), you can apply rules that allow you to draw conclusions. So let's step through that proecess. First we need to read in the file.
private static final String baseUri =
    "file:///home/wware/wware-autosci/" +
    "semweb/java/simpleNet.n3#";
private static void modelReadFile(String filename,
                                  Model model) {
    try {
        File f = new File(filename);
        FileReader fr = new FileReader(f);
        model.read(fr, baseUri);
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
}
To print the contents of a model, we can use the SPARQL query language, which looks a lot like SQL.
private static void printModel(Model model) {
    String queryString = 
        "SELECT ?x ?y ?z " +
        "WHERE {" +
        "    ?x ?y ?z . " +
        "}";
    Query query = QueryFactory.create(queryString);
    QueryExecution qe =
      QueryExecutionFactory.create(query, model);
    ResultSet results = qe.execSelect();
    ResultSetFormatter.out(System.out, results, query);
    qe.close();
}
and we'll call that method from our main method. I personally find it appalling that the graph above fails to recognize that fish have vertebrae, so we'll add a triple for that.
public static void main(String[] args) {
    Model model = ModelFactory.createDefaultModel();
    modelReadFile("simpleNet.rdf", model);
    model.createResource(baseUri + "Fish")
         .addProperty(model.createProperty(
                         baseUri + "has"),
                      model.createResource(
                         baseUri + "Vertebrae"));
    printModel(model);
}
and the RDF file that imports the model was translated from the N3 above, using CWM.

<rdf:RDF
xmlns="file:///home/wware/wware-autosci/semweb/java/simpleNet.n3#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description rdf:about="#Bear">
        <has rdf:resource="#Fur"/>
        <is-a rdf:resource="#Mammal"/>
    </rdf:Description>
    <rdf:Description rdf:about="#Cat">
        <has rdf:resource="#Fur"/>
        <is-a rdf:resource="#Mammal"/>
    </rdf:Description>
    <rdf:Description rdf:about="#Fish">
        <is-a rdf:resource="#Animal"/>
        <lives-in rdf:resource="#Water"/>
    </rdf:Description>
    <rdf:Description rdf:about="#Mammal">
        <has rdf:resource="#Vertebrae"/>
        <is-a rdf:resource="#Animal"/>
    </rdf:Description>
    <rdf:Description rdf:about="#Whale">
        <is-a rdf:resource="#Mammal"/>
        <lives-in rdf:resource="#Water"/>
    </rdf:Description>
</rdf:RDF>

There is a Model.write(OutputStream) method, so we can write a model directly to a file instead of stepping through the triples explicitly.

So how about some actual reasoning? We should be able to conclude that a cat is an animal, and has vertebrae. This will require that we define two rules of inference, rule1 ("is-a" is transitive) and rule2 (a member of a species has the things the species has).
String rules =
        "[ rule1: (?x " + baseUri+"is-a ?y) " +
                 "(?y " + baseUri+"is-a ?z) -> " +
                 "(?x " + baseUri+"is-a ?z) ] " +
        "[ rule2: (?x " + baseUri+"is-a ?y) " +
                 "(?y " + baseUri+"has ?z) -> " +
                 "(?x " + baseUri+"has ?z) ]";
Reasoner reasoner = new
    GenericRuleReasoner(Rule.parseRules(rules));
reasoner.setDerivationLogging(true);
InfModel inf =
  ModelFactory.createInfModel(reasoner, model);
printModel(inf);
Simply creating the InfModel is enough to draw all the relevant inferences. The Reasoner's setDerivationLogging method tells the model to remember the derivations that led to any new conclusions, and these derivations can be examined for debugging purposes.
PrintWriter out = new PrintWriter(System.out);
for (StmtIterator i = inf.listStatements();
             i.hasNext(); ) {
    Statement s = i.nextStatement();
    for (Iterator id = inf.getDerivation(s);
             id.hasNext(); ) {
        Derivation deriv = id.next();
        deriv.printTrace(out, true);
    }
}
out.flush();

1 comment:

Gustavo Freitas said...

Hi Will,

I'm Gustavo Freitas from Brazil and this is a very interesting and useful post. I intend to start my conclusion work in a few days and to work in something like Jena. I wanna build a web application using those concepts. I'll be happy to give you a feedback while I work on.

Thanks guy!