Wednesday, November 11, 2009

Getting counts via SPARQL

Many applications need to acquire a count of either statements or subjects to provide some statistics about the information in an RDF graph. There are multiple ways to accomplish this:

  1. Programmatically via the Semantics.SDK API.
  2. Using the SPARQL aggregation function: count.

In this article we will focus on the SPARQL count function. The example below shows a SPARQL query that uses the count function to return the number of unique subjects that have a property x:gender equal to x:Male. The inference rules used in the example are solely for generating some sample data for this example.

prefix x: <http://example.org/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

#infer some sample data
rulebase (
  construct {
    x:id-1 rdf:type x:Person.
    x:id-1 rdf:type x:Employee.
    x:id-1 rdf:type x:SalesPerson.
    x:id-1 x:gender x:Male.
    x:id-2 rdf:type x:Customer.
    x:id-2 x:gender x:Male.}
)
select count(?s) where {?s x:gender x:Male.}

When the query above is executed it returns a value of:

"2"^^<http://www.w3.org/2001/XMLSchema#int>

The count function can be used with multiple arguments to provide a count of distinct combinations of variables. For example, if we wanted to get a count of the the number of distinct combinations of subject and object values for a particular predicate value (rdf:type) the query could be modified to:

select count(?s, ?o) where {
  ?s ?p ?o. filter(?p=rdf:type)
} group by ?p

Notice we include a GROUP BY clause to force an aggregation of ?s and ?o. This query produces a result of:

"4"^^<http://www.w3.org/2001/XMLSchema#int>

If you would like to get a count of all the statements in a graph(s) you can execute the following query:

select count(?s, ?p, ?o) where  {?s ?p ?o. filter(?x=1)} group by ?x

We need to force a grouping on a dummy variable ?x to avoid having the query default to grouping on all variables (?s, ?p and ?o). This behavior will be modified in a future release to make the default grouping be on no variables so a dummy variable is not required. The query above produces the result:

"6"^^<http://www.w3.org/2001/XMLSchema#int>

I will try and get an article on using the Semantics.SDK API to get counts sometime soon.

Friday, November 6, 2009

OWL Union and Intersection Rules

This article provides a simple example of how to use inference rules to map owl:unionOf and owl:intersectionOf class descriptions into rdfs:subClassOf relationships.

The example below shows a SPARQL query that has rules to:

(1) Interpret the RDF collection notation to assign membership.

(2) Create RDFS subclass relationships based on the OWL union and intersection descriptions.

(3) Generates some sample data to demonstrate the first two rules.

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix x: <http://www.example.org/>

rulebase (
# (1) some rdf collection rules
  construct {?l rdfs:member ?i} where {?l rdf:first ?i}
  construct {?l rdfs:member ?i} where {?l rdf:rest ?r. ?r rdfs:member ?i}

# (2) owl to rdfs subclass rules
  construct {?s rdfs:subClassOf ?c} where {?s owl:intersectionOf ?l. ?l rdfs:member ?c}
  construct {?c rdfs:subClassOf ?u} where {?u owl:unionOf ?l. ?l rdfs:member ?c}

# (3) some sample data
  construct {x:HumanBeing  owl:unionOf  (x:Man x:Woman)}
  construct {x:TallMan  owl:intersectionOf  (x:Man x:TallThing)}
)

select ?sub ?super where {?sub rdfs:subClassOf ?super}

The SELECT query then produces a list of classes with their corresponding base class.  When the query is executed it produces the following results:

<http://www.example.org/Man>    <http://www.example.org/HumanBeing>
<http://www.example.org/TallMan> <http://www.example.org/Man>
<http://www.example.org/TallMan> <http://www.example.org/TallThing>
<http://www.example.org/Woman> <http://www.example.org/HumanBeing>

Note, the RDFS subclass rules are not recursive and just show the immediate base class.