Wednesday, November 11, 2009

Getting counts via SPARQL

Many applications need to acquire a count of either statements or subjects to provide some statistics about the information in an RDF graph. There are multiple ways to accomplish this:

  1. Programmatically via the Semantics.SDK API.
  2. Using the SPARQL aggregation function: count.

In this article we will focus on the SPARQL count function. The example below shows a SPARQL query that uses the count function to return the number of unique subjects that have a property x:gender equal to x:Male. The inference rules used in the example are solely for generating some sample data for this example.

prefix x: <http://example.org/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

#infer some sample data
rulebase (
  construct {
    x:id-1 rdf:type x:Person.
    x:id-1 rdf:type x:Employee.
    x:id-1 rdf:type x:SalesPerson.
    x:id-1 x:gender x:Male.
    x:id-2 rdf:type x:Customer.
    x:id-2 x:gender x:Male.}
)
select count(?s) where {?s x:gender x:Male.}

When the query above is executed it returns a value of:

"2"^^<http://www.w3.org/2001/XMLSchema#int>

The count function can be used with multiple arguments to provide a count of distinct combinations of variables. For example, if we wanted to get a count of the the number of distinct combinations of subject and object values for a particular predicate value (rdf:type) the query could be modified to:

select count(?s, ?o) where {
  ?s ?p ?o. filter(?p=rdf:type)
} group by ?p

Notice we include a GROUP BY clause to force an aggregation of ?s and ?o. This query produces a result of:

"4"^^<http://www.w3.org/2001/XMLSchema#int>

If you would like to get a count of all the statements in a graph(s) you can execute the following query:

select count(?s, ?p, ?o) where  {?s ?p ?o. filter(?x=1)} group by ?x

We need to force a grouping on a dummy variable ?x to avoid having the query default to grouping on all variables (?s, ?p and ?o). This behavior will be modified in a future release to make the default grouping be on no variables so a dummy variable is not required. The query above produces the result:

"6"^^<http://www.w3.org/2001/XMLSchema#int>

I will try and get an article on using the Semantics.SDK API to get counts sometime soon.

0 comments:

Post a Comment