Friday, February 5, 2010

Querying for intersections between sets.

A SPARQL query is often needed to find the intersection between sets in a data model. This article provides a simple example based on landmarks and regions. In this data model a landmark belongs to a region and regions can belong to other regions creating a hierarchical collection of sets. The goal is to find the common regions for all landmarks. The trick is we want the best match for the region. For example Boston Common and Fenway Park are both located in Boston; therefore, we do not want to return results showing regions that contain Boston such as Massachusetts, United States of America or North America.

The SPARQL query below uses a rule to generate some data for the examples. It uses a couple of other rules to recursively generation all the containment relationships for the regions. The query itself uses negation to eliminate results that have a region that contains another region for the same two landmarks.

prefix x: <http://example.org/>

rulebase (
# generate some facts for our example
 construct {
 
# places
 x:Boston x:isPartOf x:Massachusetts.    
 x:Massachusetts x:isPartOf x:UnitedStatesOfAmerica.    
 x:UnitedStatesOfAmerica x:isPartOf x:NorthAmerica
 x:Boston x:isPartOf x:Massachusetts.    
 x:Paris x:isPartOf x:France.    
 x:France x:isPartOf x:Europe.
 x:Toronto x:isPartOf x:Ontario.    
 x:Ontario x:isPartOf x:Canada.    
 x:Canada x:isPartOf x:NorthAmerica.
 
 
#landmarks
 x:BostonCommon x:hasLocation x:Boston.
 x:FenwayPark x:hasLocation x:Boston.
 x:EiffelTower x:hasLocation x:Paris.
 x:CNTower x:hasLocation x:Toronto.
 x:MississippiRiver x:hasLocation x:UnitedStatesOfAmerica.
 x:GreatLakes x:HasLocation x:NorthAmerica.
}

# walk the region classification
construct {?a x:isPartOf ?b} where {?a x:isPartOf ?c. ?c x:isPartOf ?b}
construct {?a x:isSameAsOrPartOf ?a} where {{?a x:isPartOf ?x.}union{?x x:isPartOf ?a}}
construct {?a x:isSameAsOrPartOf ?b} where {?a x:isPartOf ?b}
)

# find the smallest common location between landmarks
select ?lm1 ?lm2 ?region where {
  ?lm1 x:hasLocation ?loc1.
  ?lm2 x:hasLocation ?loc2.
  filter(?lm1 != ?lm2)
  ?loc1 x:isSameAsOrPartOf ?region.
  ?loc2 x:isSameAsOrPartOf ?region.
 
#make sure there is not a closer match
  not(?loc1 ?loc2 ?region)
  {
    ?loc1 x:isSameAsOrPartOf ?region2.
    ?loc2 x:isSameAsOrPartOf ?region2.
    ?region2 x:isPartOf ?region.
  }
}

When the query above is executed it produces the following results.

   1: <http://example.org/BostonCommon>    <http://example.org/CNTower>    <http://example.org/NorthAmerica>.
   2: <http://example.org/BostonCommon>    <http://example.org/FenwayPark>    <http://example.org/Boston>.
   3: <http://example.org/BostonCommon>    <http://example.org/MississippiRiver>    <http://example.org/UnitedStatesOfAmerica>.
   4: <http://example.org/CNTower>    <http://example.org/BostonCommon>    <http://example.org/NorthAmerica>.
   5: <http://example.org/CNTower>    <http://example.org/FenwayPark>    <http://example.org/NorthAmerica>.
   6: <http://example.org/CNTower>    <http://example.org/MississippiRiver>    <http://example.org/NorthAmerica>.
   7: <http://example.org/FenwayPark>    <http://example.org/BostonCommon>    <http://example.org/Boston>.
   8: <http://example.org/FenwayPark>    <http://example.org/CNTower>    <http://example.org/NorthAmerica>.
   9: <http://example.org/FenwayPark>    <http://example.org/MississippiRiver>    <http://example.org/UnitedStatesOfAmerica>.
  10: <http://example.org/MississippiRiver>    <http://example.org/BostonCommon>    <http://example.org/UnitedStatesOfAmerica>.
  11: <http://example.org/MississippiRiver>    <http://example.org/CNTower>    <http://example.org/NorthAmerica>.
  12: <http://example.org/MississippiRiver>    <http://example.org/FenwayPark>    <http://example.org/UnitedStatesOfAmerica>.

Wednesday, February 3, 2010

Using rules to implement a cascading delete

In an RDF model some entities are wholly owned by other entities. These are often referenced to as contained objects. When deleting an entity the desired logic is:

  1. Delete all statements about the entity
  2. Delete all statements about any contained objects referenced by the entity
  3. Delete any statements that reference the contained object.

One way to implement this logic is to add some additional information about the predicates that are used to reference a contained object. For example for a property “foo” we can add the following statement to the ontology.

x:foo x:refType x:containedSubject.

This statement denotes that any entity that is the object value of a statement with the predicate x:foo is a contained object. Using this approach a generic set of recursive delete rules can be implemented for removing contained objects and the statements that reference them. It traditional database terminology this is referred to a a cascading delete.

The SPARQL rulebase and DELETE query shown below provides an example of how to implement this logic.

prefix x: <http://example.org/>

rulebase (
# remove references to the contained object.
select ?s ?p ?o ?r where {?s ?p ?o. 
  ?p x:refType x:containedObject. filter(?o = ?r)} as x:cascadeDelete

# remove all statements about the contained object.
select ?s ?p ?o ?r where {?s ?p ?o. 
  filter(?s=?r).} as x:cascadeDelete

# remove all referenced contained objects
select ?s ?p ?o ?r where {?r ?x ?y. 
  ?x x:refType x:containedObject. 
  x:cascadeDelete(?s, ?p, ?o, ?y).} as x:cascadeDelete
)

# delete a node where @target = the target URI
delete from <xxx> {?s ?p ?o} 
from <xxx>
where {x:cascadeDelete(?s, ?p, ?o, @target).}