Wednesday, December 16, 2009

Using the SQL Data Source

The SQL Data Source is a component available in the Semantics.SDK. It allows an application to connect to a traditional relational database and expose its data as RDF data. It is most commonly used by an semantic application to import relational data in to its RDF database. This article will provide a short example of how to use it.

Sample Data

Let’s first examine a very simple relational database. It contains one table “Employee” with its schema described below.

Column Description
ID The unique identifier of the employee. (PK, int)
Name The employee name. (NOT NULL, varchar(255))

The table contains the following two records.

ID Name
1 Fred Flintstone
2 Barney Rubble
Model Manager

I will first show how you can query the relational data as RDF using SPARQL via the Model Manager. First open the Model Manager and use the context menu on the Data Sources folder under the default model to add a SQL Data Source. The following dialog box will be displayed.

image

The table below describes the fields in the dialog box.

Data Source Uri The URI for the graph representing the relational data in your local model. This will be the URI you will use in the FROM clause of your SPARQL queries.
Base Uri The base URI is used by the SQL Data Source when generating URIs to describe the relational data as RDF data.
SQL Connection String The connection string for the relational database.
Query Mode The SQL Data Source can be configured to return RDF data that describes the data in the relational tables, that describes the relational schema, or both.

I configured the SQL Data Source to return both Table & Schema Data so we can see both. Using the Model Manager query window I executed the following SPARQL query.

select ?s ?p ?o 
from <http://example.org/graph/Employee>
where {?s ?p ?o} order by ?s ?p

I save the results as to turtle file (via a context menu on the query results) that contains the RDF data shown below:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix ns1: <http://www.w3.org/2001/XMLSchema#>.
@prefix ns2: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix ns3: <http://example.org/data/>.
@prefix ns4: <http://www.w3.org/2002/07/owl#>.

ns3:Employee_Name a ns4:DatatypeProperty;
    ns2:range [
        ns1:maxLength "255"^^ns1:int;
        ns2:subClassOf ns1:string];
    ns2:domain ns3:Employee.
ns3:Employee a ns4:Class.
ns3:Employee_ID a ns4:DatatypeProperty;
    ns2:range ns1:int;
    ns2:domain ns3:Employee.
<http://example.org/data/Employee,1> ns3:Employee_Name "Fred Flintstone"^^ns1:string;
    a ns3:Employee;
    ns3:Employee_ID "1"^^ns1:int.
<http://example.org/data/Employee,2> ns3:Employee_Name "Barney Rubble"^^ns1:string;
    a ns3:Employee;
    ns3:Employee_ID "2"^^ns1:int.

If you examine the RDF data closely you will see the table schema is described as an ontology using RDFS and OWL and the table data is described in terms of that ontology.

Using .NET

The Semantics.SDK allows you do perform the same task I performed above in your own .NET application. This is because the Model Manage itself is based on the Semantics.SDK .NET API. The code segment shown below creates SQL Data Source connected to the relation database, executes a SPARQL query on it, and writes the results to a turtle file.

ClientModel cm = new ClientModel();

SqlDatasource ds = new SqlDatasource(
    "trusted_connection=yes;;server=localhost\mssql2008;database=test", 
    "http://www.example.org/data/");

ds.QueryScope = QUERY_SCOPE.DATA | QUERY_SCOPE.ONTOLOGY;

cm.DataSources["http://www.example.org/graph/Employee"] = ds;

Table res = cm.Query(@"
        select ?s ?p ?o from <http://www.example.org/graph/Employee> 
        where {?s ?p ?o} order by ?s ?p");

GraphDataSource g = new GraphDataSource(res);

using (StreamWriter s = new StreamWriter("c:\\sample.ttl"))
{
    g.Format<TurtleFormatter>(s);
}
Ontology Mapping

Note that the ontology used by the SQL Data Source to describe the relational data isn’t always how you want to model the data in your application. However, now that you have the data described as RDF it in one ontology it is easy to map that data into another ontology using inference rules. That topic I will save for another posting.

Wednesday, December 9, 2009

Full Text Expressions

Many applications need to efficiently search a RDF graph for any statements that contain a set of terms in a literal value. Semantics.SDK and Semantics.Server both provide support for full text expressions to meet this need. In the case of Semantics.Server these full text expressions are evaluated using full text capabilities of Microsoft SQL Server.

Full text expressions are based on the following constructs:

pattern Any simple word match that contains no spaces or punctuation. The matched text must contain the exact word as specified in the pattern.

ex: bluebird
prefix pattern Any valid pattern followed by an asterisk (‘*’). The matched text must contain a word that starts with the specified pattern.

ex: blue*
simple expression A simple expression contains one or more patterns. The matched text must match all the patterns specified.

ex: song bluebird
set expression A set expression contains one or more sets. The matched text must match all the sets in the expression.

ex: {blue red}{large medium}!{dog}
set A set contains one or more patterns delimited by curly braces (‘{}’). The text must match at least one of the patterns in the set.

ex: {blue* red*}
negated set A negated set is a set that is prefixed with an exclamation mark (‘!’). The text must not match any of the patterns in the set.

ex: !{dog cat}

Examples:

The following examples are based on the familiar input string:

“The quick fox jumps over the lazy brown dog.”

Expression Match?
dog laz* yes
{dog laz*}{quick} yes
{dog laz*}!{quick} no
umps no

SPARQL:

The contains function allows you to use a full text expression in a SPARQL query. The example below shows the use of a full text expression in a SPARQL query.

select 1 where {filter(contains("The quick fox jumps over the lazy brown dog.", "dog laz*"))}