Data Source Configuration

In sapi-nt, a data source is defined as a resource which accepts queries (usually SPARQL) and emits result sets in response. In other words, a data source is an access point which provides data from the underlying store to the sapi-nt application.

Data source specs are characterised by the following type values:

Type	Meaning
`dataSource.sparql.file`	File-based data source
`dataSource.sparql.classpath`	Classpath resource-based data source
`dataSource.sparql.remote`	Remotely stored data source
`fileSparqlSource`	Deprecated - Please use `dataSource.sparql.file`
`classpathSparqlSource`	Deprecated - Please use `dataSource.sparql.classpath`
`remoteSparqlSource`	Deprecated - Please use `dataSource.sparql.remote`

You can define a data source with a name value of defaultDataSource to declare it the default spec of this type. If you do, sapi-nt will use that data source wherever one is required and not specified explicitly.

Note

Sapi-nt is specialised for querying RDF data using SPARQL. However, you can also integrate other databases such as Cassandra and PostgreSQL with your sapi-nt application.

Text Indexing

You must define and build a Lucene text index on your data source in order to expose the text search features of the sapi-nt API.

For file based data sources, you can configure an indexSpec on the data source spec itself, which is a comma separated list of predicates (in full URI form) from which to create a text index. The property rdfs:label will automatically be included in any index created in this way. Sapi-nt will build the text index when it initialises the store. Alternatively, you can set a value of default to set up a simple default text index (on rdfs:label).

For remote data sources, the text index will be determined by the particular configuration of your SPARQL server. For a Fuseki server, follow the official documentation.

Batch Processing

When batch processing is enabled, it may be necessary to define a separate data source that uses the same SPARQL store but has a higher resource allowance (for example, timeout), which can be assigned as the “batch data source” for enabled endpoints. This is the intended way to implement batch functionality in sapi-nt applications.

Data Source Types

File

File data sources, having a type of dataSource.sparql.file, use an RDF file or set of files on the local file system as their underlying store. They can issue SPARQL queries against the data in the file. File data sources are suited to storing small amounts of static data, for example for testing.

A file data source spec can have the following properties:

Property	Meaning	Default
`sourceDir`	The location of a file or directory from which to load RDF data.
`indexSpec`	A comma separated list of predicates from which to create a text index.
`multiIndex`	If true, and an `indexSpec` is given, configures a separate text index for each predicate. Otherwise, all predicates are stored in the same index.	false

Example

name      : localSource
type      : dataSource.sparql.file
sourceDir : src/test/data/fileSparqlData
indexSpec : default

Classpath

Classpath data sources, having a type of dataSource.sparql.classpath, are similar to file data sources except that they locate RDF files under the application’s classpath resources (src/main/resources).

A classpath data source spec can have the following properties:

Property	Meaning	Default
`sourceLocation`	Comma separated list of classpath resources from which to load RDF data. Should be prefixed with `classpath:`.

The sourceLocation value can use Spring path-matching pattern syntax.

Example

name           : packagedSource
type           : dataSource.sparql.classpath
sourceLocation : classpath:def/*

Remote

Remote data sources, having a type of dataSource.sparql.remote, use a remote SPARQL server such as Fuseki as their underlying data source. They can issue SPARQL queries over HTTP to the store’s API. Usually, the default data source for a sapi-nt application will be a remote data source.

A remote data source spec can have the following properties:

Property	Meaning	Default
`endpoint`	URL for the SPARQL query endpoint.
`graphEndpoint`	URL for the graph store protocol endpoint.
`postgisSupported`	A map of geo filter query names to the postgis query names defined on the SPARQL source.
`timeout`	The timeout, in seconds, that sapi-nt will request from the SPARQL server (Fuseki only).
`initialTimeout`	The initial timeout (until the first query result), in seconds, that sapi-nt will request from the SPARQL server (Fuseki only).

Example

name     : defaultDataSource
type     : dataSource.sparql.remote
endpoint : http://localhost:3030/ds/query
postgisSupported:
  "CircleFilter"  : "findWithin"
  "PolygonFilter" : "findGeoJson"

Cassandra

Time series endpoints can use a Cassandra database as their data source. See the Cassandra integration documentation for more information.

SQL

Time series endpoints can use a PostgreSQL database as their data source. See the PostgreSQL integration documentation for more information.