In sapi-nt, a data source is defined as a resource which accepts queries (usually SPARQL) and emits result sets in response. In other words, a data source is an access point which provides data from the underlying store to the sapi-nt application.
Data source specs are characterised by the following type values:
| Type | Meaning |
|---|---|
dataSource.sparql.file |
File-based data source |
dataSource.sparql.classpath |
Classpath resource-based data source |
dataSource.sparql.remote |
Remotely stored data source |
fileSparqlSource |
Deprecated - Please use dataSource.sparql.file |
classpathSparqlSource |
Deprecated - Please use dataSource.sparql.classpath |
remoteSparqlSource |
Deprecated - Please use dataSource.sparql.remote |
You can define a data source with a name value of defaultDataSource to declare it the default spec of this type.
If you do, sapi-nt will use that data source wherever one is required and not specified explicitly.
Sapi-nt is specialised for querying RDF data using SPARQL. However, you can also integrate other databases such as Cassandra and PostgreSQL with your sapi-nt application.
You must define and build a Lucene text index on your data source in order to expose the text search features of the sapi-nt API.
For file based data sources, you can configure an indexSpec on the data source spec itself, which is a comma separated
list of predicates (in full URI form) from which to create a text index.
The property rdfs:label will automatically be included in any index created in this way.
Sapi-nt will build the text index when it initialises the store.
Alternatively, you can set a value of default to set up a simple default text index (on rdfs:label).
For remote data sources, the text index will be determined by the particular configuration of your SPARQL server. For a Fuseki server, follow the official documentation.
When batch processing is enabled, it may be necessary to define a separate data source that uses the same SPARQL store
but has a higher resource allowance (for example, timeout), which can be assigned as the “batch data source”
for enabled endpoints. This is the intended way to implement batch functionality in sapi-nt applications.
File data sources, having a type of dataSource.sparql.file, use an RDF file or set of files on the local file system as their underlying store.
They can issue SPARQL queries against the data in the file.
File data sources are suited to storing small amounts of static data, for example for testing.
A file data source spec can have the following properties:
| Property | Meaning | Default |
|---|---|---|
sourceDir |
The location of a file or directory from which to load RDF data. | |
indexSpec |
A comma separated list of predicates from which to create a text index. | |
multiIndex |
If true, and an indexSpec is given, configures a separate text index for each predicate. Otherwise, all predicates are stored in the same index. |
false |
name : localSource
type : dataSource.sparql.file
sourceDir : src/test/data/fileSparqlData
indexSpec : default
Classpath data sources, having a type of dataSource.sparql.classpath, are similar to file data sources except that they
locate RDF files under the application’s classpath resources (src/main/resources).
A classpath data source spec can have the following properties:
| Property | Meaning | Default |
|---|---|---|
sourceLocation |
Comma separated list of classpath resources from which to load RDF data. Should be prefixed with classpath:. |
The sourceLocation value can use Spring path-matching pattern syntax.
name : packagedSource
type : dataSource.sparql.classpath
sourceLocation : classpath:def/*
Remote data sources, having a type of dataSource.sparql.remote, use a remote SPARQL server such as Fuseki as their underlying data source.
They can issue SPARQL queries over HTTP to the store’s API.
Usually, the default data source for a sapi-nt application will be a remote data source.
A remote data source spec can have the following properties:
| Property | Meaning | Default |
|---|---|---|
endpoint |
URL for the SPARQL query endpoint. | |
graphEndpoint |
URL for the graph store protocol endpoint. | |
postgisSupported |
A map of geo filter query names to the postgis query names defined on the SPARQL source. | |
timeout |
The timeout, in seconds, that sapi-nt will request from the SPARQL server (Fuseki only). | |
initialTimeout |
The initial timeout (until the first query result), in seconds, that sapi-nt will request from the SPARQL server (Fuseki only). |
name : defaultDataSource
type : dataSource.sparql.remote
endpoint : http://localhost:3030/ds/query
postgisSupported:
"CircleFilter" : "findWithin"
"PolygonFilter" : "findGeoJson"
Time series endpoints can use a Cassandra database as their data source. See the Cassandra integration documentation for more information.
Time series endpoints can use a PostgreSQL database as their data source. See the PostgreSQL integration documentation for more information.