Asynchronous Processing

SAPI-NT supports asynchronous (batch) processing of requests. This is intended to help the API to respond to long-running and high-volume requests by using the following measures:

Requests are queued and executed sequentially, not concurrently.
SAPI-NT can request privileged access to the necessary resources (for example by requesting an extended query timeout for a Fuseki database).
The formatted results of the request are cached so that they can be retrieved immediately on subsequent requests.

The queue is typically a data store such as a SQL database, but could also be an in-memory store.

The cache is typically a local or remote file system such as Amazon S3 where results can be stored in an appropriate file format.

The underlying framework is armlib-spring, which is derived from armlib by adding Kotlin and Spring framework integration.

SAPI-NT does not control the maintenance or clearing of the queue or cache. If you want to invalidate or add pre-made entries to the cache, you can do so independently of your SAPI-NT application.

Basic Life Cycle

The API author configures the queue and cache implementation for their SAPI-NT application, using standard Spring Beans functionality.
On startup, the SAPI-NT application connects to the queue and begins polling for pending requests.
On receiving a batch request, it is serialized and stored in the queue as a pending request. The caller receives a response to let them know that the request is pending.
When the pending request is detected, the application starts to execute it and marks it as in progress.
When the request is finished, it is marked as complete or failed. If it was completed successfully, the results of the request are stored as a file in the cache.
When the same batch request is subsequently received again, the application will return the contents of the cached file as its response.

Configuration

API

Batch processing is configured at the API level by certain Spring application properties that are stored under the sapi-nt.batch namespace.

The following properties determine how the API will respond to batch requests.

Property	Meaning	Default
`enabled`	Controls whether to respond to batch requests and add them to the queue. *DEPRECATED: Use `sapi-nt.web.controller.batch.enabled` instead.*	`false`
`path`	The URL path segment that the user will append to an existing endpoint path to make a batch request to that endpoint. *DEPRECATED: Use `sapi-nt.web.controller.batch.path` instead.*	`/batch`

The following properties determine how batch requests that are queued should be processed.

Property	Meaning	Default
`process.enabled`	Controls whether SAPI-NT will take batch requests from the head of the queue and execute them.	`false`
`process.retry`	The number of times a request should be retried before marking it as failed and moving on to the next.	0
`process.timeout`	The interval (in ms) at which to poll the queue for pending requests.	1000
`process.retryInit`	The interval (in ms) at which to retry the batch processor initialisation, if it fails on start up.

The most important properties are sapi-nt.batch.enabled and sapi-nt.batch.process.enabled, which must both be set to true in order to fully enable the batch processing features of SAPI-NT.

However, you may want to use a different application to process queued requests while making the results accessible through your SAPI-NT API. In this case, you should set the sapi-nt.batch.enabled property to true and the sapi-nt.batch.process.enabled property to false.

Endpoints

With batch processing enabled at the API level, you can configure whether it will be available for each of your endpoints. This is determined by the following properties in the endpoint specification:

Property	Meaning	Default
`batch.enabled`	Controls whether batch processing will be available for the endpoint.	`false`
`batch.dataSource`	The identifying name of a data source that has been configured by your application, similar to the usual `dataSource` property.
`batch.contentType`	The MIME type that the results of batch processing will be served as.	text/csv
`batch.estimate`	The estimated time in milliseconds to complete one request. Determines the ETA value on batch status responses.	60000

The data source that you specify should be configured to sustain long-running and high-volume queries, for example by having an increased time limit. The underlying store may be the same as the usual data source for the endpoint.

Note that if you have configured a softLimit for your endpoint that limits the number of items retrieved when no user limit is given, it will be ignored when executing the query in batch mode.

Queue and Cache

The queue is a data store which keeps track of requests and their status. SAPI-NT interacts with the store through the QueueManager interface defined in armlib-spring.

The cache is a file system which stores the results of previously completed requests as files. SAPI-NT interacts with the cache through the CacheManager interface defined in armlib-spring.

You can use one of the standard queue and cache manager implementations that are included in armlib-spring-impl or write a custom implementation. If you want to use a standard implementation, you must add the armlib-spring-impl dependency to your project POM.

<dependency>
    <groupId>com.epimorphics</groupId>
    <artifactId>armlib-spring-impl</artifactId>
    <version>0.2.2</version>
</dependency>

To configure the queue and cache managers, and to make them available to the SAPI-NT application context, define them as Spring Beans by using Spring annotations or XML configuration. The exact configuration options will depend on the implementations you choose.

Note

If you use XML configuration, you must make sure that your application imports the corresponding resource. You can do this by adding the @ImportResource("classpath:your-file-here.xml") annotation to your main class (usually SapiNtApplication).

Example XML Configuration

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
			 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
			 xsi:schemaLocation="http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans.xsd">

	<bean id="cacheManager" class="com.epimorphics.armlib.impl.S3CacheManager">
		<property name="compressed" value="true" />
	</bean>

	<bean id="queueManager" class="com.epimorphics.armlib.impl.SqlQueueManager">
		<constructor-arg name="config" ref="queueManagerConfig"/>
	</bean>

	<bean id="queueManagerConfig" class="com.epimorphics.armlib.impl.SqlQueueManager.Config">
		<constructor-arg name="deleteOnComplete" value="false"/>
	</bean>
</beans>

Spring Auto Configuration

Sapi-nt and armlib make use of Spring auto-configuration to initialise certain beans, such as DataSource, if needed. This allows any implementation to be injected into the application automatically simply by adding the dependency to the classpath. However, this may result in errors at start-up if the auto-configured beans are misconfigured for your environment. In that case, you should use profile-based configuration to define your intended usage.

Data Source

When using the SQL queue manager implementation, you will need to configure a data source. If you do not, and auto-configuration is enabled, your application will emit the following error on start-up:

***************************
APPLICATION FAILED TO START
***************************

Description:

Failed to configure a DataSource: 'url' attribute is not specified and no embedded datasource could be configured.

Reason: Failed to determine a suitable driver class

You can prevent this error by either disabling auto-configuration of data sources (see the section on profile-based configuration), or by configuring the spring.datasource.* properties in your application.yaml file. The basic properties you will need to initialise your data source are the following:

Property	Meaning
`url`	The JDBC URL of your database.
`driverClassName`	The fully qualified name of the driver to use (usually `org.postgresql.Driver`).
`username`	The username to log in to the database, if necessary.
`password`	The password to log in to the database, if necessary.

Read the Spring documentation for additional options.

Example

spring:
  datasource:
    driverClassName: org.postgresql.Driver
    url: jdbc:postgresql://localhost:5432/db

Profile Based Configuration

You may need to use distinct configurations for production and development environments. You can do this by using Spring profiles and defining profile-specific config options and Spring beans.

For profile-based YAML configuration, create a file named application-{profile}.yaml in the resources directory of your application. For example, to disable the auto-configuration of the SQL data source in development, create the following application-dev.yaml:

spring:
  autoconfigure:
    exclude:
    - "org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration"

For profile-based XML bean configuration, create a separate beans element with a profile attribute for each of your profiles. For example, to use S3 and SQL cache and queue implementations by default, and file and memory implementations in development, change your XML configuration to the following:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
			 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
			 xsi:schemaLocation="http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans.xsd">

	<beans profile="default">
		<bean id="cacheManager" class="com.epimorphics.armlib.impl.S3CacheManager">
			<property name="compressed" value="true" />
		</bean>

		<bean id="queueManager" class="com.epimorphics.armlib.impl.SqlQueueManager">
			<constructor-arg name="config" ref="queueManagerConfig"/>
		</bean>

		<bean id="queueManagerConfig" class="com.epimorphics.armlib.impl.SqlQueueManager.Config">
			<constructor-arg name="deleteOnComplete" value="false"/>
		</bean>
	</beans>

	<beans profile="dev">
		<bean id="cacheManager" class="com.epimorphics.armlib.impl.FileCacheManager">
		</bean>

		<bean id="queueManager" class="com.epimorphics.armlib.impl.MemQueueManager">
		</bean>
	</beans>
</beans>

To run the sapi-nt application under a certain profile, you can use the --spring.profiles.active={profile} program argument. For example, to use the dev profile, supply the argument --spring.profiles.active=dev.

Usage

Batch Request

API Endpoints which have batch processing enabled can be called in “batch mode” by appending /batch to the end of the URL path (note that the path can be configured by setting the sapi-nt.batch.path Spring property). Path variables and query parameters can be used in the same way as with the usual endpoint. For example, the URL

/stream/AB-01/package?_view=compact&min-mass=1000

could be rewritten as

/stream/AB-01/package/batch?_view=compact&min-mass=1000

in order to execute the same request asynchronously.

If the results of the request have already been cached, they will be downloaded immediately. Otherwise, you will be redirected to the batch status page, which will show some information about the state of the request in the queue. Refresh this page to check the progress of an ongoing request, until eventually it will show the download URL for the results. Alternatively, once processing is complete, you can navigate back to the original batch endpoint to download (these are the same URL in the current implementation).

Some requests may fail due to invalid parameters, or by exceeding resource limitations. In this case, the batch status page will indicate that the request has failed, and the reason for the failure will be logged by the SAPI-NT application. Failed requests can be attempted again by sending another request to the batch endpoint.

POST Method

You can use POST requests to initiate and inspect the status of batch requests. Unlike GET requests, they will not return the result data for the request.

You can send a POST request to any batch endpoint to add the request to the queue if it hasn’t been added already. If the request was added successfully, or was already added, the response will redirect you to the status page for your request. You will be redirected even if the request has already been completed.

Batch Status

Similarly, you can append /status to the end of the batch URL path to get information about the state of the batch request in the queue. So the URL above could be written as

/stream/AB-01/package/batch/status?_view=compact&min-mass=1000

The response will be in JSON format, having the following properties:

Property	Meaning
`key`	The uniquely identifying key for the request. This will usually be some composition of the request path and query parameters
`status`	The status of the request. Possible values are `Pending`, `InProgress`, `Completed` and `Failed`.
`url`	The URL from which the results of the request can be downloaded, if it is complete.
`positionInQueue`	The position of the request in the queue. A value of `0` denotes that the request is already in progress.
`eta`	The estimated time until the request will be complete. This may not be representative of the actual time to completion.
`started`	The Unix timestamp of the time when the request started executing, if it has been started.