SAPI-NT supports asynchronous (batch) processing of requests. This is intended to help the API to respond to long-running and high-volume requests by using the following measures:
The queue is typically a data store such as a SQL database, but could also be an in-memory store.
The cache is typically a local or remote file system such as Amazon S3 where results can be stored in an appropriate file format.
The underlying framework is armlib-spring, which is derived from armlib by adding Kotlin and Spring framework integration.
SAPI-NT does not control the maintenance or clearing of the queue or cache. If you want to invalidate or add pre-made entries to the cache, you can do so independently of your SAPI-NT application.
Batch processing is configured at the API level by certain Spring application properties that are stored under the sapi-nt.batch namespace.
The following properties determine how the API will respond to batch requests.
| Property | Meaning | Default |
|---|---|---|
enabled |
Controls whether to respond to batch requests and add them to the queue. DEPRECATED: Use sapi-nt.web.controller.batch.enabled instead. |
false |
path |
The URL path segment that the user will append to an existing endpoint path to make a batch request to that endpoint. DEPRECATED: Use sapi-nt.web.controller.batch.path instead. |
/batch |
The following properties determine how batch requests that are queued should be processed.
| Property | Meaning | Default |
|---|---|---|
process.enabled |
Controls whether SAPI-NT will take batch requests from the head of the queue and execute them. | false |
process.retry |
The number of times a request should be retried before marking it as failed and moving on to the next. | 0 |
process.timeout |
The interval (in ms) at which to poll the queue for pending requests. | 1000 |
process.retryInit |
The interval (in ms) at which to retry the batch processor initialisation, if it fails on start up. |
The most important properties are sapi-nt.batch.enabled and sapi-nt.batch.process.enabled,
which must both be set to true in order to fully enable the batch processing features of SAPI-NT.
However, you may want to use a different application to process queued requests while making the results accessible through your SAPI-NT API.
In this case, you should set the sapi-nt.batch.enabled property to true and the sapi-nt.batch.process.enabled property to false.
With batch processing enabled at the API level, you can configure whether it will be available for each of your endpoints. This is determined by the following properties in the endpoint specification:
| Property | Meaning | Default |
|---|---|---|
batch.enabled |
Controls whether batch processing will be available for the endpoint. | false |
batch.dataSource |
The identifying name of a data source that has been configured by your application, similar to the usual dataSource property. |
|
batch.contentType |
The MIME type that the results of batch processing will be served as. | text/csv |
batch.estimate |
The estimated time in milliseconds to complete one request. Determines the ETA value on batch status responses. | 60000 |
The data source that you specify should be configured to sustain long-running and high-volume queries, for example by having an increased time limit. The underlying store may be the same as the usual data source for the endpoint.
Note that if you have configured a softLimit for your endpoint that limits the number of items retrieved when no user limit
is given, it will be ignored when executing the query in batch mode.
The queue is a data store which keeps track of requests and their status. SAPI-NT interacts with the store through the QueueManager interface
defined in armlib-spring.
The cache is a file system which stores the results of previously completed requests as files. SAPI-NT interacts with the cache through
the CacheManager interface defined in armlib-spring.
You can use one of the standard queue and cache manager implementations that are included in armlib-spring-impl
or write a custom implementation. If you want to use a standard implementation, you must add the armlib-spring-impl dependency to your project POM.
<dependency>
<groupId>com.epimorphics</groupId>
<artifactId>armlib-spring-impl</artifactId>
<version>0.2.2</version>
</dependency>
To configure the queue and cache managers, and to make them available to the SAPI-NT application context, define them as Spring Beans by using Spring annotations or XML configuration. The exact configuration options will depend on the implementations you choose.
If you use XML configuration, you must make sure that your application imports the corresponding resource.
You can do this by adding the @ImportResource("classpath:your-file-here.xml") annotation to your main class (usually SapiNtApplication).
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="cacheManager" class="com.epimorphics.armlib.impl.S3CacheManager">
<property name="compressed" value="true" />
</bean>
<bean id="queueManager" class="com.epimorphics.armlib.impl.SqlQueueManager">
<constructor-arg name="config" ref="queueManagerConfig"/>
</bean>
<bean id="queueManagerConfig" class="com.epimorphics.armlib.impl.SqlQueueManager.Config">
<constructor-arg name="deleteOnComplete" value="false"/>
</bean>
</beans>
Sapi-nt and armlib make use of Spring auto-configuration to initialise certain beans, such as DataSource, if needed.
This allows any implementation to be injected into the application automatically simply by adding the dependency to the classpath.
However, this may result in errors at start-up if the auto-configured beans are misconfigured for your environment.
In that case, you should use profile-based configuration to define your intended usage.
When using the SQL queue manager implementation, you will need to configure a data source. If you do not, and auto-configuration is enabled, your application will emit the following error on start-up:
***************************
APPLICATION FAILED TO START
***************************
Description:
Failed to configure a DataSource: 'url' attribute is not specified and no embedded datasource could be configured.
Reason: Failed to determine a suitable driver class
You can prevent this error by either disabling auto-configuration of data sources (see the section on profile-based configuration),
or by configuring the spring.datasource.* properties in your application.yaml file.
The basic properties you will need to initialise your data source are the following:
| Property | Meaning |
|---|---|
url |
The JDBC URL of your database. |
driverClassName |
The fully qualified name of the driver to use (usually org.postgresql.Driver). |
username |
The username to log in to the database, if necessary. |
password |
The password to log in to the database, if necessary. |
Read the Spring documentation for additional options.
spring:
datasource:
driverClassName: org.postgresql.Driver
url: jdbc:postgresql://localhost:5432/db
You may need to use distinct configurations for production and development environments. You can do this by using Spring profiles and defining profile-specific config options and Spring beans.
For profile-based YAML configuration, create a file named application-{profile}.yaml in the resources directory of your application.
For example, to disable the auto-configuration of the SQL data source in development, create the following application-dev.yaml:
spring:
autoconfigure:
exclude:
- "org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration"
For profile-based XML bean configuration, create a separate beans element with a profile attribute for each of your profiles.
For example, to use S3 and SQL cache and queue implementations by default, and file and memory implementations in development,
change your XML configuration to the following:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<beans profile="default">
<bean id="cacheManager" class="com.epimorphics.armlib.impl.S3CacheManager">
<property name="compressed" value="true" />
</bean>
<bean id="queueManager" class="com.epimorphics.armlib.impl.SqlQueueManager">
<constructor-arg name="config" ref="queueManagerConfig"/>
</bean>
<bean id="queueManagerConfig" class="com.epimorphics.armlib.impl.SqlQueueManager.Config">
<constructor-arg name="deleteOnComplete" value="false"/>
</bean>
</beans>
<beans profile="dev">
<bean id="cacheManager" class="com.epimorphics.armlib.impl.FileCacheManager">
</bean>
<bean id="queueManager" class="com.epimorphics.armlib.impl.MemQueueManager">
</bean>
</beans>
</beans>
To run the sapi-nt application under a certain profile, you can use the --spring.profiles.active={profile} program argument.
For example, to use the dev profile, supply the argument --spring.profiles.active=dev.
API Endpoints which have batch processing enabled can be called in “batch mode” by appending /batch to the end
of the URL path (note that the path can be configured by setting the sapi-nt.batch.path Spring property).
Path variables and query parameters can be used in the same way as with the usual endpoint.
For example, the URL
/stream/AB-01/package?_view=compact&min-mass=1000
could be rewritten as
/stream/AB-01/package/batch?_view=compact&min-mass=1000
in order to execute the same request asynchronously.
If the results of the request have already been cached, they will be downloaded immediately. Otherwise, you will be redirected to the batch status page, which will show some information about the state of the request in the queue. Refresh this page to check the progress of an ongoing request, until eventually it will show the download URL for the results. Alternatively, once processing is complete, you can navigate back to the original batch endpoint to download (these are the same URL in the current implementation).
Some requests may fail due to invalid parameters, or by exceeding resource limitations. In this case, the batch status page will indicate that the request has failed, and the reason for the failure will be logged by the SAPI-NT application. Failed requests can be attempted again by sending another request to the batch endpoint.
You can use POST requests to initiate and inspect the status of batch requests. Unlike GET requests, they will not return the result data for the request.
You can send a POST request to any batch endpoint to add the request to the queue if it hasn’t been added already. If the request was added successfully, or was already added, the response will redirect you to the status page for your request. You will be redirected even if the request has already been completed.
Similarly, you can append /status to the end of the batch URL path to get information about the state of the batch request
in the queue. So the URL above could be written as
/stream/AB-01/package/batch/status?_view=compact&min-mass=1000
The response will be in JSON format, having the following properties:
| Property | Meaning |
|---|---|
key |
The uniquely identifying key for the request. This will usually be some composition of the request path and query parameters |
status |
The status of the request. Possible values are Pending, InProgress, Completed and Failed. |
url |
The URL from which the results of the request can be downloaded, if it is complete. |
positionInQueue |
The position of the request in the queue. A value of 0 denotes that the request is already in progress. |
eta |
The estimated time until the request will be complete. This may not be representative of the actual time to completion. |
started |
The Unix timestamp of the time when the request started executing, if it has been started. |