The datahub uses the Tomcat web server to provide a web service interface and to run a background process to process messages.
There are two ways to run the web server: single instance or multi-instance.
A single instance web server uses a single properties file and schema folder.
A multi instance web server uses multiple properties files and schema folders to access multiple "instances" of the datahub. These could be databases used by different applications, or, in a service offering, databases used by different organisations.
In most cases, it makes sense to use the multi instance web
server, even if you only have one instance within it, as it
provides more functionality such as a get option to retrieve data
and the ability to store and retrieve files in the data hub.
However, some solutions may work better with the single instance
web server because it allows some items of configuration to be
hard coded into the properties file, providing a more locked-down
API. For example, the single instance web server allow the user to
be defaulted on in incoming message, which might be useful for
data feeds from different systems that do not otherwise
self-identify.
The web.xml configuration is different for single and multi instance web server.
Single instance:
<?xml version="1.0" encoding="UTF-8"?>
<web-app version="2.5" xmlns="http://java.sun.com/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd">
<servlet>
<display-name>Monitor</display-name>
<servlet-name>Monitor</servlet-name>
<servlet-class>com.metrici.datahub.MonitorServlet</servlet-class>
<load-on-startup>1</load-on-startup>
<init-param>
<param-name>propertyFile</param-name>
<param-value>/etc/datahub/datahub.properties</param-value>
</init-param>
</servlet>
<servlet>
<display-name>Receive</display-name>
<servlet-name>Receive</servlet-name>
<servlet-class>com.metrici.datahub.ReceiveMessageServlet</servlet-class>
<init-param>
<param-name>system</param-name>
<param-value>*</param-value>
</init-param>
<init-param>
<param-name>propertyFile</param-name>
<param-value>/etc/datahub/datahub.properties</param-value>
</init-param>
<load-on-startup>1</load-on-startup>
</servlet>
<servlet-mapping>
<servlet-name>Receive</servlet-name>
<url-pattern>/receive</url-pattern>
</servlet-mapping>
</web-app>
Multi-instance
<?xml version="1.0" encoding="UTF-8"?>
<web-app version="2.5" xmlns="http://java.sun.com/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd">
<context-param>
<param-name>propertyDir</param-name>
<param-value>/var/datahub/config</param-value>
</context-param>
<servlet>
<display-name>Monitor</display-name>
<servlet-name>Monitor</servlet-name>
<servlet-class>com.metrici.datahub.MonitorServlet</servlet-class>
<load-on-startup>1</load-on-startup>
</servlet>
<servlet>
<display-name>Instance</display-name>
<servlet-name>Instance</servlet-name>
<servlet-class>com.metrici.datahub.InstanceServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>Instance</servlet-name>
<url-pattern>/*</url-pattern>
</servlet-mapping>
</web-app>
The MonitorServlet runs a background process that searches for and processes messages. It is parameterised by properties read from the datahub properties file.
The Monitor Servlet can run in single instance mode or multi
instance. The default is single instance. Add the property multi=true
to datahub.properties to switch on multi instance mode.
The MonitorServlet searches for:
Messages are queued in ascending processed_timestamp sequence, i.e. oldest first.
It requires a monitor.delay property set to the number of
seconds before it first checks for work and a monitor.interval
property set to the number of seconds between checks for new
messages. These defaults to 0 and 60 respectively.
It requires a monitor.queue property set to the maximum
number of messages that will be queued at any one time. This is
useful in systems that can receive large numbers of messages in a
small period of time, to make the system more manageable. This can
be omitted or set to 0 for no limit. The default is 10.
It requires a monitor.threads property to set the maximum number of threads, which defaults to 1. This controls the number of messages that can be processed at once. (monitor.threads can be set to 0 to run all processing on the main thread and wait for it to continue. This is intended for use from some command-line utilities, and should not be used for the servlet).
In multi-instance mode, if you don't want to run the monitor for
one instance, set monitor.run=false in just that instance.
The monitor will skip over the instance. This can be useful for
instances that don't use the normal datahub processing, for
example ones containing only executed services.
If a message returns a requires reprocessing status, the
processor will update the next attempt timestamp. The delay before
the next run depends on the duration of the last run and the age
of the message.
It calculates this using the most recent runtime (in seconds) and the age of the message (processed_timestamp minus message_timestamp). It uses the following properties.
| Property |
Description |
Default |
| monitor.minimumAge |
The minimum age of a message before it can be
processed. Setting a minimum age is required in a
configuration in which some messages are submitted for
immediate processing, to prevent the monitor from attempting
to process those messages in the short time after the
message is written and before immediate processing starts. |
15 |
| monitor.minimumDelayMultiplier |
A number which is multiplied by the most
recent runtime to give the minimum delay for the next run.
For example, if this is set to 10, and a run took 10 seconds
it will not be scheduled to run again for a minimum of 100
seconds. |
10 |
| monitor.minimumDelay | The minimum delay between runs. Typically set to 60 seconds. | 60 |
| monitor.maximumDelay |
The maximum number of seconds before the
message should be reprocessed. This would typically be a
number of hours, such as 14400 for 4 hours. |
62400 |
| monitor.messageTimeout |
The number of seconds after the message timestamp when the message should be considered timed out. This is typically set to a reasonably long period, e.g. 3456000 for 40 days. Can be set to 0 to have no timeout.
|
3456000 |
| monitor.ageExponent |
Used to calculate reprocessing delay. |
1 |
| monitor.ageMultiplier |
Used to calculate reprocessing delay. |
1 |
The next attempt timestamp is calculated by adding a delay to the
processed timestamp. The delay is calculated using:
min(
max(
minimumDelay + ageMultipler x age ^
ageExponent,
duration x minimumDelayMultiplier
),
maximumDelay
)
If this is greater than the message timestamp plus message timeout period, the message is marked as in error and not processed.
The monitor is shut down when the servlet ends. It will wait for current work to complete, up to a maximum of monitor.shutdownTimeout seconds.
The monitor can run in one of two modes: single instance or multi-instance. To turn on multi-instance, set multi=true, and set propertyDir to a directory that contains dorectories each of which represents a single instance and has a datahub.properties file within it. propertyDir defaults the folder containing the main properties file.
For debugging, there is verbosity parameter to control the amount of logging. Default is 0. Set to 1 or 2 for progressively more logging.
When running as a multi-instance server, there is one datahub.properties file for overall processing and one datahub.properties file for each instance.
The main datahub.properties file would typically be in
/var/datahub/config/datahub.properties
The properties file for each instance need to be in a directory named the same as the instance name, relative to the config directory. So the sales properties file would be
/var/datahub/config/sales/datahub.properties
The main datahub.properties file contains properties that pertain to the overall server (or the overall monitor, for running the monitor in multi-mode).
This would typically include monitor properties, and the root directory for the filestore.
multi=true
monitor.threads=5
fileStore=/var/datahub/filestore
The individual instance properties would include database connection, as well as the specific fileStore value for that instance.
dbDriver=com.mysql.jdbc.Driver
dbURL=jdbc:mysql://127.0.0.1/sales
dbUser=sales
dbPassword=hare93
verbosity=1
timestampPrecision=second
elastic=com.metrici.datahub.ElasticSearchDataStore
elastic.host=127.0.0.1
elastic.port=9200
elastic.scheme=http
elastic.timestampPrecision=second
elastic.prefix=sales
authenticator=com.metrici.datahub.AllowAllAuthenticator
authorizer=com.metrici.datahub.AllowAllAuthorizer
fileStore=/var/datahub/filestore/sales
fileURLPrefix=http://localhost:8080/datahub/sales/
The receive message servlet provides a simple web service through which source systems can pass data into the data hub. The source systems POST messages to the web service. See message format for details of the format.
This takes init parameters that follow the same pattern as those for the command line program. Properties are specified separately using the load properties servlet and reprocess and help are not supported.
| system |
This specifies a pipe-delimited list of valid
values for system. If a single value for system is given, it
is used as a default. Use a value of * to mean the default
mapping (i.e. to send in data that conforms to the target
entities). |
| entity | Similar to system, a pipe-delimited list of
valid values for entity If a single value for system is
given, it is used as a default. This option is required with
the allData option. |
| user | Similar to system, a pipe-delimited list of valid values for user If a single value for system is given, it is used as a default. |
| refresh |
If set to true, indicates that the message
contains all the data for the target entity, and that
existing target system records should be deleted |
| format |
If set to "data" indicates that the incoming request contains only data, not a formatted message. If set to "file" indicates that the incoming request should be stored as a file, and data about this file passed in the message. Default value is "message". If this is "data" or "file", then the entity option must be set, and the system option may well be required. |
| process |
If set to true, indicates that the message
should be processed straight away. If set to false, the
default, the message is loaded and can be processed later. |
The receive message servlet can then be mapped to a URL, in this case /receive.
You can define multiple servlets based on the com.metrici.datahub.ReceiveMessageServlet class, each with different parameters (for example, different system and entities). These can be mapped to different URLs and you can set different security constraints on these to secure them separately.
The multi-instance web server uses the instance servlet in place
of the receive message servlet. The servlet to use is identified
by an instance reference that appears as the first part of the
path (after the context path in Tomcat).
Unlike the receive message servlet, it does not allow for default properties (this would not make sense in a multi-tenant environment). However, different actions can be selected using different URL patterns.
The instance servlet supports a get interface backed by the Query
class and for the storage and retrieval of files.
From Linux, you can use the curl command-line tool to send data to the servlet.
The general format of this is
curl -d @file -X POST url
Where file contains the data to be sent to the datahub,
and url is the URL to which the receive message servlet
has been mapped.
If you want to upload a file to be stored (when the format is set to "file"):
For example, to upload an image called mypicture.jpg, you would
do something like
curl --data-binary @mypicture.jpg -H "Content-Type: image/jpeg" -H "File-Name: mypicture.jpg" -X POST url