The data hub processing is written in Java.
The processing is split into a number of classes, in package
com.metrici.datahub. See the API
documentation for details.
Some of the classes are specific to the main data hub implementation, and only work with database tables for the message store and target database and file-based schemas.
The main processing is in separate classes that are more general purpose and can be used with different sources of messages and schema interface, and different target data structures. This allows the data hub logic to be reused against different technologies, such as within the Metrici platform.
There are three interfaces that define how the data hub logic
interacts with persistent data. Implementations of these are
required in any solution.
| Schema |
The schema. This has a method to return the
definition of a source system as a SourceSystem object. This
in turn can return a set of EntityMapping objects for a
source entity defined in the source system. |
| DataStore |
The target data store. |
| Message |
A single message. |
The main processing proceeds through a number of classes.
| MessageProcessor |
This is instantiated with a schema, data
store and message. It reads and validates the message and
then passes control to the entity processor to perform
further processing. |
| EntityProcessor |
The entity processor is responsible for
processing all the target entities for a valid message. It
is separate from the MessageProcessor because it is used
recursively to process child entities. This iterates over all entity mappings for the source entity and all records in the data, and invokes a RecordProcessor for each one. |
| RecordProcessor |
Process one record using one set of entity
mappings. |
Some actions are carried out through plug-ins, the class names of which are given in the properties file. These are based on classes, not interfaces, because they are instantiated using standard constructors.
| PlugIn |
This is a general container for user-defined plug-ins,
such as message readers and post processors. The ScriptPlugIn class allows plug-ins to be written using JavaScript. The ScriptPlugIn is passed a global attributes value, with get(key) and put(key,value) methods, from which parameters can be read and to which return values can be written. Different types of plug in have different parameters and return values.
|
| UserAuthenticator |
For the InstanceServlet, identifies and
authenticates the user. Subclasses provide no
authentication, basic authentication (user and password) or
JSON web token (JWT) based authentication. |
| Authorizer |
Determines whether the stated user is
permitted to perform an action. Subclasses provide no
authentication, and authentication based on a permissions
file. |
Other relevant classes include:
| MessageStatus |
Provides constants for message processing
outcomes. |
||||||||||
| MessageReader |
Extends PlugIn to provide default class for classes that interpret the data on messages. The plug-in is passed the following attributes:
It should return in the attribute "records" one of the following:
If the config is JSON and contains a readFile of true,
then instead of reading data directly from the message,
assume that the message represents a file loaded into the
data hub (see the Files
topic), and read data from the file itself. This option is
useful when dealing with very large files, which can be
loaded as files into the data hub and then read with
readFile of true. |
||||||||||
| CSVMessageReader |
An extension of MessageReader that reads CSV data. The CSV is converted to JSON by taking field names from the first non-empty row. This follows RFC4180, in particular in its expectations for new lines in individual cells. CSVMessageReader takes the following configuration properties:
|
||||||||||
| MessageFilter |
An extension of MessageReader that invokes a reader and
then filters the records through the another plug-in. The
filter can change and delete records returned by the
reader but not add new records. This takes the following configuration properties:
The filter plug in is called for each record. It is passed "options" and "context" attributes like the message reader. It is also passed in "record" the record. The filter can:
|
||||||||||
| ListPlugIn |
An extension of plug ins that runs other plug ins. The config should be a JSON array of objects. Each object should have plugIn which identifies the class for the plug in to run, and plugInConfig for the class. The plug ins are run in turn. The same attributes map is
passed to each plug in, allowing them to share data. |
The main implementation has some specific classes. Equivalents to
these would be required in alternative implementations.
| DataHubSchema |
Implementation of the Schema interface
which reads schema definition from JSON files. |
| DataHubDataStore |
Implementation of the DataStore interface
which stores data in database tables. |
| DataHubMessage |
Implementation of the Message interface
which reads messages from the message store. |
| MessageReceiver |
Provides logic to load messages into the
message store. |
| MessageLoader |
Command-line tool that invokes the
MessageReceiver and (if the process option is used) the
DataHubMessageProcessor. |
| ReceiveMessageServlet |
Servlet to provide web service access to
the data hub. Like the message loader, it passes control
to the MessageReceiver and DataHubMessageProcessor. |
| DataHubMessageMonitor |
Looks for unprocessed messages on the message store and passes control (via DataHubMessageMonitorRun and DataHubMessageMonitorTask) to a configured instance of MessageProcessor to process them. Can be run from the command line and is used by DataHubMessageMonitorServlet.
|
| DataHubMessageMonitorRun |
Task that performs a single run of the
DataHubMessageMonitor. Can also be run from the command
line to process messages. |
| DataHubMessageProcessingThread |
Task that processes a single message. |
| MessageMonitorServlet |
Uses the DataHubMessageMonitor to perform
background processing of messages. |
| InstanceServlet |
Performs data access in a multi-tenant
environment. |
| ConnectionManager |
Manages connections to the database. |
| PropertiesManager |
Manages retrieval of data hub properties. |
| DataHubRetriever |
Default retriever for data hub. |