Data hub Java classes

Datahub documentation home

The data hub processing is written in Java.

The processing is split into a number of classes, in package com.metrici.datahub. See the API documentation for details.

Some of the classes are specific to the main data hub implementation, and only work with database tables for the message store and target database and file-based schemas.

The main processing is in separate classes that are more general purpose and can be used with different sources of messages and schema interface, and different target data structures. This allows the data hub logic to be reused against different technologies, such as within the Metrici platform.

General-purpose classes

Interfaces

There are three interfaces that define how the data hub logic interacts with persistent data. Implementations of these are required in any solution.

Schema
The schema. This has a method to return the definition of a source system as a SourceSystem object. This in turn can return a set of EntityMapping objects for a source entity defined in the source system.
DataStore
The target data store.
Message
A single message.

Processing

The main processing proceeds through a number of classes.

MessageProcessor
This is instantiated with a schema, data store and message. It reads and validates the message and then passes control to the entity processor to perform further processing.
EntityProcessor
The entity processor is responsible for processing all the target entities for a valid message. It is separate from the MessageProcessor because it is used recursively to process child entities.
This iterates over all entity mappings for the source entity and all records in the data, and invokes a RecordProcessor for each one.
RecordProcessor
Process one record using one set of entity mappings.

Plug-ins

Some actions are carried out through plug-ins, the class names of which are given in the properties file. These are based on classes, not interfaces, because they are instantiated using standard constructors.

PlugIn

This is a general container for user-defined plug-ins, such as message readers and post processors.

The ScriptPlugIn class allows plug-ins to be written using JavaScript. The ScriptPlugIn is passed a global attributes value, with get(key) and put(key,value) methods, from which parameters can be read and to which return values can be written. Different types of plug in have different parameters and return values.

UserAuthenticator
For the InstanceServlet, identifies and authenticates the user. Subclasses provide no authentication, basic authentication (user and password) or JSON web token (JWT) based authentication.
Authorizer
Determines whether the stated user is permitted to perform an action. Subclasses provide no authentication, and authentication based on a permissions file.

Other classes

Other relevant classes include:

MessageStatus
Provides constants for message processing outcomes.
MessageReader

Extends PlugIn to provide default class for classes that interpret the data on messages. The plug-in is passed the following attributes:

  • "data" - message data, as a string.
  • "options" - message options, as a string.
  • "context" - processing context
  • "message" - the message object
  • "entityMapping" - the entity mapping object

It should return in the attribute "records" one of the following:

  • An iterator or array of JSON objects or JSON object strings.
  • A string containing a JSON array of objects.

If the config is JSON and contains a readFile of true, then instead of reading data directly from the message, assume that the message represents a file loaded into the data hub (see the Files topic), and read data from the file itself. This option is useful when dealing with very large files, which can be loaded as files into the data hub and then read with readFile of true.

CSVMessageReader

An extension of MessageReader that reads CSV data. The CSV is converted to JSON by taking field names from the first non-empty row. This follows RFC4180, in particular in its expectations for new lines in individual cells.

CSVMessageReader takes the following configuration properties:

readFile
As for message reader.
delimiter
Field delimiter. Defaults to comma.
quote
Quote character. Defaults to double-quote.
convertNumbers
Indicates that unquoted numbers should be returned as numbers, not strings, i.e. that 1 in the input should be the number 1, not the string "1". Default is true.

MessageFilter

An extension of MessageReader that invokes a reader and then filters the records through the another plug-in. The filter can change and delete records returned by the reader but not add new records.

This takes the following configuration properties:

messageReader
The name of the message reader plug in class, which is the source of records to be filtered.
If not set, the default message reader will be used which expects a JSON array of JSON objects.
messageReaderConfig
The configuration of the message reader.
messageFilter
The name of the message filter plug in class, which is used to filter or amend the records.
messageFilterConfig
Configuration of the message filter.
parseJSON
If set to true, converts the messages to JSON objects before passing them to the filter.

The filter plug in is called for each record.

It is passed "options" and "context" attributes like the message reader. It is also passed in "record" the record.

The filter can:

  • Do nothing, in which case the record is passed through as-is.
  • Set the "record" attribute to a modified record.
  • Set the "record" attribute to null, to remove the record.
  • If parseJSON is true, update the record (rather than create a new record and set the record attribute).
ListPlugIn

An extension of plug ins that runs other plug ins.

The config should be a JSON array of objects. Each object should have plugIn which identifies the class for the plug in to run, and plugInConfig for the class.

The plug ins are run in turn. The same attributes map is passed to each plug in, allowing them to share data.

Implementation-specific classes

The main implementation has some specific classes. Equivalents to these would be required in alternative implementations.

DataHubSchema
Implementation of the Schema interface which reads schema definition from JSON files.
DataHubDataStore
Implementation of the DataStore interface which stores data in database tables.
DataHubMessage
Implementation of the Message interface which reads messages from the message store.
MessageReceiver
Provides logic to load messages into the message store.
MessageLoader
Command-line tool that invokes the MessageReceiver and (if the process option is used) the DataHubMessageProcessor.
ReceiveMessageServlet
Servlet to provide web service access to the data hub. Like the message loader, it passes control to the MessageReceiver and DataHubMessageProcessor.
DataHubMessageMonitor

Looks for unprocessed messages on the message store and passes control (via DataHubMessageMonitorRun and DataHubMessageMonitorTask) to a configured instance of MessageProcessor to process them.

Can be run from the command line and is used by DataHubMessageMonitorServlet.

DataHubMessageMonitorRun
Task that performs a single run of the DataHubMessageMonitor. Can also be run from the command line to process messages.
DataHubMessageProcessingThread
Task that processes a single message.
MessageMonitorServlet
Uses the DataHubMessageMonitor to perform background processing of messages.
InstanceServlet
Performs data access in a multi-tenant environment.
ConnectionManager
Manages connections to the database.
PropertiesManager
Manages retrieval of data hub properties.
DataHubRetriever
Default retriever for data hub.