Data sub schema

Datahub documentation home

Data hub processing is driven by definitions of the source data and the target databases. These definitions are collectively known as the schema. You may need to read the processing section and this section together to fully understand the schema.

The schema is structured around entities (tables) and fields (columns). The schema maps from source entities and fields to target entities and fields.

There is one definition for each system that can send data to the data hub. This source system can send one or more entities, each of these source entities can be mapped to one or more entities in the target data structures, and each of these entity mapping can contain multiple field mappings.

Where the schema is held

The schema is coded as JSON file, held in a directory identified with the data hub properties file.

There is one JSON file for each system that sends data to the data hub which defines the mapping from that system to the data hub. There is also one default file which defines the data in the data hub with no mapping.

The files are named using the system references, followed by .json. So a system with a reference "system_one" would have its schema definition in system_one.json in the schema directory.

The default schema is always called default.json.

Format of the schema

{
"reference": "system_reference",
"name": "System Name",
"mappings": {
"source_entity_reference": [
{
... source entity mapping object – see below ...
}, ... more ...
]
}
}

The schema is an object with the following properties.

reference
A reference for the system. This should be the same as the reference used to build the schema file name.
name
An optional name for the system. May be used for documentation. Defaults to the system reference.
mappings
An object that allows an array of entity mapping objects to be looked up for a different source

Source entity mapping object

{
"reference": "target_entity_reference",
"name": "Target Entity Name",
"sourceReference": "source_entity_reference",
"sourceName": "Source Entity Name",
"messageReader": "message.reader.name",
"messageReaderConfig": "config",
"unique": true|false,
"uniqueIdentifier": "target_identifier_field_reference",
"sourceUniqueIdentifier": "source_identifier_field_reference",
"uniqueSequence": "target_sequence_field_reference",
"sourceUniqueSequence": "source_sequence_field_reference",
"dataStore": "dataStoreReference",
"priority": priority,
"export": false|true,
"exportIdentifier": "id specifier",
"exportStandardFields": false|true,
"processor": "processor.name",
"processorConfig": "config",
"messageRetentionPeriod": days,
"retainFiles": false|true,
"fields": [
.. fields ..
],
"execute": false|true
}

The source entity mapping object maps a source system entity to a target entity. It has the following properties.

reference

Reference to target entity (table name).

This can be set to "NULL" (or "null", but not null) to indicate that the data should not be mapped to any entity. NULL entities can have children fields, allowing them to be used as containers for otherwise unrelated sets of data.

Unlike other children, the children of null entities are not refreshed unless a refresh is passed to the null entity itself.

name
Name of target entity. Optional. Used for documentation.
sourceReference
Reference by which the entity is known in the source. Defaults to the reference.
sourceName
Name of source entity. Optional. Defaults to name.
messageReader

Optional identifier of plug-in class used to read message. This is described in more detail in the Java classes topic.

If there is more than one target entity, the message reader and message reader config specified for the first target entity are used for all target entities.

If you want to process the message through two message readers, use the message reprocessor post-processing plug-in - see post processing for details.

messageReaderConfig
Config string used to instantiate the message reader. (This is a string even if config is JSON.)
unique
If set to true, indicates that this represents a unique set of records, for example transactions. Default is false.
uniqueIdentifier
If unique is true, reference of field on the target table that identifies this set of records. This will be populated with the message identifier. The field should be a character field of at least 36 characters in length. This and the uniqueSequence should be on the list of fields for the target table, and should be identified as the key.
sourceUniqueIdentifier
Reference of field on the source message to which the uniqueIdentifier will be written. Defaults to uniqueIdentifier.
uniqueSequence
If unique is true, reference to integer field on the target table that is used to hold the record sequence.
sourceUniqueSequence
Reference of field on the source message to which the uniqueSequence will be written. Defaults to uniqueSequence.
dataStore

Optional reference for the type of data store in which data for this entity should be stored. Defaults to "database".

priority
Controls whether messages from this source system/entity can delete or create records created or deleted by other systems. Higher priorities take precedence. Defaults to 0.
export
Set to true to indicate this is an export entity. Export entities are write-only structures to which the data hub sends data, but which the data hub does not read.
exportIdentifier
Specifies how the record id should be built for an export. Depends on the data store class being used, but would typically hold a field reference.
exportStandardFields
Indicates that standard fields should be exported.
processor

Optional identifier of plug-in class used to define additional processing for this entity.

In a put operation, the processor is run after the data has been processed. It is passed the message options as a string in the attribute "options".

If there is more than one target entity, the processor is called once after all the target entities have been populated. The processor and processor config specified for the first target entity are used for all target entities.

If an execution has been called, the processor defines the execution, and performs all actions.

processorConfig
Config used to create processor.
messageRetentionPeriod
For how long, in days, the message should be retained in the message store. Default is 0, which mean indefinitely.
The retention period applies from the processed timestamp on the message process table, and only applies where the message has been processed successfully or has a permanent error.
retainFiles
Whether files associated with this entity should be retained after the associated message has been deleted. Default is false, which means the files may be deleted when the message is deleted.
fields
An array of fields to be read from the record and how they should be mapped to the target tables.
execute

If set to true, indicates that an execution is required. If there is an execution, the input data is not written to the message store and no processing other than the execute is triggered. Instead the data is passed to the plug in defined by the "processor" property, and configured by the "processorConfig" property. Other properties are not used.

execute must be set on the first entity mapping for a source entity.

A simple "Hello World" execution, using the script plug-in, would be defined like this.

{
"reference":"myexecute",
"name":"My Execute",
"execute": true,
"processor": "com.metrici.datahub.ScriptPlugIn",
"processorConfig": "attributes.put('response',{message:'Hello world'});"
}

Field object

{
"reference": "target_field_reference",
"name": "Target Field Name",
"sourceReference": "source_field_reference",
"sourceName": "Source Field Name",
"key": true|false,
"type": "text|number|date|timestamp|boolean|link|children",
"length": length,
"scale": scale,
"precision": precision,
"linkEntity": "target_link_entity_reference",
"sourceLinkEntity": "source_link_entity_reference",
"linkKey": "link_key",
"sourceLinkKey": "source_link_key",
"childEntity": "target_child_entity_reference",
"sourceChildEntity": "source_child_entity_reference",
"parentIdentifier": "target_parent_identifier_reference",
"sourceParentIdentifier": "source_parent_identifier_reference",
"childSequence": "target_sequence_field_reference",
"sourceChildSequence": "source_sequence_field_reference",
"priority": priority
}

The field object identifies a field in the source entity that should be mapped to the target entity. It has the following properties.

reference
Reference to target field (column name).
name
Name for target field. Optional, used in documentation. Defaults to the reference.
sourceReference
Reference of field in source entity. Defaults to reference.
sourceName
Name for source field. Optional, used in documentation. Defaults to the name.
key
Set to true to indicate this field is part of the key of the record.
type

The data type of the target field.

number
A general number data type. Implemented as a double.
text

A text data type. If length is set and is 255 or less, will be held as a varchar. Otherwise it will be held as a long text object of indeterminate length.

Trailing spaces are not considered significant and are removed from input data. This allows for consistency between fixed-length strings and variable-length strings in source systems.

smallint
A signed 2-byte integer number.
integer
A signed 4-byte integer number.
bigint
A signed 8-byte integer number.
double
A signed double-precision (8-byte) floating point number.
decimal
A signed number with a fixed number of decimal places. The total number of digits is specified by the precision property. The number after the decimal point are specified by the scale property.
date
A date. In the incoming data this should be a string in format yyyy-mm-dd
timestamp
A date and time. In the incoming data this should be a string in format yyyy-mm-ddThh:mm:ss.ttt. Note that there is a T between the date and time portion, but the more common database convention of using a single space in place of the T is permitted. Seconds and fractional seconds are optional.
boolean
A true/false value. This can be boolean true or false, the strings "true" or "false", a non-zero number (true) or zero (false), or a string containing a number.
link

A foreign key relationship. The parent entity is identified by the linkEntity. The field to be used to look up the key is identified by the linkKey. If more than one field is involved in the key, the linkKey will contain an array of source fields references.

See link examples for examples of specifying links and children.

children

A one-to-many parent to child link, where the parent is part of the identifying key of the children.

See link examples for examples of specifying links and children.

length

For type of text, maximum length of the text. If omitted or 0, no maximum is applied.

precision
For type of decimal, the total number of digits (including those after the decimal point).
scale
For type of decimal, the number of digits after the decimal point.
linkEntity
For type of link, the target reference to the entity the retriever of which should be used to map the key.
sourceLinkEntity
For a type of link, the source reference to the entity the retriever of which should be used to map the key. The first source entity mapping for the entity is used for key resolution. Defaults to linkEntity.
linkKey

For type of link, references of the field or fields required to resolve the link entity. Can be a single string or an array of strings.

sourceLinkKey

For type of link, source references of the field or fields required to resolve the link entity. Can be a single string or an array of strings.

These keys should appear in the input record, and should match the source references of the parent entity's keys.

Defaults to the linkKey.

autocreate
For a type of link, if the parent entity record does not exist, should it be created if possible? Default is false.
childEntity
For type of children, the reference to the entity which should be used for the child rows.
sourceChildEntity
For type of children, the reference to the source entity which should be used for child rows. Defaults to childEntity.
parentIdentifier
For type of children, the reference of the field on the children entity which holds the link back to the parent.
sourceParentIdentifier
For type of children, reference of the parentField to be added to the source entity Defaults to parentField.
childSequence
For type of children, reference of the field on the children entity which holds a sequence number. If not given, then no sequence number is generated.
sourceChildSequence
For type of children, reference of the sequenceField to be added to the source entity. Defaults to sequenceField.
priority
Controls priority of values from different source systems and entities. Priorities the same or higher overwrite existing fields. Defaults to priority defined on the entity mapping.