Data hub processing is driven by
definitions of the source data and the target databases. These
definitions are collectively known as the schema. You may need to
read the processing section and this section together to fully
understand the schema.
The schema is structured around entities (tables) and fields (columns). The schema maps from source entities and fields to target entities and fields.
There is one definition for each system that can send data to the data hub. This source system can send one or more entities, each of these source entities can be mapped to one or more entities in the target data structures, and each of these entity mapping can contain multiple field mappings.
The schema is coded as JSON file, held in a directory identified with the data hub properties file.
There is one JSON file for each system that sends data to the data hub which defines the mapping from that system to the data hub. There is also one default file which defines the data in the data hub with no mapping.
The files are named using the system references, followed by .json. So a system with a reference "system_one" would have its schema definition in system_one.json in the schema directory.
The default schema is always called default.json.
{
"reference": "system_reference",
"name": "System Name",
"mappings": {
"source_entity_reference": [
{
... source entity mapping object – see below ...
}, ... more ...
]
}
}
The schema is an object with the following properties.
| reference |
A reference for the system. This should be
the same as the reference used to build the schema file
name. |
| name |
An optional name for the system. May be used
for documentation. Defaults to the system reference. |
| mappings |
An object that allows an array of entity
mapping objects to be looked up for a different source |
{
"reference": "target_entity_reference",
"name": "Target Entity Name",
"sourceReference": "source_entity_reference",
"sourceName": "Source Entity Name",
"messageReader": "message.reader.name",
"messageReaderConfig": "config",
"unique": true|false,
"uniqueIdentifier": "target_identifier_field_reference",
"sourceUniqueIdentifier": "source_identifier_field_reference",
"uniqueSequence": "target_sequence_field_reference",
"sourceUniqueSequence": "source_sequence_field_reference",
"dataStore": "dataStoreReference",
"priority": priority,
"export": false|true,
"exportIdentifier": "id specifier",
"exportStandardFields": false|true,
"processor": "processor.name",
"processorConfig": "config",
"messageRetentionPeriod": days,
"retainFiles": false|true,
"fields": [
.. fields ..
],
"execute": false|true
}
The source entity mapping object maps a source system entity to a target entity. It has the following properties.
| reference |
Reference to target entity (table name). This can be set to "NULL" (or "null", but not null) to indicate that the data should not be mapped to any entity. NULL entities can have children fields, allowing them to be used as containers for otherwise unrelated sets of data. Unlike other children, the children of null entities are
not refreshed unless a refresh is passed to the null
entity itself. |
| name |
Name of target entity. Optional. Used for
documentation. |
| sourceReference |
Reference by which the entity is known in the
source. Defaults to the reference. |
| sourceName |
Name of source entity. Optional. Defaults to
name. |
| messageReader |
Optional identifier of plug-in class used to read message. This is described in more detail in the Java classes topic. If there is more than one target entity, the message reader and message reader config specified for the first target entity are used for all target entities. If you want to process the message through two message
readers, use the message reprocessor post-processing
plug-in - see post processing for
details.
|
| messageReaderConfig |
Config string used to instantiate the message
reader. (This is a string even if config is JSON.) |
| unique |
If set to true, indicates that this
represents a unique set of records, for example
transactions. Default is false. |
| uniqueIdentifier |
If unique is true, reference of field on the
target table that identifies this set of records. This will
be populated with the message identifier. The field should
be a character field of at least 36 characters in length.
This and the uniqueSequence should be on the list of fields
for the target table, and should be identified as the key. |
| sourceUniqueIdentifier |
Reference of field on the source message to
which the uniqueIdentifier will be written. Defaults to
uniqueIdentifier. |
| uniqueSequence |
If unique is true, reference to integer field
on the target table that is used to hold the record
sequence. |
| sourceUniqueSequence |
Reference of field on the source message to
which the uniqueSequence will be written. Defaults to
uniqueSequence. |
| dataStore |
Optional reference for the type of data store in which data for this entity should be stored. Defaults to "database".
|
| priority |
Controls whether messages from this source
system/entity can delete or create records created or
deleted by other systems. Higher priorities take precedence.
Defaults to 0. |
| export |
Set to true to indicate this is an export
entity. Export entities are write-only structures to which
the data hub sends data, but which the data hub does not
read. |
| exportIdentifier |
Specifies how the record id should be built
for an export. Depends on the data store class being used,
but would typically hold a field reference. |
| exportStandardFields |
Indicates that standard fields should be
exported. |
| processor |
Optional identifier of plug-in class used to define
additional processing for this entity. In a put operation, the processor is run after the data
has been processed. It is passed the message options as a
string in the attribute "options". If there is more than one target entity, the processor is called once after all the target entities have been populated. The processor and processor config specified for the first target entity are used for all target entities. If an execution has been called, the processor defines
the execution, and performs all actions. |
| processorConfig |
Config used to create processor. |
| messageRetentionPeriod |
For how long, in days, the message should be
retained in the message store. Default is 0, which mean
indefinitely. The retention period applies from the processed timestamp on the message process table, and only applies where the message has been processed successfully or has a permanent error. |
| retainFiles |
Whether files associated with this entity
should be retained after the associated message has been
deleted. Default is false, which means the files may be
deleted when the message is deleted. |
| fields |
An array of fields to be read from the record
and how they should be mapped to the target tables. |
| execute |
If set to true, indicates that an execution is required.
If there is an execution, the input data is not written to
the message store and no processing other than the execute
is triggered. Instead the data is passed to the plug in
defined by the "processor" property, and configured by the
"processorConfig" property. Other properties are not used. execute must be set on the first entity mapping for a source entity. A simple "Hello World" execution, using the script plug-in, would be defined like this. {
|
{
"reference": "target_field_reference",
"name": "Target Field Name",
"sourceReference": "source_field_reference",
"sourceName": "Source Field Name",
"key": true|false,
"type": "text|number|date|timestamp|boolean|link|children",
"length": length,
"scale": scale,
"precision": precision,
"linkEntity": "target_link_entity_reference",
"sourceLinkEntity": "source_link_entity_reference",
"linkKey": "link_key",
"sourceLinkKey": "source_link_key",
"childEntity": "target_child_entity_reference",
"sourceChildEntity": "source_child_entity_reference",
"parentIdentifier": "target_parent_identifier_reference",
"sourceParentIdentifier": "source_parent_identifier_reference",
"childSequence": "target_sequence_field_reference",
"sourceChildSequence": "source_sequence_field_reference",
"priority": priority
}
The field object identifies a field in the source entity that should be mapped to the target entity. It has the following properties.
| reference |
Reference to target field (column name). |
||||||||||||||||||||||||
| name |
Name for target field. Optional, used in
documentation. Defaults to the reference. |
||||||||||||||||||||||||
| sourceReference |
Reference of field in source entity. Defaults
to reference. |
||||||||||||||||||||||||
| sourceName |
Name for source field. Optional, used in
documentation. Defaults to the name. |
||||||||||||||||||||||||
| key |
Set to true to indicate this field is part of
the key of the record. |
||||||||||||||||||||||||
| type |
The data type of the target field.
|
||||||||||||||||||||||||
| length |
For type of text, maximum length of the text. If omitted or 0, no maximum is applied. |
||||||||||||||||||||||||
| precision |
For type of decimal, the total number of
digits (including those after the decimal point). |
||||||||||||||||||||||||
| scale |
For type of decimal, the number of digits
after the decimal point. |
||||||||||||||||||||||||
| linkEntity |
For type of link, the target reference to the
entity the retriever of which should be used to map the key. |
||||||||||||||||||||||||
| sourceLinkEntity |
For a type of link, the source reference to
the entity the retriever of which should be used to map the
key. The first source entity mapping for the entity is used
for key resolution. Defaults to linkEntity. |
||||||||||||||||||||||||
| linkKey |
For type of link, references of the field or fields required to resolve the link entity. Can be a single string or an array of strings. |
||||||||||||||||||||||||
| sourceLinkKey |
For type of link, source references of the field or fields required to resolve the link entity. Can be a single string or an array of strings. These keys should appear in the input record, and should match the source references of the parent entity's keys. Defaults to the linkKey. |
||||||||||||||||||||||||
| autocreate |
For a type of link, if the parent entity
record does not exist, should it be created if possible?
Default is false. |
||||||||||||||||||||||||
| childEntity |
For type of children, the reference to the entity which should be used for the child rows. | ||||||||||||||||||||||||
| sourceChildEntity |
For type of children, the reference to the
source entity which should be used for child rows. Defaults
to childEntity. |
||||||||||||||||||||||||
| parentIdentifier |
For type of children, the reference of the
field on the children entity which holds the link back to
the parent. |
||||||||||||||||||||||||
| sourceParentIdentifier |
For type of children, reference of the
parentField to be added to the source entity Defaults to
parentField. |
||||||||||||||||||||||||
| childSequence |
For type of children, reference of the field on the children entity which holds a sequence number. If not given, then no sequence number is generated. | ||||||||||||||||||||||||
| sourceChildSequence |
For type of children, reference of the
sequenceField to be added to the source entity. Defaults to
sequenceField. |
||||||||||||||||||||||||
| priority |
Controls priority of values from different
source systems and entities. Priorities the same or higher
overwrite existing fields. Defaults to priority defined on
the entity mapping. |