Data hub message format

Datahub documentation home

File and web service format

Assuming the allData option is not set, the message read from the file for the command line, or the data posted to the web service should contain a JSON object of the following format:

{
"system": "source_system_reference",
"entity": "source_entity_reference",
"timestamp": "effective timestamp",
"user": "user name",
"refresh": false|true,
"process": false|true,
"options": options,
"data": data,
"msgid": "identifier"
}
system
The source system. Defaults to * which means that data will conform to the target database structures.
entity
Reference to the source entity. Required.
timestamp
Point in time when data is valid. Defaults to now.
user
User or system that produced the data. Optional.
refresh
Set to true to indicate that this represents a complete refresh of the data.
process
Set to true to indicate that the message should be processed straight away. Omit or set to false to process the message later.
options
A JSON object used to parameterize the processing of the data. May contain anything, or be omitted.
data
The data itself.
guid
Unique identifier for the message. This would typically omitted and the data hub would generate a suitable unique identifier. If passed, this should be a 36-character UUID. Messages with a duplicate guid will be dropped.

If the allData option is set to true, then the message will only contain the data. Other options may be specified in the command line or in the web service initialisation parameters.

Data format

The message can contain any string. It is processed by a message reader which returns an object which can be used to iterate over JSON objects. If no message reader is specified, then the message must contain an array of JSON objects.

Each object represents one record. The properties of the object represent the fields.

[
{
"product_number": 1234567,
"product_description": "Breville Toaster"
},
{
"product_number": 2345678,
"product_description": "JCB Back Hoe"
}
]

Text, number and boolean fields are written using string, number and boolean JSON types.

Dates and timestamps and dates are written using ISO 8601 formats (yyyy-mm-dd or yyy-mm-ddThh:mm:ss.ttt).

Link types are represented by one or more fields used to resolve the links. (The rules for link resolution are in the schema.)

Children types are represented by an array of child rows. For example, adding pricing history for products might be something like this.

[
{
"product_number": 1234567,
"product_description": "Breville Toaster",
"price_history": [
{
"price_date": "2019-01-21",
"price": 17.99
},
{
"price_date": "2019-03-01",
"price": 16.99
} ]
},
{
"product_number": 2345678,
"product_description": "JCB Back Hoe",
"price_history": [
{
"price_date": "2019-03-01",
"price": 11.99
} ] }
]

These rows do not have the parent key or sequence number on them. The schema defines child row columns used to link back to the parent row, and optionally a generated sequence number for the links.

Child rows can be nested indefinitely.

The same format is used for both inserts and updates. Deletes are specified by passing an object with appropriate key fields and with the "deleted_indicator" field set to true.

[
{
"product_number": 1234567,
"product_description": "Breville Toaster"
},
{
"product_number": 2345678,
"product_description": "JCB Back Hoe"
}, {
"product_number": 1239812,
"deleted_indicator": true
}
]