The data hub query component retrieves data from the data hub data store.
It is used in three places:
This page describes the syntax of the query object that drives query processing.
Note that the query component is intended to support data extraction. It does not attempt to contain all the logic that may be required for business applications, which may instead need to run directly from the target tables.
Query is driven by a JSON query object. The format of the query object is:
{
"entity": "entity",
"timestamp": "yyyy-mm-ddThh:mm:ss",
"where": {
.. where clause ..
},
"fields": [
.. fields ..
],
"deletes": true|false
}
Only the entity is mandatory. If passed just an entity, the query returns as a JSON array all the current records from the entity, exactly in the form that would be used to load the default entity (i.e. without the valid from and to timestamps and other control fields).
The other options control what data is retrieved and what is returned to the user.
| timestamp |
The point in time for which records should be
returned. Defaults to now. You can use this to return the
data as they would have been at some time in the past. |
| where |
A JSON object that contains field names and values, and which can be used to select the rows you want to process. For example, if you wanted to retrieve the product with a product_number field of 123456, you would pass "where": {
You can pass more than one field in the where clause, and they will be ANDed together. The where clause is only applied to the entity, not to any link entities (see below). If you pass an array in place of a single value, the field must match one of the values. For example: "where": {
There is no support for more complex expressions, such as
OR or NOT. How the expression is executed depends on the underlying data store. The default database data store indexes on guid, key fields, foreign keys and source message. If you use other fields in your where clause your query may take a long time to execute. The Elasticsearch query interface does not support list
values, as it is only intended to support the logic of
writing to Elasticsearch. Elasticsearch itself does
support more complex expressions. |
| fields |
The fields you want to return in your query,
as a list of fields names. It defaults to all the fields in
the entity. If you want fewer fields, just list the field
names, as in this example which returns product_number and
product_name from a product entity.{
The fields clause can also specify field renames and
retrieval from linked entities. See the section below. |
| deletes |
Set to true to return rows that hold logical
deletes. These rows will have the field deleted_indicator
set to true, even if deleted_indicator was not requested in
the list of fields. |
The components that use query have their own methods to set or
override properties in the query object.
For an entity with no link fields and no children fields, the
default field list is the list of all fields in the entity.
System-generated fields are not included unless you have set
deletes to true, in which case deleted_indicator will be set for
deleted fields.
You can select which fields you want by listing them in the fields array. You can include system-generated fields: guid, valid_from_timestamp, valid_to_timestamp, deleted_indicator, source_message and field_provenance.
For example, you might want to understand when prices for a product came into force by returning the valid_from_timestamp.
"fields": [
"valid_from_timestamp",
"product_number",
"product_name",
"price_inc_vat"
]
You can rename a field by replacing its name in the field list with an object. The object should have a "name" property to identify the field and an "as" propery to give a new name.
For example, if you wanted to rename "price_inc_vat" as "price", you could use:
"fields": [
"valid_from_timestamp",
"product_number",
"product_name",
{
"name": "price_inc_vat",
"as": "price"
}
]
If there are link fields, then by default the keys of the linked row are inserted.
For example, consider a product entity with a link field "range" which links to a range entity. The range entity has a key field of range_reference and then a fields of range_name. A field specification of:
"fields": [
"product_number",
"product_name",
"range"
]
Might return a structure such as this.
[
{
"product_number": 123456,
"product_name": "Breville Kettle",
"price_inc_vat": 17.99,
"range_reference": "ELEC1"
}
]
This is the same as the data format that would have been used to write the product in the first place.
Link fields can identify entity mappings as target entities, which can provide alternative source field names for the key fields of the linked entity. This is required where the key field names in the link entity do not match the source field names used when loading the entity, for example because there is more than one link to the same parent entity. Where this is the case, if you specify a link field by itself without a fields list, then the generated fields will use source field names. However, if you specify a field list for a link field then the target field names (or any renames) will be used, not the source field names. In general, this means that the linked fields will do what you would expect to recreate the input format.
Instead of using the field name, you can use an object that specifies the fields to be returned. This object should have a property for the field name and a fields array of its own.
For example,
"fields": [
"product_number",
"product_name",
{
"name": "range",
"fields": [
"range_reference",
"range_name"
]
}
]
Might return a structure such as this.
[
{
"product_number": 123456,
"product_name": "Breville Kettle",
"price_inc_vat": 17.99,
"range_reference": "ELEC1",
"range_name": "Small electricals"
}
]
Nested fields lists can contain field objects with "as" properties, or further link or children field specifications.
Like link fields, children fields can be included in the output. The children fields are represented by an array of child records, as they are in the input data format. By default, the child records contain all the child fields except for the fields used to link back to the parent (this is inverse of link fields, where the default is the key fields only, and makes sense when you consider that the child records are wholly owned by the parent).
A field specification object can be used to specify the fields to be shown on the child records.
Using the example above, imagine that range had a children field of "products".
The default get of range "ELEC1" might return.
[
{
"range_reference": "ELEC1",
"range_name": "Small electricals",
"products": [
{
"product_number": 123456,
"product_name": "Breville Kettle",
"price_inc_vat": 17.99
},
{
"product_number": 234567,
"product_name": "Kenwood Kettle",
"price_inc_vat": 23.99 } ]
}
]
If you don't want all the child fields, then you have to specify which fields you want, for example:
"fields": [
"range_reference",
"range_name",
{
"name": "products",
"fields": [
"product_number"
]
}
]
This would then return only the specified fields for the children:
[
{
"range_reference": "ELEC1",
"range_name": "Small electricals",
"products": [
{
"product_number": 123456
},
{
"product_number": 234567
} ]
}
]
When including link and children fields, it is possible to specify recursive relationships. To prevent this, the query processor limits the depth of link entity and children subqueries to 25, and raises an error after this. If you hit this limit, you are doing something wrong.