Data hub properties

Datahub documentation home

The data hub is configured with a Java properties file. This can be called anything and kept anywhere, but on a Linux system the standard is to call it datahub.properties and hold it in /etc/datahub.

Here is an example properties file.

# Data hub properties

database=com.metrici.datahub.DataHubDataStore

dbDriver=org.mariadb.jdbc.Driver
dbURL=jdbc:mariadb://127.0.0.1/datahub
dbUser=datahub
dbPassword=jfu7Kjh023
commitFrequency=1000

databaseLimit=true
timestampPrecision=second

schema=/etc/datahub/schema

multi=true
monitor.delay=0
monitor.interval=60
monitor.processTimeout=21600
monitor.queue=10
monitor.threads=1
monitor.minimumDelay=60
monitor.maximumDelay=14400
monitor.minimumDelayMultiplier=10
monitor.ageExponent=1
monitor.ageMultiplier=1
monitor.messageTimeout=3456000
monitor.shutdownTimeout=600

fileStore=/var/datahub/filestore
fileURLPrefix=https://datahub.metrici.com/files/
generateFileName=random

load.cacheSize=10000
query.cacheSize=10000

verbosity=1

authenticator=com.metrici.datahub.BasicAuthenticator
authorizer=com.metrici.datahub.BasicAuthorizer

messageStore=com.metrici.datahub.DataHubMessageStore
messageControl=com.metrici.datahub.DataHubMessageControl

The properties follows normal Java properties file conventions. Precede comment lines with a #, blank lines are allowed, and there can be spaces around the =.

Database properties

database equates the database dataStore reference to a data store class. The default value is shown here.

dbDriver, dbURL, dbUser and dbPassword are used to connect to the database to access the message store tables and the target database tables. You can hold these with a "database." prefix, but this default to the value without the prefix.

If you wanted to target more than one database, you would set the dataStore property of some entities to something else, e.g. "database2". Then you can give the data store class, and parameters for the data store with a prefix of database1, e.g

database2=com.metrici.datahub.DataHubDataStore

database2.dbDriver2=org.mariadb.jdbc.Driver
database2.dbURL=jdbc:mariadb://127.0.0.1/datahub2
database2.dbUser=datahub2
database2.dbPassword=jkh98jhkjh

The data store class defaults to com.metrici.datahub.DataHubDataStore, and the dbname.xxx properties default to the xxx properties, so to introduce a second class with the same driver you might just need something like.

database2.dbURL=jdbc:mariadb://127.0.0.1/datahub2
database2.dbUser=datahub2
database2.dbPassword=jkh98jhkjh

Set databaseLimit to true if the database supports the LIMIT clause.

Set timestampPrecision to the precision to be used for timestamps. One of "second", "millisecond", "microsecond" or "nanosecond". Default is "second". This should match the precision used in the database, or be less precise than the database. These can also be prefixed by a data store reference, e.g. database2.limit=false.

Set commitFrequency to the number of inserts or updates that should be performed before a commit. Default is 1000.

The messageStore and messageControl identify database parameters for the message store and for the message process and history, respectively.

Elasticsearch properties

You can use Elasticsearch as a data store.

elastic=com.metrici.datahub.ElasticSearchDataStore
elastic.url=http://localhost:9200
elastic.user=admin
elastic.password=admin
elastic.prefix=yourcompany
elastic.commitFrequency=1000
elastic.insecure=false

The "elastic" reference is just a data store reference, like "database".

The prefix is added to the entities to give index names, e.g. "yourcompany.entity_one". If you omit prefix, the reference is used, which would be "elastic.entity_one".

The commitFrequency works in much the same way as that on a database connection, and also defaults to 1000.

Set insecure to true to allow SSL certificates that are self-signed or for the wrong host. This can be useful for testing or where the Elasticsearch server is on a secure internal network. Do not use this for accessing an Elasticsearch instance over the public Internet.

Schema properties

schema identifies a folder that holds schema definition files. This is evaluated relative to the properties file, and defauls to "schema", which means that the schemas are held in a folder called schema which is itself in the same folder as the properties file.

Monitor properties

The monitor.* properties are used to schedule the processing of files and the automatic reprocessing of files. They are described in the web server topic.

File storage properties

If you are loading files, set fileStore to the directory in which files should be written and set fileURLPrefix to a prefix that should be added to start of relative file paths so that they can be served from the web server. The generateFileName property specifies the strategy to use when generating file names. The default is "random", which means to generate secure random file names (which means that URLs need not be otherwise secured). This can also be set to "sequential" to generate sequential file names, which will give simple numerical file names (e.g. 1.txt, 2.txt, ...).

Load properties

load.cacheSize controls the maximum size of the cache used when resolving foreign keys during a load. The default of 10000 will be sufficient in most cases. In very memory-limited implementations it can be reduced. Implementations that use queries which involve resolving a large number of foreign keys may benefit from a larger size.

Query properties

query.cacheSize controls the maximum size of the cache used by the query component to cache parent records, the use of which can speed up queries by a factor of 2 or more. The default of 10000 will be sufficient in most cases. In very memory-limited implementations it can be reduced. Implementations that use queries which involve looking up a large number of different parent rows may benefit from a larger size.

Debugging properties

verbosity indicates the level of logging that is required. 0 means warnings and errors only, 1 shows progress, and 2 shows diagnostics. Most applications will want a verbosity of 1.

Security properties

When using the instance servlet, set authenticator to the class to perform user authentication (who's who) and authorizer to the class to authorize the action (who can do what). See user authentication and authorization, which describes further properties for different authenticators and authorizers.

Message store class properties

The messageStore and messageControl provide the names of the classes used to access the message store table and the message control tables (process and history), respectively.