This is the documentation for the latest (main) development branch of the Infuse-IoT platform. If you are looking for the documentation of previous releases, use the drop-down menu on the left and select the desired version.

Tagged Data Format (TDF)

Tagged Data Format is a block-oriented time-series data logging format. The Infuse-IoT implementation is an extension of the format described in The Big Night Out (Sommer et al, 2014).

The TDF abstraction is most commonly used via the TDF Data Logger API, which automatically handles the logging of completed blocks.

Reading Format

The basic format of a TDF reading on a block is the following:

Header

Size

Timestamp

Array Header

Reading Data

2 bytes

1 byte

0 to 6 bytes

0 or 3 bytes

0 to 255 bytes

Due to the inclusion of a size field in the logged data, TDF readings can be either a fixed size (e.g. tdf_acc_4g), or a variable size (e.g. tdf_algorithm_class_histogram).

TDF Header

The 2 byte header contains the metadata that parsers use to consume additional bytes from the buffer.

Array type

2 bits

Timestamp type

2 bits

Reading ID (tdf_builtin_id)

12 bits

Timestamp Types

Each TDF reading is associated with a single time point using the Epoch Time API. To increase packing efficiency, the last logged timestamp on a block is preserved while parsing, so that readings spaced closely together in time can use delta values instead of complete timestamps.

Type

Size

Description

TDF_TIMESTAMP_NONE

0

No reading timestamp

TDF_TIMESTAMP_ABSOLUTE

6

Absolute timestamp

TDF_TIMESTAMP_RELATIVE

2

[0 to 1] second increment from block timestamp

TDF_TIMESTAMP_EXTENDED_RELATIVE

3

[-128 to 127] second increment from block timestamp

Array Types

TDF array types enable improved data packing efficiency when multiple instances of the same reading are being logged with predictable timestamps.

TDF_ARRAY_NONE

Single TDF reading

TDF_ARRAY_TIME

Array of TDF readings evenly spaced in time

TDF_ARRAY_DIFF

Array of TDF readings evenly spaced in time and closely spaced in values

No Array

No array is a single TDF reading with no additional headers.

Time Array

When a reading is of type TDF_ARRAY_TIME, an additional 3 byte header is present after the timestamp structure.

struct tdf_time_array_header {
   uint8_t num;
   uint16_t period;
} __packed;

The num field specifies how many copies of the TDF reading exist in the payload, while period specifies the time between readings. The timestamp of each reading in the array is calculated using the following formula: timestamp[N] = timestamp_base + (N * period).

Diff Array

The TDF_ARRAY_DIFF is an extension of TDF_ARRAY_TIME, which relies on struct fields being close in value to each other and implicit knowledge about the structure layout to achieve additional compression. Consider the following arbitarary TDF definition:

struct tdf_example {
   /** Ambient temperature (millidegrees) */
   int32_t temperature;
   /** Atmospheric pressure (pascals) */
   uint32_t pressure;
} __packed;

When samples are taken at short intervals, the values of each field are unlikely to change by a large amount. Instead of saving each reading as a complete 8 byte struct, it could instead be stored as one base reading, and then a repeating array of differences on each field. For example:

struct tdf_example_diff_array {
   /** Base reading */
   struct tdf_example base;
   /** Difference from the previous reading */
   struct {
      int8_t diff_temperature;
      int8_t diff_pressure;
   } diffs[];
} __packed;

As long as the differences on each field fall within int8_t from the previous value, this can lead to large packing efficiencies (75% in this example). The original values can be reconstructed as follows:

struct tdf_example reading[0] = array.base;
struct tdf_example reading[N+1] = {
   .temperature = reading[N].temperature + array.diffs[N].diff_temperature,
   .pressure = reading[N].pressure + array.diffs[N].diff_pressure,
};

To limit the complexity of encoding and reconstruction, there are 3 supported variants of diff encoding.

Enum

Input Type

Diff Type

TDF_DATA_FORMAT_DIFF_ARRAY_16_8

uint16_t / int16_t

int8_t

TDF_DATA_FORMAT_DIFF_ARRAY_32_8

uint32_t / int32_t

int8_t

TDF_DATA_FORMAT_DIFF_ARRAY_32_16

uint32_t / int32_t

int16_t

The input data type defines how the encoder views the TDF struct, for example with TDF_DATA_FORMAT_DIFF_ARRAY_32_8 the encoder will interpret the input as uint32_t chunks. The diff type defines the maximum value difference between input chunks that can be encoded as a valid diff.

Generally, the input data type will be self-evident from the TDF type being encoded. struct tdf_example from above for example should use either TDF_DATA_FORMAT_DIFF_ARRAY_32_8 or TDF_DATA_FORMAT_DIFF_ARRAY_32_16. The choice comes down to the expected differences between subsequent values. A larger diff type can handle larger differences without falling back to TDF_ARRAY_TIME, but consumes more size in the output buffer.

When a reading is of this type, an additional 3 byte header is present after the timestamp structure.

struct tdf_diff_array_header {
   uint8_t mode_num;
   uint16_t period;
} __packed;

In contrast to the TDF_ARRAY_TIME header, the mode_num field encodes both the number of diffs present and the mode of the diff encoding.

Note

Encoding data in the TDF_ARRAY_DIFF format requires CONFIG_TDF_DIFF.

Index Array

For high-frequency data sets, the accuracy of timestamping individual samples (logged individually or in a TDF_ARRAY_TIME) starts to degrade as the sample period approaches the timestamp resolution (~15 us, 1 / 65536). An example of this is raw audio samples, which might be sampled at 48 kHz (sample period 20 us). Using the per-sample timestamp options previously described would result in decoded samples not having a consistent period, despite the input data being sampled at a consistent frequency. This problem only gets worse as the frequency increases further.

To enable these use-cases, TDF data can be stored as an “Index Array”, where instead of attempting to record timestamps for each individual sample, only the timestamp of the first sample is recorded (accurate to the 15 us resolution), and all future samples are timestamped according to the sample index. This mode does not attempt to store the actual sampling frequency. It is recommended to log an instance of TDF_IDX_ARRAY_FREQ or TDF_IDX_ARRAY_PERIOD with the same base timestamp to record this information.

When a reading is of this type, an additional 3 byte header is present after the timestamp structure.

struct tdf_idx_array_header {
   uint8_t num;
   uint16_t sample_idx;
} __packed;

The sample_idx field stores the current (rotating 16 bit) sample index of the recording, with index 0 corresponding to the first sample.

Size

The size field is used to enable parsers to be able to jump forward to the next TDF on a block, even if the parser is not aware of the data type of the preceding reading. The meaning of this field changes depending on the array type of the reading.

No Array

When the array type is TDF_ARRAY_NONE, the field is simply the size of the trailing reading data.

Payload Size

size

Number TDFs

1

Time/Index Array

When the array type is TDF_ARRAY_TIME or TDF_ARRAY_IDX, the field is the size of a single reading in the array. To obtain the complete payload size it must be multiplied with the num field in the time array header.

Payload Size

size * time_header.num

Number TDFs

time_header.num

Diff Array

When the array type is TDF_ARRAY_DIFF, the field is the size of the base reading in the array. To obtain the complete payload size it must be combined with the information in the time array header.

Payload Size

size + time_header.num * (size / diff_type_size)

Number TDFs

1 + time_header.num

Note

diff_type_size is the size of an individual diff value. i.e. 1 if the diff value is 8 bit and 2 is the diff value is 16 bit.

Reading Data

The remainder of a reading is the trailing data array. The format of the array is simply a binary packed array. The currently defined ID to data structure mappings can be found at this page:

Logging Examples

When possible, the type safe logging macros should be preferred, as they validate that the type of the passed data pointer matches the type associated with the TDF ID.

Note

The type safe macros are not possible to use for TDFs with a trailing variable length array, since instantiations of the type with a defined length are by definition a different type.

Single TDF

Logging a single TDF at an arbitrary point in time.

Low-Level API
static uint8_t buffer[32];
struct tdf_buffer_state state;
struct tdf_acc_4g reading = {
   .sample = {0, -1000, 2000},
};

net_buf_simple_init_with_data(&state.buf, buffer, sizeof(buffer));
tdf_buffer_state_reset(&state);

/* TDF_ADD is preferred */
TDF_ADD(&state, TDF_ACC_4G, 1, epoch_time_now(), 0, &reading);
tdf_add(&state, TDF_ACC_4G, sizeof(reading), 1, epoch_time_now(), 0, &reading);
struct tdf_acc_4g reading = {
   .sample = {0, -1000, 2000},
};

/* TDF_DATA_LOGGER_LOG is preferred */
TDF_DATA_LOGGER_LOG(TDF_DATA_LOGGER_BT_ADV, TDF_ACC_4G, epoch_time_now(), &reading);
tdf_data_logger_log(TDF_DATA_LOGGER_BT_ADV, TDF_ACC_4G, sizeof(reading), epoch_time_now(), &reading);

Time Array

Logging an array of TDFs, evenly spaced in time.

Warning

It is important that the timestamp provided to TDF logging functions in array mode is the timestamp of the FIRST reading, not the LAST reading.

Low-Level API
static uint8_t buffer[256];
struct tdf_buffer_state state;
struct tdf_acc_4g readings[] = {...};
uint32_t reading_period = INFUSE_EPOCH_TIME_TICKS_PER_SEC / 100;
uint64_t base_time = epoch_time_now() - ((ARRAY_SIZE(readings) - 1) * reading_period);

net_buf_simple_init_with_data(&state.buf, buffer, sizeof(buffer));
tdf_buffer_state_reset(&state);

/* TDF_ADD is preferred */
TDF_ADD(&state, TDF_ACC_4G, ARRAY_SIZE(readings), base_time,
        reading_period, readings);
tdf_add(&state, TDF_ACC_4G, sizeof(readings[0]), ARRAY_SIZE(readings),
        base_time, reading_period, readings);
struct tdf_acc_4g readings[] = {...};
uint32_t reading_period = INFUSE_EPOCH_TIME_TICKS_PER_SEC / 100;
uint64_t base_time = epoch_time_now() - ((ARRAY_SIZE(readings) - 1) * reading_period);

/* TDF_DATA_LOGGER_LOG_ARRAY is preferred */
TDF_DATA_LOGGER_LOG_ARRAY(TDF_DATA_LOGGER_BT_ADV, TDF_ACC_4G, ARRAY_SIZE(readings),
                          base_time, reading_period, readings);
tdf_data_logger_log_array(TDF_DATA_LOGGER_BT_ADV, TDF_ACC_4G, sizeof(readings[0]), ARRAY_SIZE(readings),
                          base_time, reading_period, readings);

Index Array

Logging an array of high-frequency TDFs, evenly spaced in time.

Warning

It is important that the timestamp provided to TDF logging functions in array mode is the timestamp of the FIRST reading, not the LAST reading.

Low-Level API
static uint8_t buffer[256];
struct tdf_buffer_state state;
struct tdf_acc_4g readings[] = {...};
struct tdf_idx_array_freq array_info;
uint64_t base_time = epoch_time_now() - ((ARRAY_SIZE(readings) - 1) * reading_period);
uint8_t remaining = ARRAY_SIZE(readings);
uint8_t to_log, chunk_size = 16;
int idx = 0;

net_buf_simple_init_with_data(&state.buf, buffer, sizeof(buffer));
tdf_buffer_state_reset(&state);

array_info.tdf_id = TDF_ACC_4G;
array_info.frequency = 1000;
TDF_ADD(&state, TDF_IDX_ARRAY_FREQ, 1, base_time, 0, &array_info);

while(remaining) {
     to_log = MIN(remaining, chunk_size);
     tdf_add_core(&state, TDF_ACC_4G, sizeof(reading), to_log, base_time, idx,
                  readings + idx, TDF_DATA_FORMAT_IDX_ARRAY);
     /* Only the first sample gets an explicit timestamp */
     base_log = 0;
     idx += to_log;
}
struct tdf_acc_4g readings[] = {...};
struct tdf_idx_array_freq array_info;
uint64_t base_time = epoch_time_now() - ((ARRAY_SIZE(readings) - 1) * reading_period);
int idx = 0;

TDF_DATA_LOGGER_LOG(TDF_DATA_LOGGER_FLASH, TDF_IDX_ARRAY_FREQ, base_time, &array_info);
for (int i = 0; i < num_buffers; i++) {
   tdf_data_logger_log_core(TDF_DATA_LOGGER_FLASH, TDF_ACC_4G, sizeof(readings[0]), ARRAY_SIZE(readings),
                           TDF_DATA_FORMAT_IDX_ARRAY, base_time, idx, readings);
   idx += ARRAY_SIZE(readings);
   base_time = 0;
}

Embedded Parsing

If required, embedded devices can parse a TDF block through the tdf_parse() API.

uint8_t tdf_block[] = { /* TDF payload exists in here */ };
struct tdf_buffer_state state;
struct tdf_parsed tdf;

/* Initialise parser */
tdf_parse_start(&state, tdf_block, sizeof(tdf_block));
/* Loop while TDFs exist on block */
while (tdf_parse(&state, &parsed) == 0) {
   /* Handle parsed TDF data */
   LOG_INF("ID: %d Timestamp: %lld Num: %d Data: %p", parsed.tdf_id, parsed.time, parsed.tdf_num, parsed.data);
}

User-defined Types

Infuse-IoT also allows custom user-defined TDFs to be integrated with the framework with CONFIG_INFUSE_DEFS_GENERATED_DOWNSTREAM. See Extending Infuse-IoT Definitions for more details.

API Reference

Tagged Data Format APIs
TDF util APIs