aboutsummaryrefslogtreecommitdiff
path: root/READMEs/README.cbor-lecp.md
blob: 109dc0c8b62b6e8fae5674c1ae3e944a5537b921 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
# RFC8949 CBOR Stream Parsing and Writing

|||
|---|---|---|
|cmake| `LWS_WITH_CBOR`, `LWS_WITH_CBOR_FLOAT`|
|Header| ./include/libwebsockets/lws-lecp.h|
|api-test| ./minimal-examples/api-tests/api-test-lecp/|
|test app| ./test-apps/test-lecp.c -> libwebsockets-test-lecp|

LECP is the RFC8949 CBOR stream parsing counterpart to LEJP for JSON.

## Features

 - Completely immune to input fragmentation, give it any size blocks of CBOR as
   they become available; 1 byte, or 100K at a time give identical parsing
   results
 - Input chunks discarded as they are parsed, whole CBOR never needed in memory
 - Nonrecursive, fixed stack usage of a few dozen bytes
 - No heap allocations at all, just requires ~500 byte context usually on
   caller stack
 - Creates callbacks to a user-provided handler as members are parsed out
 - No payload size limit, supports huge / endless strings or blobs bigger than
   system memory
 - Collates utf-8 text and blob payloads into a 250-byte chunk buffer for ease
   of access
 - Write apis don't use any heap allocations or recursion either
 - Write apis use an explicit context with its own lifecycle, and printf style
   vaargs including sized blobs, C strings, double, int, unsigned long etc
 - Completely immune to output fragmentation, supports huge strings and blobs
   into small buffers, api returns to indicates unfinished if it needs to be
   called again to continue; 1 byte or 100K output buffer give same results
 - Write apis completely fill available buffer and if unfinished, continues
   into same or different buffer when called again with same args; no
   requirement for subsequent calls to be done sequentially or even from same
   function

## Type limits

CBOR allows negative integers of up to 64 bits, these do not fit into a `uint64_t`.
LECP has a union for numbers that includes the types `uint64_t` and `int64_t`,
but it does not separately handle negative integers.  Only -2^63.. 2^64 -1 can
be handled by the C types, the oversize negative numbers wrap and should be
avoided.

## Floating point support

Floats are handled using the IEEE memory format, it means they can be parsed
from the CBOR without needing any floating point support in the build.  If
floating point is available, you can also enable `LWS_WITH_CBOR_FLOAT` and
a `float` and `double` types are available in the number item union.  Otherwise
these are handled as `ctx->item.u.u32` and `ctx->item.u.u64` union members.

Half-float (16-bit) is defined in CBOR and always handled as a `uint16_t`
number union member `ctx->item.u.hf`.

## Callback reasons

The user callback does not have to handle any callbacks, it only needs to
process the data for the ones it is interested in.

|Callback reason|CBOR structure|Associated data|
|---|---|---|
|`LECPCB_CONSTRUCTED`|Created the parse context||
|`LECPCB_DESTRUCTED`|Destroyed the parse context||
|`LECPCB_COMPLETE`|The parsing completed OK||
|`LECPCB_FAILED`|The parsing failed||
|`LECPCB_VAL_TRUE`|boolean true||
|`LECPCB_VAL_FALSE`|boolean false||
|`LECPCB_VAL_NULL`|explicit NULL||
|`LECPCB_VAL_NUM_INT`|signed integer|`ctx->item.u.i64`|
|`LECPCB_VAL_STR_START`|A UTF-8 string is starting||
|`LECPCB_VAL_STR_CHUNK`|The next string chunk|`ctx->npos` bytes in `ctx->buf`|
|`LECPCB_VAL_STR_END`|The last string chunk|`ctx->npos` bytes in `ctx->buf`|
|`LECPCB_ARRAY_START`|An array is starting||
|`LECPCB_ARRAY_END`|An array has ended||
|`LECPCB_OBJECT_START`|A CBOR map is starting||
|`LECPCB_OBJECT_END`|A CBOR map has ended||
|`LECPCB_TAG_START`|The following data has a tag index|`ctx->item.u.u64`|
|`LECPCB_TAG_END`|The end of the data referenced by the last tag||
|`LECPCB_VAL_NUM_UINT`|Unsigned integer|`ctx->item.u.u64`|
|`LECPCB_VAL_UNDEFINED`|CBOR undefined||
|`LECPCB_VAL_FLOAT16`|half-float available as host-endian `uint16_t`|`ctx->item.u.hf`|
|`LECPCB_VAL_FLOAT32`|`float` (`uint32_t` if no float support) available|`ctx->item.u.f`|
|`LECPCB_VAL_FLOAT64`|`double` (`uint64_t` if no float support) available|`ctx->item.u.d`|
|`LECPCB_VAL_SIMPLE`|CBOR simple|`ctx->item.u.u64`|
|`LECPCB_VAL_BLOB_START`|A binary blob is starting||
|`LECPCB_VAL_BLOB_CHUNK`|The next blob chunk|`ctx->npos` bytes in `ctx->buf`|
|`LECPCB_VAL_BLOB_END`|The last blob chunk|`ctx->npos` bytes in `ctx->buf`|
|`LECPCB_ARRAY_ITEM_START`|A logical item in an array is starting|
|`LCEPDB_ARRAY_ITEM_END`|A logical item in an array has completed|

## CBOR indeterminite lengths

Indeterminite lengths are supported, but are concealed in the parser as far as
possible, the CBOR lengths or its indeterminacy are not exposed in the callback
interface at all, just chunks of data that may be the start, the middle, or the
end.

## Handling CBOR UTF-8 strings and blobs

When a string or blob is parsed, an advisory callback of `LECPCB_VAL_STR_START` or
`LECPCB_VAL_BLOB_START` occurs first.  The `_STR_` callbacks indicate the
content is a CBOR UTF-8 string, `_BLOB_` indicates it is binary data.

Strings or blobs may have indeterminite length, but if so, they are composed
of logical chunks which must have known lengths.  When the `_START` callback
occurs, the logical length either of the whole string, or of the sub-chunk if
indeterminite length, can be found in `ctx->item.u.u64`.

Payload is collated into `ctx->buf[]`, the valid length is in `ctx->npos`.

For short strings or blobs where the length is known, the whole payload is
delivered in a single `LECPCB_VAL_STR_END` or `LECPCB_VAL_BLOB_END` callback.

For payloads larger than the size of `ctx->buf[]`, `LECPCB_VAL_STR_CHUNK` or
`LECPCB_VAL_BLOB_CHUNK` callbacks occur delivering each sequential bufferload.
If the CBOR indicates the total length, the last chunk is delievered in a
`LECPCB_VAL_STR_END` or `LECPCB_VAL_BLOB_END`.

If the CBOR indicates the string end after the chunk, a zero-length `..._END`
callback is provided.

## Handling CBOR tags

CBOR tags are exposed as `LECPCB_TAG_START` and `LECPCB_TAG_END` pairs, at
the `_START` callback the tag index is available in `ctx->item.u.u64`.

## CBOR maps

You can check if you are on the "key" part of a map "key:value" pair using the
helper api `lecp_parse_map_is_key(ctx)`.

## Parsing paths

LECP maintains a "parsing path" in `ctx->path` that represents the context of
the callback events.  As a convenience, at LECP context creation time, you can
pass in an array of path strings you want to match on, and have any match
checkable in the callback using `ctx->path_match`, it's 0 if no active match,
or the match index from your path array starting from 1 for the first entry.

|CBOR element|Representation in path|
|---|---|
|CBOR Array|`[]`|
|CBOR Map|`.`|
|CBOR Map entry key string|`keystring`|

## Accessing raw CBOR subtrees

Some CBOR usages like COSE require access to selected raw CBOR from the input
stream.  `lecp_parse_report_raw(ctx, on)` lets you turn on and off buffering of
raw CBOR and reporting it in the parse callback with `LECPCB_LITERAL_CBOR`
callbacks.  The callbacks mean the temp buffer `ctx->cbor[]` has `ctx->cbor_pos`
bytes of raw CBOR available in it.  Callbacks are triggered when the buffer
fills, or reporting is turned off and the buffer has something in it.

By turning the reporting on and off according to the outer CBOR parsing state,
it's possible to get exactly the raw CBOR subtree that's needed.

Capturing and reporting the raw CBOR does not change that the same CBOR is being
passed to the parser as usual as well.

## Comparison with LEJP (JSON parser)

LECP is based on the same principles as LEJP and shares most of the callbacks.
The major differences:

 - LEJP value callbacks all appear in `ctx->buf[]`, ie, floating-point is
   provided to the callback in ascii form like `"1.0"`.  CBOR provides a more
   strict typing system, and the different type values are provided either in
   `ctx->buf[]` for blobs or utf-8 text strtings, or the `item.u` union for
   converted types, with additional callback reasons specific to each type.

 - CBOR "maps" use `_OBJECT_START` and `_END` parsing callbacks around the
   key / value pairs.  LEJP has a special callback type `PAIR_NAME` for the
   key string / integer, but in LECP these are provided as generic callbacks
   dependent on type, ie, generic string callbacks or integer ones, and the
   value part is represented according to whatever comes.


# Writing CBOR

CBOR is written into a `lws_lec_pctx_t` object that has been initialized to
point to an output buffer of a specified size, using printf type formatting.

Output is paused if the buffer fills, and the write api may be called again
later with the same context object, to resume emitting to the same or different
buffer.

This allows bufferloads of encoded CBOR to be produced on demand, it's designed
to fit usage in WRITEABLE callbacks and Secure Streams tx() callbacks where the
buffer size for one packet is already fixed.

CBOR array and map lengths are deduced from the format string, as is whether to
use indeterminite length formatting or not.  For indeterminite text or binary
strings, a container of < > 

|Format|Arg(s)|Meaning|
|---|---|---|
|`123`||unsigned literal number|
|`-123`||signed literal number|
|`%u`|`unsigned int`|number|
|`%lu`|`unsigned long int`|number|
|`%llu`|`unsigned long long int`|number|
|`%d`|`signed int`|number|
|`%ld`|`signed long int`|number|
|`%lld`|`signed long long int`|number|
|`%f`|`double`|floating point number|
|`123(...)`||literal tag and scope|
|`%t(...)`|`unsigned int`|tag and scope|
|`%lt(...)`|`unsigned long int`|tag and scope|
|`%llt(...)`|`unsigned long long int`|tag and scope|
|`[...]`||Array (fixed len if `]` in same format string)|
|`{...}`||Map (fixed len if `}` in same format string)|
|`<t...>`||Container for indeterminite text string frags|
|`<b...>`||Container for indeterminite binary string frags|
|`'string'`||Literal text of known length|
|`%s`|`const char *`|NUL-terminated string|
|`%.*s`|`int`, `const char *`|length-specified string|
|`%.*b`|`int`, `const uint8_t *`|length-specified binary|
|`:`||separator between Map items (a:b)|
|`,`||separator between Map pairs or array items|

Backslash is used as an escape in `'...'` literal strings, so `'\\'` represents
a string consisting of a single backslash, and `'\''` a string consisting of a
single single-quote.

For integers, various natural C types are available, but in all cases, the
number is represented in CBOR using the smallest valid way based on its value,
the long or long-long modifiers just apply to the expected C type in the args.

For floats, the C argument is always expected to be a `double` type following
C type promotion, but again it is represented in CBOR using the smallest valid
way based on value, half-floats are used for NaN / Infinity and where possible
for values like 0.0 and -1.0.

## Examples

### Literal ints

```
	uint8_t buf[128];
	lws_lec_pctx_t cbw;

	lws_lec_init(&cbw, buf, sizeof(buf));
	lws_lec_printf(ctx, "-1");
```
|||
|---|---|
|Return| `LWS_LECPCTX_RET_FINISHED`|
|`ctx->used`|1|
|`buf[]`|20|

### Dynamic ints

```
	uint8_t buf[128];
	lws_lec_pctx_t cbw;
	int n = -1; /* could be long */

	lws_lec_init(&cbw, buf, sizeof(buf));
	lws_lec_printf(ctx, "%d", n); /* use %ld for long */
```
|||
|---|---|
|Return| `LWS_LECPCTX_RET_FINISHED`|
|`ctx->used`|1|
|`buf[]`|20|

### Maps, arrays and dynamic ints

```
	...
	int args[3] = { 1, 2, 3 };

	lws_lec_printf(ctx, "{'a':%d,'b':[%d,%d]}", args[0], args[1], args[2]);
```

|||
|---|---|
|Return| `LWS_LECPCTX_RET_FINISHED`|
|`ctx->used`|9|
|`buf[]`|A2 61 61 01 61 62 82 02 03|

### String longer than the buffer

Using `%s` and the same string as an arg gives same results

```
	uint8_t buf[16];
	lws_lec_pctx_t cbw;

	lws_lec_init(&cbw, buf, sizeof(buf));
	lws_lec_printf(ctx, "'A literal string > one buf'");
	/* not required to be in same function context or same buf,
	 * but the string must remain the same */
	lws_lec_setbuf(&cbw, buf, sizeof(buf));
	lws_lec_printf(ctx, "'A literal string > one buf'");
```

First call

|||
|---|---|
|Return| `LWS_LECPCTX_RET_AGAIN`|
|`ctx->used`|16|
|`buf[]`|78 1A 41 20 6C 69 74 65 72 61 6C 20 73 74 72 69|

Second call

|||
|---|---|
|Return| `LWS_LECPCTX_RET_FINISHED`|
|`ctx->used`|12|
|`buf[]`|6E 67 20 3E 20 6F 6E 65 20 62 75 66|

### Binary blob longer than the buffer

```
	uint8_t buf[16], blob[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 };
	lws_lec_pctx_t cbw;

	lws_lec_init(&cbw, buf, sizeof(buf));
	lws_lec_printf(ctx, "%.*b", (int)sizeof(blob), blob);
	/* not required to be in same function context or same buf,
	 * but the length and blob must remain the same */
	lws_lec_setbuf(&cbw, buf, sizeof(buf));
	lws_lec_printf(ctx, "%.*b", (int)sizeof(blob), blob);
```

First call

|||
|---|---|
|Return| `LWS_LECPCTX_RET_AGAIN`|
|`ctx->used`|16|
|`buf[]`|52 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F|

Second call

|||
|---|---|
|Return| `LWS_LECPCTX_RET_FINISHED`|
|`ctx->used`|3|
|`buf[]`|10 11 12|