examples/blockStreaming_lineByLine.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122

# LZ4 Streaming API Example : Line by Line Text Compression
by *Takayuki Matsuoka*

`blockStreaming_lineByLine.c` is LZ4 Streaming API example which implements line by line incremental (de)compression.

Please note the following restrictions :

 - Firstly, read "LZ4 Streaming API Basics".
 - This is relatively advanced application example.
 - Output file is not compatible with lz4frame and platform dependent.


## What's the point of this example ?

 - Line by line incremental (de)compression.
 - Handle huge file in small amount of memory
 - Generally better compression ratio than Block API
 - Non-uniform block size


## How the compression works

First of all, allocate "Ring Buffer" for input and LZ4 compressed data buffer for output.

```
(1)
    Ring Buffer

    +--------+
    | Line#1 |
    +---+----+
        |
        v
     {Out#1}


(2)
    Prefix Mode Dependency
          +----+
          |    |
          v    |
    +--------+-+------+
    | Line#1 | Line#2 |
    +--------+---+----+
                 |
                 v
              {Out#2}


(3)
          Prefix   Prefix
          +----+   +----+
          |    |   |    |
          v    |   v    |
    +--------+-+------+-+------+
    | Line#1 | Line#2 | Line#3 |
    +--------+--------+---+----+
                          |
                          v
                       {Out#3}


(4)
                        External Dictionary Mode
                +----+   +----+
                |    |   |    |
                v    |   v    |
    ------+--------+-+------+-+--------+
          |  ....  | Line#X | Line#X+1 |
    ------+--------+--------+-----+----+
                            ^     |
                            |     v
                            |  {Out#X+1}
                            |
                          Reset


(5)
                                    Prefix
                                    +-----+
                                    |     |
                                    v     |
    ------+--------+--------+----------+--+-------+
          |  ....  | Line#X | Line#X+1 | Line#X+2 |
    ------+--------+--------+----------+-----+----+
                            ^                |
                            |                v
                            |            {Out#X+2}
                            |
                          Reset
```

Next (see (1)), read first line to ringbuffer and compress it by `LZ4_compress_continue()`.
For the first time, LZ4 doesn't know any previous dependencies,
so it just compress the line without dependencies and generates compressed line {Out#1} to LZ4 compressed data buffer.
After that, write {Out#1} to the file and forward ringbuffer offset.

Do the same things to second line (see (2)).
But in this time, LZ4 can use dependency to Line#1 to improve compression ratio.
This dependency is called "Prefix mode".

Eventually, we'll reach end of ringbuffer at Line#X (see (4)).
This time, we should reset ringbuffer offset.
After resetting, at Line#X+1 pointer is not adjacent, but LZ4 still maintain its memory.
This is called "External Dictionary Mode".

In Line#X+2 (see (5)), finally LZ4 forget almost all memories but still remains Line#X+1.
This is the same situation as Line#2.

Continue these procedures to the end of text file.


## How the decompression works

Decompression will do reverse order.

 - Read compressed line from the file to buffer.
 - Decompress it to the ringbuffer.
 - Output decompressed plain text line to the file.
 - Forward ringbuffer offset. If offset exceeds end of the ringbuffer, reset it.

Continue these procedures to the end of the compressed file.