linux/Documentation/xz.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84


XZ data compression in Linux
============================

    The xz_dec module provides XZ decoder which supports the LZMA2
    filter and CRC32 for integrity checking. The usage of the xz_dec
    module is documented in include/linux/xz.h.

Userspace tools

    XZ Utils include a zlib-like compression library and a gzip-like
    command line tool. XZ Utils can be downloaded from
    <http://tukaani.org/xz/>. From the same webiste, you can also
    find latest development versions of the XZ code used in Linux,
    and information how to use that code outside the Linux kernel too.

Notes on compression options

    Since the XZ implementation in the kernel supports only streams with
    no integrity check or CRC32, make sure that you don't use some other
    integrity check type when encoding files that are supposed to be
    decoded by the kernel. In liblzma, you need to use either
    LZMA_CHECK_NONE or LZMA_CHECK_CRC32 when encoding. In the xz command
    line tool, use --check=none or --check=crc32 (-Cnone and -Ccrc32 are
    the short option counterparts).

    Using CRC32 is strongly recommended unless there is some other layer
    which will verify the integrity of the uncompressed data anyway.
    Double checking the integrity would probably be waste of CPU cycles,
    so feel free to use LZMA_CHECK_NONE or --check=none. Note that the
    headers will always have a CRC32 which will be validated by the
    decoder; you can only change the integrity check type (or disable it)
    for the actual uncompressed data.

    In userspace, LZMA2 is typically used with dictionary sizes of several
    megabytes. The decoder needs to have the dictionary in RAM, thus big
    dictionaries cannot be used for files that are intended to be decoded
    by the kernel. 1 MiB is probably the maximum reasonable dictionary
    size for in-kernel use. The presets in XZ Utils may not be optimal
    when creating files for the kernel, so don't hesitate to use custom
    settings. Example:

        xz --check=crc32 --lzma2=dict=128KiB,nice=273,depth=512 inputfile

    An exception to above dictionary size limitation is when the decoder
    is used in single-call mode. Compressing the kernel itself and
    initramfs are examples of this situation. In this case the memory
    usage doesn't depend on the dictionary size, and it is perfectly fine
    to use a big dictionary: for maximum compression, the dictionary
    should be at least as big as the uncompressed data itself.

Future plans

    Add support for BCJ (Branch/Call/Jump) filters for different
    instruction sets. This could be useful both at boot and with
    compressed file systems that store executables. BCJ filters are
    small and improve the compression ratio a little, and have minimal
    effect on performance.

    Creating a limited XZ encoder may be considered if people think it is
    useful. LZMA2 is slower to compress than e.g. Deflate or LZO even at
    the fastest settings, so it isn't clear if LZMA2 encoder is wanted
    into the kernel.

Conformance to the .xz file format specification

    There are a couple of corner cases where things have been simplified
    at expense of detecting errors as early as possible. These should not
    matter in practice all, since they don't cause security issues. But
    it is good to know this if testing the code e.g. with the test files
    from XZ Utils.

Reporting bugs

    Report bugs to <lasse.collin@tukaani.org> or (preferably) visit
    #tukaani on Freenode and talk to Larhzu. I don't actively read
    LKML or other kernel-related mailing lists, so if there's something
    I should know, you must email to me personally or use IRC.

    Don't bother Igor Pavlov with questions about the XZ implementation
    in the kernel or about XZ Utils. While these two implementations
    include essential code that is directly based on Igor Pavlov's code,
    these implementations aren't maintained nor supported by him.