librsync  2.3.1
format.md
1 # File formats {#page_formats}
2 
3 ## Generalities
4 
5 There are two file formats used by `librsync` and `rdiff`: the
6 *signature* file, which summarizes a data file, and the *delta* file,
7 which describes the edits from one data file to another.
8 
9 librsync does not know or care about any formats in the data files.
10 
11 All integers are big-endian.
12 
13 ## Magic numbers
14 
15 All librsync files start with a u32 \ref rs_magic_number identifying them.
16 These are declared in `librsync.h`, and there are different numbers for every
17 different signature and delta file type. Note magic numbers for newer file
18 types are not supported by older versions of librsync. Older librsync versions
19 will immediately fail with an error when they encounter file types they don't
20 support.
21 
22 ## Signatures
23 
24 Signatures consist of a header followed by a number of block signatures for
25 each block in the data file.
26 
27 The signature header is:
28 
29  u32 magic; // Some RS_*_SIG_MAGIC value.
30  u32 block_len; // Bytes per block.
31  u32 strong_sum_len; // Bytes per strong sum in each block.
32 
33 Each block signature includes a weaksum followed by a truncated strongsum hash
34 for one block of `block_len` bytes from the input data file. The strongsum
35 signature will be truncated to the `strong_sum_len` in the header. The final
36 data block may be shorter. The number of blocks in the signature is therefore
37 
38  ceil(input_len/block_len)
39 
40 The block signature weak checksum is used as a rolling checksum to find moved
41 data, and a strong hash used to check the match is correct. The weak checksum
42 is either a rollsum (based on adler32) or (better alternative) rabinkarp, and
43 the strong hash is either MD4 or BLAKE2 depending on the magic number.
44 
45 Truncating the strongsum makes the signatures smaller at a cost of a greater
46 chance of collisions. The strongsums are truncated by keeping the left most
47 (first) bytes after computation.
48 
49 Each signature block format is (see `rs_sig_do_block`):
50 
51  u32 weak_sum;
52  u8[strong_sum_len] strong_sum;
53 
54 ## Delta files
55 
56 Deltas consist of the delta magic constant `RS_DELTA_MAGIC` followed by a
57 series of commands. Commands tell the patch logic how to construct the result
58 file (new version) from the basis file (old version).
59 
60 There are three kinds of commands: the literal command, the copy command, and
61 the end command. A command consists of a single byte followed by zero or more
62 arguments. The number and size of the arguments are defined in `prototab.c`.
63 
64 A literal command describes data not present in the basis file. It has one
65 argument: `length`. The format is:
66 
67  u8 command; // in the range 0x41 through 0x44 inclusive
68  u8[arg1_len] length;
69  u8[length] data; // new data to append
70 
71 A copy command describes a range of data in the basis file. It has two
72 arguments: `start` and `length`. The format is:
73 
74  u8 command; // in the range 0x45 through 0x54 inclusive
75  u8[arg1_len] start; // offset in the basis to begin copying data
76  u8[arg2_len] length; // number of bytes to copy from the basis
77 
78 The end command indicates the end of the delta file. It consists of a single
79 null byte and has no arguments.
rs_magic_number
A uint32 magic number, emitted in bigendian/network order at the start of librsync files...
Definition: librsync.h:65