F.1 LZSS compression

23.0403GPPRelease 17Technical realization of the Short Message Service (SMS)TS

F.1.1 Introduction

The LZSS compression algorithm is one of a number of compression algorithms generally referred to as “Dictionary Methods”. These algorithms rely upon the fact that (in general) an input data buffer will contain repeating “patterns” or matching sequences of bytes.

The algorithms fall into 2 groups. Systems like LZ78 and LZW scan an input buffer and construct a “dictionary” of the most commonly occurring byte sequences or “phrases”. This dictionary is pre-pended with the compressed data and the compressed data comprises an array of indices into the dictionary.

A second set is a modification of this in that the data dictionary is implicit in the uncompressed data buffer. All are based upon an algorithm developed and published in 1977 by Abraham Lempel and Jakob Ziv LZ77. A refinement of this algorithm, which is the basis for practically all the later methods in this group, is the LZSS algorithm developed in 1982 by Storer and Szymanski. These methods try to find if the character sequence currently being compressed has already occurred earlier in the input data and then, instead of repeating it, output only a pointer to the earlier occurrence. This is illustrated in the following diagram:

Figure F.1 Illustration of “Implicit Dictionary” compression methods

F.1.2 LZSS Basic Algorithm

The algorithm searches the window (a buffer moving back from the current position in the input data). It searches for the longest match with the beginning of the look-ahead buffer (a buffer moving forward from the current position in the input data) and outputs a pointer to that match. This pointer indicates a position and length of that data match. It is referred to here as a “Slice Descriptor”.

Since it is possible that not even a one-character match can be found, the output cannot contain just pointers. Accordingly at times it is necessary to write literal octets into the output buffer. A block of literal octets is preceded by a “Literal Block Identifier” which indicates the length of the literal octet sequence that follows.

F.1.3 Informative Example.

The following is provided as an informative example using the input buffer shown below.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

0x01

0x02

0x03

0x01

0x02

0x03

0x04

0x01

0x02

0x03

0x01

0x02

0x03

0x01

0x02

0x03

Figure F.2 Sample input buffer (16 octets long)

Step 1:

Starting position is byte 1 in the input buffer. For octets 1 to 3 there are no octet matches in the window for the look-ahead buffer. So write a literal octet sequence of 3 octets following a literal block header.

1

2

3

4

0x83

0x01

0x02

0x03

Figure F.3 Output buffer after initial literal block is written

Step 2:

Current position is octet 4. Examining the look-ahead buffer and the window a 3 octet match is found beginning 3 octets before (octet 1) and of 3 octets in length. A 2 octet slice descriptor is added to the output buffer. The current position moves to octet 7 of the input buffer.

1

2

3

4

5

6

0x83

0x01

0x02

0x03

0x06

0x03

Figure F.4 Output buffer after the first slice descriptor is written

Step 3:

Current position is octet 7 in the input buffer (0x04). There are no matches in the window for this value so a 2 octet literal sequence is written to the end of the output buffer. The current position moves to octet 8 of the input buffer.

1

2

3

4

5

6

7

8

0x83

0x01

0x02

0x03

0x06

0x03

0x81

0x04

Figure F.5 Second literal block is written into output buffer

Step 4:

Current position is octet 8 of the input buffer. Comparing the window with the look-ahead buffer reveals a octet match from the current position with octets 1 to 6 of the input buffer. That is a 6 octet sequence beginning 7 octets back from the current position.. A two-octet slice descriptor for this match is added to the output buffer. The current position moves to octet 14 of the input buffer (6 octets further on).

1

2

3

4

5

6

7

8

9

10

0x83

0x01

0x02

0x03

0x06

0x03

0x81

0x04

0x0C

0x07

Figure F.6 Octet match slice descriptor is written into output buffer

Step 5:

Current position is octet 14 of the input buffer. Comparing the window with the look-ahead buffer reveals another 3 octet sequence match (0x01, 0x02, 0x03). This octet sequence occurs several times in the window within the 511 octets that the slice descriptor allows. Therefore several different (but valid) slice descriptors could be written (this would be implementation dependent). However in this example we will reference the initial 3 octets of the input buffer and write a slice descriptor indicating a 3 octet match beginning 13 octets behind the current position.

1

2

3

4

5

6

7

8

9

10

11

12

0x83

0x01

0x02

0x03

0x06

0x03

0x81

0x04

0x0C

0x07

0x06

0x0D

Figure F.7 Octet match slice descriptor is written into output buffer: the final output buffer

Annex G (normative):
WVG (Wireless Vector Graphics) data format

WVG (Wireless Vector Graphics) is a compact binary data format for vector graphics. WVG data is represented by a bit stream, composed of a header, codec parameters and graphical elements. The bit representation of the drawing and contained graphical elements is designed such that the bit stream can be optimized for smallest possible size.