5 CBS Data Coding Scheme

23.0383GPPAlphabets and language-specific informationRelease 18TS

The CBS Data Coding Scheme indicates the intended handling of the message at the MS, the character set/coding, and the language (when applicable). Any reserved codings shall be assumed to be the GSM 7 bit default alphabet (the same as codepoint 00001111) by a receiving entity. The octet is used according to a coding group which is indicated in bits 7..4. The octet is then coded as follows:

Coding Group

Bits

7..4

Use of bits 3..0

0000

Language using the GSM 7 bit default alphabet

Bits 3..0 indicate the language:

0000 German

0001 English

0010 Italian

0011 French

0100 Spanish

0101 Dutch

0110 Swedish

0111 Danish

1000 Portuguese

1001 Finnish

1010 Norwegian

1011 Greek

1100 Turkish

1101 Hungarian

1110 Polish

1111 Language unspecified

0001

0000 GSM 7 bit default alphabet; message preceded by language indication.

The first 3 characters of the message are a two-character representation of the language encoded according to ISO 639 [12], followed by a CR character. The CR character is then followed by 90 characters of text (NOTE 1).

0001 UCS2; message preceded by language indication

The message starts with a two GSM 7-bit default alphabet character representation of the language encoded according to ISO 639 [12]. This is padded to the octet boundary with two bits set to 0 and then followed by 40 characters of UCS2-encoded message (NOTE 1).

An MS not supporting UCS2 coding will present the two character language identifier followed by improperly interpreted user data.

0010..1111 Reserved

0010..

0000 Czech

0001 Hebrew (NOTE 2)

0010 Arabic (NOTE 2)

0011 Russian (NOTE 2)

0100 Icelandic

0101..1111 Reserved for other languages using the GSM 7 bit default alphabet, with unspecified handling at the MS

0011

0000..1111 Reserved for other languages using the GSM 7 bit default alphabet, with unspecified handling at the MS

01xx

General Data Coding indication

Bits 5..0 indicate the following:

Bit 5, if set to 0, indicates the text is uncompressed

Bit 5, if set to 1, indicates the text is compressed using the compression algorithm defined in 3GPP TS 23.042 [13]

Bit 4, if set to 0, indicates that bits 1 to 0 are reserved and have no message class meaning

Bit 4, if set to 1, indicates that bits 1 to 0 have a message class meaning:

Bit 1 Bit 0 Message Class:

0 0 Class 0

0 1 Class 1 Default meaning: ME-specific.

1 0 Class 2 (U)SIM specific message.

1 1 Class 3 Default meaning: TE-specific (see 3GPP TS 27.005 [8])

Bits 3 and 2 indicate the character set being used, as follows:

Bit 3 Bit 2 Character set:

0 0 GSM 7 bit default alphabet

0 1 8 bit data

1 0 UCS2 (16 bit) [10]

1 1 Reserved

1000

Reserved coding groups

1001

Message with User Data Header (UDH) structure:

Bit 1 Bit 0 Message Class:

0 0 Class 0

0 1 Class 1 Default meaning: ME-specific.

1 0 Class 2 (U)SIM specific message.

1 1 Class 3 Default meaning: TE-specific (see 3GPP TS 27.005 [8])

Bits 3 and 2 indicate the alphabet being used, as follows:

Bit 3 Bit 2 Alphabet:

0 0 GSM 7 bit default alphabet

0 1 8 bit data

1 0 USC2 (16 bit) [10]

1 1 Reserved

1010..1100

Reserved coding groups

1101

I1 protocol message defined in 3GPP TS 24.294 [19]

1110

Defined by the WAP Forum [15]

1111

Data coding / message handling

Bit 3 is reserved, set to 0.

Bit 2 Message coding:

0 GSM 7 bit default alphabet

1 8 bit data

Bit 1 Bit 0 Message Class:

0 0 No message class.

0 1 Class 1 user defined.

1 0 Class 2 user defined.

1 1 Class 3

default meaning: TE specific

(see 3GPP TS 27.005 [8])

NOTE 1: The language indication shall appear at the start of each Message Information Page (see 3GPP TS 23.041 [5]) and the language indication on each Message Information Page shall be for the same language.

NOTE 2: Message text in Hebrew, Arabic and Russian cannot be encoded in the GSM 7-bit default alphabet. For these languages UCS2 encoding shall be used.

These codings may also be used for USSD and MMI/display purposes.

The message length specified in this subclause is not applicable for UTRAN, E-UTRAN, and NG-RAN, but only applicable for GSM.

See 3GPP TS 24.090 [11] for specific coding values applicable to USSD for MS originated USSD messages and MS terminated USSD messages. USSD messages using the default alphabet are coded with the GSM 7-bit default alphabet given in clause 6.2.1. The message can then consist of up to 182 user characters.

Cell Broadcast messages using the default alphabet are coded with the GSM 7-bit default alphabet given in clause 6.2.1. The Message Information Page then consists of 93 user characters.

If the GSM 7 bit default alphabet extension mechanism is used then the number of displayable characters will reduce by one for every instance where the GSM 7 bit default alphabet extension table is usedCell Broadcast Messages Information Page using 8-bit data have user-defined coding, and will each be 82 octets in length.

UCS2 character set indicates that the message is coded in UCS2 [10]. The General notes specified in clause 6.1.1 override any contrary specification in UCS2, so for example even in UCS2 a <CR> character will cause the MS to return to the beginning of the current line and overwrite any existing text with the characters which follow the <CR>. Cell Broadcast Messages Information Page encoded in UCS2 consist of 41 characters each.

When a CBS message received by the MS is message class 0 and the MS has the capability of displaying CBS messages, the MS shall display the message immediately. The message shall not be automatically stored in the (U)SIM or ME.

The ME may make provision through MMI for the user to selectively prevent the message from being displayed immediately.

If the ME is incapable of displaying CBS messages or if the immediate display of the message has been disabled through MMI then the ME shall treat the CBS message as though there was no message class, i.e. it will ignore bits 0 and 1 in the TP-DCS but may store the message either on the ME or on the (U)SIM.

Class 1 and Class 2 messages may be routed by the ME to user-defined destinations, but the user may override any default meaning and select their own routing.

Class 3 messages will normally be selected for transfer to a TE, in cases where a ME supports an SMS/CBS interface to a TE, and the TE requests "TE-specific" cell broadcast messages (see 3GPP TS 27.005 [8]). The user may be able to override the default meaning and select their own routing.

Messages with a User Data Header Structure are encoded as described in 3GPP TS 23.040 [4] for SMS, in subclauses 3.10 and 9.2.3.24.

The use of Cell Broadcast DCS values for messages with a User Data Header structure implies that the 82-bytes CB payload has a User Data Header structure.

The CBS message information field will contain the IEs as described in 3GPP TS 23.040. The concatenation IEs will not be used, as CB concatenation will rely in that case on the existing CB mechanism. Note that IEs that cannot be split and that IEs that are too large to fit in one CB segment cannot be transmitted using this mechanism. Also, some IEs as defined for SMS are not applicable for CB:

VALUE (hex)

MEANING

00

Concatenated short messages, 8-bit reference number

01

Special SMS Message Indication

06

SMSC Control Parameters

08

Concatenated short message, 16-bit reference number

20

RFC 822 E-Mail Header

23

Enhanced Voice Mail Information

70-7F

(U)SIM Toolkit Security Headers

80-89

SME to SME specific use