5 CBS Data Coding Scheme
23.0383GPPAlphabets and language-specific informationRelease 18TS
The CBS Data Coding Scheme indicates the intended handling of the message at the MS, the character set/coding, and the language (when applicable). Any reserved codings shall be assumed to be the GSM 7 bit default alphabet (the same as codepoint 00001111) by a receiving entity. The octet is used according to a coding group which is indicated in bits 7..4. The octet is then coded as follows:
Coding Group Bits 7..4 |
Use of bits 3..0 |
---|---|
0000 |
Language using the GSM 7 bit default alphabet |
Bits 3..0 indicate the language: |
|
0000 German |
|
0001 English |
|
0010 Italian |
|
0011 French |
|
0100 Spanish |
|
0101 Dutch |
|
0110 Swedish |
|
0111 Danish |
|
1000 Portuguese |
|
1001 Finnish |
|
1010 Norwegian |
|
1011 Greek |
|
1100 Turkish |
|
1101 Hungarian 1110 Polish |
|
1111 Language unspecified |
|
0001 |
0000 GSM 7 bit default alphabet; message preceded by language indication. The first 3 characters of the message are a two-character representation of the language encoded according to ISO 639 [12], followed by a CR character. The CR character is then followed by 90 characters of text (NOTE 1). 0001 UCS2; message preceded by language indication The message starts with a two GSM 7-bit default alphabet character representation of the language encoded according to ISO 639 [12]. This is padded to the octet boundary with two bits set to 0 and then followed by 40 characters of UCS2-encoded message (NOTE 1). An MS not supporting UCS2 coding will present the two character language identifier followed by improperly interpreted user data. 0010..1111 Reserved |
0010.. |
0000 Czech 0001 Hebrew (NOTE 2) 0010 Arabic (NOTE 2) 0011 Russian (NOTE 2) 0100 Icelandic 0101..1111 Reserved for other languages using the GSM 7 bit default alphabet, with unspecified handling at the MS |
0011 |
0000..1111 Reserved for other languages using the GSM 7 bit default alphabet, with unspecified handling at the MS |
01xx |
General Data Coding indication Bits 5..0 indicate the following: |
Bit 5, if set to 0, indicates the text is uncompressed Bit 5, if set to 1, indicates the text is compressed using the compression algorithm defined in 3GPP TS 23.042 [13] |
|
Bit 4, if set to 0, indicates that bits 1 to 0 are reserved and have no message class meaning Bit 4, if set to 1, indicates that bits 1 to 0 have a message class meaning: |
|
Bit 1 Bit 0 Message Class: |
|
0 0 Class 0 |
|
0 1 Class 1 Default meaning: ME-specific. |
|
1 0 Class 2 (U)SIM specific message. |
|
1 1 Class 3 Default meaning: TE-specific (see 3GPP TS 27.005 [8]) |
|
Bits 3 and 2 indicate the character set being used, as follows: |
|
Bit 3 Bit 2 Character set: |
|
0 0 GSM 7 bit default alphabet |
|
0 1 8 bit data |
|
1 0 UCS2 (16 bit) [10] |
|
1 1 Reserved |
|
1000 |
Reserved coding groups |
1001 |
Message with User Data Header (UDH) structure: |
Bit 1 Bit 0 Message Class: |
|
0 0 Class 0 |
|
0 1 Class 1 Default meaning: ME-specific. |
|
1 0 Class 2 (U)SIM specific message. |
|
1 1 Class 3 Default meaning: TE-specific (see 3GPP TS 27.005 [8]) |
|
Bits 3 and 2 indicate the alphabet being used, as follows: |
|
Bit 3 Bit 2 Alphabet: |
|
0 0 GSM 7 bit default alphabet |
|
0 1 8 bit data |
|
1 0 USC2 (16 bit) [10] |
|
1 1 Reserved |
|
1010..1100 |
Reserved coding groups |
1101 |
I1 protocol message defined in 3GPP TS 24.294 [19] |
1110 |
Defined by the WAP Forum [15] |
1111 |
Data coding / message handling |
Bit 3 is reserved, set to 0. |
|
Bit 2 Message coding: |
|
0 GSM 7 bit default alphabet |
|
1 8 bit data |
|
Bit 1 Bit 0 Message Class: |
|
0 0 No message class. |
|
0 1 Class 1 user defined. |
|
1 0 Class 2 user defined. |
|
1 1 Class 3 |
|
default meaning: TE specific (see 3GPP TS 27.005 [8]) |
|
NOTE 1: The language indication shall appear at the start of each Message Information Page (see 3GPP TS 23.041 [5]) and the language indication on each Message Information Page shall be for the same language. NOTE 2: Message text in Hebrew, Arabic and Russian cannot be encoded in the GSM 7-bit default alphabet. For these languages UCS2 encoding shall be used. |
These codings may also be used for USSD and MMI/display purposes.
The message length specified in this subclause is not applicable for UTRAN, E-UTRAN, and NG-RAN, but only applicable for GSM.
See 3GPP TS 24.090 [11] for specific coding values applicable to USSD for MS originated USSD messages and MS terminated USSD messages. USSD messages using the default alphabet are coded with the GSM 7-bit default alphabet given in clause 6.2.1. The message can then consist of up to 182 user characters.
Cell Broadcast messages using the default alphabet are coded with the GSM 7-bit default alphabet given in clause 6.2.1. The Message Information Page then consists of 93 user characters.
If the GSM 7 bit default alphabet extension mechanism is used then the number of displayable characters will reduce by one for every instance where the GSM 7 bit default alphabet extension table is usedCell Broadcast Messages Information Page using 8-bit data have user-defined coding, and will each be 82 octets in length.
UCS2 character set indicates that the message is coded in UCS2 [10]. The General notes specified in clause 6.1.1 override any contrary specification in UCS2, so for example even in UCS2 a <CR> character will cause the MS to return to the beginning of the current line and overwrite any existing text with the characters which follow the <CR>. Cell Broadcast Messages Information Page encoded in UCS2 consist of 41 characters each.
When a CBS message received by the MS is message class 0 and the MS has the capability of displaying CBS messages, the MS shall display the message immediately. The message shall not be automatically stored in the (U)SIM or ME.
The ME may make provision through MMI for the user to selectively prevent the message from being displayed immediately.
If the ME is incapable of displaying CBS messages or if the immediate display of the message has been disabled through MMI then the ME shall treat the CBS message as though there was no message class, i.e. it will ignore bits 0 and 1 in the TP-DCS but may store the message either on the ME or on the (U)SIM.
Class 1 and Class 2 messages may be routed by the ME to user-defined destinations, but the user may override any default meaning and select their own routing.
Class 3 messages will normally be selected for transfer to a TE, in cases where a ME supports an SMS/CBS interface to a TE, and the TE requests "TE-specific" cell broadcast messages (see 3GPP TS 27.005 [8]). The user may be able to override the default meaning and select their own routing.
Messages with a User Data Header Structure are encoded as described in 3GPP TS 23.040 [4] for SMS, in subclauses 3.10 and 9.2.3.24.
The use of Cell Broadcast DCS values for messages with a User Data Header structure implies that the 82-bytes CB payload has a User Data Header structure.
The CBS message information field will contain the IEs as described in 3GPP TS 23.040. The concatenation IEs will not be used, as CB concatenation will rely in that case on the existing CB mechanism. Note that IEs that cannot be split and that IEs that are too large to fit in one CB segment cannot be transmitted using this mechanism. Also, some IEs as defined for SMS are not applicable for CB:
VALUE (hex) |
MEANING |
00 |
Concatenated short messages, 8-bit reference number |
01 |
Special SMS Message Indication |
06 |
SMSC Control Parameters |
08 |
Concatenated short message, 16-bit reference number |
20 |
RFC 822 E-Mail Header |
23 |
Enhanced Voice Mail Information |
70-7F |
(U)SIM Toolkit Security Headers |
80-89 |
SME to SME specific use |