5 CBS Data Coding Scheme

23.0383GPPAlphabets and language-specific informationRelease 18TS

Tools: ARFCN - Frequency Conversion for 5G NR/LTE/UMTS/GSM

The CBS Data Coding Scheme indicates the intended handling of the message at the MS, the character set/coding, and the language (when applicable). Any reserved codings shall be assumed to be the GSM 7 bit default alphabet (the same as codepoint 00001111) by a receiving entity. The octet is used according to a coding group which is indicated in bits 7..4. The octet is then coded as follows:

Coding Group Bits 7..4	Use of bits 3..0
0000	Language using the GSM 7 bit default alphabet

	Bits 3..0 indicate the language:
	0000 German
	0001 English
	0010 Italian
	0011 French
	0100 Spanish
	0101 Dutch
	0110 Swedish
	0111 Danish
	1000 Portuguese
	1001 Finnish
	1010 Norwegian
	1011 Greek
	1100 Turkish
	1101 Hungarian 1110 Polish
	1111 Language unspecified
0001	0000 GSM 7 bit default alphabet; message preceded by language indication. The first 3 characters of the message are a two-character representation of the language encoded according to ISO 639 [12], followed by a CR character. The CR character is then followed by 90 characters of text (NOTE 1). 0001 UCS2; message preceded by language indication The message starts with a two GSM 7-bit default alphabet character representation of the language encoded according to ISO 639 [12]. This is padded to the octet boundary with two bits set to 0 and then followed by 40 characters of UCS2-encoded message (NOTE 1). An MS not supporting UCS2 coding will present the two character language identifier followed by improperly interpreted user data. 0010..1111 Reserved
0010..	0000 Czech 0001 Hebrew (NOTE 2) 0010 Arabic (NOTE 2) 0011 Russian (NOTE 2) 0100 Icelandic 0101..1111 Reserved for other languages using the GSM 7 bit default alphabet, with unspecified handling at the MS
0011	0000..1111 Reserved for other languages using the GSM 7 bit default alphabet, with unspecified handling at the MS
01xx	General Data Coding indication Bits 5..0 indicate the following:

	Bit 5, if set to 0, indicates the text is uncompressed Bit 5, if set to 1, indicates the text is compressed using the compression algorithm defined in 3GPP TS 23.042 [13]

	Bit 4, if set to 0, indicates that bits 1 to 0 are reserved and have no message class meaning Bit 4, if set to 1, indicates that bits 1 to 0 have a message class meaning:

	Bit 1 Bit 0 Message Class:
	0 0 Class 0
	0 1 Class 1 Default meaning: ME-specific.
	1 0 Class 2 (U)SIM specific message.
	1 1 Class 3 Default meaning: TE-specific (see 3GPP TS 27.005 [8])

	Bits 3 and 2 indicate the character set being used, as follows:
	Bit 3 Bit 2 Character set:
	0 0 GSM 7 bit default alphabet
	0 1 8 bit data
	1 0 UCS2 (16 bit) [10]
	1 1 Reserved
1000	Reserved coding groups
1001	Message with User Data Header (UDH) structure:

	Bit 1 Bit 0 Message Class:
	0 0 Class 0
	0 1 Class 1 Default meaning: ME-specific.
	1 0 Class 2 (U)SIM specific message.
	1 1 Class 3 Default meaning: TE-specific (see 3GPP TS 27.005 [8])

	Bits 3 and 2 indicate the alphabet being used, as follows:
	Bit 3 Bit 2 Alphabet:
	0 0 GSM 7 bit default alphabet
	0 1 8 bit data
	1 0 USC2 (16 bit) [10]
	1 1 Reserved
1010..1100	Reserved coding groups
1101	I1 protocol message defined in 3GPP TS 24.294 [19]
1110	Defined by the WAP Forum [15]
1111	Data coding / message handling

	Bit 3 is reserved, set to 0.

	Bit 2 Message coding:
	0 GSM 7 bit default alphabet
	1 8 bit data

	Bit 1 Bit 0 Message Class:
	0 0 No message class.
	0 1 Class 1 user defined.
	1 0 Class 2 user defined.
	1 1 Class 3
	default meaning: TE specific (see 3GPP TS 27.005 [8])
NOTE 1: The language indication shall appear at the start of each Message Information Page (see 3GPP TS 23.041 [5]) and the language indication on each Message Information Page shall be for the same language. NOTE 2: Message text in Hebrew, Arabic and Russian cannot be encoded in the GSM 7-bit default alphabet. For these languages UCS2 encoding shall be used.

These codings may also be used for USSD and MMI/display purposes.

The message length specified in this subclause is not applicable for UTRAN, E-UTRAN, and NG-RAN, but only applicable for GSM.

See 3GPP TS 24.090 [11] for specific coding values applicable to USSD for MS originated USSD messages and MS terminated USSD messages. USSD messages using the default alphabet are coded with the GSM 7-bit default alphabet given in clause 6.2.1. The message can then consist of up to 182 user characters.

Cell Broadcast messages using the default alphabet are coded with the GSM 7-bit default alphabet given in clause 6.2.1. The Message Information Page then consists of 93 user characters.

If the GSM 7 bit default alphabet extension mechanism is used then the number of displayable characters will reduce by one for every instance where the GSM 7 bit default alphabet extension table is usedCell Broadcast Messages Information Page using 8-bit data have user-defined coding, and will each be 82 octets in length.

UCS2 character set indicates that the message is coded in UCS2 [10]. The General notes specified in clause 6.1.1 override any contrary specification in UCS2, so for example even in UCS2 a <CR> character will cause the MS to return to the beginning of the current line and overwrite any existing text with the characters which follow the <CR>. Cell Broadcast Messages Information Page encoded in UCS2 consist of 41 characters each.

When a CBS message received by the MS is message class 0 and the MS has the capability of displaying CBS messages, the MS shall display the message immediately. The message shall not be automatically stored in the (U)SIM or ME.

The ME may make provision through MMI for the user to selectively prevent the message from being displayed immediately.

If the ME is incapable of displaying CBS messages or if the immediate display of the message has been disabled through MMI then the ME shall treat the CBS message as though there was no message class, i.e. it will ignore bits 0 and 1 in the TP-DCS but may store the message either on the ME or on the (U)SIM.

Class 1 and Class 2 messages may be routed by the ME to user-defined destinations, but the user may override any default meaning and select their own routing.

Class 3 messages will normally be selected for transfer to a TE, in cases where a ME supports an SMS/CBS interface to a TE, and the TE requests "TE-specific" cell broadcast messages (see 3GPP TS 27.005 [8]). The user may be able to override the default meaning and select their own routing.

Messages with a User Data Header Structure are encoded as described in 3GPP TS 23.040 [4] for SMS, in subclauses 3.10 and 9.2.3.24.

The use of Cell Broadcast DCS values for messages with a User Data Header structure implies that the 82-bytes CB payload has a User Data Header structure.

The CBS message information field will contain the IEs as described in 3GPP TS 23.040. The concatenation IEs will not be used, as CB concatenation will rely in that case on the existing CB mechanism. Note that IEs that cannot be split and that IEs that are too large to fit in one CB segment cannot be transmitted using this mechanism. Also, some IEs as defined for SMS are not applicable for CB:

VALUE (hex)	MEANING
00	Concatenated short messages, 8-bit reference number
01	Special SMS Message Indication
06	SMSC Control Parameters
08	Concatenated short message, 16-bit reference number
20	RFC 822 E-Mail Header
23	Enhanced Voice Mail Information
70-7F	(U)SIM Toolkit Security Headers
80-89	SME to SME specific use