6 Individual parameters
23.0383GPPAlphabets and language-specific informationRelease 18TS
6.1 General principles
6.1.1 General notes
Except where otherwise indicated, the following shall apply to all character sets:
1: The characters marked "1)" are not used but are displayed as a space.
2: The characters of this set, when displayed, should approximate to the appearance of the relevant characters specified in ISO 1073 [16]and the relevant national standards.
3: Control characters:
Code |
Meaning |
LF |
Line feed: Any characters following LF which are to be displayed shall be presented as the next line of the message, commencing with the first character position. |
CR |
Carriage return: Any characters following CR which are to be displayed shall be presented as the current line of the message, commencing with the first character position. |
SP |
Space character. |
4: The display of characters within a message is achieved by taking each character in turn and placing it in the next available space from left to right and top to bottom.
6.1.2 Character packing
6.1.2.1 SMS Packing
6.1.2.1.1 Packing of 7-bit characters
If a character number α is noted in the following way:
b7 b6 b5 b4 b3 b2 b1
αa αb αc αd αe αf αg
The packing of the 7-bitscharacters in octets is done by completing the octets with zeros on the left.
For examples, packing: α
– one character in one octet:
– bits number:
7 6 5 4 3 2 1 0
0 1a 1b 1c 1d 1e 1f 1g
– two characters in two octets:
– bits number:
7 6 5 4 3 2 1 0
2g 1a 1b 1c 1d 1e 1f 1g
0 0 2a 2b 2c 2d 2e 2f
– three characters in three octets:
– bits number:
7 6 5 4 3 2 1 0
2g 1a 1b 1c 1d 1e 1f 1g
3f 3g 2a 2b 2c 2d 2e 2f
0 0 0 3a 3b 3c 3d 3e
– seven characters in seven octets:
– bits number:
7 6 5 4 3 2 1 0
2g 1a 1b 1c 1d 1e 1f 1g
3f 3g 2a 2b 2c 2d 2e 2f
4e 4f 4g 3a 3b 3c 3d 3e
5d 5e 5f 5g 4a 4b 4c 4d
6c 6d 6e 6f 6g 5a 5b 5c
7b 7c 7d 7e 7f 7g 6a 6b
0 0 0 0 0 0 0 7a
– eight characters in seven octets:
– bits number:
7 6 5 4 3 2 1 0
2g 1a 1b 1c 1d 1e 1f 1g
3f 3g 2a 2b 2c 2d 2e 2f
4e 4f 4g 3a 3b 3c 3d 3e
5d 5e 5f 5g 4a 4b 4c 4d
6c 6d 6e 6f 6g 5a 5b 5c
7b 7c 7d 7e 7f 7g 6a 6b
8a 8b 8c 8d 8e 8f 8g 7a
The bit number zero is always transmitted first.
Therefore, in 140 octets, it is possible to pack (140×8)/7=160 characters.
6.1.2.2 CBS Packing
6.1.2.2.1 Packing of 7-bit characters
If a character number α is noted in the following way:
b7 b6 b5 b4 b3 b2 b1
αa αb αc αd αe αf αg
the packing of the 7-bits characters in octets is done as follows:
bit number
7 6 5 4 3 2 1 0
octet number
1 2g 1a 1b 1c 1d 1e 1f 1g
2 3f 3g 2a 2b 2c 2d 2e 2f
3 4e 4f 4g 3a 3b 3c 3d 3e
4 5d 5e 5f 5g 4a 4b 4c 4d
5 6c 6d 6e 6f 6g 5a 5b 5c
6 7b 7c 7d 7e 7f 7g 6a 6b
7 8a 8b 8c 8d 8e 8f 8g 7a
8 10g 9a 9b 9c 9d 9e 9f 9g
.
.
81 93d 93e 93f 93g 92a 92b 92c 92d
82 0 0 0 0 0 93a 93b 93c
The bit number zero is always transmitted first.
Therefore, in 82 octets, it is possible to pack (82×8)/7 = 93.7, that is 93 characters. The 5 remaining bits are set to zero as stated above.
6.1.2.3 USSD packing
6.1.2.3.1 Packing of 7 bit characters
If a character number α is noted in the following way:
b7 b6 b5 b4 b3 b2 b1
αa αb αc αd αe αf αg
The packing of the 7-bit characters in octets is done by completing the octets with zeros on the left.
For example, packing: α
– one character in one octet:
– bits number:
7 6 5 4 3 2 1 0
0 1a 1b 1c 1d 1e 1f 1g
– two characters in two octets:
– bits number:
7 6 5 4 3 2 1 0
2g 1a 1b 1c 1d 1e 1f 1g
0 0 2a 2b 2c 2d 2e 2f
– three characters in three octets:
– bits number:
7 6 5 4 3 2 1 0
2g 1a 1b 1c 1d 1e 1f 1g
3f 3g 2a 2b 2c 2d 2e 2f
0 0 0 3a 3b 3c 3d 3e
– six characters in six octets:
– bits number:
7 6 5 4 3 2 1 0
2g 1a 1b 1c 1d 1e 1f 1g
3f 3g 2a 2b 2c 2d 2e 2f
4e 4f 4g 3a 3b 3c 3d 3e
5d 5e 5f 5g 4a 4b 4c 4d
6c 6d 6e 6f 6g 5a 5b 5c
0 0 0 0 0 0 6a 6b
– seven characters in seven octets:
– bits number:
7 6 5 4 3 2 1 0
2g 1a 1b 1c 1d 1e 1f 1g
3f 3g 2a 2b 2c 2d 2e 2f
4e 4f 4g 3a 3b 3c 3d 3e
5d 5e 5f 5g 4a 4b 4c 4d
6c 6d 6e 6f 6g 5a 5b 5c
7b 7c 7d 7e 7f 7g 6a 6b
0 0 0 1 1 0 1 7a
The bit number zero is always transmitted first.
– eight characters in seven octets:
– bits number:
7 6 5 4 3 2 1 0
2g 1a 1b 1c 1d 1e 1f 1g
3f 3g 2a 2b 2c 2d 2e 2f
4e 4f 4g 3a 3b 3c 3d 3e
5d 5e 5f 5g 4a 4b 4c 4d
6c 6d 6e 6f 6g 5a 5b 5c
7b 7c 7d 7e 7f 7g 6a 6b
8a 8b 8c 8d 8e 8f 8g 7a
– nine characters in eight octets:
– bits number:
7 6 5 4 3 2 1 0
2g 1a 1b 1c 1d 1e 1f 1g
3f 3g 2a 2b 2c 2d 2e 2f
4e 4f 4g 3a 3b 3c 3d 3e
5d 5e 5f 5g 4a 4b 4c 4d
6c 6d 6e 6f 6g 5a 5b 5c
7b 7c 7d 7e 7f 7g 6a 6b
8a 8b 8c 8d 8e 8f 8g 7a
0 9a 9b 9c 9d 9e 9f 9g
– fifteen characters in fourteen octets:
– bits number:
7 6 5 4 3 2 1 0
2g 1a 1b 1c 1d 1e 1f 1g
3f 3g 2a 2b 2c 2d 2e 2f
4e 4f 4g 3a 3b 3c 3d 3e
5d 5e 5f 5g 4a 4b 4c 4d
6c 6d 6e 6f 6g 5a 5b 5c
7b 7c 7d 7e 7f 7g 6a 6b
8a 8b 8c 8d 8e 8f 8g 7a
10g 9a 9b 9c 9d 9e 9f 9g
11f 11g 10a 10b 10c 10d 10e 10f
12e 12f 12g 11a 11b 11c 11d 11e
13d 13e 13f 13g 12a 12b 12c 12d
14c 14d 14e 14f 14g 13a 13b 13c
15b 15c 15d 15e 15f 15g 14a 14b
0 0 0 1 1 0 1 15a
– sixteen characters in fourteen octets:
– bits number:
7 6 5 4 3 2 1 0
2g 1a 1b 1c 1d 1e 1f 1g
3f 3g 2a 2b 2c 2d 2e 2f
4e 4f 4g 3a 3b 3c 3d 3e
5d 5e 5f 5g 4a 4b 4c 4d
6c 6d 6e 6f 6g 5a 5b 5c
7b 7c 7d 7e 7f 7g 6a 6b
8a 8b 8c 8d 8e 8f 8g 7a
10g 9a 9b 9c 9d 9e 9f 9g
11f 11g 10a 10b 10c 10d 10e 10f
12e 12f 12g 11a 11b 11c 11d 11e
13d 13e 13f 13g 12a 12b 12c 12d
14c 14d 14e 14f 14g 13a 13b 13c
15b 15c 15d 15e 15f 15g 14a 14b
16a 16b 16c 16d 16e 16f 16g 15a
The bit number zero is always transmitted first.
Therefore, in 160 octets, is it possible to pack (160*8)/7 = 182.8, that is 182 characters. The remaining 6 bits are set to zero as stated above.
Packing of 7 bit characters in USSD strings is done in the same way as for SMS (clause 6.1.2.1). The character stream is bit padded to octet boundary with binary zeroes as shown above.
If the total number of characters to be sent equals (8n‑1) where n=1,2,3 etc. then there are 7 spare bits at the end of the message. To avoid the situation where the receiving entity confuses 7 binary zero pad bits as the @ character, the carriage return or <CR> character (defined in clause 6.1.1) shall be used for padding in this situation, just as for Cell Broadcast.
If <CR> is intended to be the last character and the message (including the wanted <CR>) ends on an octet boundary, then another <CR> must be added together with a padding bit 0. The receiving entity will perform the carriage return function twice, but this will not result in misoperation as the definition of <CR> in clause 6.1.1 is identical to the definition of <CR><CR>.
The receiving entity shall remove the final <CR> character where the message ends on an octet boundary with <CR> as the last character.
6.2 Character sets and coding
This section provides list of character sets and codings to be supported by SMS, CBS, USSD and IEs included in NAS messages as specified in 3GPP TS 24.008 [20] and 3GPP TS 24.301 [21]. Implementation of the GSM 7 bit default alphabet is mandatory. Support of other character sets is optional.
It should be noted that support of Latin and non-Latin languages by GSM 7 bit default alphabet is limited. It is therefore essential to introduce UCS 2 character set in mobile stations, SCs and systems handling SMSs, CBSs, USSDs, and IEs included in NAS messages.
6.2.1 GSM 7 bit Default Alphabet
Bits per character: 7
CBS/USSD/IE of NAS message pad character: CR
Character table:
b7 |
0 |
0 |
0 |
0 |
1 |
1 |
1 |
1 |
||||
b6 |
0 |
0 |
1 |
1 |
0 |
0 |
1 |
1 |
||||
b5 |
0 |
1 |
0 |
1 |
0 |
1 |
0 |
1 |
||||
b4 |
b3 |
b2 |
b1 |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|
0 |
0 |
0 |
0 |
0 |
@ |
SP |
0 |
¡ |
P |
¿ |
p |
|
0 |
0 |
0 |
1 |
1 |
£ |
_ |
! |
1 |
A |
Q |
a |
q |
0 |
0 |
1 |
0 |
2 |
$ |
" |
2 |
B |
R |
b |
r |
|
0 |
0 |
1 |
1 |
3 |
¥ |
# |
3 |
C |
S |
c |
s |
|
0 |
1 |
0 |
0 |
4 |
è |
¤ |
4 |
D |
T |
d |
t |
|
0 |
1 |
0 |
1 |
5 |
é |
% |
5 |
E |
U |
e |
u |
|
0 |
1 |
1 |
0 |
6 |
ù |
& |
6 |
F |
V |
f |
v |
|
0 |
1 |
1 |
1 |
7 |
ì |
‘ |
7 |
G |
W |
g |
w |
|
1 |
0 |
0 |
0 |
8 |
ò |
( |
8 |
H |
X |
h |
x |
|
1 |
0 |
0 |
1 |
9 |
Ç |
) |
9 |
I |
Y |
i |
y |
|
1 |
0 |
1 |
0 |
10 |
LF |
* |
: |
J |
Z |
j |
z |
|
1 |
0 |
1 |
1 |
11 |
Ø |
1) |
+ |
; |
K |
Ä |
k |
ä |
1 |
1 |
0 |
0 |
12 |
ø |
Æ |
, |
< |
L |
Ö |
l |
ö |
1 |
1 |
0 |
1 |
13 |
CR |
æ |
– |
= |
M |
Ñ |
m |
ñ |
1 |
1 |
1 |
0 |
14 |
Å |
ß |
. |
> |
N |
Ü |
n |
ü |
1 |
1 |
1 |
1 |
15 |
å |
É |
/ |
? |
O |
§ |
o |
à |
NOTE 1): This code is an escape to an extension of this table (either to the GSM 7 bit default alphabet extension table, see subclause 6.2.1.1, or a National Language Single Shift Table, see subclause 6.2.1.2.2). A receiving entity which does not understand the meaning of this escape mechanism shall display it as a space character. |
6.2.1.1 GSM 7 bit default alphabet extension table
The table below is reserved for symbols of international significance (e.g currency symbols). It also contains a mechanism to permit escape (Note 1) to additional tables for symbols of international significance in the event that the table below becomes fully populated.
b7 |
0 |
0 |
0 |
0 |
1 |
1 |
1 |
1 |
||||
b6 |
0 |
0 |
1 |
1 |
0 |
0 |
1 |
1 |
||||
b5 |
0 |
1 |
0 |
1 |
0 |
1 |
0 |
1 |
||||
b4 |
b3 |
b2 |
b1 |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|
0 |
0 |
0 |
0 |
0 |
| |
|||||||
0 |
0 |
0 |
1 |
1 |
||||||||
0 |
0 |
1 |
0 |
2 |
||||||||
0 |
0 |
1 |
1 |
3 |
||||||||
0 |
1 |
0 |
0 |
4 |
^ |
|||||||
0 |
1 |
0 |
1 |
5 |
€ |
|||||||
0 |
1 |
1 |
0 |
6 |
||||||||
0 |
1 |
1 |
1 |
7 |
||||||||
1 |
0 |
0 |
0 |
8 |
{ |
|||||||
1 |
0 |
0 |
1 |
9 |
} |
|||||||
1 |
0 |
1 |
0 |
10 |
3) |
|||||||
1 |
0 |
1 |
1 |
11 |
1) |
|||||||
1 |
1 |
0 |
0 |
12 |
[ |
|||||||
1 |
1 |
0 |
1 |
13 |
~ |
|||||||
1 |
1 |
1 |
0 |
14 |
] |
|||||||
1 |
1 |
1 |
1 |
15 |
\ |
|||||||
In the event that an MS receives a code where a symbol is not represented in the above table then the MS shall display either the character shown in the main GSM 7 bit default alphabet table in subclause 6.2.1., or the character from the National Language Locking Shift Table in the case where the locking shift mechanism as defined in subclause 6.2.1.2.3 is used. NOTE 1): This code is reserved for the extension to another extension table. On receipt of this code, a receiving entity shall display a space until another extension table is defined. It is not intended that this extension mechanism should be used as an alternative to UCS2 to enhance the 7bit default alphabet character repertoire for national specific character sets. NOTE 2): Void NOTE 3): This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet table extension mechanism will treat this character as Line Feed. |
6.2.1.2 National Language Identifier
6.2.1.2.1 Introduction
The national language tables are used for adding the special characters of certain languages that cannot be expressed using the GSM default 7 bit alphabet.
The principle is to use the National Language Identifier to indicate to a receiving entity that the message has been encoded using a national language table. Both single shift and locking shift mechanisms are defined.
The single shift mechanism, as defined in subclause 6.2.1.2.2, applies to a single character and it replaces the GSM 7 bit default alphabet extension table defined in subclause 6.2.1.1 with a National Language Single Shift Table (see subclause A.2).
The locking shift mechanism, as defined in subclause 6.2.1.2.3, applies throughout the message, or the current segment in case of a concatenated message, and it replaces the GSM 7 bit default alphabet defined in subclause 6.2.1 with a National Language Locking Shift Table (see subclause A.3) that defines the whole character set needed for the language.
In case that several languages are used, which require different national language tables, it is recommended to encode the message in UCS-2, however it is possible to use both single shift and locking shift with the corresponding tables in a single message.
Implementations based on older reference versions (so-called "legacy implementations") will use the fallback mechanisms that are defined in the earlier versions of the specification for handling of unknown characters.
6.2.1.2.2 Single shift mechanism
In the case where single shift is not combined with locking shift, single shift means that the receiving entity shall decode all characters in the message (or the current segment in case of a concatenated message) using the GSM 7 bit default alphabet unless the escape mechanism is used, i.e <escape><character>, as defined in subclause 6.2.1.
The case where single shift and locking shift (which may be for the same or different languages) are combined is described in subclause 6.2.1.2.3.
If the escape mechanism is used then instead of the GSM 7 bit default alphabet extension table in subclause 6.2.1.1 the receiving entity shall decode the subsequent character using the National Language Single Shift Table for the indicated language in table 6.2.1.2.4.1. Each time a sending entity requires to send a character from the National Language Single Shift Table the sending entity shall encode this as <escape><character>, where the <character> is encoded using the indicated National Language Single Shift Table.
6.2.1.2.3 Locking shift mechanism
Locking Shift means that the receiving entity shall decode all characters in the message (or the current segment in case of a concatenated message) using the National Language Locking Shift Table unless the escape mechanism is used. i.e. <escape><character>, as defined in subclause 6.2.1.
If the escape mechanism is used and no National Language Single Shift Table is indicated (see subclause 6.2.1.2.4), the receiving entity shall decode the message (or the current segment in case of a concatenated message) using the GSM 7 bit default alphabet extension table as defined in subclause 6.2.1.1.
If the escape mechanism is used and a National Language Single Shift Table is indicated (see subclause 6.2.1.2.4), the receiving entity shall decode the message (or the current segment in case of a concatenated message) using the National Language Single Shift Table as defined in subclause 6.2.1.2.2.
6.2.1.2.4 National Language Identifier
A National Language Single Shift IE and a National Language Locking Shift IE can be included in the TP User Data Header, as defined in 3GPP TS 23.040 [4]. The receiving entity shall decode using single shift or locking shift as applicable for the language indicated in the National Language Identifier within these IEs.
The National Language Identifier octet is encoded as shown in table 6.2.1.2.4.1.
Table 6.2.1.2.4.1
Language code b7……b0 |
Language |
National Language Single Shift Table |
National Language Locking Shift Table |
00000000 |
Reserved |
n/a |
n/a |
00000001 |
Turkish |
Subclause A.2.1 |
Subclause A.3.1 |
00000010 |
Spanish |
Subclause A.2.2 |
Not defined – fallback to GSM 7 bit default alphabet (see subclause 6.2.1) |
00000011 |
Portuguese |
Subclause A.2.3 |
Subclause A.3.3 |
00000100 |
Bengali |
Subclause A.2.4 |
Subclause A.3.4 |
00000101 |
Gujarati |
Subclause A.2.5 |
Subclause A.3.5 |
00000110 |
Hindi |
Subclause A.2.6 |
Subclause A.3.6 |
00000111 |
Kannada |
Subclause A.2.7 |
Subclause A.3.7 |
00001000 |
Malayalam |
Subclause A.2.8 |
Subclause A.3.8 |
00001001 |
Oriya |
Subclause A.2.9 |
Subclause A.3.9 |
00001010 |
Punjabi |
Subclause A.2.10 |
Subclause A.3.10 |
00001011 |
Tamil |
Subclause A.2.11 |
Subclause A.3.11 |
00001100 |
Telugu |
Subclause A.2.12 |
Subclause A.3.12 |
00001101 |
Urdu |
Subclause A.2.13 |
Subclause A.3.13 |
00001110 to 11111111 |
Reserved |
n/a |
n/a |
6.2.1.2.5 Processing of national language characters
When supporting a specific national language, the sending entity shall support the encoding of messages using the corresponding National Language Identifier defined in subclause 6.2.1.2.4.
The receiving entity should be able to decode messages usingthe National Language Identifiers defined in subclause 6.2.1.2.4 for the languages that are supported by that entity.
If a message is received, containing a National Language Identifier indicating a reserved value or a value that is not supported by the receiving entity, the receiving entity shall ignore the IE (see 3GPP TS 23.040 [4]) in which the National Language Identifier was indicated.
The receiving entity shall be capable of processing both single shift and locking shift within the same message.
It is an implementation option for the sending entity whether to use the single shift mechanism, the locking shift mechanism or both.
NOTE 1: A message using the locking shift mechanism cannot make use of characters from the GSM 7 bit Default Alphabet table unless such characters are replicated in the National Language Locking Shift Table or (in the case of locking shift and single shift), the National Language Single Shift table.
NOTE 2: Encoding of a message using the national locking shift mechanism is not intended to be implemented until a formal request is issued by the relevant national regulatory body. This is because a receiving entity not supporting the relevant locking-shift decoding will present different characters from the ones intended by the sending entity.
NOTE 3: An SMS message using a locking shift table for a language may not be properly displayed when the terminal does not support the locking shift table for that language. When the network is aware of the list of the locking shift tables supported by the UE, the network can deliver the SMS messages using an appropriate encoding.
6.2.2 8 bit data
8 bit data is user defined
Padding: CR in the case of an 8 bit character set
Otherwise – user defined
Character table: User Specific
6.2.3 UCS2
Bits per character: 16
CBS/USSD pad character: CR
Character table: ISO/IEC 10646 [10]
Annex A (normative):
National Language Tables