7 Implementation considerations

35.2313GPPDocument 1: Algorithm specificationRelease 17Specification of the TUAK algorithm set: A second example algorithm set for the 3GPP authentication and key generation functions f1, f1*, f2, f3, f4, f5 and f5*TS

7.1 TOPC computed on or off the UICC?

It will be seen in clause 6.1 that TOP C is computed from OP and K, and that it is only TOP C, not TOP, that is ever used in subsequent computations.

As for OPC in MILENAGE, it is recommended that TOPC be computed off the UICC where possible, and that TOPC rather than TOP be loaded to the UICC for use in subsequent computations. This should also apply when updating an embedded UICC (eUICC) : the value of TOPC (and not TOP) should be loaded to the eUICC in conjunction with the new K and other operator customization parameters.

This gives the following benefits:

– The complexity of the algorithms run on the UICC is reduced.

– It is more likely that TOP can be kept secret. (If TOP is stored on the UICC, it only takes one UICC to be reverse engineered for TOP to be discovered and published. But it should be difficult for someone who has discovered even a large number of (TOPC, K) pairs to deduce TOP. That means that the TOPC associated with any other value of K will be unknown, which may make it harder to mount some kinds of cryptanalytic and forgery attacks. The algorithms are designed to be secure whether or not TOP is known to the attacker, but a secret TOP is one more hurdle in the attacker’s path.)

7.2 Further customization

TOP obviously allows for some degree of operator customization.

Further, as described in clause 5.1, the lengths of K, and of MAC-A/MAC-S, RES, CK and IK can be chosen by the operator, although they have to be fixed in each particular implementation of Tuak. In a flexible implementation (e.g. UICC), these operator-chosen parameter lengths could be loaded to the UICC in conjunction with the associated K and TOPC.

Where compatibility is required with existing 3GPP specifications, the operator shall set the length of the K to 128 bits, the length of the RES to between 32 and 128 bits, the length of the MAC-A/MAC-S to 64 bits, the length of CK to 128 bits, and the length of IK to 128 bits.

If an even more secure version of this algorithm is required, this could be done by adding extra applications of the Keccak permutation before extracting output. These would be used in the derivation of TOP C (clause 6.1), and each of the algorithms f1, f1*, f2-f5, f5* (clauses 6.2 to 6.5). In each case, instead of:

Construct IN

OUT = Π(IN)

Extract outputs

the approach could be

Construct IN

OUT = Π(Π(IN))

Extract outputs

or

Construct IN

OUT = Π(Π(Π(IN)))

Extract outputs

or however many extra applications of the permutation are required. Again, in a flexible implementation (e.g. UICC), the number of iterations of Π may be loaded to the UICC as an operator-chosen parameter.

7.3 Resistance to side channel attacks

When these algorithms are implemented on a UICC, consideration should be given to protecting them against side channel attacks such as differential power analysis (DPA). [4, 6, 7, 8, 9, 10, 11] may be useful references.

Annex A (normative):
Tuak diagrams

Figure A.1 Tuak operation

The first diagram illustrates the derivation of TOPC.

The second diagram illustrates the derivation of either MAC-A (using the f1 function) or MAC-S (using the f1* function), with different values of the INSTANCE byte in each case.

The third diagram illustrates the derivation of RES (using the f2 function), CK (using f3), IK (using f4) and AK (using f5) or alternatively the derivation of AK using f5* (in which case the other three outputs should be ignored).

In all cases it is assumed that just one iteration of the Keccak permutation is used (see clause 7.2).

Note the 512-bit "capacity" of Keccak: only zeroes are input to the rightmost 512 bits, and no outputs are extracted from the rightmost 512 bits.

Annex B (informative):
TuakApplication Programme Interface ( AP) in ANSI CI

/* ————————————————————-

Constants and Typedefs

————————————————————-

*/

typedef unsigned char uint8;

static const uint8 ALGONAME[] = "TUAK1.0";

uint8 TOP[32]; /* Operator’s Configuration */

uint8 KEY_sz; /* = 16/32 bytes */

uint8 RES_sz; /* = 4/8/16/32 bytes */

uint8 CK_sz; /* = 16/32 bytes */

uint8 IK_sz; /* = 16/32 bytes */

uint8 MAC_sz; /* = 8/16/32 bytes */

uint8 KeccakIterations; /* >=1, number of iterations */

/* ————————————————————-

TUAK API Declaration

————————————————————-

*/

void TUAK_ComputeTOPC( uint8 *key, /* in, uint8[KEY_sz] */

uint8 *TOPC /* out, uint8[32] */

);

void TUAK_f1 ( uint8 *key, /* in, uint8[KEY_sz] */

uint8 *rand, /* in, uint8[16] */

uint8 *sqn, /* in, uint8[6] */

uint8 *amf, /* in, uint8[2] */

uint8 *mac /* out, uint8[MAC_sz] */

);

void TUAK_f2345 ( uint8 *key, /* in, uint8[KEY_sz] */

uint8 *rand, /* in, uint8[16] */

uint8 *res, /* out, uint8[RES_sz] */

uint8 *ck, /* out, uint8[CK_sz] */

uint8 *ik, /* out, uint8[IK_sz] */

uint8 *ak /* out, uint8[6] */

);

void TUAK_f1s ( uint8 *key, /* in, uint8[KEY_sz] */

uint8 *rand, /* in, uint8[16] */

uint8 *sqn, /* in, uint8[6] */

uint8 *amf, /* in, uint8[2] */

uint8 *mac /* out, uint8[MAC_sz] */

);

void TUAK_f5s ( uint8 *key, /* in, uint8[KEY_sz] */

uint8 *rand, /* in, uint8[16] */

uint8 *ak /* out, uint8[6] */

);

Annex C (normative):
Specification of the Keccak permutation used within Tuak

The following specification of the permutation П is extracted from the Keccak reference specification [3], section 1.2.

The permutation П is described as a sequence of operations on a state a that is a three-dimensional 5 x 5 x 64 array of elements of GF(2). The expression a[x][y][z] with x, y ∈ Z5 and z ∈ Z64, denotes the bit in position (x, y, z), with the indices starting from zero. The array a is mapped to the bits of an input and output string s by s[64.(5y + x) + z] = a[x][y][z], so that s is a 1600-bit string, indexed from bit 0 up to bit 1599. Note that expressions in the x and y coordinates should be taken modulo 5 and expressions in the z coordinate modulo 64. It is possible to omit the [z] index, both the [y][z] indices, or all three indices, implying that the statement is valid for all values of the omitted indices.

П is an iterated permutation, consisting of a sequence of 24 rounds R, indexed by ir from 0 to 23. Each round consists of five steps, R = ι ◦ χ ◦ π ◦ ρ ◦ θ, performed in the following order:

θ : a[x][y][z] ← a[x][y][z] + Σ4y′=0 a[x − 1][y′][z] + Σ4y′=0 a[x + 1][y′][z − 1],

ρ : a[x][y][z] ← a[x][y][z − (t + 1)(t + 2)/2],

with t satisfying 0 ≤ t < 24 and

( ) ( ) = ( ) in GF(5)2×2

0 1 t 1 x

2 3 0 y

or t = 1 if x = y = 0,

so that t is given by the following table: and (t + 1)(t + 2)/2 Z64 is given by:

x=

0

1

2

3

4

x=

0

1

2

3

4

y=0

-1

0

18

6

12

y=0

0

1

62

28

27

y=1

7

23

2

9

22

y=1

36

44

6

55

20

y=2

1

3

17

16

20

y=2

3

10

43

25

39

y=3

13

8

4

5

15

y=3

41

45

15

21

8

y=4

19

10

21

14

11

y=4

18

2

61

56

14

π : a[x][y] ← a[x′][y′], with

x 0 1 x′

y 2 3 y′ ,

χ : a[x] ← a[x] + (a[x + 1] + 1)a[x + 2],

ι : a ← a + RC[ir].

GF(2). With the exception of

the value of the round constants RC[ir], these rounds are identical.

The round constants are given by (with the first index denoting the round number):

RC[ir][0][0][2j1] = rc[j + 7ir] for all 0 ≤ j ≤ 6,

RC[ir][x][y][z] = 0 for all other values of x, y, z

The values rc[t] GF(2) are defined as the output of a binary linear feedback shift register (LFSR):

rc[t] = ( xt mod x8 + x6 + x5 + x4 + 1) mod x in GF(2)[x],

so that rc[j + 7ir] is given by the following table:

j=

0

1

2

3

4

5

6

j=

0

1

2

3

4

5

6

ir=0

1

0

0

0

0

0

0

ir=12

1

1

1

1

1

1

0

ir=1

0

1

0

1

1

0

0

ir=13

1

1

1

1

0

0

1

ir=2

0

1

1

1

1

0

1

ir=14

1

0

1

1

1

0

1

ir=3

0

0

0

0

1

1

1

ir=15

1

1

0

0

1

0

1

ir=4

1

1

1

1

1

0

0

ir=16

0

1

0

0

1

0

1

ir=5

1

0

0

0

0

1

0

ir=17

0

0

0

1

0

0

1

ir=6

1

0

0

1

1

1

1

ir=18

0

1

1

0

1

0

0

ir=7

1

0

1

0

1

0

1

ir=19

0

1

1

0

0

1

1

ir=8

0

1

1

1

0

0

0

ir=20

1

0

0

1

1

1

1

ir=9

0

0

1

1

0

0

0

ir=21

0

0

0

1

1

0

1

ir=10

1

0

1

0

1

1

0

ir=22

1

0

0

0

0

1

0

ir=11

0

1

1

0

0

1

0

ir=23

0

0

1

0

1

1

1

Annex D (informative):
Example source code for Tuak (ANSI C)

Tuak is built on top of the Keccak function, as described in annex C. In the following example code, a Tuak configuration is defined as a set of global variables, so that these can be initialized or modified at run-time, and there is no need to fix them in the code as static constants. The read/write data to the Keccak state, including the padding, will depend on the Keccak state representation. Included in this example are 8-bit, 32-bit and 64-bit implementations of the Keccak function, each having a different representation of the state.

The Tuak f-functions do similar calculations in their core, therefore, it was possible to combine most of the computations in one function called TUAK_Main(). Lengths are given in bytes, rather than bits, in order to support 8-bit environments (e.g., length=256 does not fit in type uint8). All the example codes are endianness-free, i.e. can be used on both big and little endian machines.

/* This code may be freely used or adapted.

This implementation of TUAK is endianness-free.

It supports 64-bit, 32-bit and 8-bit environments.

*/

/* ———————————————————————

Constants, typedefs, macros, compilation settings

———————————————————————

*/

/* This macro selects Keccak_f implementation instance – 8/32/64-bit version */

#define KECCAK_VERSION_BITS 32

/* Depending on the version of Keccak we do:

– map KECCAK_F macro to relevant Keccak_f (8/32/64-bit) instance

– declare Keccak’s state INOUT[] as global, for simplicity

– define method for TUAK padding TUAK_ADD_PADDING()

*/

#if KECCAK_VERSION_BITS==64

static uint64 INOUT[25]; /* state to Keccak_f for 64-bit version */

extern void Keccak_f_64(uint64 *s);

# define KECCAK_F Keccak_f_64

# define TUAK_ADD_PADDING() INOUT[12] = 0x1FULL, INOUT[16] = (0x01ULL<<63)

#elif KECCAK_VERSION_BITS==32

static uint32 INOUT[50]; /* state to Keccak_f for 32-bit version */

extern void Keccak_f_32(uint32 *s);

# define KECCAK_F Keccak_f_32

# define TUAK_ADD_PADDING() INOUT[24] = 0x1FUL, INOUT[33] = 0x80000000

#elif KECCAK_VERSION_BITS==8

static uint8 INOUT[200]; /* state to Keccak_f for 8-bit version */

extern void Keccak_f_8(uint8 s[200]);

# define KECCAK_F Keccak_f_8

# define TUAK_ADD_PADDING() INOUT[96] = 0x1F, INOUT[135] = 0x80

#else

# error The requested version of Keccak_f is not implemented!

#endif

static const uint8 ALGONAME[] = "TUAK1.0";

void TUAK_ComputeTOPC(uint8*, uint8*);

/* ———————————————————————

TUAK Instance Configuration

if dynamic => can be set/modified on run-time

if constants => fixed instance of the algorithm

———————————————————————

*/

uint8 TOP[32]; /* Operator’s Configuration */

uint8 KEY_sz = 16; /* = 16/32 bytes */

uint8 RES_sz = 8; /* = 4/8/16/32 bytes */

uint8 CK_sz = 32; /* = 16/32 bytes */

uint8 IK_sz = 32; /* = 16/32 bytes */

uint8 MAC_sz = 16; /* = 8/16/32 bytes */

uint8 KeccakIterations = 1; /* >=1, number of Keccak_f iterations */

/* ———————————————————————

PUSH_DATA / PULL_DATA, TUAK_Main()

———————————————————————

*/

void PUSH_DATA(const uint8 * data, uint8 n, uint8 location)

{ while(n–)

#if KECCAK_VERSION_BITS==64

INOUT[location>>3] |= ((uint64)data[n]) << ((location++ & 7)<<3);

#elif KECCAK_VERSION_BITS==32

INOUT[location>>2] |= ((uint32)data[n]) << ((location++ & 3)<<3);

#elif KECCAK_VERSION_BITS==8

INOUT[location++] = data[n]; /* Note: reversed order of bytes */

#endif

}

void PULL_DATA(uint8 * data, uint8 n, uint8 location)

{ while(n–)

#if KECCAK_VERSION_BITS==64

data[n] = (uint8)(INOUT[location>>3] >> ((location++ & 7)<<3));

#elif KECCAK_VERSION_BITS==32

data[n] = (uint8)(INOUT[location>>2] >> ((location++ & 3)<<3));

#elif KECCAK_VERSION_BITS==8

data[n] = INOUT[location++]; /* Note: reversed order of bytes */

#endif

}

/* Universal function used by TUAK API functions */

void TUAK_Main ( uint8 instance, /* in, uint8 */

uint8 *rand, /* in, uint8[16] */

uint8 *amf, /* in, uint8[2] */

uint8 *sqn, /* in, uint8[6] */

uint8 *key /* in, uint8[16/32] */

)

{ uint8 i, TOPC[32];

TUAK_ComputeTOPC(key, TOPC); /* compute TOPC */

memset((uint8*)INOUT , 0, 200); /* clean INOUT */

PUSH_DATA(TOPC , 32, 0); /* TOPC */

PUSH_DATA(&instance , 1 , 32); /* INSTANCE */

PUSH_DATA(ALGONAME , 7 , 33); /* ALGONAME */

PUSH_DATA(rand , 16, 40); /* RAND */

if(amf) PUSH_DATA(amf, 2 , 56); /* AMF , if !=NULL */

if(sqn) PUSH_DATA(sqn, 6 , 58); /* SQN , if !=NULL */

PUSH_DATA(key, (instance & 1)?32:16, 64); /* KEY-128/256 bits */

TUAK_ADD_PADDING(); /* Padding bits 768-1087 */

for(i=0; i<KeccakIterations; ++i)

KECCAK_F(INOUT);

}

/* ———————————————————————

TUAK API Definition

———————————————————————

*/

void TUAK_ComputeTOPC( uint8 *key, /* in, uint8[16/32] */

uint8 *TOPC /* out, uint8[32] */

)

{ uint8 i, inst = KEY_sz>>5;

memset(INOUT, 0, 200);

PUSH_DATA(TOP , 32, 0 ); /* TOP */

PUSH_DATA(&inst , 1 , 32); /* INSTANCE for TOPC */

PUSH_DATA(ALGONAME , 7 , 33); /* ALGONAME */

PUSH_DATA(key , KEY_sz, 64); /* KEY-128/256 */

TUAK_ADD_PADDING(); /* Padding bits 768-1087 */

for(i=0; i<KeccakIterations; ++i)

KECCAK_F(INOUT);

PULL_DATA(TOPC, 32, 0); /* get the result */

}

void TUAK_f1 ( uint8 *key, /* in, uint8[KEY_sz] */

uint8 *rand, /* in, uint8[16] */

uint8 *sqn, /* in, uint8[6] */

uint8 *amf, /* in, uint8[2] */

uint8 *mac /* out, uint8[MAC_sz] */

)

{ TUAK_Main( (KEY_sz>>5) | MAC_sz, rand, amf, sqn, key);

PULL_DATA(mac, MAC_sz, 0);

}

void TUAK_f2345 ( uint8 *key, /* in, uint8[KEY_sz] */

uint8 *rand, /* in, uint8[16] */

uint8 *res, /* out, uint8[RES_sz] */

uint8 *ck, /* out, uint8[CK_sz] */

uint8 *ik, /* out, uint8[IK_sz] */

uint8 *ak /* out, uint8[6] */

)

{ TUAK_Main( (KEY_sz>>5) | ((IK_sz>>4)&0x02) | ((CK_sz>>3)&0x04)

| (RES_sz&0x38) | 0x40, rand, 0, 0, key);

PULL_DATA(res, RES_sz, 0 );

PULL_DATA(ck , CK_sz , 32);

PULL_DATA(ik , IK_sz , 64);

PULL_DATA(ak , 6 , 96);

}

void TUAK_f1s ( uint8 *key, /* in, uint8[KEY_sz] */

uint8 *rand, /* in, uint8[16] */

uint8 *sqn, /* in, uint8[6] */

uint8 *amf, /* in, uint8[2] */

uint8 *mac /* out, uint8[MAC_sz] */

)

{ TUAK_Main( (KEY_sz>>5) | MAC_sz | 0x80, rand, amf, sqn, key);

PULL_DATA(mac, MAC_sz, 0);

}

void TUAK_f5s ( uint8 *key, /* in, uint8[KEY_sz] */

uint8 *rand, /* in, uint8[16] */

uint8 *ak /* out, uint8[6] */

)

{ TUAK_Main( (KEY_sz>>5) | 0xc0, rand, 0, 0, key);

PULL_DATA(ak, 6, 96);

}

Annex E (informative):
Example source code for Keccak (ANSI C)

Comments to the example code for a 64-bit implementation of Keccak

This 64-bit implementation of the Keccak permutation follows the specification in annex C. It has 24 rounds of 5 basic steps in the order Theta, Rho, Pi, Chi, Iota. The state of Keccak is 1600 bits, represented as 25 blocks of 64 bits each (uint64 s[25]). Such a block is referred to as a "lane" in the Keccak reference specification [3].

In the example code, bit n=0..1599 of Keccak state is mapped to the state representation array as:

bit(n) = (s[n/64]>>(n%64))&1. The mapping between s[w] and a[x][y][z] is: a[x][y][z] = (s[5y+x]>>z)&1, for any triple x, y=0..4, z=0..63 (and w=5y+x).

An efficient way to implement Keccak is to code it as an update function. For the purpose of efficiency, there are three pre-computed constant tables Rho[25], Pi[25], Iota[24] (74 bytes in total). Run-time requires 43+sizeof(uint64*) bytes of stack, excluding possible alignment of input arguments.

In the Theta function, it is necessary to perform the following computation:

θ : a[x][y][z] ← a[x][y][z] + Σ4y′=0 a[x − 1][y′][z] + Σ4y′=0 a[x + 1][y′][z − 1].

To do this, first compute 5 sums Σ4y′=0 a[x][y′][z] for each x=0..4 (and all z=0..63) and store them in five temporary 64-bit variables (uint64 t[5]). Such an operation as a[x][y][z] a[x][y][z-1] is just a rotation of the corresponding 64-bit value of s by 1 to the left, and a[x][y][z] a[x′][y′][z] is a corresponding assignment s[w]s[w’]. This way, in the second loop of Theta, run over y=0..4 and update 25 words of s based on pre-computed temporary sums.

The Rho function is just a circular rotation of bits for each of 64-bit words individually of the state s[25]. The number of bits a word s[n] is to be rotated by is Rho[n].

The Pi function is a full-cycle permutation of 24 words of s, excluding s[0]. To avoid duplicating the state for this operation, remember the first word in the permutation chain s[1] somewhere in a temporary variable T, then perform s[1]  s[Pi[1]], where the value Pi[1] is the index of the word that has to go to s[1]. In the next step, assign s[Pi[1]]s[Pi[Pi[1]]], and so on. After 23 such steps the permutation is almost complete, except that the saved value should be located to the last referenced place s[Pi^23[1]]T.

In the Chi function, reuse 5 temporary 64-bit words to compute somewhat mixed Boolean expressions; the function is straightforward to implement.

The Iota function updates 7 bits indexed by 0, 1, 3, 7, 15, 31, 63, of only one word s[0], by XORing them with some constant values that depend on the current round index (r=0..23). Each cell of the pre-computed constant table Iota[24] encodes those 7 bits in one byte (uint8) in the following way.

Bits related to indices 0, 1, 3, 7 stay in their correct place of the byte Iota[r]. Bits related to indices 15, 31, 63 are mapped to bits 4, 5, 6 of the byte Iota[r], correspondingly. When XORing, the lower bits 0-7 appear well mapped (Iota[round]&0x8B), and the upper bits 31-63 are received by the relevant three shifts of that byte to the left to certain positions (note, the byte Iota[r] is first converted to uint64). It is also possible to OR all the shifts first, and then to mask those 7 bits before XORing the result to s[0]. This trick is possible since ORing does not overlap those 7 bits, but the result of ORing will, however, interfere some of the other 57 bits, and by the final masking with the constant 0x800000008000808BULL, remove that unwanted influence.

Comments to the example code for an 8-bit implementation of Keccak

The 8-bit implementation of Keccak is basically a step-by-step refactored version of the 64-bit version. The state of Keccak is now represented as 200 bytes of 8 bits each (uint8 s[200]). Each block of 8 bytes is a mapping of a relevant 64-bit word in the 64-bit version. The Keccak state is mapped to the state representation array as:
bit(n) = (s[n/8]>>(n%8))&1, for n=0..1599.

A 64-bit word (that is now represented as 8 bytes) can be rotated by 1-7 bits easily –just save the first byte of the 8-byte block, then shift and propagate 1-7 bits from one byte to the next, and, finally, complete the operation at the end by using the saved byte.

However, in the Rho function, the rotation parameter – r bits – can be any value in the range 1..63. Then perform the following technique. Split r as r=8*A+B, where A is the number of full bytes of a 64-bit word to be rotated, and B is the remaining number of bits of the rotation value r. In the first loop, rotate an 8-byte block of s by A bytes, and store the result in some other temporary 8-byte block (in t[0..7]). In the second step. copy that temporary block t[0..7] back to the relevant location of the state s, and, in parallel, perform the rotation of the block by B bits in the way mentioned earlier.

Comments to the example code for a 32-bit implementation of Keccak

The 32-bit implementation of Keccak is yet another step-by-step refactoring of the 64-bit version. The state of Keccak is represented as uint32[50], where each two 32-bit consecutive words are mapped to one 64-bit word. i.e., state64[k] = (state32[2*k+1]<<32) | state32[2*k], for k=0..24.

In the Rho function, every such pair of 32-bit words (A; B), where A=state32[2*k] and B=state32[2*k+1], for some k=0..24, needs to be rotated by n=0..63 bits. The resulting pair will be (A’; B’) = (A<<n | B>>(32-n); B<<n | A>>(32-n)), if n<32. In case n>=32 then take n%32 as the shifting parameter for each of the 32-bit words, and the resulting A’ and B’ are swapped. The refactoring of other functions is straightforward.

/* This code may be freely used or adapted.

*/

typedef unsigned char uint8;

typedef unsigned long uint32;

typedef unsigned long long uint64;

const uint8 Rho[25] = {0,1,62,28,27,36,44,6,55,20,3,10,43,25,39,41,45,

15,21,8,18,2,61,56,14};

const uint8 Pi[25] = {0,6,12,18,24,3,9,10,16,22,1,7,13,19,20,4,5,11,17,

23,2,8,14,15,21};

const uint8 Iota[24] = {1,146,218,112,155,33,241,89,138,136,57,42,187,203,

217,83,82,192,26,106,241,208,33,120};

#define ROTATE64(value, n) \

((((uint64)(value))<<(n)) | (((uint64)(value))>>(64-(n))))

/* ———————————————————————

64-bit version of Keccak_f(1600)

———————————————————————

*/

void Keccak_f_64(uint64 *s)

{ uint64 t[5];

uint8 i, j, round;

for(round=0; round<24; ++round)

{ /* Theta function */

for(i=0; i<5; ++i)

t[i] = s[i] ^ s[5+i] ^ s[10+i] ^ s[15+i] ^ s[20+i];

for(i=0; i<5; ++i, s+=5)

{ s[0] ^= t[4] ^ ROTATE64(t[1], 1);

s[1] ^= t[0] ^ ROTATE64(t[2], 1);

s[2] ^= t[1] ^ ROTATE64(t[3], 1);

s[3] ^= t[2] ^ ROTATE64(t[4], 1);

s[4] ^= t[3] ^ ROTATE64(t[0], 1);

}

s -= 25;

/* Rho function */

for(i=1; i<25; ++i)

s[i] = ROTATE64(s[i], Rho[i]);

/* Pi function */

for(t[1] = s[i=1]; (j=Pi[i]) > 1; s[i]=s[j], i=j);

s[i] = t[1];

/* Chi function */

for(i=0; i<5; ++i, s += 5)

{ t[0] = (~s[1]) & s[2];

t[1] = (~s[2]) & s[3];

t[2] = (~s[3]) & s[4];

t[3] = (~s[4]) & s[0];

t[4] = (~s[0]) & s[1];

for(j=0; j<5; ++j) s[j] ^= t[j];

}

s -= 25;

/* Iota function */

t[0] = Iota[round];

*s ^= (t[0] | (t[0]<<11) | (t[0]<<26) | (t[0]<<57))

& 0x800000008000808BULL; /* set & mask bits 0,1,3,7,15,31,63 */

}

}

/* ———————————————————————

8-bit version of Keccak_f(1600)

———————————————————————

*/

void Keccak_f_8(uint8 s[200])

{ uint8 t[40], i, j, k, round;

for(round=0; round<24; ++round)

{ /* Theta function */

for(i=0; i<40; ++i)

t[i]=s[i]^s[40+i]^s[80+i]^s[120+i]^s[160+i];

for(i=0; i<200; i+=8)

for(j = (i+32)%40, k=0; k<8; ++k)

s[i+k] ^= t[j+k];

for(i=0; i<40; t[i] = (t[i]<<1)|j, i+=8)

for(j = t[i+7]>>7, k=7; k; –k)

t[i+k] = (t[i+k]<<1)|(t[i+k-1]>>7);

for(i=0; i<200; i+=8)

for(j = (i+8)%40, k=0; k<8; ++k)

s[i+k] ^= t[j+k];

/* Rho function */

for(i=8; i<200; i+=8)

{ for(j = Rho[i>>3]>>3, k=0; k<8; ++k) /* j:=bytes to shift, s->t */

t[(k+j)&7] = s[i+k];

for(j = Rho[i>>3]&7, k=7; k; –k) /* j:=bits to shift, t->s */

s[i+k] = (t[k]<<j) | (t[k-1]>>(8-j));

s[i] = (t[0]<<j) | (t[7]>>(8-j));

}

/* Pi function */

for(k=8; k<16; ++k) t[k] = s[k]; /* =memcpy(t+8, s+8, 8) */

for(i=1; (j=Pi[i])>1; i=j)

for(k=0; k<8; ++k) /* =memcpy(s+(i<<3), s+(j<<3), 8) */

s[(i<<3)|k] = s[(j<<3)|k];

for(k=0; k<8; ++k) /* =memcpy(s+(i<<3), t+8, 8) */

s[(i<<3)|k] = t[k+8];

/* Chi function */

for(i=0; i<200; i+=40)

{ for(j=0; j<40; ++j)

t[j]=(~s[i+(j+8)%40]) & s[i+(j+16)%40];

for(j=0; j<40; ++j) s[i+j]^=t[j];

}

/* Iota function */

k = Iota[round];

s[0] ^= k & 0x8B; /* bits 0, 1, 3, 7 */

s[1] ^= (k<<3)&0x80; /* bit 15 */

s[3] ^= (k<<2)&0x80; /* bit 31 */

s[7] ^= (k<<1)&0x80; /* bit 63 */

}

}

/* ———————————————————————

32-bit version of Keccak_f(1600)

———————————————————————

*/

void Keccak_f_32(uint32 *s)

{ uint32 t[10];

uint8 i, j, round, k;

for(round=0; round<24; ++round)

{ /* Theta function */

for(i=0; i<10; ++i)

t[i] = s[i] ^ s[10+i] ^ s[20+i] ^ s[30+i] ^ s[40+i];

for(i=0; i<5; ++i)

for(j=8, k=2; ; j%=10, k=(k+2)%10)

{ *s++ ^= t[j++] ^ ((t[k]<<1)|(t[k+1]>>31));

*s++ ^= t[j++] ^ ((t[k+1]<<1)|(t[k]>>31));

if(j==8) break;

}

s -= 50;

/* Rho function */

for(i=2; i<50; i+=2)

{ k = Rho[i>>1] & 0x1f;

t[0] = (s[i+1] << k) | (s[i] >> (32-k));

t[1] = (s[i] << k) | (s[i+1] >> (32-k));

k = Rho[i>>1] >> 5;

s[i] = t[1-k], s[i+1] = t[k];

}

/* Pi function */

for(i=2, t[0]=s[2], t[1]=s[3]; (j=(Pi[i>>1]<<1))>2; i=j)

s[i]=s[j], s[i+1]=s[j+1];

s[i]=t[0], s[i+1]=t[1];

/* Chi function */

for(i=0; i<5; ++i, s+=10)

{ for(j=0; j<10; ++j)

t[j] = (~s[(j+2)%10]) & s[(j+4)%10];

for(j=0; j<10; ++j)

s[j] ^= t[j];

}

s -= 50;

/* Iota function */

t[0] = Iota[round];

s[0] ^= (t[0] | (t[0]<<11) | (t[0]<<26)) & 0x8000808B;

s[1] ^= (t[0]<<25) & 0x80000000;

}

}

Annex F (informative):
Change history

Change history

Date

Meeting

TDoc

CR

Rev

Cat

Subject/Comment

New version

2013-12

Version after approval

12.0.0

2013-12

Update in Introduction with the spec numbers

12.0.1

2014-09

SA#64

SP-140316

0001

2

Overall editorial modification to the Tuak specification TS 35.231

12.1.0

2016-01

0002

1

F

Update to Rel-13 version (MCC).

13.0.0

2017-03

SA#75

Promotion to Release 14 without technical change

14.0.0

2018-06

Update to Rel-15 version (MCC)

15.0.0

2019-09

SA#85

SP-190683

0005

1

A

Removing references of TS 103 383 in TS 35.231

15.1.0

2020-07

Update to Rel-16 version (MCC)

16.0.0

2022-03

Update to Rel-17 version (MCC)

17.0.0