large memory buffer, zeroed out.
sponge such as Keccak, initialized by absorbing some initialization string.
squeeze an address out of the sponge. read memory at that address and absorb the data into the sponge. squeeze out data, and write it to the memory address. repeat until most memory has probably been read and written many times.
how much data to squeeze from the sponge and write to memory remains to be specified; it need not be the same amount as earlier read from memory. it might make sense to squeeze out a whole block. if using the same sponge as SHA3-224, block size is 1152 bits (144 bytes) (18 64-bit words). if using the same sponge as SHA3-512, block size is 576 bits (72 bytes) (9 words).
the memory access pattern depends on the initialization string, so this is vulnerable to side-channel attack. the initialization string should not be a password if side-channel attacks are a threat. this construction may be useful for proof-of-work, or for processing a salt.
more fancy, interleaving many memory operations (but still just one sponge) to keep the CPU busy while waiting for a memory read to compete. DEPTH is the pipeline depth, the amount of interleaving. details such as truncating address to the size of memory are omitted.
int r[DEPTH] = {...}; //array of registers
sponge.absorb(...); //initialize sponge
address = ...; // initial address
for(c=0;c<MAXCYCLES;++c) {
for(i=0;i<DEPTH;++i) { //unroll this loop
sponge.absorb(r[i]);
data = sponge.squeeze();
memory[address] = data;
address = sponge.squeeze();
r[i] = memory[address];
}
}
we specify r to be an array of registers and the loop to be unrolled so that an out-of-order processor can tell that the r[i] are different for each iteration of the loop. the next i iteration can begin even if the previous iteration's read from memory has not yet completed. is there a better way to do this?
note that we reuse "address" from the previous i iteration. a memory write to the same location gets launched soon after the memory read of that location from the previous iteration, which likely hasn't completed yet. would there be benefit in instead ensuring writes and reads to the same address don't happen so very close in time? or, maybe avoid consecutive operations on the same memory bank.
the result, the final state of the sponge, depends on pipeline DEPTH.
do things work if virtual memory is used, swapping to very slow disk? use a very deep pipeline.
we could implement the pipeline explicitly with two threads: one to do the memory operations in order and the other to do the sponge operations in order.
how can we determine the proper value of DEPTH? is there harm in making DEPTH too large?
we expect the sponge to fit in cache so not be a memory latency bottleneck. SHA3 internal state is 200 bytes. (previously, attempting to make the sponge internal state larger.)
maximum amount of memory is that which can be addressed by one sponge block, so, if using the same sponge as SHA3-224, 2^1152 words = 2^1155 bytes. motivation was to create a memory-hard function that can scale beyond Argon2's 4 TiB limit = 2^42 bytes (future post hskhldmx).
what is a good value of MAXCYCLES? we want to thwart time-memory trade off (TMTO).
No comments :
Post a Comment