Sunday, January 06, 2019

[qqcvkken] Million-bit RSA

The following Haskell program deterministically prints three 500000-bit (150515-digit) prime numbers not of any special form.

module Main(main) where {
import Data.Function((&)); import Control.Category((>>>)); import System.Random.TF(seedTFGen); {- tf-random-0.5 is known to work -} import Data.Word(Word64,Word8); import Data.List(foldl'); import System.Random(randoms); import Data.Bits((.|.)); import GHC.Integer(orInteger);

main :: IO();
main = if tf_do_validation == tf_validation_correct_result
then mapM_ (gen_int >>> print) primes
else error "prng validation failed";

newtype Offset = Offset Int deriving(Show);
type Bits256 = (Word64,Word64,Word64,Word64);
newtype TFseed = TFseed Bits256 deriving (Show);
data GenParams = GenParams Offset TFseed deriving (Show);

{- These parameters were found through extensive random search. -}
primes :: [GenParams];
primes =
[GenParams (Offset 6) (TFseed (17713156516097407808, 14297790823966464650, 5518682532734221150, 7708414198874846232))
,GenParams (Offset 4) (TFseed (11689060907666845202, 4518458191795426877, 4522717457826316412, 15146264857744232431))
,GenParams (Offset 21) (TFseed (13505286482281835764, 3738884481498903559, 6913871810196357451, 4212719527986366550))];

numbytes :: Int;
numbytes = div 500000 8;

gen_int :: GenParams -> Integer;
gen_int (GenParams (Offset offset) (TFseed seed)) =
seedTFGen seed
& randoms
& drop (numbytes*offset)
& take numbytes
& set_highbit
& foldl' (\i x -> i*256+fromIntegral x) 0 {- radix conversion from base 256 -}
& orInteger 1; {- set low bit, i.e., make odd -}

set_highbit :: [Word8] -> [Word8];
set_highbit s = (head s .|. 0x80) : tail s;

{- We validate to catch potential changes in the behavior of future versions of the PRNG -}
tf_do_validation :: [Word8];
tf_do_validation = seedTFGen tf_validation_seed & randoms & take validation_size;

tf_validation_seed :: Bits256;
tf_validation_seed = get_seed $ head primes;

get_seed :: GenParams -> Bits256;
get_seed (GenParams _ (TFseed seed)) = seed;

validation_size :: Int;
validation_size = 100;

tf_validation_correct_result :: [Word8];
tf_validation_correct_result = [208, 21, 0, 166, 195, 123, 224, 100, 74, 12, 118, 155, 94, 65, 238, 46, 158, 61, 191, 253, 46, 9, 1, 189, 159, 67, 137, 206, 41, 39, 135, 146, 5, 247, 84, 189, 73, 236, 235, 134, 236, 159, 59, 241, 8, 32, 100, 166, 34, 142, 205, 3, 179, 202, 127, 160, 30, 186, 145, 152, 231, 191, 45, 22, 165, 190, 243, 91, 54, 1, 87, 36, 61, 31, 173, 136, 43, 89, 83, 230, 101, 133, 190, 169, 32, 173, 101, 89, 5, 166, 61, 86, 171, 211, 231, 235, 63, 155, 245, 160];
}

(This incidentally demonstrates how arbitrarily large pseudo randomly generated prime numbers can be "compressed" by storing the seed and program used to generate them.)

The first and last 10 digits of the three prime numbers are as follows.  We elide approximately 150,000 digits from the middle of each number.

9640823696...2708146347
7144631805...4559093721
5084291883...7493881601

These prime numbers were found through extensive random search.  Each attempt first sampled a 256-bit random number seed from /dev/urandom, then repeatedly sampled 500000-bit pseudorandom bitstrings from the seeded PRNG.  Each bitstring then had its Least and Most Significant Bits set to 1, forming an odd 500000-bit number.  (Note how we are not searching for primes by testing consecutive odd numbers, because that selects primes preceded by unusually large prime gaps.)  Each number was first tested with trial division through the first 28 million prime numbers, that is, all primes less than 533,000,401.  Once one number passing trial division was found, it was passed on to Pari/GP and tested with its ispseudoprime(), concluding the attempt with that seed.  We do our own trial division before the call to ispseudoprime() because ispseudoprime() by itself only does divisibility testing up to 101 before switching to BPSW.  The optimal amount of trial division is described in this post.

We tested 354704 odd numbers for primality, of which all but 19766 were eliminated by trial division.  Finding a number not eliminated by trial division test took 2 to 4 minutes, depending on computer speed.  A BPSW rejection (almost certainly its first-stage base-2 Miller-Rabin test failing) with ispseudoprime() took 40 to 70 minutes, depending on computer speed.  Not all 19766 candidates were tested with ispseudoprime(); about 20 were interrupted due to machines going down, etc.  The computation was done over several months across many computers.

According to the Prime Number Theorem, we should find a half-million bit prime once every log(2^500000)/2 = 173286 odd numbers, so the expected number of successes in 354704 tries is 2.05.  We were a bit lucky to find 3.

We used the Threefish-based tf-random random number generator, version 0.5 with its 256-bit seed to avoid birthday-type collisions of separate random runs accidentally choosing the same seed.  In retrospect, we didn't need all 256 bits of entropy to avoid collisions; 64 bits would have been sufficient: we could have set the rest of the words to 0 to have shorter code above.  Also, in the code above, we do some validation of the PRNG to verify that it is functioning the same as it did when we originally generated the primes.

In retrospect, we should have considered GMP ECM's -primetest flag, which is about 25% faster than Pari/GP's ispseudoprime() for a failing test of primality on a number of this size.

Of course, the fun thing to do with half-million-bit prime numbers is to multiply two of them and form a million-bit RSA public and private key pair.  We do so with the first 2 prime numbers above:

? #
timer = 1 (on)
? allocatemem(50*10^6)
*** Warning: new stack size = 50000000 (47.684 Mbytes).
? p=96408236965415345708875393107808886322713415419852952002...
time = 100 ms.
? ispseudoprime(p)
time = 3h, 36min, 22,319 ms.
1
? q=71446318054747116524562445875606788566702601347360872663...
time = 101 ms.
? ispseudoprime(q)
time = 3h, 36min, 36,438 ms.
1
? n=p*q;
time = 2 ms.
? print("n=",n)
n=6888013561328492774669469263192786297690177936933339147055...
time = 109 ms.
? addprimes([p,q]);
time = 0 ms.
? f=eulerphi(n);
time = 2,768 ms.
? print("f=",f)
f=6888013561328492774669469263192786297690177936933339147055...
time = 73 ms.
? e=65537;
time = 0 ms.
? d=lift(Mod(e,f)^(-1));
time = 0 ms.
? print("d=",d);
d=1168094095253137444461548154342197947915355258725256439574...
time = 56 ms.
? numdigits=floor(log(n)/log(10))
time = 0 ms.
301029
? plaintext=10^numdigits;
time = 4 ms.
? ciphertext=Mod(plaintext,n)^e;
time = 268 ms.
? print("ciphertext=",lift(ciphertext));
ciphertext=3741633683418547797066016235359093753574537162813...
time = 78 ms.
? decipher=lift(ciphertext^d);
time = 4h, 18min, 11,928 ms.
? print("plaintext=",decipher);
plaintext=10000000000000000000000000000000000000000000000000...
time = 72 ms.
? print("difference=",decipher-plaintext);
difference=0
time = 0 ms.

The RSA public key is (n,e), and the private key is (n,d).

Here are the complete scripts and log files of key generation, which give the primes p and q printed out in their entirety.  The files are best read without line wrapping.  Firefox now wraps lines in text files by default.  To disable it in Desktop Firefox, modify View -> Page Style.  (Don't know how to do it on mobile.)  Other ways: less -S on the command line, and Emacs M-x toggle-truncate-lines.

As seen above, encryption with a million-bit public key is very quick: 268 milliseconds.  Decryption is painfully slow: 4 hours. (Previously, demonstrating a 16384-bit key.)

The public encryption exponent, following tradition, is 65537.  It should be noted that the modulus n=p*q satisfies 39199^65537 < n < 39200^65537.  Therefore, very short one and two-byte messages won't complete even one loop of modular arithmetic, so they would be trivial to decipher: just take the 65537th root in Z (not Zn).  This kind of problem only occurs with gargantuan keys like ours.  It is not a problem for keys smaller than 65537 bits, e.g., the common 2048-bit RSA key size.  Perhaps we should have chosen an exponent greater than 1000000, maybe 1048609.  Of course, you should be properly padding your RSA plaintexts anyway, so this should never be a problem.

We considered publishing as "the most secure RSA key ever" only the million-bit RSA public key and not the constituent primes or private key, but because the prime search was not done on secure computers, the key is not anywhere near as practically secure as its size would suggest.  Furthermore, the key is at most only as strong as the 256 bits of entropy that went into seeding the random number generator, and tf-random has a warning not to use it for cryptographic applications.

2 comments :

Unknown said...

> We considered publishing as "the most secure RSA key ever" only the million-bit RSA public key and not the constituent primes or private key

if you mean `longest` with `secure`, then you have been beaten to it: https://eprint.iacr.org/2017/351, who created a 1-terabyte public key.

Ken said...

Thanks for the reference; I was not aware of their work. They took a rather different approach in constructing their large key.