## Friday, November 23, 2018

### [dlhmfrld] Compact factorization challenges

Find short mathematical expressions for numbers which factor into 2 (-ish) large primes.  After some experimentation, the best way I found to do this is to sample a short expression template, for example, x!+y, reject immediately if it is prime, then do the elliptic curve factorization method with GMP ECM for "enough" iterations that you are convinced that it has no small factors.  We have to test for primality first because GMP ECM 7.0.4 has a bug/feature in which a prime as input or a composite it is unable to factor as input both have the same behavior.  (Update: the -primetest flag will do a primality test, though it confusingly doesn't explicitly say an input is prime when it encounters one; the program just exits.)

Here are some semiprimes found in the vicinity of factorials:

58! -874271 -811073 -716207 -566851 -347447 -297251 -156679 +496289 +634507 +868877 +869591

59! -737489 -257801 -241333 -129607 -55577 +29591 +98431 +602489 +768533

64! -907363 -714277 -699767 -586543 -530597 -518503 -503731 -489691 -472357 -333743 -305593 -294401 -278609 -253927 -231293 -130909 +15811 +59497 +114913 +183017 +184361 +314231 +461207 +483209 +491461 +494149 +497347 +501521 +607531 +641489 +757937 +787057 +842293 +865429 +936437 +969907

70! -810457 -689837 -638003 -482663 -424747 -317321 -275377 -122443 -26489 -10429 +292937 +339613 +362507 +373871 +419429 +432697 +515173 +726217

82! -862931 -725869 -722011 -678467 -659947 -581293 -487349 -338531 -260759 -253423 -106399 -97577 -77291 -14957 +151493 +218371 +327263 +359707 +398471 +45649 +487013 +492197 +610921 +656143 +695887 +761489 +900821 +936959 +985513

58! is 261 bits or 79 digits.  59! is 267 bits 81 digits.  64! is 296 bits or 90 digits.  70! is 333 bits or 101 digits.  82! is 408 bits or 123 digits.  This method does not find numbers large enough to do RSA cryptography securely.  To do RSA with these numbers (perhaps for toy purposes), you need to factor these numbers first, which is not too difficult at these sizes with tools like Pari/GP or Yafu.  (Cado-NFS is new; I have yet to try it.)  Previously on factoring numbers of approximately these sizes.  The quadratic sieve (msieve) or number field sieve (ggnfs) invoked by Yafu is much faster than the quadratic sieve used by Pari/GP.

Finding pairs of large prime numbers which multiply to a semiprime which is compactly expressible and large enough for secure RSA but does not give away the factorization in that compact expression seems like an impossible problem.  If it could be solved, people could publish compact RSA public keys.

We use the form x!+y because factorial is the fastest growing function with a commonly known short notation.  (Incidentally, n!>9^n for n>21 and 9^9^4 < 9999! < 9^9^5.)  The region immediately around a factorial, between n!-n and n!+n, is free of primes with the possible exceptions of offsets -1 and +1 (as seen in the proof that there are arbitrarily long prime gaps) with all numbers in that range having a small factor.  Research question: does the region slightly beyond that also have a statistically unusual (perhaps depleted) distribution of primes?  OEIS A073308 and A073443 seem to be less common than the factorial primes A002982 and A002981.

Here is a bash script which spawns `nproc` copies of GMP ECM, one for each processor.

nproc=4 ; for p in \$(seq 1 \$nproc) ; do for i in \$(seq -f "%03.0f" 0 999) ; do if [ -e stop ] ; then break ; fi ; fn=a\${i}p\$p.out ; echo \$fn ; perl -lwe '\$r=999999;\$i=int(rand(2*\$r+1))-\$r;\$i="+".\$i if\$i>=0;print "82!\$i"' | nice -19 ecm -one -c 2000 -I 1 1e5 > \$fn ; done & done

This script should be modified to do -primetest.

Then `grep -l 'Factor found' *.out | xargs -i mv '{}' ...` is useful to move aside those which were factored.

For numbers around 58! we did 1000 iterations starting at B1=100000 (then autoincrement with the -I flag) as indicated above, but of course exiting early (the -one flag) if a small factor was found.  For the larger factorials, we did 2000 iterations.

Are expressions of this form amenable to the special number field sieve?  Usually we see SNFS applied to numbers offset by only 1 from a smooth form, e.g., a^b+-1.  Can SNFS do larger offsets?

In earlier experimentation of this project, we tried using Pari/GP's threshold of "enough" first-stage ECM (the point where it would switch over to MPQS if we hadn't given the flag 9 to factorint).  We switched to using GMP ECM because the latter was faster and more convenient.

The original motivation was to provide a guest access WiFi access point whose name is a puzzle and whose password is the solution to the puzzle.  We wanted a puzzle hard enough to deter casual wifi scanning users but solvable by those willing to put in some effort.

Wifi access point names are constrained to 32 characters or less; WPA2 PSK passwords must be 63 characters are less.  An AP name like "Prime factor of 58!-874271" weighs in at 26 characters, just under the 32-character maximum, which is why we cared that the numerical expression be compactly expressible.  The smaller factor is guaranteed to be 40 digits or less (because 58! is 79 digits), well under the 63-character maximum.  Of course, there are 2 solutions (because 2 factors): a user not knowing that the smaller one is the desired password would try both, because WPA2-PSK has no penalty for incorrect password attempts.  It is not guaranteed that these factorization puzzles have exactly 2 factors; some numbers with 3 factors might have snuck in because ECM is a randomized algorithm, which is why we said at the outset that these numbers factor into 2-ish large primes  This is not a problem for this application; the user can try all 3 (or more) as possible passwords.

To be clear, the wifi administrator does need to first solve a puzzle in order to set the password.

Elliptic curve public-key cryptography promises compactness compared to RSA.  However, directly using a famous curve like Curve25519 yields a 32-byte public key (which becomes longer than 32 characters after the binary bytes are encoded to something human-readable) that someone would have to crack, which is too long for an access point name.  We would have to shrink it somehow as we have done with RSA.

Just for fun, the following numbers (615 digits, 2043 bits) also have no small factors:

300! -825533 -803911 -450809 -354839 +38183 +313151 +892433

We did GMP ECM with autoincrement up to B1=27900000.  Each required about 1 CPU month of computation.