Thursday, January 16, 2020

[mkbewoae] String to PRNG

Here is a Haskell function that seeds a tf-random pseudo random number generator with a String.  Unlike previously when we used PBKDF2, this time we use straight unsalted unstretched SHA-256 because this is not intended for cryptographic application (consistent with the caveat given in the documentation to tf-random).  Also, this time we use cereal instead of binary to avoid lazy ByteStrings.

We do Unicode normalization because we want different representations of the same String to result in the same random number generator.  We chose normalization method NFC arbitrarily.

It's cool to glue functions from about 5 different packages together so nicely.

For pedagogical purposes, we provide many type annotations, and we also often use fully qualified package names.

Data.Serialize.decode "just works" to unpack a 256-bit (32 byte) ByteString into a (Word64, Word64, Word64, Word64) tuple, the seed type for TFGen, because there are built-in decoders for Word64 and (a,b,c,d).  Neither have metadata.  It happens to do big-endian for Word64, but we do not care about endianness for this application, so long as it is consistent.

{-# LANGUAGE PackageImports #-}
import Control.Category((>>>));
import Prelude hiding((.),(>>)); --optional
import qualified System.Random.TF;
import qualified Data.Text;
import qualified Data.Text.Encoding;
import qualified Data.Text.ICU;
import qualified "cryptohash-sha256" Crypto.Hash.SHA256; -- PackageImports
import qualified Data.Serialize;
import qualified Data.ByteString as Strict;
import qualified Data.Either.Combinators;
import Data.Word(Word64);

rnginit :: String -> System.Random.TF.TFGen;
rnginit = (Data.Text.pack :: String -> Data.Text.Text)
>>> (Data.Text.ICU.normalize Data.Text.ICU.NFC :: Data.Text.Text -> Data.Text.Text)
>>> (Data.Text.Encoding.encodeUtf8 :: Data.Text.Text -> Strict.ByteString)
>>> (Crypto.Hash.SHA256.hash :: Strict.ByteString -> Strict.ByteString)
>>> (Data.Serialize.decode :: Strict.ByteString -> Either String (Word64,Word64,Word64,Word64))
>>> (Data.Either.Combinators.fromRight' :: Either String (Word64,Word64,Word64,Word64) -> (Word64,Word64,Word64,Word64))
>>> (System.Random.TF.seedTFGen :: (Word64,Word64,Word64,Word64) -> System.Random.TF.TFGen);

Here's a quick test, rolling a d6 die 10 times:

*Main> take 10 $ System.Random.randomRs (1,6) (rnginit "foo")

Connecting the random number generator to the random-fu package gives us access to many more probability distributions.  random-fu provides entropy source instances (MonadRandom and RandomSource) for StdGen but not general RandomGen (but this seems to be being worked on).  We thought of writing an instance for TFGen, but because of the way instances in Haskell work, if anyone else in the universe (only a bit of an exaggeration) also ever writes an instance for TFGen, the instances will overlap or conflict -- there's no way to mask someone else's instance.

Therefore, we instead use the GetPrim hook to locally create a wrapped Control.Monad.State.Strict monad to be a RandomSource.  Fortunately, getRandomPrimFromRandomGenState is polymorphic over RandomGen.

{-# LANGUAGE ScopedTypeVariables #-}
import Data.Function((&));
import Data.RVar(runRVar,RVar);
import Control.Monad.State.Strict(evalState,State); -- lazy state monad also works
import Data.Random.Internal.Source(GetPrim(GetPrim));
import Data.Random.Source.StdGen(getRandomPrimFromRandomGenState);
import qualified System.Random.TF as TF;

-- at least one of the GetPrim type annotations is required
rgen :: forall a . RVar a -> TF.TFGen -> a;
rgen r gen = (GetPrim getRandomPrimFromRandomGenState :: GetPrim (State TF.TFGen))
& (runRVar r :: GetPrim (State TF.TFGen) -> State TF.TFGen a)
& ((\state -> evalState state gen) :: State TF.TFGen a -> a);

stringr :: RVar a -> String -> a;
stringr r = rnginit >>> rgen r;

Here's a quick test, generating a standard normal deviate from a TFGen initialized with the string "foo":

*Main> stringr (stdNormal :: RVar Double) "foo"

No comments :