Sunday, August 27, 2006

foldM

We first had some code like this:

(: test3 :fun (IO :unit)()
      (:do
       (:= input getContents)
       (:= db (cascade-substitutions 1 
         (:rpipe input lines (map words) 
                 (map create-line))))

       (:= db (cascade-substitutions 2 db))
       (:= db (cascade-substitutions 3 db))
       (return :nothing)
       ))

or in Haskell:

test3 = (do{
  input<-getContents;
  db<-(cascade_substitutions 1 
    ((map create_line)((map words)(lines(input)))));
  db<-(cascade_substitutions 2 db);
  db<-(cascade_substitutions 3 db);
  (return ());
});

cascade-substitutions runs in the IO monad and produces output, as well as updates the db.

However, cut-and-paste growing the code results in the compiler (ghc 6.4) quickly running out of memory after about 1000 repetitions of the cascade-substitutions line.

After manually rewriting the do-notation we get

(>>=
       (>>=
        (>>=
         (>>= getContents 
              (:lambda f1 ((input))
                (cascade-substitutions 1 
                    (create-many-lines input))))
         (:lambda f2 ((db))
                  (cascade-substitutions 2 db)))
        (:lambda f3 ((db))
                 (cascade-substitutions 3 db)))
       (:lambda finish ((_)) (return :nothing)))

or in Haskell:

((>>=) ((>>=) ((>>=) ((>>=) getContents (\ (input) -> (cascade_substitutions 1 (create_many_lines input)))) (\ (db) -> (cascade_substitutions 2 db))) (\ (db) -> (cascade_substitutions 3 db))) (\ (_) -> (return ())))

We note that >>= is left-associative. We can put the inner loop in a list.

(:mlist (:lambda f2 ((db)) (cascade-substitutions 2 db)) (:lambda f3 ((db)) (cascade-substitutions 3 db)))

We can then generate that list with an map that generates a list of lambda functions, and (foldl >>=) them.

(>>=(foldl >>= (>>= getContents 
       (:lambda f1 ((input))
            (cascade-substitutions 1 
                 (create-many-lines input))))
                 (map
                  (:lambda create-f ((n))
                      (:lambda f ((db))
                          (cascade-substitutions n db)))
                  (enumFromTo 2 10000))
                 )
          (:lambda finish ((_)) (return :nothing)))

or in Haskell:

((>>=) (foldl (>>=) ((>>=) getContents (\ (input) -> (cascade_substitutions 1 (create_many_lines input)))) (map (\ (n) -> (\ (db) -> (cascade_substitutions n db))) (enumFromTo 2 10000))) ((_) -> (return ())))

On one hand, I'm surprised there isn't already a "foldl bind", however this has a problem: If (enumFromTo 2 10000) is replaced with (enumFrom 2), i.e., the infinite list [2..], then the program runs out of memory during execution before producing any output. Let's try foldM.

(: test5 :fun (IO :unit) ()
      (:do
       (:= input getContents)
       (:= _ (foldM (flip cascade-substitutions)
                    (create-many-lines input)
                    (enumFrom 1 )))
       (return :nothing)
       ))

or in Haskell:

test5 = (do{
  input<-getContents;
  _<-(foldM (flip cascade_substitutions) 
            (create_many_lines input) (enumFrom 1));
  (return ());
})

This one runs well without running out of memory. foldM is the answer.

Additional keywords: sequence, state, foldr, sequence_

No comments :