There are many different ways to code Eratosthenes sieve, depending on how much memory you want to use. One method is to keep a list of primes found so far, and check the current number for divisibility by those primes up to the square root of current number. If the current number is prime, add it to the list of primes found so far.
The straightforward implementation uses a lot of memory: the list of primes found so far grows quite quickly. We observe that the list of primes have a "leading edge" and "trailing edge". At the leading edge are the most current prime found. At the trailing edge is the prime approximately at the square root of the current number. Only the primes between zero and the trailing edge are needed for the divisibility test. By storing only those primes, we can save N/log(N)-sqrt(N) memory, a tremendous savings.
However, it appears that when we discover a prime on the leading edge, we must forget it (to save memory) and somehow unforget it when it exists within the trailing edge. There is no free lunch: the solution is to run two parallel sieves, one up to the leading edge and the other up to the trailing edge. Sure, the trailing edge computation repeats something the leading edge already did (but forgot); this is the price we pay for memory savings. However, the trailing edge moves quite slowly -- it's that square root again -- so the amount of extra computation is quite small. Benchmarking reveals no time difference for sieving the first million primes.
It is a complex interleaving of leading edge and trailing edge sieving, but a lazy programming language like Haskell takes care of it automatically and painlessly. The trailing edge is a 0-ary function that is simply a list of "small" primes. The leading edge is a one-argument function, taking an input of the Unit type, i.e., "()" and returning a list of primes.