Monday, December 07, 2020

[orptuqka] Bits instance of Integer

It's a bit surprising that the arbitrary precision Integer type in Haskell is an instance of Data.Bits.  We investigate its behavior as of base, comparing it with fixed-width Int.

Side note: as of base, there is also the Numeric.Natural.Natural arbitrary precision integer type without negative numbers.  I have not experimented with it.

(Another digression: in Bluespec, the Bits typeclass represents synthesizable types, e.g., one can create a hardware register holding that type.  Arbitrary precision integers are not synthesizable so not an instance of Bluespec Bits.)

import qualified Data.Bits as Bits;

*Main> Bits.complement (0::Integer)
*Main> Bits.complement (0::Int)

*Main> Bits.complement (-1::Integer)
*Main> Bits.complement (-1::Int)

*Main> Bits.complement (1::Integer)
*Main> Bits.complement (1::Int)

*Main> Bits.shift (-2::Integer) (-1)
*Main> Bits.shift (-2::Int) (-1)

So far, so good.  Negative Integer acts like two's complement.  (Perhaps also like a p-adic number, though I don't understand p-adic numbers.)  The negative shift amount (-1) means right shift; right shift of Integer seemingly does sign extension, pushing a 1 bit onto its infinitely most significant (leftmost) end.

However, left rotate (positive rotate) of a negative Integer shifts in a 0, not a 1 from the infinite left end.

*Main> Bits.rotate (-2 :: Integer) 1
*Main> Bits.rotate (-2 :: Int) 1

Right rotate is denoted by a negative rotation amount.  If we try to create an Integer with just its infinitely most significant bit set:

*Main> Bits.rotate (1::Integer) (-1)
*Main> Bits.rotate (1::Int) (-1)

The above two issues are because rotate of Integer is implemented as shift.  I think it would be better if rotate crashed instead (like division by zero), because that would avoid surprise.  In particular, right rotate by 1 bit should be an error if the rightmost bit (LSB) does not match the infinitely leftmost "sign" bit.  Only even positive integers and odd negative integers may be right rotated by 1 bit.  Larger right rotation amounts require more consecutive 0s or 1s in their least significant bits.

In another instance of surprising behavior, popCount returns a negative number:

*Main> Bits.popCount (-1 :: Integer)
*Main> Bits.popCount (-1 :: Int)

*Main> Bits.popCount (-2::Integer)
*Main> Bits.popCount (-2::Int)

*Main> Bits.popCount (-3::Integer)
*Main> Bits.popCount (-3::Int)

*Main> Bits.popCount (-4::Integer)
*Main> Bits.popCount (-4::Int)

*Main> Bits.popCount (-5::Integer)
*Main> Bits.popCount (-5::Int)

For popCount, Integer stops acting like two's complement and acts more like ones' complement.  popCount is implemented as the negative of the popcount of the number made positive, in Ghc.Integer.popCountInteger in the integer-gmp package.  It would be more consistent if popCount acted like two's complement offset by a constant "infinity".  For the small examples above, subtracting 65 from the value returned by Int popCount would have kept things vaguely consistent with fixed-width types.

Unlike the above functions, bitSize unsurprisingly does error for Integer arguments.  If it were consistent with popCount, it could have returned -1 as the size of Integer.

*Main> Bits.bitSize (-5 ::Integer)
<interactive>:19:1: warning: [-Wdeprecations]
In the use of ‘bitSize’ (imported from Data.Bits):
Deprecated: "Use 'bitSizeMaybe' or 'finiteBitSize' instead"
*** Exception: Data.Bits.bitSize(Integer)

*Main> Bits.bitSize (-5 ::Int)
<interactive>:20:1: warning: [-Wdeprecations]
In the use of ‘bitSize’ (imported from Data.Bits):
Deprecated: "Use 'bitSizeMaybe' or 'finiteBitSize' instead"

Perhaps all the problematic methods of Data.Bits should be deprecated and similarly replaced with Maybe versions?

To do: find mailing list discussion (if any) when the decision to implement the Bits instance of Integer was first made.

One of the problems with Haskell is that it is impossible to mask an instance.  One cannot import Data.Bits but avoid importing instance Bits Integer.  It is reasonable for a user to write code which uses Bits functions only on fixed width types (e.g., Int) and to want any accidental use of Bits functions on an Integer to be flagged as a compile-time error; however, such an import is not possible.  (Another user may actually want to use the Bits instance of Integer, being careful to avoid its gotchas, so we wish to keep that available as an option as well.)

A well known version of this problem happened with the Foldable Traversable Proposal (FTP), also called the Burning Bridges Proposal.  It is reasonable for a user to write code that uses tuples but to want any use of a Foldable function on a tuple, e.g., length (which confusingly always returns 1), to be flagged as a compile-time error; however, this is not possible.

The FiniteBits typeclass is for types of finite width.  Presumably it was invented to exclude Integer and Natural.  countTrailingZeros is annoyingly in this typeclass and not Bits.  countTrailingZeros is a function that we reasonably want to call on Integer, accepting that some sort of error will occur if called on zero, like division by zero.

What is a good implementation of countTrailingZeros for arbitrary precision Integer?  In order to get linear time, I suspect we need access to the internal representation.  I don't think it can be done faster than linear time: one has to look at the trailing bits to confirm that they are actually zero.

1 comment :

Sylvain HENRY said...

There are several quirks in Integer/natural instances as you've found. When I implemented ghc-bignum (replacing integer-gmp/integer-simple in GHC 9.0) I asked if we could fix some of them on the mailing list:

We can't because too much code relies on it :'(

Methods of `Bits` class are really a mess (as are those of `Num`).