Tuesday, April 02, 2013

From Double Precision Normal Density to Double Precision Cumulative Normal Distribution

Marsaglia in his paper on Normal Distribution made the same mistake I initially did while trying to verify the accuracy of the normal density.

In his table of values comparing the true value computed by Maple for some values of x to the values computed by Sun or Ooura erfc, he actually does not really use the same input for the comparison. One example is the last number: 16.6. 16.6 does not have an exact representation in double precision, even though it is displayed as 16.6 because of the truncation at machine epsilon precision. Using Python mpmath, one can see that:

>>> mpf(-16.6)
mpf('-16.6000000000000014210854715202004')


This is the more accurate representation if one goes beyond double precision (here 30 digits). And the value of the cumulative normal distribution is:

>>> ncdf(-16.6)
mpf('3.4845465199503256054808152068743e-62')


It is different from:

>>> ncdf(mpf("-16.6"))
mpf('3.48454651995040810217553910503186e-62')


where in this case it is really evaluated around -16.6 (up to 30 digits precision). Marsaglia gives this second number as reference. But all the other algorithms will actually take as input the first input. It is more meaningful to compare results using the exact same input. Using human readable but computer truncated numbers is not the best. The cumulative normal distribution will often be computed using some output of some calculation where one does not have an exact human readable input.

The standard code for Ooura and Schonfelder (as well as Marsaglia) algorithms for the cumulative normal distribution don't use Cody's trick to evaluate the exp(-x*x). This function appears in all those implementations because it is part of the dominant term in the usual expansions. Out of curiosity, I replaced this part with Cody trick. For Ooura I also made minor changes to make it work directly on the CND instead of going through the error function erfc indirection. Here are the results without the Cody trick (except for Cody):
and with it:


All 3 algorithms are now of similiar accuracy (note the difference of scale compared to the previous graph), with Schonfelder being a bit worse, especially for x >= -20. If one uses only easily representable numbers (for example -37, -36,75, -36,5, ...) in double precision then, of course, Cody trick importance won't be visible and here is how the 3 algorithms would fare with or without Cody trick:

Schonfelder looks now worse than it actually is compared to Cody and Ooura.

To conclude, if someone claims that a cumulative normal distribution is up to double precision accuracy and it does not use any tricks to compute exp(-x*x), then beware, it probably is quite a bit less than double precision.

1 comment :

  1. Did you consider other places where Cody's trick might be interesting. The normal (or Bachelier) analytic solution has a call to CND and also a call to the normal density.

    ReplyDelete