Wednesday, May 14, 2008

DecimalFormat Is Broken

A friend of mine recently noticed that the good old DecimalFormat class is "broken". If you try to parse a string that is not a number but is starting with a number, the DecimalFormat.parse will return what it managed to parse.

The correct behavior should be to throw a parse exception IMHO. Judging from an old post in the Sun bug tracker, The folks at Sun don't think it really is, they call the default mode of parsing the "lenient" mode. It accepts bad inputs. Then why throwing ParseException at all and why not return 0/NaN when the first character is not a number? Why accepting 1toto2 as a number and not toto2?

In reality it can really create unexpected problems. For example, in France, 0.1 is 0,1 because of the Locale conventions. If a user enters 0.1 in a French Locale, a method using DecimalFormat.parse will interpret it as 0 without throwing any exception.

Note that DateFormat does not have that problem, at one point Sun added setLenient flag to be able to be in non Lenient mode. It would be very simple to do it with DecimalFormat, I did it myself as an exercise. In DecimalFormat.subparse, the 2 last break statements should stop processing in lenient mode. Lines 1528 to 1531:
sawExponent = true;
}
break; // Whether we fail or succeed, we exit this loop
}
else {
break;
}

become:

sawExponent = true;
} else {
if (isLenient()) {
parsePosition.index = oldStart;
parsePosition.index = oldStart;
return false;
}
}
break; // we succeed, we exit this loop

}
else {
if (isLenient()) {
parsePosition.index = oldStart;
parsePosition.errorIndex = oldStart;
return false;
}
break;
}

DecimalFormat Is Broken

A friend of mine recently noticed that the good old DecimalFormat class is "broken". If you try to parse a string that is not a number but is starting with a number, the DecimalFormat.parse will return what it managed to parse.

The correct behavior should be to throw a parse exception IMHO. Judging from an old post in the Sun bug tracker, The folks at Sun don't think it really is, they call the default mode of parsing the "lenient" mode. It accepts bad inputs. Then why throwing ParseException at all and why not return 0/NaN when the first character is not a number? Why accepting 1toto2 as a number and not toto2?

In reality it can really create unexpected problems. For example, in France, 0.1 is 0,1 because of the Locale conventions. If a user enters 0.1 in a French Locale, a method using DecimalFormat.parse will interpret it as 0 without throwing any exception.

Note that DateFormat does not have that problem, at one point Sun added setLenient flag to be able to be in non Lenient mode. It would be very simple to do it with DecimalFormat, I did it myself as an exercise. In DecimalFormat.subparse, the 2 last break statements should stop processing in lenient mode. Lines 1528 to 1531:
sawExponent = true;
}
break; // Whether we fail or succeed, we exit this loop
}
else {
break;
}

become:

sawExponent = true;
} else {
if (isLenient()) {
parsePosition.index = oldStart;
parsePosition.index = oldStart;
return false;
}
}
break; // we succeed, we exit this loop

}
else {
if (isLenient()) {
parsePosition.index = oldStart;
parsePosition.errorIndex = oldStart;
return false;
}
break;
}