Thursday, November 26, 2009

double[][] Is Fine

In my previous post, I suggest that keeping a double[] performs better than keeping a double[][] if you do matrix multiplications and other operations.

This is actually not true. I benchmarked 3 libraries, Colt (uses double[]), Apache Commons Math (uses double[][]) and Jama (uses double[][] cleverly). At first it looks like Jama has a similar performance as Colt (they avoid [][] slow access by a clever algorithm). But once hotspot hits, the difference is crazy and Jama becomes the fastest (Far ahead).




JDK 1.6.0 Linux 1000x1000 matrix multiplication on Intel Q6600
loop indexColtCommons MathJama
111.88074824.4551259.828977
211.87497524.2651029.848916
39.77261614.3741539.826572
49.75967914.3681052.655915
59.79962215.2389282.649129
69.78055614.7418632.668104
79.7283115.5099092.646811
89.7983815.7243482.646069
99.72614315.9887622.646052
109.78450515.1217822.644572
We don't include matrix construction time, and fetching the result. Only the multiplication is taken into account.

The difference is less pronounced on smaller matrices, but still there. Jama looks very good in this simple test case. In more real scenarios, the difference is not so obvious. For example Commons Math SVD is faster than Jama one.

double[][] Is Fine

In my previous post, I suggest that keeping a double[] performs better than keeping a double[][] if you do matrix multiplications and other operations.

This is actually not true. I benchmarked 3 libraries, Colt (uses double[]), Apache Commons Math (uses double[][]) and Jama (uses double[][] cleverly). At first it looks like Jama has a similar performance as Colt (they avoid [][] slow access by a clever algorithm). But once hotspot hits, the difference is crazy and Jama becomes the fastest (Far ahead).




JDK 1.6.0 Linux 1000x1000 matrix multiplication on Intel Q6600
loop indexColtCommons MathJama
111.88074824.4551259.828977
211.87497524.2651029.848916
39.77261614.3741539.826572
49.75967914.3681052.655915
59.79962215.2389282.649129
69.78055614.7418632.668104
79.7283115.5099092.646811
89.7983815.7243482.646069
99.72614315.9887622.646052
109.78450515.1217822.644572
We don't include matrix construction time, and fetching the result. Only the multiplication is taken into account.

The difference is less pronounced on smaller matrices, but still there. Jama looks very good in this simple test case. In more real scenarios, the difference is not so obvious. For example Commons Math SVD is faster than Jama one.

The Pain of Java Matrix Libraries

Looking for a good Java Matrix (and actually also math) library, I was a bit surprised to find out there does not seem to be any really serious one still maintained.

Sure, there is Apache Commons Math, but it is still changing a lot, and it is not very performance optimized yet, while it has been active for several years already. There is also Java3D, it does Matrix through GMatrix, but not much linear algebra and if you look at their implementation, it is very basic, not performance oriented.

The other candidates seem to be 1-man projects that can disappear any other day (some of them look quite good like ojalgo, most of them are not interesting). Then you also have the serious but not maintained anymore Cern Colt library.

Compared to C/C++, the Java world is worrying if you want to do maths.

In those libraries, a dense matrix of double can be implemented two ways:
  1. by maintaining internally a double[][]. Usually those libraries allow for not copying the array, so it can be neat if your interfaces have this kind of arrays.
  2. by maintaining internally a double[]. The reason is for performance, but then each time you build a matrix from a double[][], an expensive copy will happen. So you need to use the Matrix object in your interfaces instead of double[][].
This is a pain because you can be very quickly stuck in one or the other Matrix library. A "solution" is to have your own interface, but that is a pain to write. There is UJMP, but it can hide some important methods (like transpose & multiply in one go from Colt or the ability to reuse an existing matrix in various operations to avoid allocating new memory), it is a students project (like parallel colt), but if it was a standard, it could be much more interesting.

In summary it does really look like scientific people, universities don't use Java for computation otherwise Colt surely would have been maintained.

The Pain of Java Matrix Libraries

Looking for a good Java Matrix (and actually also math) library, I was a bit surprised to find out there does not seem to be any really serious one still maintained.

Sure, there is Apache Commons Math, but it is still changing a lot, and it is not very performance optimized yet, while it has been active for several years already. There is also Java3D, it does Matrix through GMatrix, but not much linear algebra and if you look at their implementation, it is very basic, not performance oriented.

The other candidates seem to be 1-man projects that can disappear any other day (some of them look quite good like ojalgo, most of them are not interesting). Then you also have the serious but not maintained anymore Cern Colt library.

Compared to C/C++, the Java world is worrying if you want to do maths.

In those libraries, a dense matrix of double can be implemented two ways:
  1. by maintaining internally a double[][]. Usually those libraries allow for not copying the array, so it can be neat if your interfaces have this kind of arrays.
  2. by maintaining internally a double[]. The reason is for performance, but then each time you build a matrix from a double[][], an expensive copy will happen. So you need to use the Matrix object in your interfaces instead of double[][].
This is a pain because you can be very quickly stuck in one or the other Matrix library. A "solution" is to have your own interface, but that is a pain to write. There is UJMP, but it can hide some important methods (like transpose & multiply in one go from Colt or the ability to reuse an existing matrix in various operations to avoid allocating new memory), it is a students project (like parallel colt), but if it was a standard, it could be much more interesting.

In summary it does really look like scientific people, universities don't use Java for computation otherwise Colt surely would have been maintained.