Sunday, June 29, 2014

On the Role of Static Types and Generic Types on Productivity

Most developers have strong opinions on dynamic types programming languages vs static types programming languages. The former is often assumed to be good for small projects/prototyping while the later better for bigger projects. But there is a surprisingly small number of studies to back those claims.

One such study is "An experiment about static and dynamic type systems: doubts about the positive impact of static type systems on development time" and came to the conclusion that on a small project, static typing did not decrease programming time, and actually increased debugging time. However 4 years later, "An empirical comparison of static and dynamic type systems on API usage in the presence of an IDE: Java vs. groovy with eclipse" shows that a developer is 2x more productive with Java than with Groovy using an unknown API. This contrasts a bit (but does not contradict) with their previous study "Static Type Systems (Sometimes) have a Positive Impact on the Usability of Undocumented Software: An Empirical Evaluation" that showed Groovy to be more productive on small projects. One problem is that all these studies stem from the same person.

It's more interesting to look at generic types vs raw types use, where even less studies have been done. "Do developers benefit from generic types?: an empirical comparison of generic and raw types in java" concludes that generic types do not provide any advantages to fix typing errors, hardly surprising in my opinion. Generic types (especially with type erasure as in Java) is the typical idea that sounds good but that in practice does not really help: it makes the code actually more awkward to read and tend to make developers too lazy to create new classes that would often be more appropriate than a generic type (think Map<String,List<Map<String, Date>>>).

On the Role of Static Types and Generic Types on Productivity

Most developers have strong opinions on dynamic types programming languages vs static types programming languages. The former is often assumed to be good for small projects/prototyping while the later better for bigger projects. But there is a surprisingly small number of studies to back those claims.

One such study is "An experiment about static and dynamic type systems: doubts about the positive impact of static type systems on development time" and came to the conclusion that on a small project, static typing did not decrease programming time, and actually increased debugging time. However 4 years later, "An empirical comparison of static and dynamic type systems on API usage in the presence of an IDE: Java vs. groovy with eclipse" shows that a developer is 2x more productive with Java than with Groovy using an unknown API. This contrasts a bit (but does not contradict) with their previous study "Static Type Systems (Sometimes) have a Positive Impact on the Usability of Undocumented Software: An Empirical Evaluation" that showed Groovy to be more productive on small projects. One problem is that all these studies stem from the same person.

It's more interesting to look at generic types vs raw types use, where even less studies have been done. "Do developers benefit from generic types?: an empirical comparison of generic and raw types in java" concludes that generic types do not provide any advantages to fix typing errors, hardly surprising in my opinion. Generic types (especially with type erasure as in Java) is the typical idea that sounds good but that in practice does not really help: it makes the code actually more awkward to read and tend to make developers too lazy to create new classes that would often be more appropriate than a generic type (think Map<String,List<Map<String, Date>>>).

Tuesday, June 24, 2014

Moore-Penrose Inverse & Gauss-Newton SABR Minimization

I have found a particularly nice initial guess to calibrate SABR. As it is quite close to the true best fit, it is tempting to use a very simple minimizer to go to the best fit. Levenberg-Marquardt works well on this problem, but can we shave off a few iterations?

I firstly considered the basic Newton's method, but for least squares minimization, the Hessian (second derivatives) is needed. It's possible to obtain it, even analytically with SABR, but it's quite annoying to derive it and code it without some automatic differentiation tool. It turns out that as I experimented with the numerical Hessian, I noticed that it actually did not help convergence in our problem. Gauss-Newton converges similarly (likely because the initial guess is good), and what's great about it is that you just need the Jacobian (first derivatives). Here is a good overview of Newton, Gauss-Newton and Levenberg-Marquardt methods.

While Gauss-Newton worked on many input data, I noticed it failed also on some long maturities equity smiles. The full Newton's method did not fare  better. I had to take a close look at the matrices involved to understand what was going on. It turns out that sometimes, mostly when the SABR rho parameter is close to -1, the Jacobian would be nearly rank deficient (a row close to 0), but not exactly rank deficient. So everything would appear to work, but it actually misbehaves badly.

My first idea was to solve the reduced problem if a row of the Jacobian is too small, by just removing that row, and keep the previous value for the guess corresponding to that row. And this simplistic approach made the process work on all my input data. Here is the difference in RMSE compared to a highly accurate Levenberg-Marquardt minimization for 10 iterations:


Later, while reading some more material related to least square optimization, I noticed the use of the Moore-Penrose inverse in cases where a matrix is rank deficient. The Moore-Penrose inverse is defined as:
$$ M^\star = V S^\star U^T$$
where \( S^\star \) is the diagonal matrix with inverted eigenvalues and 0 if those are deemed numerically close to 0, and \(U, V\) the eigenvectors of the SVD decomposition:
$$M=U S V^T$$
It turns out to work very well, beside being simpler to code, I expected it to be more or less equivalent to the previous approach (a tiny bit slower but we don't care as we deal with small matrices, and the real slow part is the computation of the objective function and the Hessian, which is why looking at iterations is more important).

It seems to converge a little bit less quickly, likely due to the threshold criteria that I picked (1E-15).
Three iterations is actually most of the time (90%) more than enough to achieve a good accuracy (the absolute RMSE is between 1E-4 and 5E-2) as the following graph shows. The few spikes near 1E-3 represent too large errors, the rest is accurate enough compared to the absolute RMSE.

To conclude, we have seen that using the Moore-Penrose inverse in a Gauss-Newton iteration allowed the Gauss-Newton method to work on rank-deficient systems.
I am not sure how general that is, in my example, the true minimum either lies inside the region of interest, or on the border, where the system becomes deficient. Of course, this is related to a "physical" constraint, here namely rho > -1.



Moore-Penrose Inverse & Gauss-Newton SABR Minimization

I have found a particularly nice initial guess to calibrate SABR. As it is quite close to the true best fit, it is tempting to use a very simple minimizer to go to the best fit. Levenberg-Marquardt works well on this problem, but can we shave off a few iterations?

I firstly considered the basic Newton's method, but for least squares minimization, the Hessian (second derivatives) is needed. It's possible to obtain it, even analytically with SABR, but it's quite annoying to derive it and code it without some automatic differentiation tool. It turns out that as I experimented with the numerical Hessian, I noticed that it actually did not help convergence in our problem. Gauss-Newton converges similarly (likely because the initial guess is good), and what's great about it is that you just need the Jacobian (first derivatives). Here is a good overview of Newton, Gauss-Newton and Levenberg-Marquardt methods.

While Gauss-Newton worked on many input data, I noticed it failed also on some long maturities equity smiles. The full Newton's method did not fare  better. I had to take a close look at the matrices involved to understand what was going on. It turns out that sometimes, mostly when the SABR rho parameter is close to -1, the Jacobian would be nearly rank deficient (a row close to 0), but not exactly rank deficient. So everything would appear to work, but it actually misbehaves badly.

My first idea was to solve the reduced problem if a row of the Jacobian is too small, by just removing that row, and keep the previous value for the guess corresponding to that row. And this simplistic approach made the process work on all my input data. Here is the difference in RMSE compared to a highly accurate Levenberg-Marquardt minimization for 10 iterations:


Later, while reading some more material related to least square optimization, I noticed the use of the Moore-Penrose inverse in cases where a matrix is rank deficient. The Moore-Penrose inverse is defined as:
$$ M^\star = V S^\star U^T$$
where \( S^\star \) is the diagonal matrix with inverted eigenvalues and 0 if those are deemed numerically close to 0, and \(U, V\) the eigenvectors of the SVD decomposition:
$$M=U S V^T$$
It turns out to work very well, beside being simpler to code, I expected it to be more or less equivalent to the previous approach (a tiny bit slower but we don't care as we deal with small matrices, and the real slow part is the computation of the objective function and the Hessian, which is why looking at iterations is more important).

It seems to converge a little bit less quickly, likely due to the threshold criteria that I picked (1E-15).
Three iterations is actually most of the time (90%) more than enough to achieve a good accuracy (the absolute RMSE is between 1E-4 and 5E-2) as the following graph shows. The few spikes near 1E-3 represent too large errors, the rest is accurate enough compared to the absolute RMSE.

To conclude, we have seen that using the Moore-Penrose inverse in a Gauss-Newton iteration allowed the Gauss-Newton method to work on rank-deficient systems.
I am not sure how general that is, in my example, the true minimum either lies inside the region of interest, or on the border, where the system becomes deficient. Of course, this is related to a "physical" constraint, here namely rho > -1.



Thursday, June 19, 2014

One Interview Question for Job Seekers in Finance

I presented in an earlier post that I was mostly disillusioned with interview questions, it's better to find out if you can learn something out of a candidate.

Well there is maybe one very simple question that could be revealing, for people who pretend to be vaguely familiar with Black-Scholes:

What is the price of an at-the-money binary option under very high volatility?
Alternatively it can be asked with just an at-the-money european option under very high volatility.

What makes think of it is that some "product manager" recently tested risk with volatilities at 300% and was wondering why they did not see any vega (based on a 1% additive shift), and opened bugs, generated noise...

One Interview Question for Job Seekers in Finance

I presented in an earlier post that I was mostly disillusioned with interview questions, it's better to find out if you can learn something out of a candidate.

Well there is maybe one very simple question that could be revealing, for people who pretend to be vaguely familiar with Black-Scholes:

What is the price of an at-the-money binary option under very high volatility?
Alternatively it can be asked with just an at-the-money european option under very high volatility.

What makes think of it is that some "product manager" recently tested risk with volatilities at 300% and was wondering why they did not see any vega (based on a 1% additive shift), and opened bugs, generated noise...

Thursday, June 12, 2014

On the importance of accuracy for bpvol solvers

While I was playing around calibrating the arbitrage free SABR model from Hagan (using the PDE on probability density approach), I noticed a misbehavior for some short maturity smiles. I thought it was due to the PDE implementation. Actually some of it was, but the remaining large error was due to the bpvol solver.

I initially took the same approach as Choi et al. in my solver, that is to work with in-the-money prices (they work with straddles) because it's nice and convenient. I thought it was no big deal if prices lower than 1E-16 were not solved. It turns out I was wrong. Choi et al. solver has the same issue.
In the above figure, CKK denotes the Choi et al algorithm (similar with my old algorithm) and Chebyshev is my updated algorithm that is accurate with far out-of-the-money option. What happens is that even though the market price at the lowest strike is not very low, the price at the lowest strike stemming from the best fit smile is extremely low, and when we want to invert it, CKK produces a large error due to lack of representation of numbers near 1.0 as it uses indirectly the in-the-money price. That's where it introduces a particularly big error in this case.


I have updated my solver since, to work with out-of-the-money option prices as well, and have near machine accuracy on the whole range. I also reduced the number of Chebyshev polynomials used in the process. All the details are in my updated paper at http://papers.ssrn.com/abstract=2420757




On the importance of accuracy for bpvol solvers

While I was playing around calibrating the arbitrage free SABR model from Hagan (using the PDE on probability density approach), I noticed a misbehavior for some short maturity smiles. I thought it was due to the PDE implementation. Actually some of it was, but the remaining large error was due to the bpvol solver.

I initially took the same approach as Choi et al. in my solver, that is to work with in-the-money prices (they work with straddles) because it's nice and convenient. I thought it was no big deal if prices lower than 1E-16 were not solved. It turns out I was wrong. Choi et al. solver has the same issue.
In the above figure, CKK denotes the Choi et al algorithm (similar with my old algorithm) and Chebyshev is my updated algorithm that is accurate with far out-of-the-money option. What happens is that even though the market price at the lowest strike is not very low, the price at the lowest strike stemming from the best fit smile is extremely low, and when we want to invert it, CKK produces a large error due to lack of representation of numbers near 1.0 as it uses indirectly the in-the-money price. That's where it introduces a particularly big error in this case.


I have updated my solver since, to work with out-of-the-money option prices as well, and have near machine accuracy on the whole range. I also reduced the number of Chebyshev polynomials used in the process. All the details are in my updated paper at http://papers.ssrn.com/abstract=2420757