Showing posts with label Boost.Function. Show all posts
Showing posts with label Boost.Function. Show all posts

Wednesday, April 27, 2016

QuantLib : Least squares method implementation

A couple of years ago I published a blog post on using Microsoft Solver Foundation for curve fitting. In this post we go through that example again, but this time using corresponding Quantlib (QL) tools. In order to make everything a bit more concrete, the fitting scheme (as solved by Frontline Solver in Microsoft Excel) is presented in the screenshot below.



















Second degree polynomial fitting has been performed, in order to find a set of coefficients which are minimizing sum of squared differences between observed rates and estimated rates. The same minimization scheme has also been used in my example program presented later below.


LeastSquares class


After some initial woodshedding with QL optimization tools, I started to dream if it could be possible to create kind of a generic class, which could then handle all kinds of different curve fitting schemes.

When using LeastSquares class, client is creating its instance by setting desired optimization method (Quantlib::OptimizationMethod) and all parameters needed for defining desired end criteria (Quantlib::EndCriteria) for optimization procedure. The actual fitting procedure will be started by calling Fit method of LeastSquares class. For this method, client is giving independent values (Quantlib::Array), vector of function pointers (boost::function), initial parameters (Quantlib::Array) and parameter constraints (Quantlib::Constraint).

Objective function implementation (Quantlib::CostFunction) used by QL optimizers is hidden as a nested class inside LeastSquares class. Since the actual task performed by least squares method is always to minimize a sum of squared errors, this class is not expected to change. Objective function is then using a set of given independent values and function pointers (boost::function) to calculate sum of squared errors between independent (observed) values and estimated values. By using function pointers (boost::function), algorithm will not be hard-coded into class method.

Finally, for function pointers used in objective function inside LeastSquares class, implementation class with correct method signature is expected to be available. In the example program, there is CurvePoint class, which is modelling a rate approximation for a given t (maturity) and a set of given external parameters.


Example program


Create a new C++ console project and copyPaste the following into corresponding header and implementation files. Help is available here if needed. In documentations page, there are three excellent presentation slides on Boost and Quantlib libraries by Dimitri Reiswich. "Hello world" examples of QuantLib optimizers can be found within these files.

Finally, thanks again for reading my blog.
-Mike


// CurvePoint.h
#pragma once
#include <ql/quantlib.hpp>
using namespace QuantLib;
//
// class modeling rate approximation for a single 
// curve point using a set of given parameters
class CurvePoint
{
private:
 Real t;
public:
 CurvePoint(Real t);
 Real operator()(const Array& coefficients);
};
//
//
//
// CurvePoint.cpp
#include "CurvePoint.h"
//
CurvePoint::CurvePoint(Real t) : t(t) { }
Real CurvePoint::operator()(const Array& coefficients)
{
 Real approximation = 0.0;
 for (unsigned int i = 0; i < coefficients.size(); i++)
 {
  approximation += coefficients[i] * pow(t, i);
 }
 return approximation;
}
//
//
//
// LeastSquares.h
#pragma once
#include <ql/quantlib.hpp>
using namespace QuantLib;
//
class LeastSquares
{
public:
 LeastSquares(boost::shared_ptr<OptimizationMethod> solver, Size maxIterations, Size maxStationaryStateIterations, 
  Real rootEpsilon, Real functionEpsilon, Real gradientNormEpsilon);
 //
 Array Fit(const Array& independentValues, std::vector<boost::function<Real(const Array&)>> dependentFunctions, 
  const Array& initialParameters, Constraint parametersConstraint);
private:
 boost::shared_ptr<OptimizationMethod> solver;
 Size maxIterations; 
 Size maxStationaryStateIterations;
 Real rootEpsilon; 
 Real functionEpsilon;
 Real gradientNormEpsilon;
 //
 // cost function implementation as private nested class 
 class ObjectiveFunction : public CostFunction
 {
 private:
  std::vector<boost::function<Real(const Array&)>> dependentFunctions;
  Array independentValues;
 public:
  ObjectiveFunction(std::vector<boost::function<Real(const Array&)>> dependentFunctions, const Array& independentValues);
  Real value(const Array& x) const;
  Disposable<Array> values(const Array& x) const;
 };
};
//
//
//
// LeastSquares.cpp
#pragma once
#include "LeastSquares.h"
//
LeastSquares::LeastSquares(boost::shared_ptr<OptimizationMethod> solver, Size maxIterations, 
 Size maxStationaryStateIterations, Real rootEpsilon, Real functionEpsilon, Real gradientNormEpsilon)
 : solver(solver), maxIterations(maxIterations), maxStationaryStateIterations(maxStationaryStateIterations),
 rootEpsilon(rootEpsilon), functionEpsilon(functionEpsilon), gradientNormEpsilon(gradientNormEpsilon) { }
//
Array LeastSquares::Fit(const Array& independentValues, std::vector<boost::function<Real(const Array&)>> dependentFunctions, 
 const Array& initialParameters, Constraint parametersConstraint)
{
 ObjectiveFunction objectiveFunction(dependentFunctions, independentValues);
 EndCriteria endCriteria(maxIterations, maxStationaryStateIterations, rootEpsilon, functionEpsilon, gradientNormEpsilon);
 Problem optimizationProblem(objectiveFunction, parametersConstraint, initialParameters);
 //LevenbergMarquardt solver;
 EndCriteria::Type solution = solver->minimize(optimizationProblem, endCriteria);
 return optimizationProblem.currentValue();
}
LeastSquares::ObjectiveFunction::ObjectiveFunction(std::vector<boost::function<Real(const Array&)>> dependentFunctions, 
 const Array& independentValues) : dependentFunctions(dependentFunctions), independentValues(independentValues) { }
//
Real LeastSquares::ObjectiveFunction::value(const Array& x) const
{
 // calculate squared sum of differences between
 // observed and estimated values
 Array differences = values(x);
 Real sumOfSquaredDifferences = 0.0;
 for (unsigned int i = 0; i < differences.size(); i++)
 {
  sumOfSquaredDifferences += differences[i] * differences[i];
 }
 return sumOfSquaredDifferences;
}
Disposable<Array> LeastSquares::ObjectiveFunction::values(const Array& x) const
{
 // calculate differences between all observed and estimated values using 
 // function pointers to calculate estimated value using a set of given parameters
 Array differences(dependentFunctions.size());
 for (unsigned int i = 0; i < dependentFunctions.size(); i++)
 {
  differences[i] = dependentFunctions[i](x) - independentValues[i];
 }
 return differences;
}
//
//
//
// tester.cpp
#include "LeastSquares.h"
#include "CurvePoint.h"
//
int main()
{
 // 2nd degree polynomial least squares fitting for a curve
 // create observed market rates for a curve
 Array independentValues(9);
 independentValues[0] = 0.19; independentValues[1] = 0.27; independentValues[2] = 0.29; 
 independentValues[3] = 0.35; independentValues[4] = 0.51; independentValues[5] = 1.38; 
 independentValues[6] = 2.46; independentValues[7] = 3.17; independentValues[8] = 3.32;
 //
 // create corresponding curve points to be approximated
 std::vector<boost::shared_ptr<CurvePoint>> curvePoints;
 curvePoints.push_back(boost::shared_ptr<CurvePoint>(new CurvePoint(0.083)));
 curvePoints.push_back(boost::shared_ptr<CurvePoint>(new CurvePoint(0.25)));
 curvePoints.push_back(boost::shared_ptr<CurvePoint>(new CurvePoint(0.5)));
 curvePoints.push_back(boost::shared_ptr<CurvePoint>(new CurvePoint(1.0)));
 curvePoints.push_back(boost::shared_ptr<CurvePoint>(new CurvePoint(2.0)));
 curvePoints.push_back(boost::shared_ptr<CurvePoint>(new CurvePoint(5.0)));
 curvePoints.push_back(boost::shared_ptr<CurvePoint>(new CurvePoint(10.0)));
 curvePoints.push_back(boost::shared_ptr<CurvePoint>(new CurvePoint(20.0)));
 curvePoints.push_back(boost::shared_ptr<CurvePoint>(new CurvePoint(30.0)));
 //
 // create container for function pointers for calculating rate approximations
 std::vector<boost::function<Real(const Array&)>> dependentFunctions;
 //
 // for each curve point object, bind function pointer to operator() and add it into container
 for (unsigned int i = 0; i < curvePoints.size(); i++)
 {
  dependentFunctions.push_back(boost::bind(&CurvePoint::operator(), curvePoints[i], _1));
 }
 // perform least squares fitting and print optimized coefficients
 LeastSquares leastSquares(boost::shared_ptr<OptimizationMethod>(new LevenbergMarquardt), 10000, 1000, 1E-09, 1E-09, 1E-09);
 NoConstraint parametersConstraint;
 Array initialParameters(3, 0.0);
 Array coefficients = leastSquares.Fit(independentValues, dependentFunctions, initialParameters, parametersConstraint);
 for (unsigned int i = 0; i < coefficients.size(); i++)
 {
  std::cout << coefficients[i] << std::endl;
 }
 return 0;
}

Sunday, December 13, 2015

Implementing statistics gatherer design by using Boost library in C++

As a result of Monte Carlo simulation process, we get a lot of simulated values. After this, we usually want to calculate a set of desired statistics from those simulated values. A great statistics gatherer design has been thoroughly presented in The book by Mark Joshi. However, just for the curious and for the sake of trying to create something original (while respecting traditions), I wanted to implement my own statistics gatherer design, in which several different statistical measures could be calculated independently by using the same set of processed data. Moreover, the number of such calculation objects and their implementations for algorithmic details would be fully configurable. Finally, I wanted to avoid any direct coupling between data (object, which holds the source data) and algorithms (objects, which are performing the actual statistical calculations).

Why bother?


In the following example program below, a sample of discounted payoffs from Monte Carlo process has been hard-coded and two statistical measures are then calculated and printed out.

void MonteCarloProcess()
{
 // process has been producing the following discounted payoffs
 // for this sample : average = 10.4881, standard error = 1.58502
 double discountedPayoffs[] = { 18.5705, 3.31508, 0.0, 3.64361, 0.0, 0.0, 
  47.2563, 10.6534, 85.5559, 0.0, 30.2363, 0.0, 17.8391, 2.15396, 0.0, 
  66.587, 0.0, 9.19303, 0.0, 0.0, 24.2946, 29.6556, 0.0, 0.0, 0.0, 65.926, 
  0.0, 14.0329, 1.43328, 0.0, 0.0, 0.0, 1.37088, 0.0, 2.49095, 21.4755, 
  36.5432, 0.0, 16.8795, 0.0, 0.0, 0.0, 19.8927, 11.3132, 37.3946, 10.2666, 
  26.1932, 0.0, 0.551356, 29.7159, 0.0, 31.5357, 0.0, 0.0, 4.64357, 4.45376, 
  21.6076, 12.693, 16.0065, 0.0, 0.0, 0.0, 0.0, 25.9665, 18.7169, 2.55222, 
  25.6431, 8.5027, 0.0, 0.0, 29.8704, 0.0, 22.7266, 22.8463, 0.0, 0.0, 0.0, 
  0.0, 4.90832, 13.2787, 0.0, 0.0, 9.77076, 24.5855, 12.6094, 0.0, 0.0, 1.92343, 
  5.66301, 0.0, 0.0, 13.6968, 0.0, 0.0, 35.2159, 0.0, 8.44648, 7.21964, 0.0, 19.2949 };
 //
 // dump array data into vector
 std::vector<double> data(discountedPayoffs, std::end(discountedPayoffs));
 //
 // calculate and print mean and standard error
 double mean = AlgorithmLibrary::Mean(data);
 double standardError = AlgorithmLibrary::StandardError(data);
 std::cout << mean << std::endl;
 std::cout << standardError << std::endl;
}

There is nothing fundamentally wrong in this example, but a great deal of it could be done in a bit different manner. The scheme presented in this example offers a chance to explore some modern tools for implementing flexible and configurable C++ programs.

Chinese walls


Personally, I would prefer to separate data (discounted payoffs) and algorithms (mean, standard error) completely. Ultimately, I would like to have a design, in which a process (monte carlo) is generating results and adding those results into a separate container. Whenever this process is finished, it will send a reference of this container for several entities, which will calculate all required statistics independently.

There are many ways to implement this kind of a scheme, but I have been heavily influenced by delegates stuff I have learned from C#. Needless to say, the equivalent mechanism for C# delegate in C++ is a function pointer. However, instead of raw function pointer I will use Boost.Function and Boost.Bind libraries.

My new design proposal will have the following components :
  • AlgorithmLibrary for calculating different statistical measures. The header file for this contains collection of methods for calculating different statistical measures.
  • ResultContainer class for storing processed results and function pointers (boost function) which are sending processed results for further calculations.
  • StatisticsElement class for storing a value of statistical measure and function pointer (boost function) for an algorithm, which can be used for calculating required statistical measure.

 

Strangers in the night


In the first stage, StatisticsElement object will be created. This object will host a single statistical measure and a pointer to an algorithm (boost function) for calculating this specific measure. By giving required algorithm as a pointer (boost function), the object is not tied with any hard-coded algorithm. In the case there would be a need for another type for algorithm implementation for calculating required statistical measure, the scheme is flexible enough to be adjusted. In the constructor of this class, we are giving this pointer to an algorithm (boost function). Moreover, this object will be indirectly connected with ResultContainer object with a function pointer (boost function). Function pointer (boost function) will be created and binded (boost bind) with the specific method (process) of StatisticsElement object.

In the second stage, ResultContainer object will be created and all previously created function pointers (boost function, boost bind) for processing calculation results will be added into ResultContainer. This object is ultimately being shared with a process. Process will generate its results (double) and these results will be added into container object. When a process is finished, container object method (sendResults) will be called. The call for this method will trigger a loop, which will iterate through a vector of function pointers (boost function) and sending a reference for a result vector to all connected StatisticsElement objects.

Finally, the client program (in this example : main) will request the calculated statistical measures directly from all StatisticsElement objects. It should be stressed, that these two objects described above, do not have any knowledge about each other at any point. ResultContainer is just storing updates from a process and finally sending results "to somewhere" when the processing is over. StatisticsElement objects are processing their own calculation procedures as soon as they will receive results "from somewhere". It should also be noted, that this design actually implements observer pattern, where ResultContainer object is Observable and StatisticalElement objects are Observers.


Proposal


// ResultContainer.h
#pragma once
#include <vector>
#include <boost\function.hpp>
//
// class for storing processed results and function pointers 
// which are sending results for further processing
class ResultContainer
{
private:
 // container for storing processed results
 std::vector<double> results;
 // container for storing function pointers
 std::vector<boost::function<void
  (const std::vector<double>&)>> resultSenders;
public:
 // method for adding one processed result into container
 void addResult(double result);
 // method for adding one function pointer into container
 void addResultSender(boost::function<void
  (const std::vector<double>&)> resultSender);
 // method for sending all processed results
 void sendResults() const;
};
//
//
//
// StatisticsElement.h
#pragma once
#include <vector>
#include <boost\function.hpp>
//
// class for storing a value of statistical measure and 
// function pointer for an algorithm which can be used 
// for calculating required statistical measure
class StatisticsElement
{
private:
 // statistical measure
 double statisticsValue;
 // function pointer to an algorithm which calculates 
 // required statistical measure
 boost::function<double(const std::vector<double>&)> algorithm;
public:
 // parameter constructor
 StatisticsElement(boost::function<double
  (const std::vector<double>&)> algorithm);
 // method for processing data in order to 
 // calculate required statistical measure
 void process(const std::vector<double>& data);
 // method (overloaded operator) for accessing 
 // calculated statistical measure
 double operator()() const;
};
//
//
//
// AlgorithmLibrary.h
#pragma once
#include <vector>
#include <numeric>
//
// algorithms library for calculating statistical measures
namespace AlgorithmLibrary
{
 // calculate arithmetic average
 double Mean(const std::vector<double>& data) 
 { 
  return std::accumulate(data.begin(), data.end(), 0.0) 
   / data.size(); 
 }
 // calculate standard error estimate
 double StandardError(const std::vector<double>& data) 
 { 
  double mean = AlgorithmLibrary::Mean(data);
  double squaredSum = std::inner_product(data.begin(), 
   data.end(), data.begin(), 0.0);
  return std::sqrt(squaredSum / data.size() - mean * mean) 
   / std::sqrt(data.size());
 }
}
//
//
//
// ResultContainer.cpp
#include "ResultContainer.h"
//
void ResultContainer::addResult(double result)
{
 results.push_back(result);
}
void ResultContainer::addResultSender(boost::function<void
 (const std::vector<double>&)> resultSender)
{
 resultSenders.push_back(resultSender);
}
void ResultContainer::sendResults() const
{
 std::vector<boost::function<void
  (const std::vector<double>&)>>::const_iterator it;
 for(it = resultSenders.begin(); it != resultSenders.end(); it++)
 {
  (*it)(results);
 }
}
//
//
//
// StatisticsElement.cpp
#include "StatisticsElement.h"
//
StatisticsElement::StatisticsElement(boost::function<double
 (const std::vector<double>&)> algorithm) 
 : statisticsValue(0.0), algorithm(algorithm) 
{
 //
}
void StatisticsElement::process(const std::vector<double>& data)
{
 if(algorithm != NULL) statisticsValue = algorithm(data);
}
double StatisticsElement::operator()() const
{
 return statisticsValue;
}
//
//
//
// MainProgram.cpp
#include <boost\bind.hpp>
#include <iostream>
#include "AlgorithmLibrary.h"
#include "ResultContainer.h"
#include "StatisticsElement.h"
//
void MonteCarloProcess(ResultContainer& resultContainer)
{
 // process has been producing the following discounted payoffs
 // for this sample : average = 10.4881, standard error = 1.58502
 double discountedPayoffs[] = { 18.5705, 3.31508, 0.0, 3.64361, 0.0, 0.0, 
  47.2563, 10.6534, 85.5559, 0.0, 30.2363, 0.0, 17.8391, 2.15396, 0.0, 
  66.587, 0.0, 9.19303, 0.0, 0.0, 24.2946, 29.6556, 0.0, 0.0, 0.0, 65.926, 
  0.0, 14.0329, 1.43328, 0.0, 0.0, 0.0, 1.37088, 0.0, 2.49095, 21.4755, 
  36.5432, 0.0, 16.8795, 0.0, 0.0, 0.0, 19.8927, 11.3132, 37.3946, 10.2666, 
  26.1932, 0.0, 0.551356, 29.7159, 0.0, 31.5357, 0.0, 0.0, 4.64357, 4.45376, 
  21.6076, 12.693, 16.0065, 0.0, 0.0, 0.0, 0.0, 25.9665, 18.7169, 2.55222, 
  25.6431, 8.5027, 0.0, 0.0, 29.8704, 0.0, 22.7266, 22.8463, 0.0, 0.0, 0.0, 
  0.0, 4.90832, 13.2787, 0.0, 0.0, 9.77076, 24.5855, 12.6094, 0.0, 0.0, 1.92343, 
  5.66301, 0.0, 0.0, 13.6968, 0.0, 0.0, 35.2159, 0.0, 8.44648, 7.21964, 0.0, 19.2949 };
 //
 // dump array data into vector
 std::vector<double> data(discountedPayoffs, std::end(discountedPayoffs));
 // create vector iterator, loop through data and add items into result container object
 std::vector<double>::const_iterator it;
 for(it = data.begin(); it != data.end(); it++)
 {
  resultContainer.addResult(*it);
 }
 // trigger result processing for all 'connected' statistical element objects
 resultContainer.sendResults();
}
//
int main()
{
 // create : function pointer to mean algorithm, statistics element 
 // and function pointer to process method of this statistics element
 boost::function<double(const std::vector<double>&)> meanAlgorithm = 
  AlgorithmLibrary::Mean;
 StatisticsElement mean(meanAlgorithm);
 boost::function<void(const std::vector<double>&)> resultSenderForMean = 
  boost::bind(&StatisticsElement::process, &mean, _1);
 //
 // create : function pointer to standard error algorithm, statistics element and 
 // function pointer to process method of this statistics element
 boost::function<double(const std::vector<double>&)> standardErrorAlgorithm = 
  AlgorithmLibrary::StandardError;
 StatisticsElement standardError(standardErrorAlgorithm);
 boost::function<void(const std::vector<double>&)> resultSenderForStandardError = 
  boost::bind(&StatisticsElement::process, &standardError, _1);
 //
 // create : result container and add previously created function 
 // pointers (senders) into container
 ResultContainer resultContainer;
 resultContainer.addResultSender(resultSenderForMean);
 resultContainer.addResultSender(resultSenderForStandardError);
 //
 // run (hard-coded) monte carlo process
 MonteCarloProcess(resultContainer);
 //
 // print results from the both statistics elements
 std::cout << mean() << std::endl;
 std::cout << standardError() << std::endl;
 return 0;
}


Help


Concerning the actual installation and configuration of Boost libraries with compiler, there is a great tutorial by eefelix available in youtube. For using Boost libraries, there is a document available, written by Dimitri Reiswich. Personally, I would like to present my appreciations for these persons for their great contribution.

Thanks for reading my blog. Have a pleasant wait for the Christmas.
-Mike