Sunday, January 8, 2017

C++11 : String Replacer for Directory Files using PPL

Sequential algorithms can be easily replaced with the corresponding Parallel Algorithms. However, these algorithms can only be applied, if the loop operations are completely independent and they are not using any shared variables. The second condition can be relaxed by using concurrent data structures. This scheme is perfect for cases, in which we need to process a chain of independent items, such as modifying files in a directory.

The devil is in the details


Let us think a real-life case, in which we have a lot of CSV files in one specific directory. For example, we may have transaction files (containing cash flow tables, date schedule tables, parameters) produced by some front office system batch on a daily basis. These files are then used as feed for a third-party analytical software for some intensive calculations. The only problem is, that some of the information in feed files is not in recognizable form for third-party software. In a nutshell, feed files may contain some specific strings, which should be replaced with another string.


War stories


The next question is how to do this operation quickly and on a safe manner? Needless to say, there are pretty much as many approaches as there are people performing this heartbreaking operation. Personally, I have used

  • find-and-replace strategy with notepad++, until the amount of files to be processed and parameters to be replaced grew a bit too large.
  • custom batch script, until I realized that the script was sometimes not working as expected, by either leaving some cases out from replacement operations or resulting incorrect replacements. The scariest part was, that the script was actually working well most of the time.
  • custom Powershell script (created by someone else), which was using hard-coded configurations for all replacement strings and all strings which should be replaced. All configurations (hosted in source code file) needed to be set in a specific order. This was actually working well up to a point, where those hard-coded configurations should have been changed to correspond changes made in other systems. Moreover, execution times were a bit too high.

Finally, I decided to create my own program for handling this task.


United we fall, divided we stand


After learning a bit of parallel algorithms, I soon realized that this kind of a scheme (due to task independency) would be suitable candidate for implementing parallelism, in order to improve program execution speed. Each source file (containing transaction information) has to be processed separately and all key-value pairs (string-to-be-replaced and corresponding replacement string) can be stored in concurrent-safe concurrent unordered map as string pairs.


FileHandler.h


This header file is consisting free functions for all required file handling operations.

#pragma once
//
#include <filesystem>
#include <fstream>
#include <string>
#include <map>
#include <vector>
#include <algorithm>
#include <sstream>
//
namespace MikeJuniperhillFileSystemUtility
{
 std::string ReadFileContentToString(const std::string &filePathName)
 {
  std::string data;
  std::ifstream in(filePathName.c_str());
  std::getline(in, data, std::string::traits_type::to_char_type(
   std::string::traits_type::eof()));
  return data;
 }
 //
 void WriteStringContentToFile(const std::string& content, const std::string &filePathName)
 {
  std::ofstream out(filePathName.c_str());
  out << content;
  out.close();
 }
 //
 std::vector<std::string> GetDirectoryFiles(const std::string& folderPathName)
 {
  std::vector<std::string> directoryFiles;
  using itr = std::tr2::sys::directory_iterator;
  for (itr it = itr(folderPathName); it != itr(); ++it)
  {
   directoryFiles.push_back(static_cast<std::string>((*it).path()));
  }
  return directoryFiles;
 }
 //
 // replace all substrings found (stringToBeReplaced) with another 
 // string (replacementString) in a given string (string)
 std::string ReplaceAllSubstringsInString(const std::string& string, 
  const std::string& stringToBeReplaced, const std::string& replacementString)
 {
  std::string result;
  size_t find_len = stringToBeReplaced.size();
  size_t pos, from = 0;
  while (std::string::npos != (pos = string.find(stringToBeReplaced, from))) 
  {
   result.append(string, from, pos - from);
   result.append(replacementString);
   from = pos + find_len;
  }
  result.append(string, from, std::string::npos);
  return result;
 }
 // read key-value pairs into std::map
 // (key = string to be found and replaced, value = replacement string)
 std::map<std::string, std::string> ReadKeysFromFileToMap(const std::string& filePathName, char separator)
 {
  std::map<std::string, std::string> keys;
  std::ifstream file(filePathName);
  std::string stream;
  std::string key;
  std::string value;
  //
  while (std::getline(file, stream))
  {
   std::istringstream s(stream);
   std::getline(s, key, separator);
   std::getline(s, value, separator);
   keys.insert(std::pair<std::string, std::string>(key, value));
  }
  return keys;
 }
}
//


Tester.cpp


First, there has to be a directory containing all source files, which are going to be processed by file processing program. Also, there has to be a specific file, which contains key-value pairs (key = string to be found and replaced, value = replacement string) for all desired string replacement cases. Main program is creating file processors into vector container (there will be as many processors as there are source files to be processed). Then, main program is executing all file processors. A single file processor is first reading source file information into string, looping through all key-value pairs and checking whether any occurences are to be found, performing string replacements and finally, writing modified information back to original source file.

#include <ppl.h>
#include <concurrent_vector.h>
#include <concurrent_unordered_map.h>
#include <functional>
#include <memory>
#include <chrono>
#include <iostream>
#include "FileHandler.h"
//
namespace MJ = MikeJuniperhillFileSystemUtility;
//
using Files = std::vector<std::string>;
using Pair = std::pair<std::string, std::string>;
using Keys = std::map<std::string, std::string>;
using GlobalKeys = concurrency::concurrent_unordered_map<std::string, std::string>;
using ProcessorMethod = std::function<void(void)>;
using FileProcessor = std::shared_ptr<ProcessorMethod>;
//
// create file processor
FileProcessor Factory(const GlobalKeys& keys, const std::string& filePathName)
{
 // deferred execution
 ProcessorMethod processor = [=, &keys](void) -> void
 {
  std::string fileUnderProcessing = MJ::ReadFileContentToString(filePathName);
  for each (Pair pair in keys)
  {
   // extract key (string to be found and replaced) and value (replacement string)
   Pair keyValuePair = pair;
   std::string key = std::get<0>(keyValuePair);
   std::string value = std::get<1>(keyValuePair);
   //
   // check if key exists in a given string (containing all information in current file)
   size_t found = fileUnderProcessing.find(key);
   if (found != std::string::npos)
   {
    // if exists, replace key with value
    fileUnderProcessing = MJ::ReplaceAllSubstringsInString(fileUnderProcessing, key, value);
   }
  }
  // finally write change into original file
  MJ::WriteStringContentToFile(fileUnderProcessing, filePathName);
 };
 FileProcessor fileProcessor = FileProcessor(new ProcessorMethod(processor));
 return fileProcessor;
}
//
int main(int argc, char* argv[])
{
 // get required path strings from command line
 std::string keysFilePathName = argv[1];
 std::string fileDirectoryPathName = argv[2];
 //
 // create list of files (files to be modified) and list of key-value pairs 
 // (key = string to be found and replaced, value = replacement string)
 Files files = MJ::GetDirectoryFiles(fileDirectoryPathName);
 char separator = ',';
 Keys keys = MJ::ReadKeysFromFileToMap(keysFilePathName, separator);
 //
 // insert keys into concurrency-safe unordered map
 GlobalKeys Keys;
 std::for_each(keys.begin(), keys.end(),
  [&](Pair pair) -> void { Keys.insert(pair); });
 //
 // create file processors :
 // each processor will be processing exactly one file (given filePathName) 
 // by looping through all keys-value pairs (in GlobalKeys) and searching for 
 // all occurrences of substring (key) and replacing this with another string (value)
 std::vector<FileProcessor> processors;
 std::for_each(files.begin(), files.end(),
  [&](std::string filePathName) -> void { processors.push_back(Factory(Keys, filePathName)); });
 //
 // execute file processor method for all file processors
 auto start = std::chrono::steady_clock::now();
 concurrency::parallel_for_each(processors.begin(), processors.end(),
  [](FileProcessor fp) -> void { (*fp)(); });
 auto end = std::chrono::steady_clock::now();
 auto timeElapsed = std::chrono::duration_cast<std::chrono::seconds>(end - start);
 std::cout << "processed in " << timeElapsed.count() << " seconds" << std::endl;
 //
 return 0;
}
//


User settings


In this particular case, a picture is worth more than thousand words.

A : arguments for executable to be used with command prompt
B : directory for key-value pairs file
C : the content of key-value pairs file
D : directory for source files.























Finally, thanks a lot again for reading this blog.-Mike

Saturday, January 7, 2017

C++11 : Multi-Threaded PathGenerator using PPL

 

FINAL DESTINATION


The circle has been closed. This post is kind of an aggregation, based on the last four posts published on generating random numbers. Initially, I started just with a simple template class for distributional random generator, then continued with a path generator using any one-factor stochastic process and finally, ended up with a multi-threaded distributional random generation scheme using Parallel algorithms. This final post (hopefully) is opening up my larger goal : to be able to generate asset price paths for any one-factor process, using multi-threading scheme.


GROUNDHOG DAY


Again, I have tested the both sequential (for_each) and parallel (parallel_for_each) schemes by using four generators, 10000 paths and 250 time steps for a single run. After this, I repeated this run for 250 times. Conclusion :

  • The average running time for this sample was 17116 milliseconds (sequential) and 8209 milliseconds (parallel). So, parallel scheme will be completed about two times faster. 
  • The actual CPU usage profiles during the simulation processes are behaving exactly as reported in this post. 
  • I also analyzed processed asset price paths for parallel scheme, just to be absolutely sure there are no path duplicates (random number generation would not be independent). Based on my analysis made in Excel, all processed asset price paths are different and there are no duplicates. 

Presented scheme for path generator is again fulfilling my two initial requirements : faster creation of asset price paths following any one-factor process and independency of random generators.


RandomGenerator.h


The basic functionality of this template class has not been changed, except for construction part : second constructor is allowing a client to give any probability distribution for uniform generator from outside of this class. Even there is actually no need for having this kind of optionality in real-life (most of the stuff in Monte Carlo is randomized by using standard normal distribution), I decided to implement this optionality for the sake of completeness.

#pragma once
#include <algorithm>
#include <functional>
#include <vector>
#include <random>
#include <memory>
//
namespace MikeJuniperhillRandomGeneratorTemplate
{
 template <typename Generator = std::mt19937, typename Distribution = std::normal_distribution<double>>
 /// <summary>
 /// Template class for creating random number paths using mt19937 as default uniform 
 /// random generator and Standard Normal as default probability distribution.
 /// </summary> 
 class RandomGenerator
 {
 public:
  /// <summary>
  /// Constructor with explicit seed value
  /// </summary>
  RandomGenerator(unsigned long seed)
  {
   // construct function for processing distributional random number
   randomGenerator = [this](double x)-> double
   {
    x = distribution(uniformGenerator);
    return x;
   };
   // seed generator once
   uniformGenerator.seed(seed);
  }
  /// <summary> 
  /// Constructor for explicit seed value and client-given probability distribution.
  /// </summary>  
  RandomGenerator(unsigned long seed, const Distribution& distribution)
   // constructor delegation
   : RandomGenerator(seed)
  {
   // assign client-given probability distribution
   this->distribution = distribution;
  }
  /// <summary>
  /// Fill a given vector reference with distributional random numbers
  /// </summary> 
  void operator()(std::vector<double>& v) const
  {
   std::transform(v.begin(), v.end(), v.begin(), randomGenerator);
  }
 private:
  std::function<double(double)> randomGenerator;
  Generator uniformGenerator;
  Distribution distribution;
 };
}
//


OneFactorProcess.h


I decided to tag drift and diffusion functions with const declaration, since these functions should not modify the internal state of class data members.

#pragma once
//
namespace MikeJuniperhillOneFactorProcessLibrary
{
 /// <summary>
 /// Abstract base class for all one-factor processes for customizing 
 /// drift and diffusion functions for different types of processes.
 /// </summary>
 class OneFactorProcess
 {
 public:
  virtual double drift(double x, double t) const = 0;
  virtual double diffusion(double x, double t) const = 0;
 };
 //
 /// <summary>
 /// Implementation for Vasicek short-rate model.
 /// </summary>
 class Vasicek : public OneFactorProcess
 {
 public:
  Vasicek(double meanReversion, double longTermRate, double rateVolatility)
   : meanReversion(meanReversion), longTermRate(longTermRate), rateVolatility(rateVolatility) { }
  //
  double drift(double x, double t) const override { return meanReversion * (longTermRate - x); }
  double diffusion(double x, double t) const override { return rateVolatility; }
 private:
  double meanReversion;
  double longTermRate;
  double rateVolatility;
 };
}
//


PathGenerator.h


As in the case with RandomGenerator, the basic functionality of this template class has not been changed either, except for construction part : second constructor is allowing a client to give any probability distribution to be delivered for distributional random generator.

#pragma once
//
#include "RandomGenerator.h"
#include "OneFactorProcess.h"
namespace MJRandom = MikeJuniperhillRandomGeneratorTemplate;
namespace MJProcess = MikeJuniperhillOneFactorProcessLibrary;
//
namespace MikeJuniperhillPathGenerator
{
 template <typename Generator = std::mt19937, typename Distribution = std::normal_distribution<double>>
 class PathGenerator
 {
 public:
  /// <summary>
  /// Constructor for PathGenerator template class.
  /// </summary>
  PathGenerator(double spot, double maturity, unsigned long seed,
   const std::shared_ptr<MJProcess::OneFactorProcess>& process)
   : spot(spot), maturity(maturity), process(process)
  {
   // create random generator
   generator = std::unique_ptr<MJRandom::RandomGenerator<Generator, Distribution>>
    (new MJRandom::RandomGenerator<Generator, Distribution>(seed));
  }
  /// <summary>
  /// Constructor for PathGenerator template class, with a client-given probability distribution
  /// </summary>
  PathGenerator(double spot, double maturity, unsigned long seed,
   const std::shared_ptr<MJProcess::OneFactorProcess>& process, const Distribution& distribution)
   : spot(spot), maturity(maturity), process(process)
  {
   // create random generator with client-given probability distribution
   generator = std::unique_ptr<MJRandom::RandomGenerator<Generator, Distribution>>
    (new MJRandom::RandomGenerator<Generator, Distribution>(seed, distribution));
  }
  /// <summary> 
  /// Fill a given vector reference with asset prices, following a given stochastic process.
  /// </summary>  
  void operator()(std::vector<double>& v) const
  {
   // transform initialized vector into a path containing random numbers
   (*generator)(v);
   //
   double dt = maturity / (v.size() - 1);
   double dw = 0.0;
   double s = spot;
   double t = 0.0;
   v[0] = s; // 1st path element will always be the current spot price
   //
   // transform distributional random number vector into a path containing 
   // asset prices from a given stochastic one-factor process
   for (auto it = v.begin() + 1; it != v.end(); ++it)
   {
    t += dt;
    dw = (*it) * std::sqrt(dt);
    (*it) = s + (*process).drift(s, t) * dt + (*process).diffusion(s, t) * dw;
    s = (*it);
   }
  }
 private:
  double spot;
  double maturity;
  std::shared_ptr<MJProcess::OneFactorProcess> process;
  std::unique_ptr<MJRandom::RandomGenerator<Generator, Distribution>> generator;
 };
}
//


Tester.cpp


Tester program is closely tracking the program presented in previous post. For the sake of additional clarity, I have used new type definitions in order to improve code readability and get rid of some lengthy variable names. The program is again using simple factory method for creating PathGenerator (function wrapped in shared pointer). In this program, OneFactorProcess implementation is created and delivered for factory method for processing. Finally, there is a method for printing processed paths to console for testing purposes.

#include <iostream>
#include <chrono>
#include <ppl.h>
#include <concurrent_vector.h>
#include "PathGenerator.h"
namespace MJGenerator = MikeJuniperhillPathGenerator;
//
// type definitions
using Path = std::vector<double>;
using Paths = concurrency::concurrent_vector<Path>;
using Process = std::shared_ptr<MJProcess::OneFactorProcess>;
using Processor = std::function<void(void)>;
using PathGenerator = std::shared_ptr<Processor>;
//
// thread-safe container for storing asset price paths, processed by path generators
Paths paths;
//
// printer for generated asset price paths
void Printer()
{
 std::for_each(paths.begin(), paths.end(),
  [](Path path) -> void
 {
  std::for_each(path.begin(), path.end(),
   [](double s) -> void
  {
   std::cout << s << ",";
  });
  std::cout << std::endl;
 });
}
//
// factory method :
// return path-generating function as function wrapper
// input arguments are common for all types of generators
PathGenerator Factory(double spot, double maturity, int nPaths, 
 int nSteps, unsigned long seed, const Process& process, Paths& paths)
{
 // create function for processing one-factor paths
 auto generator = [=, &process, &paths]() -> void
 {
  MJGenerator::PathGenerator<> oneFactorProcess(spot, maturity, seed, process);
  Path path(nSteps);
  for (auto i = 0; i != nPaths; ++i)
  {
   oneFactorProcess(path);
   paths.push_back(path);
  }
 };
 // return generator function as function wrapper
 return PathGenerator(new Processor(generator));
}
//
int main()
{
 // create vasicek process
 double longTermRate = 0.05;
 double meanReversion = 0.2;
 double rateVolatility = 0.0075; 
 Process vasicek = Process(new MJProcess::Vasicek(meanReversion, longTermRate, rateVolatility));
 //
 // define parameters and seed values for path generators
 int nGenerators = 4;
 int nPaths = 100;
 int nSteps = (250 + 1);
 std::vector<unsigned long> seed = { 10322854, 65947, 387528, 772399573 };
 //
 // use factory method for creating path generators
 double spot = 0.0095;
 double maturity = 3.0;
 std::vector<PathGenerator> generators;
 for (auto i = 0; i < nGenerators; i++) generators.push_back(
  Factory(spot, maturity, nPaths, nSteps, seed[i], vasicek, paths));
 //
 // parallel processing
 auto start = std::chrono::steady_clock::now();
 concurrency::parallel_for_each(generators.begin(), generators.end(),
  [](PathGenerator pg) -> void { (*pg)(); });
 auto end = std::chrono::steady_clock::now();
 auto timeElapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
 std::cout << timeElapsed.count() << std::endl;
 //
 // print paths
 Printer();
 return 0;
}
//


Finally, thanks again for reading this blog.
-Mike

Sunday, January 1, 2017

C++11 : Multi-Threaded Random Generator using PPL

After some woodshedding with C++11 random and publishing distributional random generator in my blog a couple of weeks ago, the next question was, how to make this thing to happen a bit faster? Since I already have been snooping PPL (Parallel Patterns Library), I realized that Parallel Algorithms in that library would be an excellent candidate for this task. There is a great and comprehensive internet book available on this topic. After getting familiar with some of the issues presented in these sources, one should be able to parallelize algorithms succesfully.

There are some very specific issues, related to random generator seeding. As I started to parallelize my random number generation (creating several generators and processing these using parallel algorithm), I realized that my random generators were not working as expected. After some further woodshedding I came to conclusion, that (in a multi-threaded setting) each generator has to be seeded independently. Finally, concerning my current generator implementation, I made a decision that a client has to provide explicit seed values for all generators. Based on my hands-on experience with a system used for counterparty credit risk simulations (widely ranked as state-of-the-art numerical software), I dare to say that this approach is well justified. Now, the question of what kind of tools will be used for creating the actual seed values, has been left to a client judgement. The rabbit hole of these profound issues related to random generators have been chewed in this blog. Do not enter, if you are not absolutely ready to take the red pill.


ROUGH DIAGNOSTICS


I tested the both sequential (for_each) and parallel (parallel_for_each) schemes by using four generators, 10000 paths and 250 time steps for a single run. After this, I repeated this run for 250 times. The average running time for this sample was 5187 milliseconds (sequential) and 2317 milliseconds (parallel). So, parallel scheme will be completed about two times faster. The actual CPU usages during simulation process for the both schemes are shown in the screenshot below. When using parallel algorithm, computer is able to utilize all four cores in my computer.



I also tested all processed random paths for the both schemes, just to be absolutely sure there are no path duplicates (random number generation would not be independent). Based on my analysis made in Excel, all processed paths for the both schemes are different and there are no duplicates. Presented scheme is hereby fulfilling my two requirements : faster creation of random paths and independency of random generators.


RandomGenerator.h


Current implementation for distributional random path generator is presented below. Note, that the actual seeder creation takes place outside of this class.

#pragma once
#include <algorithm>
#include <functional>
#include <vector>
#include <random>
//
namespace MikeJuniperhillRandomGeneratorTemplate
{
 template <typename Generator = std::mt19937, typename Distribution = std::normal_distribution<double>>
 /// <summary>
 /// Template class for creating random number paths using
 /// Mersenne Twister as default uniform random generator and
 /// Standard Normal (0.0, 1.0) as default probability distribution.
 /// </summary> 
 class RandomGenerator
 {
 public:
  /// <summary>
  /// set explicit seed value
  /// </summary>
  RandomGenerator(unsigned long seed)
  {
   // construct lambda method for processing standard normal random number
   randomGenerator = [this](double x)-> double
   {
    x = distribution(uniformGenerator);
    return x;
   };
   // seed generator
   uniformGenerator.seed(seed);
  }
  /// <summary>
  /// fill vector reference with distributional random numbers
  /// </summary> 
  void operator()(std::vector<double>& v)
  {
   std::transform(v.begin(), v.end(), v.begin(), randomGenerator);
  }
 private:
  std::function<double(double)> randomGenerator;
  Generator uniformGenerator;
  Distribution distribution;
 };
}
//


tester.cpp


First, all simulation parameters and explicit seed values for all generators are defined. Simple factory method is then used to create the actual generator function as lambda function. Next, factory method is returning generator function wrapped inside a shared pointer. Main program is then storing each created generator function into vector container and finally processing all generators (sequentially or parallel). Each generator is storing generated paths into thread-safe nested vector (std::vector inside concurrency::concurrent_vector).

#include <iostream>
#include <memory>
#include <chrono>
#include <ppl.h>
#include <concurrent_vector.h>
#include "RandomGenerator.h"
namespace MJ = MikeJuniperhillRandomGeneratorTemplate;
//
//
// thread-safe nested vector for storing random paths processed by generators
concurrency::concurrent_vector<std::vector<double>> paths;
//
// prototype for generator function
using Generator = std::function<void(void)> ;
//
// factory method :
// return generator function as function wrapper
std::shared_ptr<Generator> Factory(int nPaths, int nSteps, unsigned long seed, 
 concurrency::concurrent_vector<std::vector<double>>& paths)
{
 // create generator function
 auto generator = [=, &paths]()->void
 {
  // step 1 : create distributional random generator
  MJ::RandomGenerator<> randomGenerator(seed);
  // step 2 : create container for a random path
  std::vector<double> path(nSteps);
  // step 3 : create path using distributional generator
  // and import created path into thread-safe container
  for (auto i = 0; i != nPaths; ++i)
  {
   randomGenerator(path);
   paths.push_back(path);
  }
 };
 // return generator function as function wrapper
 return std::shared_ptr<Generator>(new Generator(generator));
}
//
int main()
{
 // define parameters and seed values for generator functions
 int nGenerators = 4;
 int nPaths = 10000;
 int nSteps = 250;
 std::vector<unsigned long> seed = { 10322854, 65947, 387528, 772399573 };
 //
 // use factory method for creating generator function
 std::vector<std::shared_ptr<Generator>> generators;
 for (auto i = 0; i < nGenerators; i++) generators.push_back(Factory(nPaths, nSteps, seed[i], paths));
 //
 // parallel processing
 auto start = std::chrono::steady_clock::now();
 concurrency::parallel_for_each(generators.begin(), generators.end(),
  [](std::shared_ptr<Generator> ppg) -> void { (*ppg)(); });
 //
 //// uncomment for sequential processing
 //std::for_each(generators.begin(), generators.end(),
 // [](std::shared_ptr<Generator> ppg) -> void { (*ppg)(); });
 //
 auto end = std::chrono::steady_clock::now();
 auto timeElapsed = (end - start);
 std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(timeElapsed).count() << std::endl;
 return 0;
}
//


Finally, thanks for reading my blog. I would like to wish Happy New Year for everybody.
-Mike

Monday, December 26, 2016

XLW : Interfacing C++11 PathGenerator to Excel

In my previous post on generating paths for different types of one-factor processes, I was writing all processed paths sequentially into separate CSV files. Later, these files could be opened in Excel for any further use. This approach is far from being an efficient way to interface C++ program to Excel. While being possible to create in-house tool for this task, the actual implementation part requires relatively deep understanding and hands-on experience on Excel/C API. The amount of issues to be learned in order to produce truly generic and reliable tool, is far from being something which could be internalized in very short period of time. There are limited amount of books available on this topic, but the most recommended book is the one written by Steve Dalton.

This time, I wanted to present kind of "industry standard" way to accomplish this task using "easy-to-use" Excel/C++ interfacing tool, which has been there already since 2002. XLW is an application which wraps Excel/C API into a simple C++ interface, which can then be used to customize Excel with user-defined worksheet functions. For any newcomer into this issue, it is highly recommended first to watch instructional "how-to-use" clips from project homepage. The complete workflow of using XLW wrapper is presented there from downloading to debugging. Also, AdaptiveRisk blog is presenting extremely useful stuff, well enough to get started.


cppinterface.h


For this project, I have extracted a new XLW xll template and opened corresponding solution in my Visual Studio Express 2013. The content of this header file in my current project is presented below. I have method declarations for two functions, which are both returning matrix object (XLW data structure). It is my intention to use my previous PathGenerator project, in order to fill these matrix objects with paths using desired one-factor processes.

#ifndef TEST_H
#define TEST_H
//
#include "xlw/MyContainers.h"
#include <xlw/CellMatrix.h>
#include <xlw/DoubleOrNothing.h>
#include <xlw/ArgList.h>
#include <xlw/XlfServices.h>
//
using namespace xlw;
//
//<xlw:libraryname=XLWPathGenerator
//
// method for requesting vasicek paths
MyMatrix // return matrix of random paths following Vasicek SDE
//<xlw:volatile
GetPaths_Vasicek(double t, // time to maturity
double r, // current short rate
double longTermRate, // long-term average rate
double meanReversion, // mean reversion speed
double rateVolatility // rate volatility
);
//
// method for requesting GBM paths
MyMatrix // return matrix of random paths following Geometric Brownian Motion SDE
//<xlw:volatile
GetPaths_BrownianMotion(double t, // time to maturity
double s, // current spot rate
double rate, // risk-free rate
double volatility // volatility
);
//
#endif

Commenting may seem a bit strange first, but the following screenshot containing Excel function argument input box may help to catch the point.















source.cpp


Implementations for two methods declared in header file are presented below. Information concerning number of time steps (for a path) and number of paths to be created are extracted from matrix dimensions using XlfServices object. After this, desired OneFactorProcess and PathGenerator objects are created. Finally, PathGenerator object is used to process a path, which will be imported into resulting matrix object (paths) and returned for the client (Excel).

#include <cppinterface.h>
#include "PathGenerator.h"
#pragma warning (disable : 4996)
//
MyMatrix GetPaths_Vasicek(double t, double r, double longTermRate, double meanReversion, double rateVolatility)
{
 // request dimensions for calling matrix
 const unsigned int nPaths = XlfServices.Information.GetCallingCell().columns();
 const unsigned int nSteps = XlfServices.Information.GetCallingCell().rows();
 // create container for all processed paths
 MyMatrix paths(nSteps, nPaths);
 // create container for a single path to be processed
 MyArray path(nSteps);
 //
 // create vasicek process and path generator
 std::shared_ptr<MJProcess::OneFactorProcess> vasicek =
  std::shared_ptr<MJProcess::Vasicek>(new MJProcess::Vasicek(meanReversion, longTermRate, rateVolatility));
 PathGenerator<> shortRateProcess(r, t, vasicek);
 //
 // process paths using path generator
 for (unsigned int i = 0; i != nPaths; ++i)
 {
  shortRateProcess(path);
  // import processed path into paths container
  for (unsigned j = 0; j != nSteps; ++j)
  {
   paths[j][i] = path[j];
  }
 }
 return paths;
}
//
MyMatrix GetPaths_BrownianMotion(double t, double s, double rate, double volatility)
{
 // request dimensions for calling matrix
 const unsigned int nPaths = XlfServices.Information.GetCallingCell().columns();
 const unsigned int nSteps = XlfServices.Information.GetCallingCell().rows();
 // create container for all processed paths
 MyMatrix paths(nSteps, nPaths);
 // create container for a single path to be processed
 MyArray path(nSteps);
 //
 // create geometric brownian motion process and path generator
 std::shared_ptr<MJProcess::OneFactorProcess> brownianMotion =
  std::shared_ptr<MJProcess::GBM>(new MJProcess::GBM(rate, volatility));
 PathGenerator<> equityPriceProcess(s, t, brownianMotion);
 //
 // process paths using path generator
 for (unsigned int i = 0; i != nPaths; ++i)
 {
  equityPriceProcess(path);
  // import processed path into paths container
  for (unsigned j = 0; j != nSteps; ++j)
  {
   paths[j][i] = path[j];
  }
 }
 return paths;
}
//


In order to get this thing up and running, header file for PathGenerator has to be included. I have set the current XLL project as startup project. As a side, I have opened my PathGenerator project (containing header files for RandomGenerator, OneFactorProcess and PathGenerator). Since this side project is still unaccessible, it has to be linked to my current XLL project : Project - Properties - Configuration Properties - C/C++ - General - Additional Include Directories (Browse folder containing source files for the project to be linked). After completing these steps and building this project succesfully, I am finally ready to test the provided functionality in Excel.


Excel


After opening a new Excel, I need to drag-and-drop (or open from Excel) newly created xll template from my \\Projects\XLWTester\Debug folder to Excel. In my Excel (screenshot below), I have two boxes for input parameters (Vasicek, Brownian Motion) and two ranges for resulting one-factor process paths (36 steps, 15 paths). As soon as I hit F9 button, my one-factor paths will be re-created. Finally, it should be noted that the functions are both array formulas.





















Finally, thanks for reading this blog. Pleasant waiting for a new year for everybody.
-Mike

Sunday, December 18, 2016

C++11 : PathGenerator Class

This blog posting is a sequel for the post I published just a couple of days ago. Having tools for generating random numbers is nice, but I still wanted to put that Random Generator template class to do something even more useful, like to produce paths for asset prices.

The idea was to create some practical means for transforming random number paths into asset price paths, following any desired (one-factor) stochastic process. For this purpose, I have created one template class (PathGenerator), which is technically just a wrapper for RandomGenerator template class and OneFactorProcess polymorphic class hierarchy. The purpose of RandomGenerator is to produce random numbers from desired probability distribution (usually that is standard normal) and the purpose of OneFactorProcess implementation is to provide information for PathGenerator on how to calculate drift and diffusion coefficients for a chosen stochastic process.


PATHS


For a marketing reasons, let us see the end product first. Simulated asset price paths for Geometric Brownian Motion and Vasicek processes are presented in Excel screenshot below. Test program (presented below) has been created in a way, that all processed asset price paths for a chosen stochastic process are exported into CSV file (which can then be imported into Excel for further investigation).

















ONE-FACTOR PROCESS


Abstract base class (OneFactorProcess) is technically just an interface, which provides practical means for a client for customizing drift and diffusion functions for different types of stochastic processes. I decided to implement polymorphic class hierarchy, since class is still pretty compact place for storing private member data and corresponding algorithms using that member data.

In the first draft, I was actually implementing drift and diffusion coefficient algorithms by using functions and lambdas, which (being initially created in main program) would then have been used inside PathGenerator object. It was technically working well, but from the viewpoint of possible end user (say, having member data and algorithm implementations in different files) it would have been quite a different story.

#pragma once
//
namespace MikeJuniperhillOneFactorProcessLibrary
{
 /// <summary>
 /// Abstract base class for all one-factor processes
 /// is technically just an interface, which provides practical means
 /// for customizing drift and diffusion functions for different types of processes.
 /// </summary>
 class OneFactorProcess
 {
 public:
  virtual double drift(double x, double t) = 0;
  virtual double diffusion(double x, double t) = 0;
 };
 //
 /// <summary>
 /// Implementation for Vasicek short-rate model.
 /// </summary>
 class Vasicek : public OneFactorProcess
 {
 public:
  Vasicek(double meanReversion, double longTermRate, double rateVolatility)
   : meanReversion(meanReversion), longTermRate(longTermRate), rateVolatility(rateVolatility) { }
  double drift(double x, double t) override { return meanReversion * (longTermRate - x); }
  double diffusion(double x, double t) override { return rateVolatility; }
 private:
  double meanReversion;
  double longTermRate;
  double rateVolatility;
 };
 //
 /// <summary>
 /// Implementation for Geometric Brownian Motion.
 /// </summary>
 class GBM : public OneFactorProcess
 {
 public:
  GBM(double rate, double volatility) : rate(rate), volatility(volatility) { }
  double drift(double x, double t) override { return rate * x; }
  double diffusion(double x, double t) override { return x * volatility; }
 private:
  double rate;
  double volatility;
 };
}


PATH GENERATOR


PathGenerator object is using RandomGenerator object for creating random numbers from standard normal probability distribution, by using default seeder for default uniform generator (Mersenne Twister). At some point, this class was having several different constructors for client-given seeder and client-given probability distribution. However, for the sake of clarity, I decided to remove all that optionality. In most of the cases, we want to simulate random numbers from standard normal distribution and Mersenne Twister generator still does the uniform part pretty well. It should be noted, that if such a customizing need sometimes arises, this class can be modified accordingly.

#pragma once
//
#include "RandomGenerator.h"
#include "OneFactorProcess.h"
namespace MJRandom = MikeJuniperhillRandomGeneratorTemplate;
namespace MJProcess = MikeJuniperhillOneFactorProcessLibrary;
//
template <typename Generator = std::mt19937, typename Distribution = std::normal_distribution<double>>
class PathGenerator
{
public:
 /// <summary>
 /// Constructor for PathGenerator template class.
 /// <para> - using default seeder from chrono library </para>
 /// <para> - using standard normal distribution </para>
 /// <para> - using mersenne twister uniform generator </para>
 /// </summary>
 PathGenerator(double spot, double maturity, std::shared_ptr<MJProcess::OneFactorProcess>& process)
  : spot(spot), maturity(maturity), process(process)
 {
  // create default seeder lambda function for random generator
   this->seeder = [](void) -> unsigned long { return static_cast<unsigned long>
    (std::chrono::steady_clock::now().time_since_epoch().count()); };
  // create random generator
  generator = std::unique_ptr<MJRandom::RandomGenerator<Generator, Distribution>>
   (new MJRandom::RandomGenerator<Generator, Distribution>(this->seeder));
 }
 /// <summary> 
 /// Functor, which fills auxiliary vector reference with asset prices following a given stochastic process.
 /// </summary>  
 void operator()(std::vector<double>& v)
 {
  // transform initialized vector into a path containing random numbers
  (*generator)(v);
  //
  double dt = maturity / (v.size() - 1);
  double dw = 0.0;
  double s = spot;
  double t = 0.0;
  v[0] = s; // 1st path element is always the current spot price
  //
  // transform random number vector into a path containing asset prices
  for (auto it = v.begin() + 1; it != v.end(); ++it)
  {
   t += dt;
   dw = (*it) * std::sqrt(dt);
   (*it) = s + (*process).drift(s, t) * dt + (*process).diffusion(s, t) * dw;
   s = (*it);
  }
 }
private:
 double spot;
 double maturity;
 std::function<unsigned long(void)> seeder;
 std::shared_ptr<MJProcess::OneFactorProcess> process;
 std::unique_ptr<MJRandom::RandomGenerator<Generator, Distribution>> generator;
};


TESTER


Presented test program is creating two different one-factor processes (Geometric Brownian Motion, Vasicek) and using PathGenerator for simulating asset price paths. All processed paths for the both cases will then be printed into CSV file for further investigations (Excel). I dare to say, that the process of generating asset price paths for one-factor process is easy and straightforward with PathGenerator class. For testing purposes, RandomGenerator header file should be included in the project.

#include "PathGenerator.h"
#include <fstream>
//
// printer : after a simulation iteration, append a path content into a file
// this inefficient idiom is not suitable for anything else but crude testing
void Print(const std::vector<double>& v, const std::string& filePathName)
{
 std::ofstream file(filePathName, std::ios_base::app);
 for (auto& element : v) { file << std::to_string(element) << ";"; }
 file << std::endl;
}
//
int main()
{
 // define and delete output CSV test files
 const std::string fileForVasicekProcess = "C:\\temp\\paths.vasicek.csv";
 const std::string fileForBrownianMotion = "C:\\temp\\paths.brownianMotion.csv";
 std::remove(fileForVasicekProcess.c_str());
 std::remove(fileForBrownianMotion.c_str());
 //
 // create auxiliary vector container for path processing
 double t = 3.0; // time to maturity
 int nPaths = 250; // number of simulated paths
 int nSteps = 1095; // number of time steps in one path (3 years * 365 days = 1095 steps)
 // the size of vector must be the number of desired time steps 
 // plus one, since the first vector item will be the current asset spot price
 std::vector<double> path(nSteps + 1);
 //
 // example 1 : create paths using vasicek process
 double r = 0.0095;
 double longTermRate = 0.05;
 double meanReversion = 0.2;
 double rateVolatility = 0.0075;
 std::shared_ptr<MJProcess::OneFactorProcess> vasicek =
  std::shared_ptr<MJProcess::Vasicek>(new MJProcess::Vasicek(meanReversion, longTermRate, rateVolatility));
 PathGenerator<> shortRateProcess(r, t, vasicek);
 for (int i = 0; i != nPaths; ++i)
 {
  shortRateProcess(path); // generate a path
  Print(path, fileForVasicekProcess); // print a path
 }
 //
 // example 2 : create paths using geometric brownian process
 double s = 100.0;
 double rate = 0.02;
 double volatility = 0.25;
 std::shared_ptr<MJProcess::OneFactorProcess> brownianMotion =
  std::shared_ptr<MJProcess::GBM>(new MJProcess::GBM(rate, volatility));
 PathGenerator<> equityPriceProcess(s, t, brownianMotion);
 for (int i = 0; i != nPaths; ++i)
 {
  equityPriceProcess(path);  // generate a path
  Print(path, fileForBrownianMotion); // print a path
 }
 return 0;
}

Finally, thanks for reading my blog and have a very pleasant waiting time for Christmas.
-Mike