Sunday, January 1, 2017

C++11 : Multi-Threaded Random Generator using PPL

After some woodshedding with C++11 random and publishing distributional random generator in my blog a couple of weeks ago, the next question was, how to make this thing to happen a bit faster? Since I already have been snooping PPL (Parallel Patterns Library), I realized that Parallel Algorithms in that library would be an excellent candidate for this task. There is a great and comprehensive internet book available on this topic. After getting familiar with some of the issues presented in these sources, one should be able to parallelize algorithms succesfully.

There are some very specific issues, related to random generator seeding. As I started to parallelize my random number generation (creating several generators and processing these using parallel algorithm), I realized that my random generators were not working as expected. After some further woodshedding I came to conclusion, that (in a multi-threaded setting) each generator has to be seeded independently. Finally, concerning my current generator implementation, I made a decision that a client has to provide explicit seed values for all generators. Based on my hands-on experience with a system used for counterparty credit risk simulations (widely ranked as state-of-the-art numerical software), I dare to say that this approach is well justified. Now, the question of what kind of tools will be used for creating the actual seed values, has been left to a client judgement. The rabbit hole of these profound issues related to random generators have been chewed in this blog. Do not enter, if you are not absolutely ready to take the red pill.


ROUGH DIAGNOSTICS


I tested the both sequential (for_each) and parallel (parallel_for_each) schemes by using four generators, 10000 paths and 250 time steps for a single run. After this, I repeated this run for 250 times. The average running time for this sample was 5187 milliseconds (sequential) and 2317 milliseconds (parallel). So, parallel scheme will be completed about two times faster. The actual CPU usages during simulation process for the both schemes are shown in the screenshot below. When using parallel algorithm, computer is able to utilize all four cores in my computer.



I also tested all processed random paths for the both schemes, just to be absolutely sure there are no path duplicates (random number generation would not be independent). Based on my analysis made in Excel, all processed paths for the both schemes are different and there are no duplicates. Presented scheme is hereby fulfilling my two requirements : faster creation of random paths and independency of random generators.


RandomGenerator.h


Current implementation for distributional random path generator is presented below. Note, that the actual seeder creation takes place outside of this class.

#pragma once
#include <algorithm>
#include <functional>
#include <vector>
#include <random>
//
namespace MikeJuniperhillRandomGeneratorTemplate
{
 template <typename Generator = std::mt19937, typename Distribution = std::normal_distribution<double>>
 /// <summary>
 /// Template class for creating random number paths using
 /// Mersenne Twister as default uniform random generator and
 /// Standard Normal (0.0, 1.0) as default probability distribution.
 /// </summary> 
 class RandomGenerator
 {
 public:
  /// <summary>
  /// set explicit seed value
  /// </summary>
  RandomGenerator(unsigned long seed)
  {
   // construct lambda method for processing standard normal random number
   randomGenerator = [this](double x)-> double
   {
    x = distribution(uniformGenerator);
    return x;
   };
   // seed generator
   uniformGenerator.seed(seed);
  }
  /// <summary>
  /// fill vector reference with distributional random numbers
  /// </summary> 
  void operator()(std::vector<double>& v)
  {
   std::transform(v.begin(), v.end(), v.begin(), randomGenerator);
  }
 private:
  std::function<double(double)> randomGenerator;
  Generator uniformGenerator;
  Distribution distribution;
 };
}
//


tester.cpp


First, all simulation parameters and explicit seed values for all generators are defined. Simple factory method is then used to create the actual generator function as lambda function. Next, factory method is returning generator function wrapped inside a shared pointer. Main program is then storing each created generator function into vector container and finally processing all generators (sequentially or parallel). Each generator is storing generated paths into thread-safe nested vector (std::vector inside concurrency::concurrent_vector).

#include <iostream>
#include <memory>
#include <chrono>
#include <ppl.h>
#include <concurrent_vector.h>
#include "RandomGenerator.h"
namespace MJ = MikeJuniperhillRandomGeneratorTemplate;
//
//
// thread-safe nested vector for storing random paths processed by generators
concurrency::concurrent_vector<std::vector<double>> paths;
//
// prototype for generator function
using Generator = std::function<void(void)> ;
//
// factory method :
// return generator function as function wrapper
std::shared_ptr<Generator> Factory(int nPaths, int nSteps, unsigned long seed, 
 concurrency::concurrent_vector<std::vector<double>>& paths)
{
 // create generator function
 auto generator = [=, &paths]()->void
 {
  // step 1 : create distributional random generator
  MJ::RandomGenerator<> randomGenerator(seed);
  // step 2 : create container for a random path
  std::vector<double> path(nSteps);
  // step 3 : create path using distributional generator
  // and import created path into thread-safe container
  for (auto i = 0; i != nPaths; ++i)
  {
   randomGenerator(path);
   paths.push_back(path);
  }
 };
 // return generator function as function wrapper
 return std::shared_ptr<Generator>(new Generator(generator));
}
//
int main()
{
 // define parameters and seed values for generator functions
 int nGenerators = 4;
 int nPaths = 10000;
 int nSteps = 250;
 std::vector<unsigned long> seed = { 10322854, 65947, 387528, 772399573 };
 //
 // use factory method for creating generator function
 std::vector<std::shared_ptr<Generator>> generators;
 for (auto i = 0; i < nGenerators; i++) generators.push_back(Factory(nPaths, nSteps, seed[i], paths));
 //
 // parallel processing
 auto start = std::chrono::steady_clock::now();
 concurrency::parallel_for_each(generators.begin(), generators.end(),
  [](std::shared_ptr<Generator> ppg) -> void { (*ppg)(); });
 //
 //// uncomment for sequential processing
 //std::for_each(generators.begin(), generators.end(),
 // [](std::shared_ptr<Generator> ppg) -> void { (*ppg)(); });
 //
 auto end = std::chrono::steady_clock::now();
 auto timeElapsed = (end - start);
 std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(timeElapsed).count() << std::endl;
 return 0;
}
//


Finally, thanks for reading my blog. I would like to wish Happy New Year for everybody.
-Mike

No comments:

Post a Comment