Tuesday, January 8, 2019

Python: Market Scenario Files Generator for Third-party Analytics Software

Third-party analytics software usually requires specific set of market data for performing its calculations. In this post, I am publishing one of my utility Python programs for creating different types of stress scenario markets, based on given base market and set of prepared XML configurations. The complete program can be found in my GitHub repository.


The following screenshot shows configurations for this program. SourceFilePath attribute captures the source market data CSV file and TargetFolderPath captures the folder, into which all market scenario files will be created. Finally, ScenarioConfigurationsPath captures the folder, which contains all XML scenario configuration files. This configuration XML file should be stored in a chosen directory.

    <!-- attributes for scenario generator settings -->

Market data

The following screenshot shows given base market data. Due to brevity reasons, only EUR swap curve has been used here as an example. All market data points are defined here as key-value pairs (ticker, value).


We can clearly see, that the system used for constructing market data tickers leads to scheme, in which every market data point will have one and only one unique ticker. This will then guarantee, that we can drill down and stress individual market data points with regex expressions, if so desired. This data should be copied into CSV file (directory has been defined in previous configuration file).

Scenario configurations

The following screenshot shows XML configurations for one market scenario. One such scenario can have several different scenario items (Say, stress these rates up, stress those rates down, apply these changes to all FX rates against EUR and set hard-coded values for all CDS curves). From these configurations, ID and description are self-explainable. Attribute regExpression captures all regex expressions (scenario items), which will be searched from risk factor tickers. As soon as regex match is found, the program will use corresponding operationType attribute to identify desired stress operation (addition, multiplication or hard-coded value). Finally, the amount of change which will be applied in risk factor value is defined within stressValue attribute. This XML configuration should be stored (directory has been defined in program configuration file).

<!-- operation types : 0 = ADDITION, 1 = MULTIPLICATION, 2 = HARD-CODED VALUE -->
  <description>custom stress scenario for EUR swap curve</description>

Finally, the following screenshot shows resulting market data, when all configured scenario items have been applied. This is the content of output CSV file, created by this Python program.


Handy way to create and test regex expressions is to use any online tool available. As an example, the first scenario item (^IR.EUR-EURIBOR.CASH) has been applied to a given base market data. The last screenshot below shows all regex matches.

Have a great start for the year 2019 and thanks a lot again for reading my blog.

Wednesday, December 26, 2018

QuantLib-Python: Multiprocessing Method Wrapper

In this post, I published a program for simulating term structure up to 30 years with daily time step, using Hull-White one-factor model. The resulting curve was able to replicate the given initial yield curve without any notable differences. The only serious issue here was the cost of running the program in time units.

In order to improve this specific issue, I got familiar with some multi-threading possibilities in Python. Now, there are some deep issues related to the way Python threads are actually implemented and especially the issues with thread locking known as GIL. Related stuff has been completely chewed in here. In order to avoid facing GIL-related issues, another way is to use Python multiprocessing, which allows the programmer to fully leverage multiple processors on a given machine. Moreover, separate processes are completely separate and one process cannot affect another's variables.

Comparison statistics

Again, I simulated term structure up to 30 years with daily time step, using 10000 paths. I did some quick profiling, in order to find the bottlenecks in the original program (sequential). By looking the column task share, we can see, that there are two tasks, which are consuming the most part of the complete running time: path generations (~54%) and path integrations (~46%). After this, I isolated these two parts and processed these by using multiprocessing scheme.

By using multiprocessing (two configured processes), I managed to decrease the complete running time from 163 seconds to 107 seconds. In general, for all those parts which were enjoying the benefits of multiprocessing, improvement ratio (multiprocessing per sequential) is around 0.65.

CPU puzzle

In order to understand the reason for this improvement, let us take a look at processor architecture in my laptop. First, let us check the "Grand Promise" made by System Monitor.

Based on this view, I actually expected to have four CPU for processing. I was then really surprised, that adding third and fourth process was not decreasing running time any further, than having just two processes. After some usual Stackoverflow gymnastics, I finally got the definition to calculate the real number of CPU available in my laptop.

CPU available is "Core(s) per socket * Socket(s)", which is 2 * 1 in my laptop. So, all in all I have only two CPU available for processing, not four as was shown in that System Monitor. This means, that having more than two CPU available in a laptop, one should expect even better improvement ratio than reported in my statistics here.

Wrapper method

In order to avoid code duplication and to come up with something a bit more generic, I started to dream about the possibility to create a mechanism for applying multiprocessing for any given method, if so desired. Such solution is possible by using Python lambda methods.

# method for executing given lambdas in parallel
def MultiprocessingWrapper(targetFunctionList):
    processList = []
    aggregatedResults = []
    queue = mp.Manager().Queue()

    # execute lambda from a given list based on a given index
    # storing results into queue
    def Worker(index):
        result = targetFunctionList[index]()
    # start processes, call worker method with index number
    for i in range(len(targetFunctionList)):
        process = mp.Process(target = Worker, args = (i,))
    # join processes, extract queue into results list
    for process in processList:
    # return list of results for a client
    return aggregatedResults

Previous method is receiving a list of lambda methods as its argument. The idea is, that there would be always one lambda method for each process to be started. Why? We might face a situation, in which different set of parameters would be required for each lambda (say, all lambdas would be using the same uniform random generator, but with a different values for seeding the generator). In wrapper method, process is then created to start each configured lambda method. Results calculated by given lambda will be stored into queue. Finally, all results will be imported from queue into result list and returned for a client.

As concrete example, the following program segment is first creating two lambda methods for starting GeneratePaths method, but with different seed value for both processes. After this, wrapper method is then fed with the list of lambdas and processed paths will be received into list of results.

    # task of generating paths is highly time critical
    # use multiprocessing for path generation
    # create lambdas for multiprocessing wrapper
    # target signature: def GeneratePaths(seed, process, timeGrid, n)
    nPaths = 10000
    nProcesses = 2
    seeds = [1834, 66023]
    nPathsPerProcess = int(nPaths / nProcesses)
    target_1 = lambda:GeneratePaths(seeds[0], HW1F, grid.GetTimeGrid(), nPathsPerProcess)
    target_2 = lambda:GeneratePaths(seeds[1], HW1F, grid.GetTimeGrid(), nPathsPerProcess)
    targetFunctionList = [target_1, target_2]
    results = MultiprocessingWrapper(targetFunctionList)

The complete program can be found in my GitHub repository. Finally, a couple of discussion threads, which may be useful in order to understand some QuantLib-related issues, are given in here and here. Thanks for reading my blog and again, Merry Christmas for everyone.

Monday, December 17, 2018

QuantLib-Python: Exposure Simulation

This Python program is using QuantLib library tools for simulating exposures for one selected Bloomberg vanilla benchmark swap transaction. Based on simulated exposures, the program will then calculate Expected Positive Exposure (EPE) and Expected Negative Exposure (ENE), as well as corresponding CVA and DVA statistics.

Then, some (unfortunate) limitations: the program can only handle one transaction at a time, so simulating exposures for netting sets having several transactions is not possible. Also, the program can simulate only one risk factor at a time, so simulating exposures for transactions exposed to more than one risk factor is not possible. However, with some careful re-designing, these properties could also be implemented by using QuantLib library tools.

The complete program can be found in my GitHub repository. Thanks for reading this blog. Merry Christmas for everyone.

Simulated exposures


A few notes on data.
  • Swap transaction is 5Y receiver vs. 3M USD Libor + spread. At inception, swap PV has been solved to be zero. Details can be found in the screenshot below.
  • Interest rate data for spot term structure (discount factors) has been retrieved from Bloomberg Swap Manager as of 12.12.2018.
  • Default term structures for the both parties (counterparty, self) are created from flat CDS term structures (100 bps), as seen on Bloomberg Swap Manager CVA tab.
  • Short rate simulations are processed by using Hull-White one-factor model, which uses parameters calibrated to a given set of flat 20% swaption volatilities, as seen on Bloomberg Swap Manager CVA tab.


Bloomberg Swap Manager results: CVA = 6854 and DVA = 1557. Program results (for one run): CVA = 6727 and DVA = 1314, using weekly time steps and 1000 paths. However, "close enough" results can be achieved with considerably smaller amount of paths and less dense time grid.


Bloomberg swap transaction

Bloomberg CVA

Bloomberg DVA

Bloomberg EPE

Program EPE

Bloomberg ENE

Program ENE