Saturday, January 25, 2014

Data structure for matrix in VBA

Large part of programming financial models has something to do with handling and operating matrices. However, for handling any matrix data, there is not ready-made data structure available for this purpose in VBA. The only way to get around this limitation, is to write your own custom data structure by using existing data structures, such as arrays. Within my previous posting on Gaussian Copula implementation, I created one such custom data structure, a Matrix class. Technically speaking, this class is just wrapping arrays into a manageable object with setters and getters, plus provides some of the most common matrix operations for the user.

The idea was very noble and I was quite happy with the first development candidate. I especially liked the idea of data structure being an object, also for schematic and semantic reasons. Plus, I thought that I would finally get rid of all those boring extra lines of code needed, when using plain arrays. I also thought that only a small performance penalty would be paid, with all function calls made using class accessors (push, at). However, as I was prototyping Monte Carlo basket equity option pricing and doing matrix operations with large matrices, I quickly realized that the cost of processing matrix operations was definitely way too high.

Testing processing times

The issue was bothering me in a such way, that I finally wanted to get some hard-tested facts on handling and operating matrix structures. For this reason, I prepared test cases for three different matrix schemes. Within these test cases, matrix data structure was implemented by using
  1. Two-dimensional array
  2. Variant array of double arrays (jagged array)
  3. Matrix class
The use of Collection and Dictionary data structures has been left out of this experiment completely, since these data structures are empirically known to be more costly than the use of arrays. Nick Webber has been doing some serious woodshedding with these issues in his excellent book Implementing Models of Financial Derivatives: Object Oriented Applications with VBA within the chapter 16.

Within test cases mentioned, a simple procedural program performs the following operations for all matrix schemes described above
  1. Creates three matrices (A, B, C)
  2. Fills two matrices (A, B) with random numbers
  3. Performs matrix multiplication (A, B) and returns the result into matrix (C)
Time elapsed was recorded only for the actual matrix multiplication operation. For each matrix schemes, matrix B rows (A columns) were dimensioned from 100 000  to 2 000 000 and matrix B columns (A rows) were assumed to be constant 10. For example, in the first calculation, we multiplied matrix A (10 * 100 000) with matrix B (100 000 * 10) and received matrix C (10 * 10). In the second calculation matrix dimensions were A (10 * 200 000) and B (200 000 * 10) and we received matrix C (10 * 10), and so on.

Cold shower

The following chart is presenting the findings of this experiment.



We can clearly see, that a simple two-dimensional array is the most efficient data structure for handling large matrix operations in VBA. There is just no way out of this, period. Testing program is presented below. You can just copy-paste it into a new standard VBA module, if you are interested to run it in your own laptop. Remember to create reference to Microsoft Scripting Runtime library.

Option Explicit
'
Private Declare Function GetTickCount Lib "kernel32.dll" () As Long
Private Const filePath As String = "C:\temp\matrix_log.txt"
Private fileSystem As Scripting.FileSystemObject
Private textWriter As TextStream
Private textWriterEnabled As Boolean
Private stream As String
Private startTime As Long
Private endTime As Long
Private timeElapsed As Double
'
Sub tester()
    '
    Set fileSystem = New Scripting.FileSystemObject
    Set textWriter = fileSystem.OpenTextFile(filePath, ForWriting, True)
    textWriterEnabled = True
    '
    Dim r As Long: r = 100000
    Dim c As Long: c = 10
    Dim i As Long
    For i = 0 To 19
        testRun_2DArray r + i * r, c
        testRun_arrayOfArrays r + i * r, c
        testRun_matrixClass r + i * r, c
    Next i
    '
    MsgBox "completed", vbInformation, "timer log file for matrix operations"
    Set textWriter = Nothing
    Set fileSystem = Nothing
End Sub
'
Private Function testRun_2DArray(ByVal r As Long, ByVal c As Long)
    '
    ' create 2-dim arrays needed
    Dim m1() As Double: ReDim m1(1 To r, 1 To c)
    Dim m2() As Double: ReDim m2(1 To c, 1 To r)
    Dim m3() As Double: ReDim m3(1 To c, 1 To c)
    Dim i As Long, j As Long, k As Long
    '
    ' fill array 1
    For i = 1 To r
        For j = 1 To c
            m1(i, j) = Rnd
        Next j
    Next i
    '
    ' fill array 2
    For i = 1 To r
        For j = 1 To c
            m2(j, i) = Rnd
        Next j
    Next i
    '
    ' perform matrix multiplication
    startTime = GetTickCount()
    Dim v As Double
    '
    For i = 1 To c
        For j = 1 To c
            v = 0
            '
            For k = 1 To r
                v = v + m2(i, k) * m1(k, j)
            Next k
            m3(i, j) = v
        Next j
    Next i
    '
    ' write timer results into log file
    endTime = GetTickCount()
    timeElapsed = (endTime - startTime) / 1000
    stream = "testRun_2DArray;" & r & ";" & timeElapsed
    If (textWriterEnabled) Then textWriter.WriteLine stream
End Function
'
Private Function testRun_arrayOfArrays(ByVal r As Long, ByVal c As Long)
    '
    ' create arrays of arrays needed
    Dim i As Long, j As Long, k As Long
    Dim inner() As Double
    '
    Dim m1() As Variant: ReDim m1(1 To r)
    For i = 1 To r
        ReDim inner(1 To c)
        m1(i) = inner
    Next i
    '
    Dim m2() As Variant: ReDim m2(1 To c)
    For i = 1 To c
        ReDim inner(1 To r)
        m2(i) = inner
    Next i
    '
    Dim m3() As Variant: ReDim m3(1 To c)
    For i = 1 To c
        ReDim inner(1 To c)
        m3(i) = inner
    Next i
    '
    ' fill array 1
    For i = 1 To r
        For j = 1 To c
            m1(i)(j) = Rnd
        Next j
    Next i
    '
    ' fill array 2
    For i = 1 To r
        For j = 1 To c
            m2(j)(i) = Rnd
        Next j
    Next i
    '
    ' perform matrix multiplication
    startTime = GetTickCount()
    Dim v As Double
    '
    For i = 1 To c
        For j = 1 To c
            v = 0
            '
            For k = 1 To r
                v = v + m2(i)(k) * m1(k)(j)
            Next k
            m3(i)(j) = v
        Next j
    Next i
    '
    ' write timer results into log file
    endTime = GetTickCount()
    timeElapsed = (endTime - startTime) / 1000
    stream = "testRun_arrayOfArrays;" & r & ";" & timeElapsed
    If (textWriterEnabled) Then textWriter.WriteLine stream
End Function
'
Private Function testRun_matrixClass(ByVal r As Long, ByVal c As Long)
    '
    ' create 2 matrix objects
    Dim m1 As New Matrix: m1.init r, c
    Dim m2 As New Matrix: m2.init c, r
    Dim i As Long, j As Long
    '
    ' fill matrix 1
    For i = 1 To r
        For j = 1 To c
            m1.push i, j, Rnd
        Next j
    Next i
    '
    ' fill matrix 2
    For i = 1 To c
        For j = 1 To r
            m2.push i, j, Rnd
        Next j
    Next i
    '
    ' perform matrix multiplication
    Dim m3 As New Matrix
    startTime = GetTickCount()
    Set m3 = m3.multiplication(m2, m1)
    '
    ' write timer results into log file
    endTime = GetTickCount()
    timeElapsed = (endTime - startTime) / 1000
    stream = "testRun_matrixClass;" & r & ";" & timeElapsed
    If (textWriterEnabled) Then textWriter.WriteLine stream
    '
    Set m3 = Nothing
    Set m2 = Nothing
    Set m1 = Nothing
End Function
'

Small matrices

Additionally, I also tested using MMULT function with simple two-dimensional arrays. Efficiency of this method is only marginally better, than using two-dimensional arrays with the code provided above (testRun_2DArray). Moreover, there is a limit of the sizes of matrices what we can feed for this worksheet function and those are surprisingly low. For example, trying to multiply A (10 * 100 000) with B (100 000 * 10) leads to runtime error.

The chart below is presenting the results for test cases with small matrices, including test case for using MMULT worksheet function. For each matrix schemes, matrix B rows (A columns) were dimensioned from 1 000  to 65 000 and matrix B columns (A rows) were assumed to be constant 10. For example, in the first calculation, we multiplied matrix A (10 * 1 000) with matrix B (1 000 * 10) and received matrix C (10 * 10). In the second calculation matrix dimensions were A (10 * 2 000) and B (2 000 * 10) and we received matrix C (10 * 10), and so on.



The direction of the results is the same as with large matrices. Using MMULT worksheet function is the most efficient choice, but only marginally better than using simple two-dimensional arrays. The use of Matrix wrapper class for small matrix operations can still be seen as reasonable choice, since the time loss compared to more efficient choices is after all, relatively small.

Final run

Just for the curious, I wanted to compare VBA matrix operations efficiency results with the corresponding C++ results. For this reason, I used dynamically allocated arrays. Otherwise, the actual testing program was basically the same as for VBA cases: allocate memory for arrays, fill arrays with random numbers, perform matrix multiplication and finally release the allocated memory. Time elapsed was recorded only for the actual matrix multiplication operation. The chart below is presenting the results.



In a nutshell, average efficiency ratio (VBA processing time / C++ processing time) is 5.24 for this experiment sample. Moreover, larger arrays can be handled in C++ than in VBA, since the memory is allocated from the heap memory instead of stack memory.

Afterthoughts

So, for any large and time-critical matrix operations performed in VBA, a simple two-dimensional array is the most efficient data structure which can be provided by VBA. For a small matrix operations, arrays wrapped in class can still be used. For real hardcore calculations (very large matrices, fast processing times), VBA is unfortunately not efficient tool for handling such calculations.

Life goes on - have a great weekend!

-Mike

2 comments:

  1. Ah great! I will not have to write it from scratch. :) Well, I would argue that you could have used 2d array instead of variant inside your class. That would make the speed comparable.

    ReplyDelete
    Replies
    1. Good point, indeed.

      Matrix implementation is found in the post "Gaussian Copula implementation revisited". Find "standard VBA module (name = MatrixOperations)".

      -Mike

      Delete