The idea was very noble and I was quite happy with the first development candidate. I especially liked the idea of data structure being an object, also for schematic and semantic reasons. Plus, I thought that I would finally get rid of all those boring extra lines of code needed, when using plain arrays. I also thought that only a small performance penalty would be paid, with all function calls made using class accessors (push, at). However, as I was prototyping Monte Carlo basket equity option pricing and doing matrix operations with large matrices, I quickly realized that the cost of processing matrix operations was definitely way too high.
Testing processing times
The issue was bothering me in a such way, that I finally wanted to get some hard-tested facts on handling and operating matrix structures. For this reason, I prepared test cases for three different matrix schemes. Within these test cases, matrix data structure was implemented by using
- Two-dimensional array
- Variant array of double arrays (jagged array)
- Matrix class
Within test cases mentioned, a simple procedural program performs the following operations for all matrix schemes described above
- Creates three matrices (A, B, C)
- Fills two matrices (A, B) with random numbers
- Performs matrix multiplication (A, B) and returns the result into matrix (C)
Cold shower
The following chart is presenting the findings of this experiment.
We can clearly see, that a simple two-dimensional array is the most efficient data structure for handling large matrix operations in VBA. There is just no way out of this, period. Testing program is presented below. You can just copy-paste it into a new standard VBA module, if you are interested to run it in your own laptop. Remember to create reference to Microsoft Scripting Runtime library.
Option Explicit ' Private Declare Function GetTickCount Lib "kernel32.dll" () As Long Private Const filePath As String = "C:\temp\matrix_log.txt" Private fileSystem As Scripting.FileSystemObject Private textWriter As TextStream Private textWriterEnabled As Boolean Private stream As String Private startTime As Long Private endTime As Long Private timeElapsed As Double ' Sub tester() ' Set fileSystem = New Scripting.FileSystemObject Set textWriter = fileSystem.OpenTextFile(filePath, ForWriting, True) textWriterEnabled = True ' Dim r As Long: r = 100000 Dim c As Long: c = 10 Dim i As Long For i = 0 To 19 testRun_2DArray r + i * r, c testRun_arrayOfArrays r + i * r, c testRun_matrixClass r + i * r, c Next i ' MsgBox "completed", vbInformation, "timer log file for matrix operations" Set textWriter = Nothing Set fileSystem = Nothing End Sub ' Private Function testRun_2DArray(ByVal r As Long, ByVal c As Long) ' ' create 2-dim arrays needed Dim m1() As Double: ReDim m1(1 To r, 1 To c) Dim m2() As Double: ReDim m2(1 To c, 1 To r) Dim m3() As Double: ReDim m3(1 To c, 1 To c) Dim i As Long, j As Long, k As Long ' ' fill array 1 For i = 1 To r For j = 1 To c m1(i, j) = Rnd Next j Next i ' ' fill array 2 For i = 1 To r For j = 1 To c m2(j, i) = Rnd Next j Next i ' ' perform matrix multiplication startTime = GetTickCount() Dim v As Double ' For i = 1 To c For j = 1 To c v = 0 ' For k = 1 To r v = v + m2(i, k) * m1(k, j) Next k m3(i, j) = v Next j Next i ' ' write timer results into log file endTime = GetTickCount() timeElapsed = (endTime - startTime) / 1000 stream = "testRun_2DArray;" & r & ";" & timeElapsed If (textWriterEnabled) Then textWriter.WriteLine stream End Function ' Private Function testRun_arrayOfArrays(ByVal r As Long, ByVal c As Long) ' ' create arrays of arrays needed Dim i As Long, j As Long, k As Long Dim inner() As Double ' Dim m1() As Variant: ReDim m1(1 To r) For i = 1 To r ReDim inner(1 To c) m1(i) = inner Next i ' Dim m2() As Variant: ReDim m2(1 To c) For i = 1 To c ReDim inner(1 To r) m2(i) = inner Next i ' Dim m3() As Variant: ReDim m3(1 To c) For i = 1 To c ReDim inner(1 To c) m3(i) = inner Next i ' ' fill array 1 For i = 1 To r For j = 1 To c m1(i)(j) = Rnd Next j Next i ' ' fill array 2 For i = 1 To r For j = 1 To c m2(j)(i) = Rnd Next j Next i ' ' perform matrix multiplication startTime = GetTickCount() Dim v As Double ' For i = 1 To c For j = 1 To c v = 0 ' For k = 1 To r v = v + m2(i)(k) * m1(k)(j) Next k m3(i)(j) = v Next j Next i ' ' write timer results into log file endTime = GetTickCount() timeElapsed = (endTime - startTime) / 1000 stream = "testRun_arrayOfArrays;" & r & ";" & timeElapsed If (textWriterEnabled) Then textWriter.WriteLine stream End Function ' Private Function testRun_matrixClass(ByVal r As Long, ByVal c As Long) ' ' create 2 matrix objects Dim m1 As New Matrix: m1.init r, c Dim m2 As New Matrix: m2.init c, r Dim i As Long, j As Long ' ' fill matrix 1 For i = 1 To r For j = 1 To c m1.push i, j, Rnd Next j Next i ' ' fill matrix 2 For i = 1 To c For j = 1 To r m2.push i, j, Rnd Next j Next i ' ' perform matrix multiplication Dim m3 As New Matrix startTime = GetTickCount() Set m3 = m3.multiplication(m2, m1) ' ' write timer results into log file endTime = GetTickCount() timeElapsed = (endTime - startTime) / 1000 stream = "testRun_matrixClass;" & r & ";" & timeElapsed If (textWriterEnabled) Then textWriter.WriteLine stream ' Set m3 = Nothing Set m2 = Nothing Set m1 = Nothing End Function '
Small matrices
Additionally, I also tested using MMULT function with simple two-dimensional arrays. Efficiency of this method is only marginally better, than using two-dimensional arrays with the code provided above (testRun_2DArray). Moreover, there is a limit of the sizes of matrices what we can feed for this worksheet function and those are surprisingly low. For example, trying to multiply A (10 * 100 000) with B (100 000 * 10) leads to runtime error.
The chart below is presenting the results for test cases with small matrices, including test case for using MMULT worksheet function. For each matrix schemes, matrix B rows (A columns) were dimensioned from 1 000 to 65 000 and matrix B columns (A rows) were assumed to be constant 10. For example, in the first calculation, we multiplied matrix A (10 * 1 000) with matrix B (1 000 * 10) and received matrix C (10 * 10). In the second calculation matrix dimensions were A (10 * 2 000) and B (2 000 * 10) and we received matrix C (10 * 10), and so on.
The direction of the results is the same as with large matrices. Using MMULT worksheet function is the most efficient choice, but only marginally better than using simple two-dimensional arrays. The use of Matrix wrapper class for small matrix operations can still be seen as reasonable choice, since the time loss compared to more efficient choices is after all, relatively small.
Final run
Just for the curious, I wanted to compare VBA matrix operations efficiency results with the corresponding C++ results. For this reason, I used dynamically allocated arrays. Otherwise, the actual testing program was basically the same as for VBA cases: allocate memory for arrays, fill arrays with random numbers, perform matrix multiplication and finally release the allocated memory. Time elapsed was recorded only for the actual matrix multiplication operation. The chart below is presenting the results.
In a nutshell, average efficiency ratio (VBA processing time / C++ processing time) is 5.24 for this experiment sample. Moreover, larger arrays can be handled in C++ than in VBA, since the memory is allocated from the heap memory instead of stack memory.
Afterthoughts
So, for any large and time-critical matrix operations performed in VBA, a simple two-dimensional array is the most efficient data structure which can be provided by VBA. For a small matrix operations, arrays wrapped in class can still be used. For real hardcore calculations (very large matrices, fast processing times), VBA is unfortunately not efficient tool for handling such calculations.
Life goes on - have a great weekend!
-Mike
Ah great! I will not have to write it from scratch. :) Well, I would argue that you could have used 2d array instead of variant inside your class. That would make the speed comparable.
ReplyDeleteGood point, indeed.
DeleteMatrix implementation is found in the post "Gaussian Copula implementation revisited". Find "standard VBA module (name = MatrixOperations)".
-Mike