Code Abstraction for the Options Backtester
In the last two posts, we wrote code to test for the variance risk premium in selling ATM option straddles on SPX:
and on single stock equity options:
The code there is downloadable - the code in this post is for paid readers only, and over the next few posts, we improve our code base from those two files to vastly enhance our options backtesting engine.
The code is attached, so we are only going to make comments at the central points that I want to highlight - for the implementation details, please peruse the code on your own. Again, we are going to greatly improve this over a few posts, so you may also choose to wait until we get to a later point to inspect the Python code.
We are also going to publish QT202 in a few days, which does this in-depth and on a step by step basis. You may choose to follow along that instead, if the textual format is too demanding.
The first order of business is abstracting out common code logic. You can place the two code files side by side - the similarities should be jarring. Let’s implement an abstract base class, with the abstract methods enforced via a custom Exception, and use this as a parent class in an object inheritance paradigm:
import os
import pytz
import zipfile
import numpy as np
import pandas as pd
from datetime import datetime
from collections import defaultdict
from dateutil.relativedelta import relativedelta
class AbstractImplementationException(Exception):
pass
class OptAlpha():
def __init__(self,instruments,trade_range,dfs):
self.instruments = instruments
self.trade_range = trade_range
self.dfs = dfs
def instantiate_variables(self):
raise AbstractImplementationException()
def load_buffer(self,load_from,min_buffer_len=100,min_hist_len=2):
raise AbstractImplementationException()
def compute_buffer(self):
raise AbstractImplementationException()
def compute_signals(self,date,capital):
raise AbstractImplementationException()
def _default_pos(self):
return defaultdict(lambda : {"S":0, "C":[],"P":[],"CU":[],"PU":[]})
Clearly these methods are implemented differently, depending on our contract selection and data format. The buffer interacts with the source of data, and should match your own needs.
Clearly, one of the things that should NOT depend on your strategy or your data source should be things like pnl accounting:
def get_pnl(self,date,last):
if date not in self.data_buffer_idx:
return 0.0
cur_idx = self.data_buffer_idx.index(date)
curr = self.data_buffer[cur_idx]
prev= self.data_buffer[cur_idx-1]
pnl=0.0
for ticker,positions in last.items():
for call,unit in zip(positions["C"],positions["CU"]):
pricedelta = curr.at[call,"last"]-prev.at[call,"last"] if call in curr.index and call in prev.index else 0.0
pnl += pricedelta * unit
for put,unit in zip(positions["P"],positions["PU"]):
pricedelta = curr.at[put,"last"]-prev.at[put,"last"] if put in curr.index and put in prev.index else 0.0
pnl += pricedelta * unit
return pnl
If you compare this function across the two original implementations, there is a minor difference, the `Last’ and ‘last’ column-naming makes them non-perfect substitutes.
So if we use this inside the parent class, then the SPX backtester would work fine, but the equity options pnl function would throw an Exception. Well, the fix is simple, you can just rename the Last > last inside the screen_universe method.
But the lessons are more profound that the fix itself. Again, our objective as a quant blog is to develop you into a better quant, not throw code at you. The inherent principle of abstraction is specification.
When you abstract, the implementation details become obscure. In return, like a contract, you need to make explicit the guarantees of your software component. Think of it like this: you are Ray Allen, and your job is to drill 3s. The implementation of it; be it you wake up at 2am to train, or eat hamburgers is not our problem.
The load_buffer function’s job is to interact with the data layer, and guarantee some format inside the dataframe (what we call the data schema). Every other software component should not need to know where or how this data was prepared. This is the main principle of abstraction.
Inside the equity options screen universe function, we make a small change:
df=df.rename(columns={
"OptionRoot":"optionroot",
"UnderlyingSymbol": "underlying",
"UnderlyingPrice": "underlying_last",
"Type": "type",
"Expiration": "expiration",
"DataDate": "quotedate",
"Strike": "strike",
"Last": "last",
"OpenInterest": "openinterest",
"Volume": "volume"
})
Now our data schema is matched in both data sources, and our pnl functionality can be abstracted away.
This is just the main point, and the rest of changes are similar in idea. The options backtester is still begging for much more functionality and flexibility. We will look at these in rapid-fire over the next few days, so brace yourself for the posts.
Download code here and cheers: