A Box of Alphas to Rock Your Socks; Benchmark Datasets for Portfolio Optimization

Apr 23, 2023

∙ Paid

In the previous post, we talked about the ledoit-wolf constant correlation shrinkage for better estimation oos in covariance matrices for portfolio optimization.

HangukQuant’s Newsletter

Ledoit-Wolf Constant Correlation Shrinkage (Python)

2 years ago · 4 likes · HangukQuant

As we enhance the discussion in portfolio optimization methods, we want to get our hands dirty with real market data and implement these optimization methods. However, instead of working on stock/instrument data, we want to work on optimizing a portfolio of active alphas on instruments instead, which exhibit different correlation dynamics compared to the underlying. In order to advance our discussion on this topic, we find it convenient to release benchmark datasets of asset returns generated on formulaic alphas (some previously discussed on HangukQuant), available for our readers to download and play around with to follow along or implement your own optimization model.

The benchmark datasets contain the master.txt file, which will be our formulaic alphas. Each formulaic alpha has a generated portfolio dataset, containing each day’s asset weights, returns, leverage, pnl and various costs. The pnl on each day reflects profit net of costs (execution at 0.1% notional volume, no fixed, no holding), while capital_ret reflects portfolio returns absent of costs. We have purposely included high turnover portfolios in the benchmark (some with absurb sharpe penalties), to demonstrate the cost optimization down the road for discussions on cost compute. When we arrive at the discussion on multi-period optimization, we will add benchmark alphas with varying turnover to experiment with varying signal decay. The pricing data for market instruments are also included. All files (except the master text file) are Python pickles, which you can directly unpickle as Python objects for tinkering. All strategy datasets are generated from the Russian Doll backtesting engine, for which the code can be found on our posts. We add on to the code for Russian doll regarding multi-strat optimization in coming posts.

Master Subset

404B ∙ CBZ file

Read now

some alpha drop for all readers

Read now

Simulated (log, net of zero cost) returns on representative datasets: (in order, nasdaq_live,nasdaq_delisted, nyse_live and more in the link…)

We will elaborate on this and their performance under cost constraints in the coming posts, and make some illuminating notes on the large impact of costs in high turnover portfolios. We demonstrate how cost optimization is necessary to remain profitable.

We will continue to add on to the benchmark, when experimenting with dimensionality impacts on optimization problems. The files on the link should contain the most updated benchmark problem.

MEGA Link to the Benchmark Dataset and Alphas (paid):

HangukQuant Research

A Box of Alphas to Rock Your Socks; Benchmark Datasets for Portfolio Optimization

This post is for paid subscribers