HangukQuant Research

HangukQuant Research

Share this post

HangukQuant Research
HangukQuant Research
A Box of Alphas to Rock Your Socks; Benchmark Datasets for Portfolio Optimization
Copy link
Facebook
Email
Notes
More

A Box of Alphas to Rock Your Socks; Benchmark Datasets for Portfolio Optimization

HangukQuant's avatar
HangukQuant
Apr 23, 2023
∙ Paid
6

Share this post

HangukQuant Research
HangukQuant Research
A Box of Alphas to Rock Your Socks; Benchmark Datasets for Portfolio Optimization
Copy link
Facebook
Email
Notes
More
19
Share

In the previous post, we talked about the ledoit-wolf constant correlation shrinkage for better estimation oos in covariance matrices for portfolio optimization.

HangukQuant’s Newsletter
Ledoit-Wolf Constant Correlation Shrinkage (Python)
Read more
2 years ago · 4 likes · HangukQuant

As we enhance the discussion in portfolio optimization methods, we want to get our hands dirty with real market data and implement these optimization methods. However, instead of working on stock/instrument data, we want to work on optimizing a portfolio of active alphas on instruments instead, which exhibit different correlation dynamics compared to the underlying. In order to advance our discussion on this topic, we find it convenient to release benchmark datasets of asset returns generated on formulaic alphas (some previously discussed on HangukQuant), available for our readers to download and play around with to follow along or implement your own optimization model.

The benchmark datasets contain the master.txt file, which will be our formulaic alphas. Each formulaic alpha has a generated portfolio dataset, containing each day’s asset weights, returns, leverage, pnl and various costs. The pnl on each day reflects profit net of costs (execution at 0.1% notional volume, no fixed, no holding), while capital_ret reflects portfolio returns absent of costs. We have purposely included high turnover portfolios in the benchmark (some with absurb sharpe penalties), to demonstrate the cost optimization down the road for discussions on cost compute. When we arrive at the discussion on multi-period optimization, we will add benchmark alphas with varying turnover to experiment with varying signal decay. The pricing data for market instruments are also included. All files (except the master text file) are Python pickles, which you can directly unpickle as Python objects for tinkering. All strategy datasets are generated from the Russian Doll backtesting engine, for which the code can be found on our posts. We add on to the code for Russian doll regarding multi-strat optimization in coming posts.

Master Subset
404B ∙ CBZ file
Read now
some alpha drop for all readers
Read now

Simulated (log, net of zero cost) returns on representative datasets: (in order, nasdaq_live,nasdaq_delisted, nyse_live and more in the link…)

We will elaborate on this and their performance under cost constraints in the coming posts, and make some illuminating notes on the large impact of costs in high turnover portfolios. We demonstrate how cost optimization is necessary to remain profitable.

We will continue to add on to the benchmark, when experimenting with dimensionality impacts on optimization problems. The files on the link should contain the most updated benchmark problem.

MEGA Link to the Benchmark Dataset and Alphas (paid):

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 QUANTA GLOBAL PTE. LTD. 202328387H.
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More