Why Machine Learning Funds Fail
An interesting insight into problems associated with an attempts to implement machine learning in trading:
Authors: de Prado
Title: The 7 Reasons Most Machine Learning Funds Fail
The rate of failure in quantitative finance is high, and particularly so in financial machine learning. The few managers who succeed amass a large amount of assets, and deliver consistently exceptional performance to their investors. However, that is a rare outcome, for reasons that will become apparent in this presentation. Over the past two decades, I have seen many faces come and go, firms started and shut down. In my experience, there are 7 critical mistakes underlying most of those failures.
Notable quotations from the academic research paper:
• Over the past 20 years, I have seen many new faces arrive to the financial industry, only to leave shortly after.
• The rate of failure is particularly high in machine learning (ML).
• In my experience, the reasons boil down to 7 common errors:
1. The Sisyphus paradigm
2. Integer differentiation
3. Inefficient sampling
4. Wrong labeling
5. Weighting of non-IID samples
6. Cross-validation leakage
7. Backtest overfitting
The complexities involved in developing a true investment strategy are overwhelming. Even if the firm provides you with shared services in those areas, you are like a worker at a BMW factory who has been asked to build the entire car alone, by using all the workshops around you. It takes almost as much effort to produce one true investment strategy as to produce a hundred. Every successful quantitative firm I am aware of applies the meta-strategy paradigm. Your firm must set up a research factory where tasks of the assembly line are clearly divided into subtasks, where quality is independently measured and monitored for each subtask, where the role of each quant is to specialize in a particular subtask, to become the best there is at it, while having a holistic view of the entire process.
In order to perform inferential analyses, researchers need to work with invariant processes, such as returns on prices (or changes in log-prices), changes in yield, changes in volatility. These operations make the series stationary, at the expense of removing all memory from the original series. Memory is the basis for the model’s predictive power. The dilemma is returns are stationary however memory-less; and prices have memory however they are non-stationary.
Information does not arrive to the market at a constant entropy rate. Sampling data in chronological intervals means that the informational content of the individual observations is far from constant. A better approach is to sample observations as a subordinated process of the amount of information exchanged: Trade bars. Volume bars. Dollar bars. Volatility or runs bars. Order imbalance bars. Entropy bars.
Virtually all ML papers in finance label observations using the fixed-time horizon method. There are several reasons to avoid such labeling approach: Time bars do not exhibit good statistical properties and the same threshold is applied regardless of the observed volatility. There are a couple of better alternatives, but even these improvements miss a key flaw of the fixed-time horizon method: the path followed by prices.
Most non-financial ML researchers can assume that observations are drawn from IID processes. For example, you can obtain blood samples from a large number of patients, and measure their cholesterol. Of course, various underlying common factors will shift the mean and standard deviation of the cholesterol distribution, but the samples are still independent: There is one observation per subject. Suppose you take those blood samples, and someone in your laboratory spills blood from each tube to the following 9 tubes to their right. Now you need to determine the features predictive of high cholesterol (diet, exercise, age, etc.), without knowing for sure the cholesterol level of each patient. That is the equivalent challenge that we face in financial ML.
–Labels are decided by outcomes.
–Outcomes are decided over multiple observations.
–Because labels overlap in time, we cannot be certain about what observed features caused an effect.
One reason k-fold CV fails in finance is because observations cannot be assumed to be drawn from an IID process. Leakage takes place when the training set contains information that also appears in the testing set. In the presence of irrelevant features, leakage leads to false discoveries. One way to reduce leakage is to purge from the training set all observations whose labels overlapped in time with those labels included in the testing set. I call this process purging.
Backtest overfitting due to data dredging. Solution – use The Deflated Sharpe Ratio – it computes the probability that the Sharpe Ratio (SR) is statistically significant, after controlling for the inflationary effect of multiple trials, data dredging, non-normal returns and shorter sample lengths.
Are you looking for more strategies to read about? Check http://quantpedia.com/Screener
Do you want to see performance of trading systems we described? Check http://quantpedia.com/Chart/Performance
Do you want to know more about us? Check http://quantpedia.com/Home/About
Share onLinkedInTwitterFacebookRefer to a friend