Ten Reasons Why ML Fails at time series

And Why FMI Succeeds

Time series prediction is important in various fields- weather prediction, traffic control, financial prediction, as well as innumerable areas of physics. Yet it has been observed that the various avatars of AI have mostly underperformed at time series prediction (see de Prado, 2017). In the paper, he explains various roadblocks faced while implemented ML algorithms for time series analysis, especially for financial time series. He proposes many partial solutions, but most of the problems are inherent to ML and difficult to combat. Enter FMI- a technique combining elements of chaos theory and stochastic methods in quantum field theory to be inherently free of these problems. The development of field theory closely mirrored that of modern quantum and statistical field theory. To say that it is one of the most powerful tools to model natural phenomena that are mostly noisy, nonlinear and chaotic, is an understatement. We expound their power vis a vis ML methods that have been shown to be ineffectual in de Prado’s paper. Quoting de Prado, “When misused, ML algorithms will confuse statistical flukes with patterns. This fact, combined with the low signal-to-noise ratio that characterizes finance, all but ensures careless users will produce false discoveries at an ever-greater speed.”

On the other hand, FMI never underfits not overfits to a given dataset, extracting only as much statistical signal as is present. Consequently, it is easily shown that it satisfies the accuracy bound defining the limit to which time series can be predicted.  The following presents the ten pitfalls encountered in the application of ML based models to time series (mirroring de Prado), along with discussions on how FMI deals with them.

  • Pitfall 1: Training Data Needed: ML based techniques are notorious for the sheer amount of training data required in order to make predictions. On the other hand, some signals/correlations captured by FMI stabilize on as few as 20-30 data-points. This is consistent with the mathematical proof outlined in our paper that FMI will capture all possible signals present in a given time series dataset at a given granularity and time horizon, without overfitting to spurious events.    

  • Pitfall 2: Research Through Backtesting Repeatedly backtesting an ML algorithm over the same data points produces ‘pseudo-discoveries’- strategies with unexpectedly high Sharpe ratios. The algorithm should be allowed to access the data points a finite and limit number of times in principle; however in practice even large hedge funds often fail at this. In field theory, this problem does not arise since we are accessing the data points only as many times as to get an $n$ point correlation function which is all the possible information contained within the time series. We neither oversample nor under sample data points.

  • Pitfall 3:Sampling Difference Time bars oversample information during periods of low trading activity and under sample information during periods of high trading activity. This can introduce spurious correlations and non-Gaussianities in data. De Prado and Mandelbrot before him have propounded the use of volume clock or trading time. This is quite naturally accommodated in the context of field theory, where the Lifshitz symmetry of the market is evident precisely when one considers trading time on the $x$-axis as opposed to physical time. In fact changing the time parameterization in field theory is just a coordinate transformation. 

  •  Pitfall 4: Stationarity: The absence of stationarity: It is notoriously difficult to make financial predictions based on non-stationary time series which happen to contain valuable information that is lost when the time series is differentiated to get a stationary distribution. A non-stationary field theory corresponds to a field theory out of equilibrium where a slight modification of standard perturbative and non-perturbative techniques can model the system. The Schwinger Kedlysh formalism precisely addresses this issue  and is widely used; here one need not know the ‘final state’ of the system. The correlations are inferred without any assumption about the future behavior of the time series. 

  • Pitfall 5:Transparency: Since neural networks are essentially black boxes, they enable us to take in a set of inputs and produce outputs without any knowledge of what happens in between. This 'black box' nature of these models makes diagnostics extremely difficult. On the other hand, FMI can be used seamlessly in conjunction with human intuition. Unlike the weights of a neural net, the 'couplings' in a field theory have very clear financial interpretations. Consequently, if a trader has a 'hunch' about the market based on fundamental information, they can test it out by calculating the strength and stability of the corresponding coupling.  

  • Pitfall 6: Learning Side and Size simultaneously: A common mistake in financial ML models is to learn the side and size of a position simultaneously. This introduces unnecessary complexity in the model. The other alternative, which is to model side and size separately, creates dissonance in the model. Field theory has no such problem; since we are summing over an infinite number of possibilities, both side and size are by necessity clumped together with no additional complexity.

  • Pitfall 7: IID: In standard ML, it is assumed that all the samples are IID; or independent and identically distributed random variables. This is if each random variable has the same probability distribution as the previous one, and all of them are mutually independent. This misses all market anomalies arising due to residual memory effects, such as volatility arbitrage. Field theory, on the other hand, can systematically account for non-IID anomalies. An IID model is a rather special case where a series of interesting couplings between the prices and volatilities vanish. 

  • Pitfall 8: Cross Validation Leakage: One reason CV fails in finance is because due to the non-iid nature of the time series, information from the training set shows up in the test set. When a classifier is trained on $(X_{t},Y_{t})$, it is likely to predict $E(Y_{t+1}=Y_{t+1})$ even if $X$ was unimportant ant in the prediction of $Y$. This might lead to false discoveries that spell disaster for long term prediction.  

  • Pitfall 9: Walk Forward Backtesting: Since a single scenario is tested, the path can be easily overfitted. Second WF can be easily biased by the particular sequence of datapoints. Eventually, buy forecasts will prevail over sell forecasts as our strategy will be exploiting a particular sequence. Third, initial decisions are based on a small portion of the entire sample. Field theory easily overcomes these three drawbacks. Instead of testing one scenario, we sum over an infinite number of scenarios. We do not learn a particular historical sequence; instead, we test the statistical properties of the system wherein spurious movements are averaged out in the calculation of the correlation functions. Since we are not ‘training’ on a subset of the data, we use the entire information of the time series in the fields theory. 

  • Pitfall 10:Fluke Sharpes: Suppose we have a sample of iid variables following a Gaussian distribution ($x_{i} $). Now we test $I$ strategies on an inserting that is a martingale, with Sharpe ratios ${y_{i}} has _{i=1,…I}$ such that $E(y_{i})=0, \sigma^{2}(y_{i}>0)$. Clearly the true Sharpe ratio is zero; yet we will expect to find one strategy with a Sharpe ratio:$\\E(max(y_{i}))=E(max(x_{i})) \sigma(y_{i})\\$. In most ML applications, WF will imply most of the decisions are based on a small region of the dataset. That will imply $\sigma(y_{i})?>>0$ and hence that we will artificially get an inflated Sharpe ratio. In field theory there is no such problem as correlations across time scales are measured. 

Other unique advantages of  FMI include:

  • Advantage 1:Symmetries of a time series: Each symmetry or approximate symmetry that a time series possesses drastically reduces the number of degrees of freedom that characterize it. Most real life time series possess various symmetries- time translation invariance, reparameterisation invariance, and various forms of scaling symmetries. The last are especially common in pricing time series. While field theory exploits these symmetries fully, ML does not. As an example time translation symmetry halves the degrees of freedom of the time series. In the most extreme case, scaling symmetry might reduce the number of parameters to be fit to possibly one or two under certain conditions (when $z=1,2$). Using neural networks, one has to deal with an infinite number of degrees of freedom.

  • Advantage 2:Discontinuities: The ability to model discontinuities and phase shifts in markets through the language of field theory phase transitions is unique to field theory. While ML can be used to detect phase changes separately, field theory inherently accounts for the possibility that the order parameter can jump. This may occur frequently especially in pricing time series.

  • Advantage 3: Correlations Across Time Scales: Importantly, field theory provides an ideal toolbox to study correlations between market data across different time scales- not all market animals will be linear in time and field theory will capture those that will be invisible to time domain methods such as ML. 

  • Advantage 4: Stochastic Volatilities:  Field theory is the most rigorous way to model volatilities that are stochastic (cit) or that have multiple timescales in them. While modeling a two-scale volatility’s cumbersome using traditional methods, modeling a volatility distribution with an continuously infinite number of scales is extremely simple using field theory.

Click for link to de Prado's influential paper on the reasons most ML hedge funds fail.

Click for link to de Prado's influential paper on the reasons most ML hedge funds fail.


Files coming soon.