In recent years, machine learning has emerged as a powerful tool to extract insights from massive financial datasets. Academic researchers and practitioners have published numerous studies exploring the applications of machine learning in finance and investing. This article provides an overview of several influential papers and key findings on using machine learning models to improve investment analysis and portfolio decisions. By examining predictive signals hidden in financial data, machine learning techniques can help investors identify mispriced assets, forecast risk and return, and ultimately generate market-beating returns. However, simply throwing algorithms at data often leads to poor out-of-sample performance. The overview highlights best practices around feature engineering, model tuning, and trading implementation that are critical to successfully deploying machine learning in live portfolios.

Forecasting fundamentals boosts performance of factor investing strategies
A 2020 paper published in Machine Learning for Factor Investing examines augmenting a value and quality factor model with machine learning predictions of future fundamentals. The authors first demonstrate via backtests that selecting stocks based on future fundamentals calculated with perfect foresight substantially outperforms standard factor portfolio construction based on current fundamentals. Motivated by this analysis, they train deep neural networks to forecast fundamentals 5 years ahead. Quantitative analysis shows significant improvement in MSE over naive passive benchmarks. Moreover, backtests incorporating the machine learning forecasts into industry-level stock portfolio simulations show 17.1% annualized returns, versus 14.4% for standard factor models.
Deep learning predicts cross-sectional stock returns
A 2017 paper in Expert Systems with Applications tests various deep learning architectures for predicting monthly stock returns in the cross-section of Japanese equities. The results demonstrate that deep neural networks generally outperform shallow networks, with the best networks also beating representative machine learning benchmarks. The findings suggest deep learning holds promise as a mature machine learning method for predicting cross-sectional stock returns.
Machine learning models forecast ETF price movements
A 2016 paper published in the Journal of Finance and Data Science investigates the ability of machine learning algorithms to forecast the directional movement of liquid ETFs representing major asset classes. Using 5 years of daily data from 2011-2016, the supervised learning classifiers available in Python’s Scikit-Learn – including deep neural networks, random forests, and support vector machines – are tested. The research introduces a ‘profit score’ to help compare classifier performance, finding predictability difficult over short horizons of a few days and supporting the random walk hypothesis for prices. It also highlights the importance of cross-sectional and inter-temporal volume for a powerful information set, and shows many features are necessary for predictability since each contributes only a small amount.
Time series models underperform machine learning methods
A 2017 paper published in the Journal of Risk and Financial Management conducts an empirical comparison of the predictive accuracy of traditional time series models like ARIMA versus leading machine learning techniques including regression, neural networks, and autoencoders. Using daily data for three major stock indices – Dow Jones, S&P 500, and Nasdaq – the results reveal machine learning significantly outperforms the traditional models in forecasting financial time series.
LSTM networks show promise for market timing strategies
A 2016 paper in the Journal of Forecasting examines Long Short-Term Memory (LSTM) networks for predicting price movements of S&P 500 constituents from 1992-2015. An LSTM-based day trading strategy with 0.46% daily turnover achieves a 5.8 Sharpe ratio before costs, outperforming memory-free classification baselines including random forests, deep neural networks, and logistic regression. Profitability patterns are revealed, providing insights into the ‘black box’ of neural networks. High volatility and short-term return reversals are identified as common factors among the selections.
Machine learning offers immense potential for gaining an edge in investment analysis and decision making. However research shows performance depends critically on choices around data inputs, modeling approach, and portfolio integration. Thoughtful application built on market intuition is essential.