With the rapid development of artificial intelligence technology, more and more researchers and practitioners have begun to explore the application of cutting-edge algorithms in quantitative finance. Especially in the field of investment portfolio optimization, deep reinforcement learning has shown great potential and promising results. The open source framework FinRL provides a perfect tool for us to get hands-on experience in this exciting domain. In this article, we will demonstrate step-by-step how to leverage FinRL to train intelligent agents for automated portfolio weighting and rebalancing. To make the reinforcement learning process smooth, we generate simulated stock market data to approximate real-world complexity and randomness. We also compare different reinforcement learning algorithms like value estimation and policy gradient in the context of portfolio optimization. Problems regarding action sampling methods are discussed and solutions are proposed as well. By walking through this comprehensive tutorial, you will learn both the theory and engineering for applying deep reinforcement learning in quantitative finance investment.

Generating Simulated Stock Market Data to Train Reinforcement Learning Agent
In order to quickly test algorithms in investment portfolio optimization scenarios, we need a reliable source of stock price data. However, high-quality financial data usually requires payment which is not ideal for education and research purposes. Therefore, the author creates a function to automatically generate simulated daily prices for arbitrary number of stocks over customizable time range. The simulation is designed to mimic certain properties of real stock markets, e.g., correlated randomness, trends and reversion etc. By injecting different levels and types of noise, we can control the complexity of the simulation data from very simple to near-reality. This flexibility allows us to incrementally ratchet up environment difficulty so that the agent can learn from easy to hard. The ability to generate stock data on the fly also enables faster iteration of algorithms.
Setting Up Environment for Deep Reinforcement Learning in Quantitative Finance
The environment defines the universe where reinforcement learning agent resides and interacts with. To construct a financial trading environment for our portfolio managing agent, we need to specify the actions, states, rewards and transitions. At each timestep, the agent observes current portfolio allocations across stocks and cash, as well as recent price history. It then decides and submits target portfolio weights for the next timestep. Receiving agent’s action, the environment moves forward by adjusting holdings accordingly, charges transaction fees, calculates investment performance etc. The portfolio adjustment mechanism is designed to mimic real-world trading constraints like integer share quantity, market liquidity slippage and leftover cash handling. By encapsulating all these complications behind the scene, the environment allows agent to focus learning on portfolio weighting only.
Comparing Deep Reinforcement Learning Methods for Portfolio Optimization
There are two major branches of algorithms in deep reinforcement learning: value-based and policy-based methods. Value function tries to estimate expected long term rewards for state-action pairs, while policy gradient techniques directly adjust action probability distribution to maximize rewards. In theory, the value estimation approach requires a properly designed reward signal, which is very challenging in portfolio optimization problems. Even risk-adjusted returns are too noisy to serve as a good reward function. Empirically, we also find actor-critic algorithms work poorly with increasing market randomness. In contrast, pure policy gradient methods demonstrate more robust performance after hyperparameter tuning. This confirms our suspicion that value function is not suitable for portfolio management under common assumptions. Nonetheless, techniques like quantile regression Q-learning shows promise in this domain according to recent research.
Leveraging Dirichlet Distribution for Investment Portfolio Action Sampling
A key question in policy gradient algorithms is how to efficiently sample actions from a continuous multidimensional space. Standard practice suggests adding Gaussian noise to previous actions. However, this poses difficulties when applied to portfolio weights that require all values to be bounded between 0 and 1. The Dirichlet distribution provides an excellent alternative for sampling concentration parameters while automatically satisfying unity constraint. Compared to manually projecting randomly perturbed weights onto simplex via softmax, the Dirichlet approach also demonstrates far better training stability and final performance. Intuitively this is because the randomness is structured according to the intrinsic properties of weight parameters. By respecting the mathematical characteristics of portfolio allocation percentages, Dirichlet sampler aligns much better with optimization objectives.
In this article, we utilize the open-source deep reinforcement learning framework FinRL to tackle quantitative finance problems in investment portfolio optimization. Through a comprehensive tutorial process, we discuss techniques like simulating stock price data, setting up financial trading environments, comparing policy-based and value-based methods, and leveraging Dirichlet distribution for action exploration. The covered engineering practices and solutions can serve as references for both students and professionals who are interested in applying cutting edge AI algorithms in finance industry.