Taxable Stock Trading with Deep Reinforcement Learning

Can machines beat humans in stock market trading? 

Machine Learning has been an area of great interest for traders because of the insane amount of money that could be made from the stock market. Arguably, Deep Reinforcement Learning can outperform human traders. However, most of the Deep Reinforcement Learning models built to date only consider the stock’s selling price minus the buying price to maximize profit. In a real-world scenario, transaction costs for buying, selling, and taxes affect the actual profit made by the investor. 

Shan Huang has discussed this approach in his research paper titled “Taxable Stock Trading with Deep Reinforcement Learning,” which forms the basis of the following text.

Importance of this research 

The researchers have demonstrated that tax ignorance could induce more than a 62% loss on the average portfolio returns. This research paper by Shan Huang aims to maximize net portfolio returns using Deep Reinforcement Learning while considering transaction and tax calculations. 

About the research 

The research paper mentions the mathematical details of the model built for Deep Reinforcement Learning. 


  • The data set includes SPY’s (Exchange Traded Fund consisting of S&P’s top 500 companies) daily closed price and volumes from 13th November 2008 to 13th November 2018.
  • Minimum time step dt = 1 representing one trading day, and hence the total trading days per year is 252.
  • The tax regime is as mentioned below (in the US)
    • When a stock is held for more than one year, a 15% tax rate is applicable for capital gains.
    • When a stock is held for less than one year, a 25% tax rate is applicable for capital gains.
    • Investors can get a tax rebate to offset their capital gain for losses.
  • The length of the trading period is set to be five years.
  • A transaction cost of 0.1% is also included for more realistic calculations.

Research Data

The researchers have created a new OpenAI Gym environment where the observation in each timestep is SPY’s daily closed price, trading volume, averaged-basis, and average holding period. 

About the Model

In the words of the researchers

To represent the policy, we use the same default neural network architecture as PPO with fixed-length trajectory segments, which was a fully-connected MLP with two hidden layers of 64 and 64 tanh units respectively. The final output layer has a linear activation. policy and value function are estimated through separated network. The number of steps of interaction (state-action pairs) for the agent and the environment in each epoch is 5000 and the number of epochs is 50. The hyperparameter for clipping in the policy objective is chosen to be 0.2 and the GAE-Lambda is 0.97. The learning rate for policy and value function optimizer is 0.001 and 0.0003 respectively. If tax is not included in the model, the average expected return is 0.44 which seems quite promising. This considerable return is the result of exploiting price trending and frequently adjusting holding positions correspondingly, similar as the results of other AI platforms. However, this is not compelling since tax is heavily charged in a taxable year. Rather than ignoring taxes, the learning of stock trading should consider the effect of tax costs. We use PPO to train the stock trading policy in the environment with tax costs. The optimal stock trading policy in the model with taxes can achieve 0.13 average returns. To illustrate the suboptimality of the policy trained in the model without considering taxes, we apply this trained policy in the environment with tax costs, the average expected return drops to only 0.05.

Research Result

  • The optimal stock trading policy with taxes can achieve 13% average returns in the model.
  • Trained policy in the environment with tax costs, the average expected return drops to only 5%.
  • This reduction is equal to 62%, which means that the returns shrunk by 62% upon consideration of taxes. 



Investment in public markets could be a good way to invest in the world’s top companies. Public markets could be a fantastic way to invest the money as the return are, on average higher than the rates offered by the banks and other investment options. Trading stocks by reinforcement learning can guide and help agents increase their portfolio returns. Deep Reinforcement Learning Models built for stock market investment sometimes neglect the tax rates, which can massively impact the overall returns. This research paper by Shan Huang is an attempt to integrate tax calculations in the net return calculations with Deep Reinforcement Learning. The objective is to maximize net returns (after subtracting taxes) for stock investments which are the actual returns for an investor.

Source: Shan Huang’s “Taxable Stock Trading with Deep Reinforcement Learning 

Subscribe to our mailing list and get latest updates
directly in your email inbox for free.


Leave a Reply