This post is a tutorial for how to build a recurrent neural network using Tensorflow to predict stock market prices. The full working code is available in github.com/lilianweng/stock-rnn.

This is a tutorial for how to build a recurrent neural network using Tensorflow to predict stock market prices. The full working code is available in github.com/lilianweng/stock-rnn. If you don’t know what is recurrent neural network or LSTM cell, feel free to check my previous post.

There are many tutorials on the Internet, like:

Despite all these existing tutorials, I still want to write a new one mainly for three reasons:

1. Early tutorials cannot cope with the new version any more, as Tensorflow is still under development and changes on API interfaces are being made fast.
2. Many tutorials use synthetic data in the examples. Well, I would like to play with the real world data.
3. Some tutorials assume that you have known something about Tensorflow API beforehand, which makes the reading a bit difficult.

After reading a bunch of examples, I would like to suggest taking the official example on Penn Tree Bank (PTB) dataset as your starting point. The PTB example showcases a RNN model in a pretty and modular design pattern, but it might prevent you from easily understanding the model structure. Hence, here I will build up the graph in a very straightforward manner.

Table of Content

## The Goal

I will explain how to build an RNN model with LSTM cells to predict the prices of S&P500 index. The dataset can be downloaded from Yahoo! Finance ^GSPC. In the following example, I used S&P 500 data from Jan 3, 1950 (the maximum date that Yahoo! Finance is able to trace back to) to Jun 23, 2017. The dataset provides several price points per day. For simplicity, we will only use the daily close prices for prediction. Meanwhile, I will demonstrate how to use TensorBoard for easily debugging and model tracking.

As a quick recap: the recurrent neural network (RNN) is a type of artificial neural network with self-loop in its hidden layer(s), which enables RNN to use the previous state of the hidden neuron(s) to learn the current state given the new input. RNN is good at processing sequential data. Long short-term memory (LSTM) cell is a specially designed working unit that helps RNN better memorize the long-term context.

### Data Preparation

The stock prices is a time series of length $N$, defined as $p_0, p_1, \dots, p_{N-1}$ in which $p_i$ is the close price on day $i$, $% $. Imagine that we have a sliding window of a fixed size $w$ (later, we refer to this as input_size) and every time we move the window to the right by size $w$, so that there is no overlap between data in all the sliding windows.

Fig. 1 The S&P 500 prices in time. We use content in one sliding windows to make prediction for the next, while there is no overlap between two consecutive windows.

The RNN model we are about to build has LSTM cells as basic hidden units. We use values from the very beginning in the first sliding window $W_0$ to the window $W_t$ at time $t$:

to predict the prices in the following window $w_{t+1}$:

Essentially we try to learn an approximation function, $f(W_0, W_1, \dots, W_t) \approx W_{t+1}$.

Fig. 2 The unrolled version of RNN.

Considering how back propagation through time (BPTT) works, we usually train RNN in a “unrolled” version so that we don’t have to do propagation computation too far back and save the training complication.

Here is the explanation on input_size from Tensorflow’s tutorial:

By design, the output of a recurrent neural network (RNN) depends on arbitrarily distant inputs. Unfortunately, this makes backpropagation computation difficult. In order to make the learning process tractable, it is common practice to create an “unrolled” version of the network, which contains a fixed number (num_steps) of LSTM inputs and outputs. The model is then trained on this finite approximation of the RNN. This can be implemented by feeding inputs of length num_steps at a time and performing a backward pass after each such input block.

The sequence of prices are first split into non-overlapped small windows. Each contains input_size numbers and each is considered as one independent input element. Then any num_steps consecutive input elements are grouped into one training input, forming an “un-rolled” version of RNN for training on Tensorfow. The corresponding label is the input element right after them.

For instance, if input_size=3 and num_steps=2, my first few training examples would look like:

Here is the key part for formatting the data:

The complete code of data formatting is here.

### Train / Test Split

Since we always want to predict the future, we take the latest 10% of data as the test data.

### Normalization

The S&P 500 index increases in time, bringing about the problem that most values in the test set are out of the scale of the train set and thus the model has to predict some numbers it has never seen before. Sadly and unsurprisingly, it does a tragic job. See Fig. 3.

Fig. 3 A very sad example when the RNN model have to predict numbers out of the scale of the training data.

To solve the out-of-scale issue, I normalize the prices in each sliding window. The task becomes predicting the relative change rates instead of the absolute values. In a normalized sliding window $W'_t$ at time $t$, all the values are divided by the last unknown price—the last price in $W_{t-1}$:

## Model Construction

### Definitions

• lstm_size: number of units in one LSTM layer.
• num_layers: number of stacked LSTM layers.
• keep_prob: percentage of cell units to keep in the dropout operation.
• init_learning_rate: the learning rate to start with.
• learning_rate_decay: decay ratio in later training epochs.
• init_epoch: number of epochs using the constant init_learning_rate.
• max_epoch: total number of epochs in training
• input_size: size of the sliding window / one training data point
• batch_size: number of data points to use in one mini-batch.

The LSTM model has num_layers stacked LSTM layer(s) and each layer contains lstm_size number of LSTM cells. Then a dropout mask with keep probability keep_prob is applied to the output of every LSTM cell. The goal of dropout is to remove the potential strong dependency on one dimension so as to prevent overfitting.

The training requires max_epoch epochs in total; an epoch is a single full pass of all the training data points. In one epoch, the training data points are split into mini-batches of size batch_size. We send one mini-batch to the model for one BPTT learning. The learning rate is set to init_learning_rate during the first init_epoch epochs and then decay by $\times$ learning_rate_decay during every succeeding epoch.

### Define Graph

A tf.Graph is not attached to any real data. It defines the flow of how to process the data and how to run the computation. Later, this graph can be fed with data within a tf.session and at this moment the computation happens for real.

— Let’s start going through some code —

(1) Initialize a new graph first.

(2) How the graph works should be defined within its scope.

(3) Define the data required for computation. Here we need three input variables, all defined as tf.placeholder because we don’t know what they are at the graph construction stage.

• inputs: the training data X, a tensor of shape (# data examples, num_steps, input_size); the number of data examples is unknown, so it is None. In our case, it would be batch_size in training session. Check the input format example if confused.
• targets: the training label y, a tensor of shape (# data examples, input_size).
• learning_rate: a simple float.

(4) This function returns one LSTMCell with or without dropout operation.

(5) Let’s stack the cells into multiple layers if needed. MultiRNNCell helps connect sequentially multiple simple cells to compose one cell.

(6) tf.nn.dynamic_rnn constructs a recurrent neural network specified by cell (RNNCell). It returns a pair of (model outpus, state), where the outputs val is of size (batch_size, num_steps, lstm_size) by default. The state refers to the current state of the LSTM cell, not consumed here.

(7) tf.transpose converts the outputs from the dimension (batch_size, num_steps, lstm_size) to (num_steps, batch_size, lstm_size). Then the last output is picked.

(8) Define weights and biases between the hidden and output layers.

(9) We use mean square error as the loss metric and the Adam algorithm for gradient descent optimization.

### Start Training Session

(1) To start training the graph with real data, we need to start a tf.session first.

(2) Initialize the variables as defined.

(0) The learning rates for training epochs should have been precomputed beforehand. The index refers to the epoch index.

(3) Each loop below completes one epoch training.

(4) Don’t forget to save your trained model at the end.

The complete code is available here.

### Use TensorBoard

Building the graph without visualization is like drawing in the dark, very obscure and error-prone. Tensorboard provides easy visualization of the graph structure and the learning process. Check out this hand-on tutorial, only 20 min, but it is very practical and showcases several live demos.

Brief Summary

• Use with [tf.name_scope](https://www.tensorflow.org/api_docs/python/tf/name_scope)("your_awesome_module_name"): to wrap elements working on the similar goal together.
• Many tf.* methods accepts name= argument. Assigning a customized name can make your life much easier when reading the graph.
• Methods like tf.summary.scalar and tf.summary.histogram help track the values of variables in the graph during iterations.
• In the training session, define a log file using tf.summary.FileWriter.

Later, write the training progress and summary results into the file.

Fig. 4a The RNN graph built by the example code. The “train” module has been “removed from the main graph”, as it is not a real part of the model during the prediction time.

Fig. 4b Click the “output_layer” module to expand it and check the structure in details.

The full working code is available in github.com/lilianweng/stock-rnn.

## Results

In my experiment, I used the following configuration:

### input_size = 1; lstm_size = 32

(Final MSE = 9.1462e-5) The result looks noisy with a tiny input_size.

Fig. 5a Predictoin results for the first and last 200 days in test data. Model is trained with input_size=1 and lstm_size=32.

### input_size = 5; lstm_size = 32

(Final MSE = 2.9298e-4) Overall, pretty good.

Fig. 5b Predictoin results for the first and last 200 days in test data. Model is trained with input_size=5 and lstm_size=32.

### input_size = 5; lstm_size = 128

(Final MSE = 3.9621e-4) More LSTM units makes the prediction less table.

Fig. 5c Predictoin results for the first and last 200 days in test data. Model is trained with input_size=5 and lstm_size=128.

### input_size = 30; lstm_size = 32

(Final MSE = 1.4477e-3) When the input_size is large, the prediction is still in need of improvment, as it cannot capture many extreme points in the series.

Fig. 5d Predictoin results for the first and last 200 days in test data. Model is trained with input_size=30 and lstm_size=32.

## What to Expect in Part 2

In “Predict Stock Prices Using RNN: Part 2” I will demonstrate how to use an RNN model to prediction prices of different stocks. Probably using embeddings, but not fully decided yet ;)