Lstm nerual network is one kind of recurrent nerual network, and usually used to predict sequences such as language. So we use it to predict some functions such as: sin(x) * k + b, and discuss the factors that influence the accuraccy of lstm.

parameters turning

base model

We firstly generate datasets which x = np.linspace(0, 20, 100), and use the first 30 values to predict the 31th one.

The toy dataset is really simple:

def f(x_array):
    return list(np.sin(np.array(x_array)) * 5 + 10)
self.x = list(np.linspace(0, 20, 100))
self.y = list(f(self.x))

And than we select some parameters to make it learning from the train datasets:

config = {
    "forget_bias": 1.0,
    "num_units": 128,
    "layers": 12,
    "learning_rate": 0.1,
    "epoch": 300,
    "batch_size": 32,
    "seq_len": 30,
    "keep_prob": 0.8
}

But upseting, it’s about too many layers and large learning_rate making it learning nothing. The prediction result is:

So we change the parameters above, so base model’s parameters are:

config = {
    "forget_bias": 1.0,
    "num_units": 128,
    "layers": 2,
    "learning_rate": 0.01,
    "epoch": 300,
    "batch_size": 32,
    "seq_len": 300,
    "keep_prob": 0.8
}

And the base result is:

It seems better, but still don’t achieve our goal. we will improve it by kinds of ways in following.

learning_rate

We change learning_rate = 0.001, and result is:

We also try learning_rate = 0.0001, and result is:

At last, we try learning_rate = 0.00001, the result is:

So the best parameters until now seems like:

config = {
    "forget_bias": 1.0,
    "num_units": 128,
    "layers": 2,
    "learning_rate": 0.001,
    "epoch": 300,
    "batch_size": 32,
    "seq_len": 30,
    "keep_prob": 0.8
}

forget bias

The forget bias more bigger, the model will remember more about last several information pieces. We try to set it to forget_bias = 0.5, the result is:

Than we also try forget_bias = 0.1

They are all not better than forget_bias = 1.0, so the best parameters still seems like:

config = {
    "forget_bias": 1.0,
    "num_units": 128,
    "layers": 2,
    "learning_rate": 0.001,
    "epoch": 200,
    "batch_size": 32,
    "seq_len": 30,
    "keep_prob": 0.8
}

layers

Now our layer is 2, but if we want improve the capacity of the model, we must add more layers. We try to set layers = 1, the result is:

Than we try layers = 3, the result is:

Ok, how about layers = 5?

Of course, we also try more layers such as layers = 7:

So the best parameters seems is:

config = {
    "forget_bias": 1.0,
    "num_units": 128,
    "layers": 5,
    "learning_rate": 0.001,
    "epoch": 300,
    "batch_size": 32,
    "seq_len": 30,
    "keep_prob": 0.8
}

augment dataset

Until now, we just use toy dataset to train our model. But how it will be when we use more datas to train it ?

At first, we simply add more dataset to it:

def f(x_array):
    return list(np.sin(np.array(x_array)) * 5 + 10)
self.x = list(np.linspace(0, 100, 1000))
self.y = list(f(self.x))

The result seems good, but we all know that sin(x) * k + b is periodic, so we can just use part of the data to abtain good performance:

We try to imporove the std of the dataset:

def f(x_array):
    return list(np.sin(np.array(x_array)) * 5 + 10. * np.array([np.random.random() for _ in range(len(x_array))]))
self.x = list(np.linspace(0, 100, 1000))
self.y = list(f(self.x))

The result is:

But how about other funciont? such as sin(x) * x + b * random():

def f(x_array):
    return list(np.sin(np.array(x_array)) * np.array(x_array) + 10. * np.array([np.random.random() for _ in range(len(x_array))]))
self.x = list(np.linspace(0, 100, 1000))
self.y = list(f(self.x))

Let’s challenge more difficult:

def f(x_array):
    return list(np.sin(np.array(x_array)) * np.random.random() + 10. * np.array([np.random.random() for _ in range(len(x_array))]))
self.x = list(np.linspace(0, 50, 1000))
self.y = list(f(self.x))

The result seems not good as we expected:

At last, we try parameters:

config = {
    "forget_bias": 1.0,
    "num_units": 128,
    "layers": 3,
    "learning_rate": 0.0001,
    "epoch": 500,
    "batch_size": 32,
    "seq_len": 30,
    "keep_prob": 1.0
}

And get a result:

souce code

import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import json
import os

class DatasetUtil(object):

    def __init__(self, config):
        self.config = config

        def f(x_array):
            return list(np.sin(np.array(x_array)) * np.random.random() + 10. * np.array([np.random.random() for _ in range(len(x_array))]))
        self.x = list(np.linspace(0, 50, 1000))
        self.y = list(f(self.x))
        self.pred_time = 0

    def train_sample(self):
        train_data = []
        train_label = []
        for _ in range(self.config['batch_size']):
            start_id = int(np.random.random() * (len(self.y) - self.config['seq_len']))
            train_data.append(self.y[start_id: start_id + self.config['seq_len']])
            train_label.append(self.y[start_id + self.config['seq_len']])
        return train_data, train_label

    def predict_dataset(self):
        return [self.y[-self.config['seq_len']:] for _ in range(self.config['batch_size'])]

    def append_pred(self, idx_pred):
        self.pred_time += 1
        self.y.append(idx_pred)
        sub = self.x[1] - self.x[0]
        self.x.append(self.x[-1] + sub)

    def plot(self):
        fig, ax = plt.subplots()
        ax.plot(self.x[:-self.pred_time], self.y[:-self.pred_time], 'k-')
        ax.plot(self.x[-self.pred_time:], self.y[-self.pred_time:], 'ro')
        plt.show()
        # plt.savefig('example.png')


class LSTM(object):

    def __init__(self, config):
        self.config = config
        self.datasetutil = DatasetUtil(config)
        self.predict_result = []

    def build(self):
        print('############building lstm network############')
        self.inputs = tf.placeholder(tf.float32, [self.config['batch_size'], self.config['seq_len']])
        self.labels = tf.placeholder(tf.float32, [self.config['batch_size']])
        self.keep_prob = tf.placeholder(tf.float32)
        inputs = tf.expand_dims(self.inputs, -1)
        labels = tf.expand_dims(self.labels, -1)

        def lstm_cell():
            lstmcell = tf.contrib.rnn.BasicLSTMCell(self.config['num_units'], forget_bias=self.config['forget_bias'])
            lstmcell = tf.contrib.rnn.DropoutWrapper(lstmcell, input_keep_prob=self.keep_prob, output_keep_prob=self.keep_prob)
            return lstmcell
        mlstm_cell = tf.contrib.rnn.MultiRNNCell([lstm_cell() for _ in range(self.config['layers'])], state_is_tuple=True)
        init_state = mlstm_cell.zero_state(self.config['batch_size'], dtype=tf.float32)
        outputs, state = tf.nn.dynamic_rnn(mlstm_cell, inputs=inputs, initial_state=init_state)
        lstm_output = outputs[:, -1, :]

        W = tf.Variable(tf.random_normal([self.config['num_units'], 1], stddev=0.35))
        b = tf.Variable(tf.zeros([1]))
        y = tf.matmul(lstm_output, W) + b

        self.result = y
        self.loss = tf.reduce_sum((self.result - labels) ** 2) / self.config['batch_size']
        # self.loss = tf.reduce_sum(tf.abs(self.result - labels) / labels) / self.config['batch_size']
        self.train_op = tf.train.AdamOptimizer(self.config['learning_rate']).minimize(self.loss)

    def fit(self):
        print('############training lstm network############')
        # saver = tf.train.Saver()
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            for epoch in range(self.config['epoch']):
                inputs, labels = self.datasetutil.train_sample()
                result, loss, _ = sess.run([self.result, self.loss, self.train_op], feed_dict={
                    self.inputs: inputs, self.labels: labels, self.keep_prob: self.config['keep_prob']})
                if (epoch + 1) % 50 == 0:
                    print('epch={}, loss={}'.format(epoch + 1, loss))
                    # print('inputs=', inputs)
                    print('result=', result)
                    print('labels=', labels)
            for _ in range(30):
                inputs = self.datasetutil.predict_dataset()
                # print('predict inputs=', inputs)
                [result] = sess.run([self.result], feed_dict={self.inputs: inputs, self.keep_prob: 1.0})
                self.datasetutil.append_pred(result[0][0])
            self.datasetutil.plot()


if __name__ == '__main__':
    config = {
        "forget_bias": 1.0,
        "num_units": 128,
        "layers": 1,
        "learning_rate": 0.001,
        "epoch": 500,
        "batch_size": 32,
        "seq_len": 30,
        "keep_prob": 1.0
    }

    lstm = LSTM(config)
    lstm.build()
    lstm.fit()

enjoy :)

reference

  1. https://cs231n.github.io/neural-networks-3/#baby

  2. https://deeplearning4j.org/lstm.html

  3. https://www.tensorflow.org/