Tensorflow实战(1): 实现深层循环神经网络

由ypyu创建，最终由ypyu更新于2023-06-14 03:02 被浏览 41 用户

循环神经网络能够挖掘数据中的时序信息，并且具有语义信息的深度表达能力，在语音识别、语言模型、机器翻译以及时序分析等方面得到了广泛应用。

在之前的文章中，我们介绍了RNN的很多内容，包括：

循环神经网络RNN介绍介绍了RNN
Autoencoder及tensorflow实现介绍了autoencoder和tensorflow实现
前馈神经网络与符号系统介绍了神经网络的基本架构以及符号系统
back propagation algorithm推导介绍了反向传播算法，包括四个基本方程、推导以及如何进行反向传播来进行训练
RNN Part 3-Back Propagation Through Time and Vanishing Gradients(BPTT算法和梯度消失) 介绍了BPTT算法，并介绍了Vanishing Gradient Problem
Understanding LSTM Networks 介绍了
LSTM的核心概念
Step-by-Step LSTM Walk Through
Forget Gate Layer
Input Gate Layer
Output Gate Layer
RNN Part 4-LSTM 介绍了LSTM的网络结构
RNN part 5-GRU(Gated Recurrent Unit) 介绍了GRU的网络结构
RNN part 6-Bidirectional RNN 介绍了双向RNN，包括：
双向RNN的作用：解决当前节点不能获取将来上下文的问题
Bi-Directional RNN 的网络结构
前向、后向传播方式

本文我们介绍如何采用tensorflow来实现LSTM结构的循环神经网络，并完成一个序列预测的例子。

单层LSTM结构实现

Tensorflow中实现了以下模块 :tf.nn.rnn_cell，包括了10个类：

[class BasicLSTMCell](https://link.zhihu.com/?target=https%3A//tensorflow.google.cn/api_docs/python/tf/contrib/rnn/BasicLSTMCell): Basic LSTM recurrent network cell.
[class BasicRNNCell](https://link.zhihu.com/?target=https%3A//tensorflow.google.cn/api_docs/python/tf/contrib/rnn/BasicRNNCell): The most basic RNN cell.
[class DeviceWrapper](https://link.zhihu.com/?target=https%3A//tensorflow.google.cn/api_docs/python/tf/contrib/rnn/DeviceWrapper): Operator that ensures an RNNCell runs on a particular device.
[class DropoutWrapper](https://link.zhihu.com/?target=https%3A//tensorflow.google.cn/api_docs/python/tf/contrib/rnn/DropoutWrapper): Operator adding dropout to inputs and outputs of the given cell.
[class GRUCell](https://link.zhihu.com/?target=https%3A//tensorflow.google.cn/api_docs/python/tf/contrib/rnn/GRUCell): Gated Recurrent Unit cell (cf. http://arxiv.org/abs/1406.1078).
[class LSTMCell](https://link.zhihu.com/?target=https%3A//tensorflow.google.cn/api_docs/python/tf/contrib/rnn/LSTMCell): Long short-term memory unit (LSTM) recurrent network cell.
[class LSTMStateTuple](https://link.zhihu.com/?target=https%3A//tensorflow.google.cn/api_docs/python/tf/contrib/rnn/LSTMStateTuple): Tuple used by LSTM Cells for state_size, zero_state, and output state.
[class MultiRNNCell](https://link.zhihu.com/?target=https%3A//tensorflow.google.cn/api_docs/python/tf/contrib/rnn/MultiRNNCell): RNN cell composed sequentially of multiple simple cells.
[class RNNCell](https://link.zhihu.com/?target=https%3A//tensorflow.google.cn/api_docs/python/tf/contrib/rnn/RNNCell): Abstract object representing an RNN cell.
[class ResidualWrapper](https://link.zhihu.com/?target=https%3A//tensorflow.google.cn/api_docs/python/tf/contrib/rnn/ResidualWrapper): RNNCell wrapper that ensures cell inputs are added to the outputs.

在基本的LSTM cell中我们用第一个类来进行实现，他是tf.contrib.rnn.BasicLSTMCell 同名类，定义在[tensorflow/python/ops/rnn_cell_impl.py](https://link.zhihu.com/?target=https%3A//www.github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/ops/rnn_cell_impl.py)中

__init__(
    num_units,
    forget_bias=1.0,
    state_is_tuple=True,
    activation=None,
    reuse=None,
    name=None
)

其中参数表示

num_units表示神经元的个数
forget_bias就是LSTM们的忘记系数，如果等于1，就是不会忘记任何信息。如果等于0，就都忘记
state_is_tuple默认就是True，表示返回的状态是一个2-tuple_ (_c__state, m_state)
activation 表示内部状态的激活函数，默认是 tanh
name 表示这一层的名字，同样名字的层会共享权重，如果为了避免这样的情况需要设置reuse=True

采用BasicLSTMCell来声明LSTM结构如下所示，我们用伪代码和注释来进行说明。

# 定义LSTM结构，在Tensorflow中通过一句简单的命令就可以实现一个完整的LSTM结构
lstm = tf.nn.rnn_cell.BasicLSTMCell(hidden_size) # hidden_size表示LSTM cell中单元数量

# 将LSTM中的状态初始化为全0数组，BasicLSTMCell提供了zero_state函数来生成全零的初始状态
state = lstm.zero_state(batch_size, tf.float32)  

# 定义损失
loss = 0

for i in range(num_steps):
    if i > 0: tf.get_variable_scope().reuse_variables()
    
    # 每一步处理时间序列中的一个时刻，将当前输入和前一时刻状态state传入定义的LSTM结构即可得到当前LSTM的输出(h_t)和更新后的状态state(h_t和c_t), lstm_output 用于输出给其他层，state用于输出给下一时刻
    lstm_output, state = lstm(current_input, state)
    # 将当前时刻LSTM结构的输出传入一个全联接层的到最后输出
    final_output = fully_connected(lstm_output)
    # 计算当前时刻的输出损失
    loss += calc_loss(final_output, expected_output)

    # 进行优化
    .......

深层RNN实现

Deep RNN是RNN的一个变种，为了增强模型的表达能力，可以在网络中设置多个循环层，将每层循环网络的输出传给下一层进行处理，每一层的循环体中参数是一致的，而不同层中的参数可以不同，Tensorflow提供了tf.contrib.rnn.MultiRNNCell 这个类来实现Deep RNN的前向传播过程。

图1 Deep RNN示意图

MultiRNNCell的初始化方法如下

__init__(
    cells,
    state_is_tuple=True
)

其中

cells 表示RNNCells的list，按照顺序从输入到输出来表示不同层的循环层
state_is_tuple表示接受和返回的状态都是n-tuples, 其中n = len(cells)，建议采用True

同样MultiRNNCell提供了状态初始化的函数

zero_state(
    batch_size,
    dtype
)

我们接下来用伪代码和注释来说明Deep RNN如何实现

# 定义一个基本的LSTM结构作为循环体的基础结构，当然也支持使用其他的循环体结构
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell
# 通过MultiRNNCells类来实现Deep RNN，其中number_of_layers表示有多少层，lstm_size表示每层的单元数量
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm_cell(lstm_size) for _ in range(number_of_layers)])
# 初始化并获取初始状态
state = stacked_lstm.zeros_state(batch_size, tf.float32)

foor i in range(len(num_steps)):
    if i > 0:
        tf.get_variable_scope().reuse_variables()
    # 根据当前输入current_input(x_t) 和前一阶段状态state(h_(t-1), s_(t-1)) 来前向计算得到当前状态state(h_t, s_t) 和输出stacked_lstm_output (h_t)
    stacked_lstm_output, state = stacked_lstm(current_input, state)
    # 输出喂给全联接层
    final_output = fully_connected(stacked_lstm_output)
    # 计算损失
    loss += calc_loss(final_output, expected_output)
    # 进行优化
    .......

采用Tensorflow实现sin(x)的预测

我们希望在 $y=sin(x)$ 的曲线上取一小段数据 $left{ {(x_i,y_i)},i=n,..,m+n right}$ ，然后根据这m个数据来预测 $y_{n+m+1}$ 的值，这是一个典型的序列数据预测问题。

图2 sin(x)曲线

我们通过代码和注释来说明该过程

# 1. 定义超参数
hidden_size = 30 # lstm中隐藏节点的个数
num_layers = 2 # 两层LSTM

timesteps = 10 # 循环生网络训练序列长度
training_steps = 1000 # 训练轮数
batch_size = 32 # batch大小

training_examples = 10000 # 训练数据个数
testing_examples = 1000 #测试数据个数
sample_gap = 0.01 # 采样间隔

# 2. 生成数据
def generate_data(seq):
    X = []
    Y = []
    for i in range(len(seq) - timesteps):
        X.append([seq[i : i+ timesteps]])
        Y.append([seq[i + timesteps]])
    return np.array(X, dtype = np.float32), np.array(Y, dtype = np.float32)

test_start = (training_examples + timesteps) * sample_gap
test_end = (testing_examples + timesteps) * sample_gap + test_start
# 生成训练数据
train_X, train_Y = generate_data(np.sin(np.linspace(0, test_start, training_examples + timesteps, dtype = np.float32)))
#生成测试数据
test_X, test_Y = generate_data(np.sin(np.linspace(test_start, test_end, testing_examples + timesteps,  dtype = np.float32)))

# 3。构建深层lstm模型
def lstm_model(X, Y, is_training):
    '''
    LSTM模型
    X: feature
    Y: 标签
    is_training: 是否进行训练，True表示训练，False表示只进行预测
    '''
    # 多层LSTM
    cell = tf.nn.rnn_cell.MultiRNNCell([tf.nn.rnn_cell.BasicLSTMCell(hidden_size) for _ in range(num_layers)])
    outputs, _ = tf.nn.dynamic_rnn(cell, X, dtype = tf.float32)
    
    output = outputs[:,-1,:]
    predictions = tf.contrib.layers.fully_connected(output, 1, activation_fn = None)
    
    # 如果在预测，那么就直接返回预测结果
    if not is_training:
        return predictions, None, None
    
    # 计算损失
    loss = tf.losses.mean_squared_error(labels = Y, predictions = predictions)
    # 创建模型优化器
    train_op = tf.contrib.layers.optimize_loss(loss, tf.train.get_global_step(), 
                                               optimizer = 'Adagrad', learning_rate=0.1)
    return predictions, loss, train_op

# 4. 训练的函数
def train(sess, train_X, train_Y):
    '''
    训练过程
    '''
    ds = tf.data.Dataset.from_tensor_slices((train_X, train_Y))
    ds = ds.repeat().shuffle(1000).batch(batch_size)
    x, y = ds.make_one_shot_iterator().get_next()
    
    #调用模型，得到预测结果、损失函数、训练操作
    with tf.variable_scope('model'):
        predictions, loss, train_op = lstm_model(train_X, train_Y, True)
    sess.run(tf.global_variables_initializer())
    for i in range(training_steps):
        _, l = sess.run([train_op, loss])
        if i % 100 == 0:
            print('train step: ' + str(i) + ', loss: ' + str(l))

# 5. 用得到的模型进行测试的函数
def run_eval(sess, test_X, test_Y):
    '''
    用得到的模型进行测试
    '''
    ds = tf.data.Dataset.from_tensor_slices((test_X, test_Y))
    ds = ds.batch(1)
    X, y = ds.make_one_shot_iterator().get_next()
    
    #调用模型得到计算结果
    with tf.variable_scope('model', reuse = True):
        prediction, _, __ = lstm_model(X, [0.0], False)
    
    predictions = []
    labels = []
    for i in range(testing_examples):
        p, l = sess.run([prediction, y])
        predictions.append(p)
        labels.append(l)
    
    # 计算rmse作为评价指标
    predictions = np.array(predictions).squeeze()
    labels = np.array(labels).squeeze()
    rmse = np.sqrt(((predictions - labels) **2).mean(axis=0))
    print ('Mean Square Error is: %f' % rmse)
    
    plt.figure()
    plt.plot(predictions, label = 'predictions')
    plt.plot(labels, label='real')
    plt.legend()
    plt.show()

# 6. 在一个session内进行训练和测试
with tf.Session() as sess:
    train(sess, train_X, train_Y)
    run_eval(sess, test_X, test_Y)

得到结果为：

train step: 0, loss: 0.4845183
train step: 100, loss: 0.0055590835
train step: 200, loss: 0.003970341
train step: 300, loss: 0.003067538
train step: 400, loss: 0.0025000838
train step: 500, loss: 0.0021075562
train step: 600, loss: 0.0018126869
train step: 700, loss: 0.0015769802
train step: 800, loss: 0.0013801942
train step: 900, loss: 0.0012110542
Mean Square Error is: 0.033419

得到曲线为