问答交流

获取keras模型的weights和bias

由small_q创建,最终由small_q 被浏览 4 用户

问题

获取keras模型的weights和bias

解答

以帖子“LSTM Networks应用于股票市场之Functional Model”举例,它的模型是

lstm_input = Input(shape=(30, 6), name='lstm_input')
lstm_output = LSTM(128, activation=activation, dropout_W=0.2, dropout_U=0.1)(lstm_input)
aux_input = Input(shape=(1,), name='aux_input')
merged_data = merge([lstm_output, aux_input], mode='concat', concat_axis=-1)
dense_output_1 = Dense(64, activation='linear')(merged_data)
dense_output_2 = Dense(16, activation='linear')(dense_output_1)
predictions = Dense(1, activation=activation)(dense_output_2)
model = Model(input=[lstm_input, aux_input], output=predictions)
model.compile(optimizer='adam', loss='mse', metrics=['mse'])

训练部分

model.fit([train_x, train_aux], train_y, batch_size=conf.batch, nb_epoch=10, verbose=2)

运行到这,我们就可以看模型的权重了。keras提供了几个函数用来获取权重

# 看每一层layer有多少个被训练的weights和bias(kernel指weights)
model.trainable_weights

# 返回值

[<tf.Variable 'lstm_1/kernel:0' shape=(6, 512) dtype=float32_ref>,
<tf.Variable 'lstm_1/recurrent_kernel:0' shape=(128, 512) dtype=float32_ref>,
<tf.Variable 'lstm_1/bias:0' shape=(512,) dtype=float32_ref>,
<tf.Variable 'dense_1/kernel:0' shape=(129, 64) dtype=float32_ref>,
<tf.Variable 'dense_1/bias:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'dense_2/kernel:0' shape=(64, 16) dtype=float32_ref>,
<tf.Variable 'dense_2/bias:0' shape=(16,) dtype=float32_ref>,
<tf.Variable 'dense_3/kernel:0' shape=(16, 1) dtype=float32_ref>,
<tf.Variable 'dense_3/bias:0' shape=(1,) dtype=float32_ref>]

可以看到,对于这个模型来说,可调参数(包括weights和bias)存在于9个位置

# 直接调出所有weights和bias
model.get_weights()

返回值为一个list,包含九个ndarray的数组,与通过model.trainable_weights调出来的9个位置对应 所以如果想看某一个地方的weights/bias,例如dense_1的bias,那么通过下述方式就可以获取了。

model.get_weights()[4]
# 返回值
array([-0.00225463,  0.01296113,  0.00273713, -0.0065364 ,  0.01660943,
0.00623776, -0.00092952, -0.00890288,  0.00431062,  0.0162892 ,
-0.00205688, -0.00469067,  0.00429582, -0.00396401,  0.00565233,
-0.00254946,  0.02485307,  0.00086826,  0.0006156 ,  0.00458527,
0.00521648,  0.00385924,  0.00105498, -0.00517886, -0.01677693,
-0.00254344, -0.04660135, -0.0042565 , -0.01070292, -0.00978546,
0.00395998, -0.00091199,  0.00476804, -0.00296541,  0.0037867 ,
-0.00378863,  0.00216215, -0.00275317,  0.0001033 , -0.0028793 ,
-0.00472449, -0.02478764, -0.00794014,  0.00807714,  0.00265896,
0.00280038, -0.00391497, -0.00142031,  0.00072159,  0.00286194,
-0.00627549, -0.00609946, -0.00522796,  0.00402372, -0.00050308,
0.00776461,  0.00257295, -0.00229076,  0.00437025, -0.02685707,
-0.00500122, -0.00216331, -0.00430452, -0.00292455], dtype=float32)

如果想知道训练出的模型是怎样工作的,仅仅知道每一层的可调参数是不够的,还要比较清楚的了解模型中每层的结构。 神经网络类似黑箱,如果能够理解到模型内部结构并且得到各个可调参数,那么这个黑箱的黑度就能从100%变成99%了。

\

回复

问1:太谢谢楼主了,也抢一个沙发。 对于lstm模型,文献 6说最重要的weights是forget gate的。请教以本例,如何获知lstm层的forget gate的weights呢?

答1:keras的LSTM源代码 20中是这样定义的:

self.kernel_i = self.kernel[:, :self.units]
self.kernel_f = self.kernel[:, self.units: self.units * 2]
self.kernel_c = self.kernel[:, self.units * 2: self.units * 3]
self.kernel_o = self.kernel[:, self.units * 3:]

self.recurrent_kernel_i = self.recurrent_kernel[:, :self.units]
self.recurrent_kernel_f = self.recurrent_kernel[:, self.units: self.units * 2]
self.recurrent_kernel_c = self.recurrent_kernel[:, self.units * 2: self.units * 3]
self.recurrent_kernel_o = self.recurrent_kernel[:, self.units * 3:]

if self.use_bias:
self.bias_i = self.bias[:self.units]
self.bias_f = self.bias[self.units: self.units * 2]
self.bias_c = self.bias[self.units * 2: self.units * 3]
self.bias_o = self.bias[self.units * 3:]

其中i指input_gate,f指forget_gate,o指output_gate,c忘记叫什么名字了。。。

对于本例中的模型:

model.trainable_weights
# 返回值
[<tf.Variable 'lstm_1/kernel:0' shape=(6, 512) dtype=float32_ref>,
<tf.Variable 'lstm_1/recurrent_kernel:0' shape=(128, 512) dtype=float32_ref>,
<tf.Variable 'lstm_1/bias:0' shape=(512,) dtype=float32_ref>,
...
]

forget_gate的weigets有两种: 用于处理每个时刻输入数据 X(shape为(1,6))的weights:

F1 = model.get_weights()[0][0:6, 128:2*128]
# shape 为 (6,128)

用于处理每个时刻的前一个时刻返回值数据 H(shape为(1,128))的weights:

F2 = model.get_weights()[1][0:128, 128:2*128]
# shape 为 (128,128)

bias:

B = model.get_weights()[2][128:2*128]
# shape 为 (1,128)

forget_gate处的计算为

results = XF1 + HF2 + B
results = activation(results)

式中所有的 * 均为点乘,activation为激活函数,所以results的shape为(1,128),分别对应128个lstm cell。第i个lstm cell根据results[i]的大小决定forget的程度。


问2:求教在本例中aux的Weights在怎么看?谢谢!

答2:

lstm_output = LSTM(128, activation=activation, dropout_W=0.2, dropout_U=0.1)(lstm_input)
aux_input = Input(shape=(1,), name='aux_input')
merged_data = merge([lstm_output, aux_input], mode='concat', concat_axis=-1)
<tf.Variable 'dense_1/kernel:0' shape=(129, 64) dtype=float32_ref>

LSTM层有128个输出,第一层Dense层有129个输入,最后一个输入是aux。在第一层dense层中aux输入到各个cell的权重为

model.get_weights()[3][128,:]

因为aux的影响已经分布到各个cell了,所以dense_1之后所有权重都和aux有关了。

标签

股票市场