获取keras模型的weights和bias
由small_q创建,最终由small_q 被浏览 4 用户
问题
获取keras模型的weights和bias
解答
以帖子“LSTM Networks应用于股票市场之Functional Model”举例,它的模型是
lstm_input = Input(shape=(30, 6), name='lstm_input')
lstm_output = LSTM(128, activation=activation, dropout_W=0.2, dropout_U=0.1)(lstm_input)
aux_input = Input(shape=(1,), name='aux_input')
merged_data = merge([lstm_output, aux_input], mode='concat', concat_axis=-1)
dense_output_1 = Dense(64, activation='linear')(merged_data)
dense_output_2 = Dense(16, activation='linear')(dense_output_1)
predictions = Dense(1, activation=activation)(dense_output_2)
model = Model(input=[lstm_input, aux_input], output=predictions)
model.compile(optimizer='adam', loss='mse', metrics=['mse'])
训练部分
model.fit([train_x, train_aux], train_y, batch_size=conf.batch, nb_epoch=10, verbose=2)
运行到这,我们就可以看模型的权重了。keras提供了几个函数用来获取权重
# 看每一层layer有多少个被训练的weights和bias(kernel指weights)
model.trainable_weights
# 返回值
[<tf.Variable 'lstm_1/kernel:0' shape=(6, 512) dtype=float32_ref>,
<tf.Variable 'lstm_1/recurrent_kernel:0' shape=(128, 512) dtype=float32_ref>,
<tf.Variable 'lstm_1/bias:0' shape=(512,) dtype=float32_ref>,
<tf.Variable 'dense_1/kernel:0' shape=(129, 64) dtype=float32_ref>,
<tf.Variable 'dense_1/bias:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'dense_2/kernel:0' shape=(64, 16) dtype=float32_ref>,
<tf.Variable 'dense_2/bias:0' shape=(16,) dtype=float32_ref>,
<tf.Variable 'dense_3/kernel:0' shape=(16, 1) dtype=float32_ref>,
<tf.Variable 'dense_3/bias:0' shape=(1,) dtype=float32_ref>]
可以看到,对于这个模型来说,可调参数(包括weights和bias)存在于9个位置
# 直接调出所有weights和bias
model.get_weights()
返回值为一个list,包含九个ndarray的数组,与通过model.trainable_weights调出来的9个位置对应 所以如果想看某一个地方的weights/bias,例如dense_1的bias,那么通过下述方式就可以获取了。
model.get_weights()[4]
# 返回值
array([-0.00225463, 0.01296113, 0.00273713, -0.0065364 , 0.01660943,
0.00623776, -0.00092952, -0.00890288, 0.00431062, 0.0162892 ,
-0.00205688, -0.00469067, 0.00429582, -0.00396401, 0.00565233,
-0.00254946, 0.02485307, 0.00086826, 0.0006156 , 0.00458527,
0.00521648, 0.00385924, 0.00105498, -0.00517886, -0.01677693,
-0.00254344, -0.04660135, -0.0042565 , -0.01070292, -0.00978546,
0.00395998, -0.00091199, 0.00476804, -0.00296541, 0.0037867 ,
-0.00378863, 0.00216215, -0.00275317, 0.0001033 , -0.0028793 ,
-0.00472449, -0.02478764, -0.00794014, 0.00807714, 0.00265896,
0.00280038, -0.00391497, -0.00142031, 0.00072159, 0.00286194,
-0.00627549, -0.00609946, -0.00522796, 0.00402372, -0.00050308,
0.00776461, 0.00257295, -0.00229076, 0.00437025, -0.02685707,
-0.00500122, -0.00216331, -0.00430452, -0.00292455], dtype=float32)
如果想知道训练出的模型是怎样工作的,仅仅知道每一层的可调参数是不够的,还要比较清楚的了解模型中每层的结构。 神经网络类似黑箱,如果能够理解到模型内部结构并且得到各个可调参数,那么这个黑箱的黑度就能从100%变成99%了。
\
回复
问1:太谢谢楼主了,也抢一个沙发。 对于lstm模型,文献 6说最重要的weights是forget gate的。请教以本例,如何获知lstm层的forget gate的weights呢?
答1:keras的LSTM源代码 20中是这样定义的:
self.kernel_i = self.kernel[:, :self.units]
self.kernel_f = self.kernel[:, self.units: self.units * 2]
self.kernel_c = self.kernel[:, self.units * 2: self.units * 3]
self.kernel_o = self.kernel[:, self.units * 3:]
self.recurrent_kernel_i = self.recurrent_kernel[:, :self.units]
self.recurrent_kernel_f = self.recurrent_kernel[:, self.units: self.units * 2]
self.recurrent_kernel_c = self.recurrent_kernel[:, self.units * 2: self.units * 3]
self.recurrent_kernel_o = self.recurrent_kernel[:, self.units * 3:]
if self.use_bias:
self.bias_i = self.bias[:self.units]
self.bias_f = self.bias[self.units: self.units * 2]
self.bias_c = self.bias[self.units * 2: self.units * 3]
self.bias_o = self.bias[self.units * 3:]
其中i指input_gate,f指forget_gate,o指output_gate,c忘记叫什么名字了。。。
对于本例中的模型:
model.trainable_weights
# 返回值
[<tf.Variable 'lstm_1/kernel:0' shape=(6, 512) dtype=float32_ref>,
<tf.Variable 'lstm_1/recurrent_kernel:0' shape=(128, 512) dtype=float32_ref>,
<tf.Variable 'lstm_1/bias:0' shape=(512,) dtype=float32_ref>,
...
]
forget_gate的weigets有两种: 用于处理每个时刻输入数据 X(shape为(1,6))的weights:
F1 = model.get_weights()[0][0:6, 128:2*128]
# shape 为 (6,128)
用于处理每个时刻的前一个时刻返回值数据 H(shape为(1,128))的weights:
F2 = model.get_weights()[1][0:128, 128:2*128]
# shape 为 (128,128)
bias:
B = model.get_weights()[2][128:2*128]
# shape 为 (1,128)
forget_gate处的计算为
results = XF1 + HF2 + B
results = activation(results)
式中所有的 * 均为点乘,activation为激活函数,所以results的shape为(1,128),分别对应128个lstm cell。第i个lstm cell根据results[i]的大小决定forget的程度。
问2:求教在本例中aux的Weights在怎么看?谢谢!
答2:
lstm_output = LSTM(128, activation=activation, dropout_W=0.2, dropout_U=0.1)(lstm_input)
aux_input = Input(shape=(1,), name='aux_input')
merged_data = merge([lstm_output, aux_input], mode='concat', concat_axis=-1)
<tf.Variable 'dense_1/kernel:0' shape=(129, 64) dtype=float32_ref>
LSTM层有128个输出,第一层Dense层有129个输入,最后一个输入是aux。在第一层dense层中aux输入到各个cell的权重为
model.get_weights()[3][128,:]
因为aux的影响已经分布到各个cell了,所以dense_1之后所有权重都和aux有关了。