获取keras模型的weights和bias

由small_q创建，最终由small_q更新于2022-12-20 14:20 被浏览 5 用户

问题

解答

以帖子“LSTM Networks应用于股票市场之Functional Model”举例，它的模型是

lstm_input = Input(shape=(30, 6), name='lstm_input')
lstm_output = LSTM(128, activation=activation, dropout_W=0.2, dropout_U=0.1)(lstm_input)
aux_input = Input(shape=(1,), name='aux_input')
merged_data = merge([lstm_output, aux_input], mode='concat', concat_axis=-1)
dense_output_1 = Dense(64, activation='linear')(merged_data)
dense_output_2 = Dense(16, activation='linear')(dense_output_1)
predictions = Dense(1, activation=activation)(dense_output_2)
model = Model(input=[lstm_input, aux_input], output=predictions)
model.compile(optimizer='adam', loss='mse', metrics=['mse'])

训练部分

model.fit([train_x, train_aux], train_y, batch_size=conf.batch, nb_epoch=10, verbose=2)

运行到这，我们就可以看模型的权重了。keras提供了几个函数用来获取权重

# 看每一层layer有多少个被训练的weights和bias（kernel指weights）
model.trainable_weights

# 返回值

[<tf.Variable 'lstm_1/kernel:0' shape=(6, 512) dtype=float32_ref>,
<tf.Variable 'lstm_1/recurrent_kernel:0' shape=(128, 512) dtype=float32_ref>,
<tf.Variable 'lstm_1/bias:0' shape=(512,) dtype=float32_ref>,
<tf.Variable 'dense_1/kernel:0' shape=(129, 64) dtype=float32_ref>,
<tf.Variable 'dense_1/bias:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'dense_2/kernel:0' shape=(64, 16) dtype=float32_ref>,
<tf.Variable 'dense_2/bias:0' shape=(16,) dtype=float32_ref>,
<tf.Variable 'dense_3/kernel:0' shape=(16, 1) dtype=float32_ref>,
<tf.Variable 'dense_3/bias:0' shape=(1,) dtype=float32_ref>]

可以看到，对于这个模型来说，可调参数(包括weights和bias)存在于9个位置

# 直接调出所有weights和bias
model.get_weights()

返回值为一个list，包含九个ndarray的数组，与通过model.trainable_weights调出来的9个位置对应所以如果想看某一个地方的weights/bias，例如dense_1的bias，那么通过下述方式就可以获取了。

model.get_weights()[4]
# 返回值
array([-0.00225463,  0.01296113,  0.00273713, -0.0065364 ,  0.01660943,
0.00623776, -0.00092952, -0.00890288,  0.00431062,  0.0162892 ,
-0.00205688, -0.00469067,  0.00429582, -0.00396401,  0.00565233,
-0.00254946,  0.02485307,  0.00086826,  0.0006156 ,  0.00458527,
0.00521648,  0.00385924,  0.00105498, -0.00517886, -0.01677693,
-0.00254344, -0.04660135, -0.0042565 , -0.01070292, -0.00978546,
0.00395998, -0.00091199,  0.00476804, -0.00296541,  0.0037867 ,
-0.00378863,  0.00216215, -0.00275317,  0.0001033 , -0.0028793 ,
-0.00472449, -0.02478764, -0.00794014,  0.00807714,  0.00265896,
0.00280038, -0.00391497, -0.00142031,  0.00072159,  0.00286194,
-0.00627549, -0.00609946, -0.00522796,  0.00402372, -0.00050308,
0.00776461,  0.00257295, -0.00229076,  0.00437025, -0.02685707,
-0.00500122, -0.00216331, -0.00430452, -0.00292455], dtype=float32)

如果想知道训练出的模型是怎样工作的，仅仅知道每一层的可调参数是不够的，还要比较清楚的了解模型中每层的结构。神经网络类似黑箱，如果能够理解到模型内部结构并且得到各个可调参数，那么这个黑箱的黑度就能从100%变成99%了。

问1：太谢谢楼主了，也抢一个沙发。对于lstm模型，文献 6说最重要的weights是forget gate的。请教以本例，如何获知lstm层的forget gate的weights呢？

答1：keras的LSTM源代码 20中是这样定义的：

self.kernel_i = self.kernel[:, :self.units]
self.kernel_f = self.kernel[:, self.units: self.units * 2]
self.kernel_c = self.kernel[:, self.units * 2: self.units * 3]
self.kernel_o = self.kernel[:, self.units * 3:]

self.recurrent_kernel_i = self.recurrent_kernel[:, :self.units]
self.recurrent_kernel_f = self.recurrent_kernel[:, self.units: self.units * 2]
self.recurrent_kernel_c = self.recurrent_kernel[:, self.units * 2: self.units * 3]
self.recurrent_kernel_o = self.recurrent_kernel[:, self.units * 3:]

if self.use_bias:
self.bias_i = self.bias[:self.units]
self.bias_f = self.bias[self.units: self.units * 2]
self.bias_c = self.bias[self.units * 2: self.units * 3]
self.bias_o = self.bias[self.units * 3:]

其中i指input_gate，f指forget_gate，o指output_gate，c忘记叫什么名字了。。。

对于本例中的模型：

model.trainable_weights
# 返回值
[<tf.Variable 'lstm_1/kernel:0' shape=(6, 512) dtype=float32_ref>,
<tf.Variable 'lstm_1/recurrent_kernel:0' shape=(128, 512) dtype=float32_ref>,
<tf.Variable 'lstm_1/bias:0' shape=(512,) dtype=float32_ref>,
...
]

forget_gate的weigets有两种：用于处理每个时刻输入数据 X（shape为(1,6)）的weights：

F1 = model.get_weights()[0][0:6, 128:2*128]
# shape 为 (6,128)

用于处理每个时刻的前一个时刻返回值数据 H（shape为(1,128)）的weights：

F2 = model.get_weights()[1][0:128, 128:2*128]
# shape 为 (128,128)

bias：

B = model.get_weights()[2][128:2*128]
# shape 为 (1,128)

forget_gate处的计算为

results = XF1 + HF2 + B
results = activation(results)

式中所有的 * 均为点乘，activation为激活函数，所以results的shape为(1,128),分别对应128个lstm cell。第i个lstm cell根据results[i]的大小决定forget的程度。

问2：求教在本例中aux的Weights在怎么看？谢谢！

答2：

lstm_output = LSTM(128, activation=activation, dropout_W=0.2, dropout_U=0.1)(lstm_input)
aux_input = Input(shape=(1,), name='aux_input')
merged_data = merge([lstm_output, aux_input], mode='concat', concat_axis=-1)

<tf.Variable 'dense_1/kernel:0' shape=(129, 64) dtype=float32_ref>

LSTM层有128个输出，第一层Dense层有129个输入，最后一个输入是aux。在第一层dense层中aux输入到各个cell的权重为

model.get_weights()[3][128,:]

因为aux的影响已经分布到各个cell了，所以dense_1之后所有权重都和aux有关了。

获取keras模型的weights和bias

问题

解答

回复

标签