深度学习基础技术分析6：LSTM（含代码分析）_lstm结构代码

作者：笔触狂放9 | 2024-03-29 09:34:07

踩

lstm结构代码

1. 模型图示

LSTM 模型如图1 所示。横向穿过 cell 上部的线分别称作 $\mathbf{c}$ 总线，下部的线称为 $\mathbf{h}$ 总线，这意味着 $\mathbf{c}_{t - 1}$ 与 $\mathbf{h}_{t - 1}$ 会对 $t$ 时刻的计算产生影响。其中:

从 $x_t$ 与下

在这里插入图片描述图1. LSTM 模型

2. 相关技术

LSTM 从名称来看，是用于处理长短时序。

3. 代码分析

程序代码见: https://github.com/garstka/char-rnn-java
为了学习它, 我又来逐个方法来分析.

// 前向传播核心代码
// acts 根据字符串存取实型二维数组
public void active(int t, Map<String, DoubleMatrix> acts) {
    // 获取 t 时刻输入
    DoubleMatrix x = acts.get("x" + t);
    // 上一时刻的 h 和 c
    DoubleMatrix preH = null, preC = null;
    if (t == 0) {
        preH = new DoubleMatrix(1, getOutSize());
        preC = preH.dup();
    } else {
        preH = acts.get("h" + (t - 1));
        preC = acts.get("c" + (t - 1));
    }
    
    DoubleMatrix i = Activer.logistic(x.mmul(Wxi).add(preH.mmul(Whi)).add(preC.mmul(Wci)).add(bi));
    DoubleMatrix f = Activer.logistic(x.mmul(Wxf).add(preH.mmul(Whf)).add(preC.mmul(Wcf)).add(bf));
    DoubleMatrix gc = Activer.tanh(x.mmul(Wxc).add(preH.mmul(Whc)).add(bc));
    DoubleMatrix c = f.mul(preC).add(i.mul(gc));
    DoubleMatrix o = Activer.logistic(x.mmul(Wxo).add(preH.mmul(Who)).add(c.mmul(Wco)).add(bo));
    DoubleMatrix gh = Activer.tanh(c);
    DoubleMatrix h = o.mul(gh);
    
    // 存储各个二维矩阵
    acts.put("i" + t, i);
    acts.put("f" + t, f);
    acts.put("gc" + t, gc);
    acts.put("c" + t, c);
    acts.put("o" + t, o);
    acts.put("gh" + t, gh);
    acts.put("h" + t, h);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

在我运行的程序中， $x_t$ 为 one-hot 编码的 $\times 62$ 向量, $i_t$ 至 $h_t$ 均为 $\times 100$ 向量.

代码所表示的信息比图1 更丰富。矩阵变量之间要运算，很多时候要乘以权重矩阵。为使得结构更清晰，图1 牺牲了表达的准确性。以下将向量的计算翻译成数学表达式，这些向量都会被存储在模型中。

向量 $\mathbf{i}_t$ 表示 $t$ 时刻输入:
$\mathbf{i}_t = \sigma(\mathbf{W}^{xi} \cdot \mathbf{x}_t + \mathbf{W}^{hi} \cdot \mathbf{h}_{t - 1} + \mathbf{W}^{ci} \cdot \mathbf{c}_{t - 1} + bi) \tag{1}$
向量 $\mathbf{f}_t$ 表示遗忘:
$\mathbf{i}_t = \sigma(\mathbf{W}^{xf} \cdot \mathbf{x}_t + \mathbf{W}^{hf} \cdot \mathbf{h}_{t - 1} + \mathbf{W}^{cf} \cdot \mathbf{c}_{t - 1} + bf) \tag{2}$
向量 $\mathbf{gc}_t$ 表示
$\mathbf{gc}_t = tanh(\mathbf{W}^{xc} \cdot \mathbf{x}_t + \mathbf{W}^{hc} \cdot \mathbf{h}_{t - 1} + bc) \tag{3}$
向量 $\mathbf{c}_t$ 表示
$\mathbf{c}_t = \tanh(\mathbf{f} \odot \mathbf{c}_{t - 1} + \mathbf{i}_{t} \odot \mathbf{gc}_t) \tag{4}$
向量 $\mathbf{o}_t$ 表示
$\mathbf{o}_t = \sigma(\mathbf{W}^{xo} \cdot \mathbf{x}_t + \mathbf{W}^{ho} \cdot \mathbf{h}_{t - 1} + \mathbf{W}^{co} \cdot \mathbf{c}_t + bo) \tag{5}$
向量 $\mathbf{gh}_t$ 表示
$\mathbf{gh}_t = \tanh(\mathbf{c}_t) \tag{6}$
向量 $\mathbf{h}_t$ 表示本时刻输出.
$\mathbf{h}_t = \mathbf{o}_t \odot \mathbf{gh}_t \tag{7}$

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/笔触狂放9/article/detail/335114