矩阵对矩阵求导

矩阵对矩阵求导矩阵对矩阵求导微分法 假设有矩阵函数F∈Rp×qF\in\mathbb{R}^{p\timesq}F∈Rp×q要对矩阵X∈Rm×n\bm{X}\in\mathbb{R}^{m\timesn}X∈Rm×n求导。则矩阵对矩阵求导微分法的步骤是先需要先对矩阵做向量化,然后再使用向量对向量求导,则有vec(dF)=∂vec(F)∂vec(X)⊤vec(dX)\mathrm{vec(d\bm{F})}=\frac{\mathrm{\partialvec(\bm{F})}}{\partial\ma

大家好,欢迎来到IT知识分享网。矩阵对矩阵求导

1 矩阵对矩阵求导微分法

 假设有矩阵函数 F ∈ R p × q F \in \mathbb{R}^{p \times q} FRp×q要对矩阵 X ∈ R m × n \bm{X} \in \mathbb{R}^{m \times n} XRm×n求导。则矩阵对矩阵求导微分法的步骤是先对矩阵函数求微分,再需要先对矩阵做向量化,然后使用向量对向量求导,则有 v e c ( d F ) = ∂ v e c ( F ) ∂ v e c ( X ) ⊤ v e c ( d X ) \mathrm{vec}(d\mathrm{\bm{F}})=\frac{\mathrm{\partial vec(\bm{F})}}{\partial \mathrm{vec}(\bm{X})^{\top}}\mathrm{vec}(d\bm{X}) vec(dF)=vec(X)vec(F)vec(dX)

2 求解 ∂ A X B ∂ X \frac{\partial \bm{AXB}}{\partial \bm{X}} XAXB

 假设 A ∈ R l × m \bm{A} \in \mathbb{R}^{l \times m} ARl×m X ∈ R m × n \bm{X} \in \mathbb{R}^{m \times n} XRm×n B ∈ R n × q \bm{B}\in \mathbb{R}^{n \times q} BRn×q都是矩阵。先对矩阵函数求微分则有 d F = A d X B d \bm{F}=\bm{A}d\bm{XB} dF=AdXB再进行向量化操作 v e c ( d F ) = v e c ( A d X B ) = ( B ⊤ ⊗ A ) v e c ( d X ) vec(d\bm{F})=vec(\bm{A}d\bm{XB})=(\bm{B}^{\top}\otimes \bm{A})vec(d\bm{X}) vec(dF)=vec(AdXB)=(BA)vec(dX)最终可知求导的矩阵梯度为 ∂ A X B ∂ X = ( B ⊤ ⊗ A ⊤ ) ⊤ = B ⊗ A ⊤ \frac{\partial \bm{AXB}}{\partial \bm{X}}=(\bm{B}^{\top}\otimes \bm{A}^{\top})^{\top}=\bm{B}\otimes\bm{A}^{\top} XAXB=(BA)=BA利用上面的结果也可以推知 ∂ A X ∂ X = I n ⊗ A ⊤ \frac{\partial \bm{AX}}{\partial \bm{X}}=\bm{I}_n\otimes\bm{A}^{\top} XAX=InA ∂ X B ∂ X = = B ⊗ I m \frac{\partial \bm{XB}}{\partial \bm{X}}==\bm{B}\otimes\bm{I}_m XXB==BIm

3 求解 ∂ A exp ⁡ ( B X C ) D ∂ X \frac{\partial \bm{A}\exp(\bm{BXC})\bm{D}}{\partial \bm{X}} XAexp(BXC)D

 首先对矩阵函数微分可以到 d F = A [ d exp ⁡ ( B X C ) ] D = A [ exp ⁡ ( B X C ) ⊙ ( B d X C ) ] D d\bm{F}=\bm{A}[d\exp(\bm{BXC})]\bm{D}=\bm{A}[\exp(\bm{BXC})\odot(\bm{B}d\bm{XC})]\bm{D} dF=A[dexp(BXC)]D=A[exp(BXC)(BdXC)]D两边矩阵向量化则有 v e c ( d F ) = ( D ⊤ ⊗ A ) v e c [ exp ⁡ ( B X C ) ⊙ ( B d X C ) ] = ( D ⊤ ⊗ A ) d i a g ( exp ⁡ ( B X C ) ) v e c ( B d X C ) = ( D ⊤ ⊗ A ) d i a g ( exp ⁡ ( B X C ) ) ( C ⊤ ⊗ B ) v e c ( d X ) \begin{aligned}vec(d \bm{F})&=(\bm{D}^{\top}\otimes \bm{A})vec[\exp(\bm{BXC})\odot(\bm{B}d\bm{XC})]\\&=(\bm{D}^{\top}\otimes \bm{A})diag(\exp(\bm{BXC}))vec(\bm{B}d\bm{XC}) \\&=(\bm{D}^{\top}\otimes\bm{A})diag(\exp(\bm{BXC}))(\bm{C}^{\top}\otimes \bm{B})vec(d\bm{X})\end{aligned} vec(dF)=(DA)vec[exp(BXC)(BdXC)]=(DA)diag(exp(BXC))vec(BdXC)=(DA)diag(exp(BXC))(CB)vec(dX)最终可以得到 ∂ A exp ⁡ ( B X C ) D ∂ X = [ ( D ⊤ ⊗ A ) d i a g ( exp ⁡ ( B X C ) ) ( C ⊤ ⊗ B ) ] ⊤ = ( C ⊗ B ⊤ ) d i a g ( exp ⁡ ( B X C ) ) ( D ⊗ A ⊤ ) \begin{aligned}\frac{\partial \bm{A}\exp(\bm{BXC})\bm{D}}{\partial \bm{X}}&=[(\bm{D}^{\top}\otimes\bm{A})diag(\exp(\bm{BXC}))(\bm{C}^{\top}\otimes \bm{B})]^{\top}\\&=(\bm{C}\otimes \bm{B}^{\top})diag(\exp(\bm{BXC}))(\bm{D}\otimes \bm{A}^{\top})\end{aligned} XAexp(BXC)D=[(DA)diag(exp(BXC))(CB)]=(CB)diag(exp(BXC))(DA)

免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://yundeesoft.com/15984.html

(0)

相关推荐

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注微信