大家好,欢迎来到IT知识分享网。
1 矩阵对矩阵求导微分法
假设有矩阵函数 F ∈ R p × q F \in \mathbb{R}^{p \times q} F∈Rp×q要对矩阵 X ∈ R m × n \bm{X} \in \mathbb{R}^{m \times n} X∈Rm×n求导。则矩阵对矩阵求导微分法的步骤是先对矩阵函数求微分,再需要先对矩阵做向量化,然后使用向量对向量求导,则有 v e c ( d F ) = ∂ v e c ( F ) ∂ v e c ( X ) ⊤ v e c ( d X ) \mathrm{vec}(d\mathrm{\bm{F}})=\frac{\mathrm{\partial vec(\bm{F})}}{\partial \mathrm{vec}(\bm{X})^{\top}}\mathrm{vec}(d\bm{X}) vec(dF)=∂vec(X)⊤∂vec(F)vec(dX)
2 求解 ∂ A X B ∂ X \frac{\partial \bm{AXB}}{\partial \bm{X}} ∂X∂AXB
假设 A ∈ R l × m \bm{A} \in \mathbb{R}^{l \times m} A∈Rl×m, X ∈ R m × n \bm{X} \in \mathbb{R}^{m \times n} X∈Rm×n, B ∈ R n × q \bm{B}\in \mathbb{R}^{n \times q} B∈Rn×q都是矩阵。先对矩阵函数求微分则有 d F = A d X B d \bm{F}=\bm{A}d\bm{XB} dF=AdXB再进行向量化操作 v e c ( d F ) = v e c ( A d X B ) = ( B ⊤ ⊗ A ) v e c ( d X ) vec(d\bm{F})=vec(\bm{A}d\bm{XB})=(\bm{B}^{\top}\otimes \bm{A})vec(d\bm{X}) vec(dF)=vec(AdXB)=(B⊤⊗A)vec(dX)最终可知求导的矩阵梯度为 ∂ A X B ∂ X = ( B ⊤ ⊗ A ⊤ ) ⊤ = B ⊗ A ⊤ \frac{\partial \bm{AXB}}{\partial \bm{X}}=(\bm{B}^{\top}\otimes \bm{A}^{\top})^{\top}=\bm{B}\otimes\bm{A}^{\top} ∂X∂AXB=(B⊤⊗A⊤)⊤=B⊗A⊤利用上面的结果也可以推知 ∂ A X ∂ X = I n ⊗ A ⊤ \frac{\partial \bm{AX}}{\partial \bm{X}}=\bm{I}_n\otimes\bm{A}^{\top} ∂X∂AX=In⊗A⊤ ∂ X B ∂ X = = B ⊗ I m \frac{\partial \bm{XB}}{\partial \bm{X}}==\bm{B}\otimes\bm{I}_m ∂X∂XB==B⊗Im
3 求解 ∂ A exp ( B X C ) D ∂ X \frac{\partial \bm{A}\exp(\bm{BXC})\bm{D}}{\partial \bm{X}} ∂X∂Aexp(BXC)D
首先对矩阵函数微分可以到 d F = A [ d exp ( B X C ) ] D = A [ exp ( B X C ) ⊙ ( B d X C ) ] D d\bm{F}=\bm{A}[d\exp(\bm{BXC})]\bm{D}=\bm{A}[\exp(\bm{BXC})\odot(\bm{B}d\bm{XC})]\bm{D} dF=A[dexp(BXC)]D=A[exp(BXC)⊙(BdXC)]D两边矩阵向量化则有 v e c ( d F ) = ( D ⊤ ⊗ A ) v e c [ exp ( B X C ) ⊙ ( B d X C ) ] = ( D ⊤ ⊗ A ) d i a g ( exp ( B X C ) ) v e c ( B d X C ) = ( D ⊤ ⊗ A ) d i a g ( exp ( B X C ) ) ( C ⊤ ⊗ B ) v e c ( d X ) \begin{aligned}vec(d \bm{F})&=(\bm{D}^{\top}\otimes \bm{A})vec[\exp(\bm{BXC})\odot(\bm{B}d\bm{XC})]\\&=(\bm{D}^{\top}\otimes \bm{A})diag(\exp(\bm{BXC}))vec(\bm{B}d\bm{XC}) \\&=(\bm{D}^{\top}\otimes\bm{A})diag(\exp(\bm{BXC}))(\bm{C}^{\top}\otimes \bm{B})vec(d\bm{X})\end{aligned} vec(dF)=(D⊤⊗A)vec[exp(BXC)⊙(BdXC)]=(D⊤⊗A)diag(exp(BXC))vec(BdXC)=(D⊤⊗A)diag(exp(BXC))(C⊤⊗B)vec(dX)最终可以得到 ∂ A exp ( B X C ) D ∂ X = [ ( D ⊤ ⊗ A ) d i a g ( exp ( B X C ) ) ( C ⊤ ⊗ B ) ] ⊤ = ( C ⊗ B ⊤ ) d i a g ( exp ( B X C ) ) ( D ⊗ A ⊤ ) \begin{aligned}\frac{\partial \bm{A}\exp(\bm{BXC})\bm{D}}{\partial \bm{X}}&=[(\bm{D}^{\top}\otimes\bm{A})diag(\exp(\bm{BXC}))(\bm{C}^{\top}\otimes \bm{B})]^{\top}\\&=(\bm{C}\otimes \bm{B}^{\top})diag(\exp(\bm{BXC}))(\bm{D}\otimes \bm{A}^{\top})\end{aligned} ∂X∂Aexp(BXC)D=[(D⊤⊗A)diag(exp(BXC))(C⊤⊗B)]⊤=(C⊗B⊤)diag(exp(BXC))(D⊗A⊤)
免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://yundeesoft.com/15984.html