R
Published on

Sparsity Tricks

Sparse Linear Weights

Linear Layer

Instead of this

Y=W.X+BY = W.X + B

Just use a low rank WW

Y=(U.V).X+BY = (U.V).X + B

Where W=dim(m,n)W = dim(m , n) and U=dim(m,t)V=dim(t,n),t<m,nU=dim(m , t) V=dim(t , n) , t < m,n

Shared input multiple projections