Published onJune 11, 2026Sparsity Tricks Sparse Linear Weights Linear Layer Instead of this Y=W.X+BY = W.X + BY=W.X+B Just use a low rank WWW Y=(U.V).X+BY = (U.V).X + BY=(U.V).X+B Where W=dim(m,n)W = dim(m , n)W=dim(m,n) and U=dim(m,t)V=dim(t,n),t<m,nU=dim(m , t) V=dim(t , n) , t < m,nU=dim(m,t)V=dim(t,n),t<m,n Shared input multiple projections