Retrofitting Analysis
To figure out the process of retrofitting[1] objective updating, we do the following math.
Forward Derivation
[psi(Q) = sum_{i=1}^{n}left[ alpha_i||q_i-hat{q_i}||^2 + sumeta||q_i-q_j||^2
ight] \
frac{partial psi(Q)}{partial q_i} = alpha_i(q_i-hat{q_i}) + sumeta(q_i-q_j) = 0 \
(alpha_i+sumeta_{ij})q_i -alpha_ihat{q_i} -sumeta_{ij}q_j = 0 \
q_i = frac{sumeta_{ij}q_j+alpha_ihat{q_i}}{sumeta_{ij}+alpha_i}
]
Backward Derivation
This is how I understood this updating equation.
In the paper[1], it has mentioned "We take the first derivative of (psi) with respect to one qi vector, and by equating it to zero", hence we get follow idea:
[frac{partialpsi(Q)}{partial q_i} = 0
]
And,
[q_i = frac{sumeta_{ij}q_j+alpha_ihat{q_i}}{sumeta_{ij}+alpha_i} \
alpha_iq_i - alpha_ihat{q_j} + sumeta_{ij}q_i - sumeta q_j = 0 \
alpha_i(q_i-hat{q_j})+ sumeta_{ij}(q_i-q_j) = 0
]
Apparently,
[frac{partialpsi(Q)}{partial q_i} = alpha_i(q_i-hat{q_j})+ sumeta_{ij}(q_i-q_j) = 0
]
Reference
Faruqui M, Dodge J, Jauhar S K, et al. Retrofitting Word Vectors to Semantic Lexicons[J]. ACL, 2015.