t-distributed Stochastic Neighbor Embedding
t-SNE vs PCA
https://mademistakes.com © Michael Rose
pj|i=g(|xi−xj|)∑k≠ig(|xi−xk|) pj|i=0.1770.177+0.177+0.164+0.0014+0.0013+...≈0.34 pj|i=0.0770.077+0.064+0.064+0.032+0.031+...≈0.27
Perp(Pi)=2−∑pj|ilog2pj|i Perp - target number of neighbors
pj|i=g(|xi−xj|)∑k≠ig(|xi−xk|) pj|i=exp(−|xi−xj|2/2σ2i)∑k≠iexp(−|xi−xk|2/2σ2i) 1σ√2πexp(−12(xi−xjσ)2)
qij=exp(−|yi−yj|2/2σ2i)∑k≠lexp(−|yk−yl|2/2σ2i) qij=(1+|yi−yj|2)−1∑k≠l(1+|yk−yl|2)−1
Random Points
pij=exp(−|xi−xj|2/2σ2i)∑k≠lexp(−|xk−xl|2/2σ2i) qij=(1+|yi−yj|2)−1∑k≠l(1+|yk−yl|2)−1
Kullback-Leibler divergence C=DKL(P∥Q)=∑x∈XP(x)log(P(x)Q(x))
Derivate δCδyi=4∑j(pij−qij)(yi−yj)(1+|yi−yj|2)−1
Early Compression Prevents clustering at the early stages with regularization.
Early Exaggeration Moves clusters by multiplying p values at the early stages.
"There's no such thing as a stupid question!"
Kemal Erdem | @burnpiro
https://erdem.pl
https://github.com/burnpiro