5002 Subspace clustering Memo

Methods

Dense Unit-based Method
Entropy-Based Method
Transformation-Based Method

why subspace clustering?

dimension curse: when the number of dimensions increases, the distance between any two points is nearly the same.

Dense Unit-based Method

Property

If a set is cluster in k dimension, then it’s part of cluster in k-1 dimension, too.

Algorithm

Identify sub-space that contain dense units (aprior algorithm, find x, y, z these dimension)
Identify clusters in each sub-space that contain dense units (base on previous found x, y, z dimension, we find cluster like {x:[1, 10] y: [11, 20]}, etc..)

adapt aprior

for example, we get grid of 10, and mark $X_1$ as [1, 10], and so on. threshold 1. ( 1, 19) $X_0$ $Y_1$ (11, 29) $X_1$ $Y_2$ (21, 39) $X_2$ $Y_3$ (31, 49) $X_3$ $Y_4$ (41, 59) $X_4$ $Y_5$

We get dense unit: $L_1 = {X_0, X_1, X_2, X_3, X_4}$ $L_1 = {Y_1, Y_2, Y_3, Y_4, Y_5}$ We found subspace: $L_1 = {{X}, {Y}}$ And $C_2 = {{X, Y}}$ For attribute ${X, Y}$, we get: $C_2 = {{X_0, Y_1}, {X_0, Y_2} \cdots}$ $L_2 = ..$ And …

Entropy-Based Method

conditional entropy

H represents entropy. $H(Y|X)=\sum_{x\in A} p(x)H(Y|X=x)$ $H(Y|X=x)= -\sum_{y \in B} p(y|X=x)\log(p(y|X=x))$ And $H(X, Y) = H(X) + H(Y|X)$ $H(X_1, X_2, \cdots, X_k) = H(X_1,\cdots, X_{k-1}) + H(X_k|X_1,\cdots, X_{k-1})$

So we can get the Lemma: If subspace with k dimension has good clustering (H(X1,…, X_k) <= w). The each of the (k-1) dimensional projections has good clustering

Apriori

We can calculate: H(Age)=0.12, H(Income)=0.08, w = 0.2

L_1 = {{Age}, {Income}}
C_2 = {{Age, Income}}… etc

H(Age,…) is calculated based on grid.

comparison with dense unit base

good to find subspace, only. bad to find clusters, only.

Transformation-based Method

steps

for example, data points (3, 3)(0, 2)(-1, -1)(2, 0).

calculate the mean vector of data points (1, 1)
Get the difference vector with mean vector(2, 2)(-1, 1)(-2, -2)(1, -1)
Get the covariance Matrix 1. (arrange data vector as column). $$Y = \begin{bmatrix}2 &-1 &-2 &1 \ 2 &1 &-2 &-1\end{bmatrix}$$
1. covariance matrix: $\Sigma = 1/4 YY^T$
Find eigenvalues of eigenvectors, solve $det(\Sigma - \lambda I)=0$. (cross product), get lambda
Find eigenvectors using eigenvalues, $(\Sigma - \lambda_1 I)x = 0$
Normalize the eigen vector into standard form.
Arrange the eigen vector into transformation matrix. (a column a vector, corresponding to the eigenvalue).
Using $y = T x$ to transform the data vector.
Only keep the row corresponding to smallest eigen value of the transformed data vector.

why smallest eigen value?

information loss is minimized if we project to the eigenvector with a large eigenvalue.