CS outline
Reference lists
- textbook & youtube lecture lists by Steve Brunton
- textbook:
[0037]_T03_R16- Brunton, Steven L., and J. Nathan Kutz. Data-driven science and engineering: Machine learning, dynamical systems, and control. Cambridge University Press, 2019. - Youtube lecture series by Steve Brunton(Total 23)
- textbook:
-
textbook:
[0832]_T01_R08- Eldar, Yonina C., and Gitta Kutyniok, eds. Compressed sensing: theory and applications. Cambridge university press, 2012. - Compressive Sensing with matlab code
- Michigan center for CS
[0832]_T01_R02 - matlab
[0832]_T01_R01mathworks Cleve’s Corner - Colorado - Begineers Code for compressive sensing 09Y
[0832]_T01_R07
- Michigan center for CS
- SBL Mike Tipping 논문 및 코드
- 코드가 너무 어렵다;;;
- SBL DOA Peter 논문 및 코드
- 코드는 peter SBL 및 cvx 의 성능을 상호 비교하고 있음
- 신명인 작성 논문 및 코드
[0832]_T06_C02그리고 참고자료- ㅇㅇㅇ
CS
model
- 자연에 존재하는 데이터는 적절한 coordinate system(i.e., $\boldsymbol{\Psi}$, such as DCT, FT etc)을 이용한다면 sparse representation이 가능함.
[0037]_T03_R16, p96 \(\boldsymbol{x} = \boldsymbol{\Psi} \boldsymbol{s}\) - $x$ 는 원본신호이고 앞서 이야기하였지만 내재된 sparsity 특성에 의해서 transform basis에 의해 K-sparse vector로 표현될 수 있다.
한편 원본신호($\boldsymbol{x} \in \mathbb{R}^{n}$)는 measurement(혹은 sampling) matrix($\boldsymbol{C} \in \mathbb{R}^{p \times n}$)에 의해 측정신호($\boldsymbol{y} \in \mathbb{R}^{p}$)로 표현된다.
\(\begin{aligned}
\boldsymbol{y} & = \boldsymbol{C} \boldsymbol{x} \\
& = \boldsymbol{C} \boldsymbol{\Psi} \boldsymbol{s} \\
& = \boldsymbol{\Theta} \boldsymbol{s}
\end{aligned}\)
- sparse representation의 목적은 measurement $\boldsymbol{y}$로 부터 sparsest vector $\boldsymbol{s}$을 찾는 것이다.
- 혹은 measurement, $\boldsymbol{y}$로 부터 원본신호, $\boldsymbol{x}$을 복원하는 것이다.
CS in action
matlab L1-magic example
- Michigan center for CS
[0832]_T01_R02- matrix $\boldsymbol{B} \in \mathbb{R}^{481 \times 481}$ 는 DFT 연산을 수행하는 matrix
- matrix $\boldsymbol{A} \in \mathbb{R}^{90 \times 481}$ 는 matrix $\boldsymbol{B}$ 중 일부 row을 랜덤하게 선택하는 measurement matrix로 랜덤하게 선택된 데이터 포인트에 대해 DFT 연산을 수행, i.e., $\boldsymbol{y} = \boldsymbol{A} \boldsymbol{x}^{T}$
- toy example의 핵심은 measurement $ \boldsymbol{y} $ 와 measurement matrix $ \boldsymbol{A} $ 로 부터 source signal $ \boldsymbol{x} $ 을 찾아내는 것
- 코드의 논리 진행 과정에서 twice DFT는 원신호로 돌아오게 된다는 것을 알고 있어야 함.
- matlab simple code
[0832]_T01_R01mathworks Cleve’s Corner- 기본적으로 Michigan center for CS의 예제와 동일하나, L1-magic에 의한 연산 결과를 pinv에 의한 연산 결과와 비교하였음
- L1-magic - 1-norm, pinv - 2-norm
CVX, convex optimization example [0832]_T01_R07
CS with Bayesian: SBL, Sparse bayesian learning [0832]_T06_C02
- SBL 제안, Mike Tipping: Web and Paper(2001)
- SBL의 개념을 underwater acoustic 분야에 적용하여 연구를 시작한 것은 UCSD Noise Lab, Peter 에 의해서 진행이 되었음. 주로 DOA estimation, Localization 분야에 적용
- 신명인은 SBL을 주파수 추정 분야에 적용
SBL 알고리즘의 Hyper-parameter 업데이트 시 Mike는 EM-algorithms을 사용하고, Peter는 통계적인 방식을 사용하고 있음
regression comparison: SBL algorithms in depth
SBL tutorial은 Mike Tipping lecture note 1, 2, 3, 4을 참고하였다. lecture note의 논리전개는 estimation parameter(혹은 regression - model prediction)을 위하여 1) classic과 probabilistic approach 을 비교하고 2) probabilistic approach에서 condition(i.e., sparcity)을 추가하여 Sparse Bayesian Learning 알고리즘을 유도한다.
model for CLASSIC approach
given model
- data set comprising $N=15$ samples generated from the function $y=sin(x)$ with Gaussian noise of $\sigma^2=0.2$.
- model this data with some parameterised function $y(x;\boldsymbol{w})$
- linearly-weighted sum of $M$ fixed basis function $\phi _m(x)$ (RBF, radial basis function) \(y(x;\boldsymbol{w}) = \sum_{m=1}^{M} w_{m} \phi _{m}(x)\)
Goal is model prediction - model parameter prediction
CLASSIC approach #1: Least-squares approach
Criterion: minimising the error measure \(E_D(\boldsymbol{w}) = \frac{1}{2} \sum_{n=1}^{N} \left[ t_n - \sum_{m=1}^{M} w_{m} \phi _{m}(x_n) \right]^2\)
solution \(\boldsymbol{w}_{LS} = \left( \Phi^T \Phi \right)^{-1} \Phi^T \boldsymbol{t}\)
disadvantage: over-fitting
CLASSIC approach #2: PLS, penalized least-squares
idea or motivation
- we typically prefer smoother function, which typically have smaller weights $\boldsymbol{w}$
- complexity control - Regularisation (over-fitting problem)
Criterion: minimising the error with weight penalty So, add a weight penalty term to the error function so that, \(E(\boldsymbol{w}) = E_D(\boldsymbol{w}) + \color{red} \lambda \color{a} E_W(\boldsymbol{w})\)
- where standard choice of weight penalty is the squared-weight penalty, $E_W(\boldsymbol{w}) = \frac{1}{2} \sum_{m=1}^{M} w_{m}^{2}$
- the hyperparameter $ \color{red} \lambda $ balances the trade-off between $E_D(\boldsymbol{w})$ and $E_W(\boldsymbol{w})$
solution \(\boldsymbol{w}_{PLS} = \left( \Phi^T \Phi + \lambda I \right)^{-1} \Phi^T \boldsymbol{t}\)
model for probalistic approach
probalistic approach 라는 것이 곧 Bayesian approach을 의미하는 것은 아니다. also refer to Kay estimation text.
\(t_n = y(x_n;\boldsymbol{w}) + \epsilon _n\)
- where $\epsilon _n$ is noise with Gaussian distribution, $N(0, \sigma ^2)$:
-
$p(t_n \boldsymbol{w}, \sigma^2)$ ~ $N(y(x_n; \boldsymbol{w}), \sigma ^2)$
the likelihood of the data set is \(\color{green} p(\boldsymbol{t}|\boldsymbol{w}, \sigma ^2) \color{a} = \prod_{n=1}^{N} p(t_n | \boldsymbol{w}, \sigma^2)\)
Goal is model prediction - model parameter prediction
Probabilistic approach #1: Maximum Likelihood
estimate for $\boldsymbol{w}$ that maximised the $\color{green} p(\boldsymbol{t}|\boldsymbol{w}, \sigma ^2)$ this is identical to the least-squares solution
Probabilistic approach #2: MAP, maximum a posteriori
idea or motivation similar with previous PLS - regularisation weight penalty, we now control the complexity of the model via a prior distribution which expresses our ‘degree of belief’ over values that $\boldsymbol{w}$ might take: \(p(\boldsymbol{w}|\alpha) = \prod_{m=1}^{M} \sqrt{\frac{\alpha}{2\pi}} \exp \left( -\frac{1}{2} \alpha w_{m}^{2} \right)\)
- zero-mean Guassian prior, independent for each weight $w_m$
- common inverse variance hyperparameter $\alpha$
Bayes rule with more familiar notation \(\begin{aligned} p(\theta | x) & = \frac{ p(x | \theta) p(\theta) }{ p(x) } \\ p(\boldsymbol{w} | \boldsymbol{t}, \alpha, \sigma ^2) & = \frac{ p(\boldsymbol{t} | \boldsymbol{w}, \sigma ^2) p(\boldsymbol{w} | \alpha) }{ p(\boldsymbol{t} | \alpha, \sigma ^2) } \end{aligned}\)
- so, the posterior is Gaussian: $ p(\boldsymbol{w} | \boldsymbol{t}, \alpha, \sigma ^2) = N(\mu, \Sigma) $ with $\mu = $ $\Sigma = $
Bayesian learning is about the better estimation, REF youtube video.
marginalisation - the distinguishing element of Bayesian methods is marginalisation: attempt to integrate out all ‘unisance’ variables
making prediction - learned from the training values $\boldsymbol{t}$, how we make a prediction for data $t_$ given a new input datum $x_$
- reference youtube by KU DSBA lab 해설