The graduation thesis is to do this. I thought it would like this. It should be almost the same for PCA, I didn't expect to rehearse PPT speaking, my other mentor is a way to break the sky - PCA modeling forecast.
Today, the harvest is not small, summed up, so as not to forget.
PCA projection classification:
There are N samples, directly use PCA projecting, you can see their classification. During the SVD process, the USV three matrices are obtained, and u is the score matrix, V is the load matrix. If in the data matrix X, each sample is arranged in line, ie X is a matrix of N rows M columns, M is equal to the number of variables. So, u has N rows, each column of u, corresponds to one main component. V has M lines, after the transfer, there is M column, and each line corresponds to a main component, that is, the first line corresponds to PC2 ...
The following figure is the explanation of V after the transfer
Modeling of PCA
The correct way is:
First create a training sample, assume n, constitute a matrix X, then decompose SVD, get V (note, here V has no transposition)
New samples. Composition matrix Y, with this matrix Y and
v multiplied, ie y *
v
V has no transposition, so
V number of lines = variable, Y's number = number of variables, so you can Y *
v)
Attention, red
v is consistently composed of a part of the V.ing of the original model.
So, red
How is the column of V? No transposition, V's column corresponds to the primary component PC, so if the modeling is used, the PC1 and PC2 projections are used, that is here
v The first column and second columns of V obtained by the original modeling, and other main components
This way, y *
v You can get a U ', this U has N rows, column number equal to the number of primary components selected in V, in general, if the plane projected, u' is two columns, the meaning of U is the first start modeling The U significiist of SVD decomposition is the same.
With this u ', projection to the original projection map, you can observe the distribution of the spatial distribution of the sample formation of the sample to model the sample. If mode is modeled, the sample is divided into three classes, and the sample to be tested is Category, then on the final projection map, the sample to be tested should be distributed in a rough range of the corresponding type of modeling (herein, all data is not abnormal, all of which are normal standard samples).
If you still use the model matrix X and red
V multiplied, then the obtained u 'and the modeling U is exactly the same (fierce, exactly the corresponding column), ie, assume that the group components PC1 and PC2, then u', only two columns, it and u The first class second column is exactly the same.
This is the method of modeling forecasts.
The original method of forming modeling data and test data into a matrix re-projection can only be used to observe the classification of the entire sample, and it is not reasonable for the forecasting master.