If you’re wondering what linear separability is, the Python Machine Learning book we recently analyzed has a nice graphic illustrating it. Assuming we know that the data is generated in two groups, if the data is linearly separated, we can easily separate the data into small groups with a line as shown below. However, if the data is nonlinear, a more complex polynomial function may be required to partition the data. Since a classical PCA simply computes the PC as a linear combination of the underlying structure of the data, a classical PCA is unable to separate the nonlinear data.
Linear vs. nonlinear problem
What happens when you apply normal PCA to a data set that is not linearly separable? And how can we handle such a large data set? In this post, we will explore these problems using a script with examples.
Let’s start by downloading all the necessary packages to illustrate the use of Spring PCA. We first use the sklearn dataset module to create nonlinear datasets. And then we load two modules that will be useful for doing a regular PSA and a Spring Sklearn PSA.
from sklearn.datasets import make_circles
from sklearn.decomposition import PCA, KernelPCA
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
To create nonlinear data, we will use make_circles() to create circular data of two groups. Here we generate 200 colors data from two groups, one group has a circular structure and the other group contains random numbers concentrated in the center of the circle. The function make_circles() returns the data and group assignment for each observation.
# Create discontinuous linear data
X, y = make_circles(n_samples=200, random_state=1, noise=0.1, factor=0.1)
We will store the data in the Pandas data frame with a group assignment variable.
df =pd.DataFrame(X)
df.columns=[‘a’,’b’]
df[y]=y
We can use the Seaborn scattering function to visualize the nonlinearity of the data.
sns.scatterplot(data=df,x=’a’,y=’b’,hue=y)
As expected, in this example we see that we have data from two groups with a distinct nonlinear pattern.
Non-linear circular data for PCA spring
PCA for non-linear data
Now let’s apply normal PCA to this untrained data and see what the computers look like. We use Skliren’s PCA function to perform a PCA.
scikit_pca = PCA(n_components=2)
X_pca = scikit_pca.fit_transform(X)
To visualize the results of the conventional PCA, we construct a scatter plot between PC1 and PC2. Let us first store the PCA results in a Pandas data frame with a known group mapping.
pc_res = pd.DataFrame(X_pca)
pc_res.columns=[pc1,pc2]
pc_res.head()
pc_res[‘y’]=y
The PCA graph shows that it is very similar to the original data, and there is no line to separate the data from the two groups.
sns.scatterplot(data=pc_res,x=’pc1′,y=’pc2′,hue=y)
PCA on non-linear circular data
Size reduction with core PCA using scientifically trained material
We now use the same data, but this time we apply Spring PCA using the kernalPCA() function in sklearn. The basic idea of spring automata is that we use the spring function to project nonlinear data into a higher space where the groups are linearly separated. And then use normal PCA to reduce the dimensionality.
This function uses the KernelPCA() function with the rbf kernel function to perform a kernel PCA.
kpca = KernelPCA(kernel=rbf,
fit_inverse_transform=True,
gamma=10,
n_components=2)
X_kpca = kpca.fit_transform(X)
Let’s keep the results in the data frame as before.
kpca_res = pd.DataFrame(X_kpca)
kpca_res.columns=[kpc1,kpc2]
kpca_res[‘y’]=y
kpca_res.head()
We can now visualize the computers in the PCA core with a scatter diagram and clearly see that the data is linearly separated.
sns.scatterplot(data=kpca_res,x=’kpc1′,y=’kpc2′,hue=y)
PCA plot of nonlinear data with kernel PCA
The post Introduction to Kernal PCA with Python appeared first on .
Related Tags:
kernel pca python from scratch,rbf kernel pca,kernel pca matlab,kernel pca example,kernel pca gamma,kernel pca pdf,Privacy settings,How Search works,polynomial kernel pca,kernel pca vs pca