Principal component analysis is one of the most widely used methods for reducing dimensionality in unsupervised learning. One of the assumptions of PCA is that the data are linearly separated. Core PCA, is a variant of PCA that allows nonlinear data to be processed and made linearly separable.
If you’re wondering what linear separability is, the Python Machine Learning book we recently reviewed has a nice graphic illustrating it. Assuming we know that the data is generated in two groups, if the data is linearly separated, we can easily separate the data into small groups with a line as shown below. However, if the data is nonlinear, a more complex polynomial function may be required to partition the data. Since a classical PCA simply computes the PC as a linear combination of the underlying structure of the data, a classical PCA is unable to separate the nonlinear data.
Linear vs. nonlinear problem
What happens when you apply normal PCA to a data set that is not linearly separable? And how can we handle such a large data set? In this post, we will explore these problems using a script with examples.
Let’s start by downloading all the necessary packages to illustrate the use of Spring PCA. We first use the sklearn dataset module to create nonlinear datasets. And then we load two modules that will be useful for doing a regular PSA and a Spring Sklearn PSA.
from sklearn.datasets import make_circles
from sklearn.decomposition import PCA, KernelPCA
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
To create nonlinear data, we will use make_circles() to create circular data of two groups. Here we generate 200 colors data from two groups, one group has a circular structure and the other group contains random numbers concentrated in the center of the circle. The function make_circles() returns the data and group assignment for each observation.
# Create discontinuous linear data
X, y = make_circles(n_samples=200, random_state=1, noise=0.1, factor=0.1)
We will store the data in the Pandas data frame with a group assignment variable.
We can use the Seaborn scattering function to visualize the nonlinearity of the data.
As expected, in this example we see that we have data from two groups with a distinct nonlinear pattern.
Non-linear circular data for PCA spring
PCA for non-linear data
Now let’s apply normal PCA to this untrained data and see what the computers look like. We use Skliren’s PCA function to perform a PCA.
Size reduction with core PCA using scientifically trained material
We now use the same data, but this time we apply Spring PCA using the kernalPCA() function in sklearn. The basic idea of spring automata is that we use the spring function to project nonlinear data into a higher space where the groups are linearly separated. And then use normal PCA to reduce the dimensionality.
This function uses the KernelPCA() function with the rbf kernel function to perform a kernel PCA.
James Gordon is a content manager for the website Feedbuzzard. He loves spending time in nature, and his favorite pastime is watching dogs play. He also enjoys watching sunsets, as the colors are always so soothing to him. James loves learning about new technology, and he is excited to be working on a website that covers this topic.