The problem nowadays is that most datasets have a large number of variables. In other words, they have a high number of dimensions along which the data is distributed. Visually exploring the data can then become challenging and most of the time even practically impossible to do manually. However, such visual exploration is incredibly important in any data-related problem. Therefore it is key to understand how to visualise high dimensional datasets. This can be achieved using techniques known as dimensionality reduction. PCA is a technique for reducing the number of dimensions in a dataset while retaining most information. It is using the correlation between some dimensions and tries to provide a minimum number of variables that keeps the maximum amount of variation or information about how the original data is distributed.
“t-Distributed stochastic neighbor embedding (t-SNE) minimizes the divergence between two distributions: a distribution that measures pairwise similarities of the input objects and a distribution that measures pairwise similarities of the corresponding low-dimensional points in the embedding”.
One drawback of PCA is that it is a linear projection, meaning it can’t capture non-linear dependencies. For instance, PCA would not be able to “unroll” the following structure. This is because a linear projection is basically like casting a shadow. There is no direction from which we can look at this Swiss roll that would allow us to open it up. This is the motivation behind t-SNE. Unlike PCA, t-SNE is not limited to linear projections, which makes it suited to all sorts of datasets. we apply these technique on Breast Cancer Wisconsin (Diagnostic) dataset, to detect whether the cancer is benign or malignant .By using this above technique on breast cancer data we get result as t-SNE gives 8 – 16 % more accuracy than PCA.That result presented by Rhishikesh Kadam and Vanita Mhaske in National Conference on “Recent Advances in Statistics And Statistical Practice” held at Department of statistics ,Sardar Patel University in feb 2020.