6 crazy things Deep Learning and Topological Data Analysis can do with your data

6 crazy things Deep Learning and Topological Data Analysis can do with your data

Say you have a thousand columns and a million rows in your data set. Whichever way you look at it – small, medium or big data – you won’t be able to actually look at it. Zoom it in or out. Fit it into one screen. Blame human nature but most of us understand a subject better when they get to see a bigger picture. Is there a way to put your data in one image and navigate it almost like you would do with a map?

Deep Learning combined with Topological Data Analysis can do exactly that and more. Here are 6 craziest science stuff this technology can do with your data:

Based on items’ correlation and learned patterns the system places groups of similar items together. This results in a unique representation of your data, which will give you a better insight into your data. Nodes in a visualisation consist of one or many data points while links represent a high lever of similarity between the items.

This is an example of how the algorithm identifies two distinct groups just by analysing users’ activities. A surprising characteristic distinguishes yellow and blue dots: females and males.

If we analyse by the type of activity, one of the groups mostly sends messages (males), another receives them (females).

Segmentation is performed on many levels – from high-level categories to groups with similar data items.

In the example of a Netflix dataset, each data item is a movie. The highest level groups are music, kids, foreign and adult movies. Middle level contains different segments: from Indian and Hong Kong to thriller and horror movies. On the lower level we’ve got a group of TV series such as “Jeeves and Wooster”, “The Office”, “Doctor Who” and others.

Any data can be segmented and understood if it can be presented as a matrix of numbers, where every row is a data item and column is a parameter. These are the most common use cases:

Select a group of items, group them, and the algorithm will find all related or similar items. Repeat this process a few times and a neural network will learn the difference between, for example, texts about Mac hardware, PC hardware and general electronics.

Initial analysis of 20,000 articles on 20 different topics resulted in a dense cloud of points (left image). After applying Deep Learning a few times an algorithm grouped them at an error rate of just 1.2% (right image).

Deep Learning and Autoencoders are mimicking human brain activity and can automatically identify high-level patterns in a dataset. For example in Google Brain project Autoencoders successfully trained themselves to recognise human and cat faces based on 10 million digital images taken from YouTube videos:

I’ve been playing around with topological data analysis and deep learning lately and developed a tool that brings these technologies into one user-friendly interface to help people to see their data and new possibilities it offers. Have a look at the website and let me know if you’d like to create a map of your data.

Images Powered by Shutterstock