This document is not only an introduction to Clustering, but it also contains details on the Clustering Algorithms that Datagran offers, as well as the output tables. Hope you enjoy it.

Unsupervised Techniques

1. No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning).

Unlabeled data: We have the input features X, but we do not have the labels Y.

The goal in such unsupervised learning problems may be to discover groups of similar examples within the data or to determine how the data is distributed in the space.

Clustering

Clustering can be considered the most important unsupervised learning problem; so, as with every other problem of this kind, it deals with finding a structure in a collection of unlabeled data.

It is the task of identifying similar instances and assigning them to clusters, or groups of similar instances.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/7762de39-3e97-4e32-9c23-70b945a09b6b/unnamed_(4).png

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/3656be9f-c78c-482f-8a52-9adea0f71264/unnamed_(5).png

Clustering Applications

Customer Segmentation

You can cluster your customers based on purchases and activity in your website.

This is useful to understand who your customers are and what they need.

Semi-supervised learning

If you only have a few labels, you could perform clustering and propagate the labels to all instances in the same cluster. This is useful to increase the number of labels available for subsequent supervised algorithms and thus improve its performance.

Anomaly Detection

Any instance that has low affinity to all clusters is likely to be an anomaly. If you have clustered the users of your website based on their behavior, you can detect users with unusual behavior, such as an unusual number of request per second. This is useful in detecting defects in manufacturing, or fraud detection.