Clustering is a popular data mining technique, often used for segmentation, and for outlier (exception) detection. In this short, free video, Rafal introduces these concepts, focusing on the reasons why it is useful to use clustering to find non-traditional segments. Log-in or get a free account to watch it!
For example, you may be used to seeing customer sales segmented by a geographical region. That is a common way to discuss and compare financial results at company meetings. Unfortunately, such a way of looking at the sales might not be showing you what is really happening with your sales in a way that could help your company improve its performance. What if there were a completely different way to segment your customers, that would show major, yet otherwise unknown, differences between sales?
You can find new ways to cluster data using SQL Server Data Mining. We will explain this process in the next, full-length video in this series. Once you have found your clusters, you need to analyse them to understand them, and so that you can give them meaningful names. Then, if your clustering model works, and you have tested it, you will be able to apply it to any similar data to automatically categorise it. In the demo you can see how Excel data is categorised by using a clustering model—all you need to do is to use the Query button from the free Microsoft Data Mining Add-ins for Office. Excel queries the model (which runs in SQL Server Analysis Services) and asks it to predict the names of the clusters to which each row in your sheet should belong to. This is a very fast process, very useful, and it is also an important step in getting to know your clusters, as it is a good idea to apply a model to different sets of data to verify that the cluster names, which you have given to them, make sense. Indeed, spot the comment in Rafal’s video which shows that an even better name should have been applied to one of the shown clusters!
If you are interested in clustering, make sure to watch the 1-hour 50-minute, in-depth module Clustering in Depth, and please also review the remaining modules in this online course, starting with the Introduction to Data Mining.