Decision Trees in Depth Purchase the entire course

29 March 2013 · · 4872 views

Classification, tree and linear regression, and associative analysis

Decision Trees are the most useful Microsoft data mining technique: they are easy to use, simple to interpret, and they work fast, even on very large data sets. In essence, a decision tree is just a tree of nodes. Each node represents a logical decision, which you can just think of as a choice of a value of one of your inputs that would make the most profound difference to the output that you wish to study. Once you try a decision tree a few times, you will realise how easy, and useful they are to help you understand any sets of data. This almost 2-hour, in-depth video by Rafal starts with an explanation of the three key uses of decision trees, which are: data classification, regression, and associative analysis, and then takes you on a comprehensive tour of this data mining algorithm, covering it in slides and detailed, hi-def demos. As this is a large module, make sure to use the “Jump to chapter” links, in the right-hand column of the page.

You can create a decision tree in several ways. It is simplest to start in Excel, using the Classify button on the Data Mining ribbon, as shown in the first demo, in which you can see how to classify customers in terms of their long-term loyalty to a retailer, as measured by the number of life-time purchases. It is, however, more convenient to use SQL Server Data Tools (SSDT) to work with your decision trees on an ongoing basis, especially if you plan to change parameters, or you want to experiment with different content types, for example changing from discrete to continuous data, and so on. Rafal shows you the just-introduced version of this tool, now based on the shell of Visual Studio 2012.

Microsoft Decision Trees behave as three related, but significantly different techniques. The simplest, flattened-data, that is a case-level decision tree, is the one that you might use most often. A more advanced form of the trees uses nested cases to perform associative analysis, which is similar in nature to the Association Rules algorithm. It is used to find relationships between case-level attributes and the values of the nested key, as well as relationships between those keys. This technique builds a forest of decision trees, one for each value of the nested key, and then looks for relationships between the nodes of the trees in that forest. For example, you could use this technique to analyse customers and their demographical information (case-level inputs) and the purchases made by those customers (nested cases), as is shown in the extensive demo.

The third form of the trees is known as Regressive Decision Trees and it is used to model continuos data, such as income, profit, or sales, as opposed to discrete, or discretised data—if you are not sure what those terms mean, follow our Data Mining Concepts and Tools tutorial. Regressive trees are based on the well-known statistical concept of regression analysis, which creates a formula to predict an outcome by means of a mathematical function of known, continuous inputs. There is, however, an additional benefit of using a regressive decision tree to a simple regression formula. A tree is capable of including discrete data in a clever way: instead of building one formula, the tree is actually a tree of regression formulas, where each node is formed like in a traditional decision tree, by means of making the best split in the tree, based on the input that provides the most information, or, in other words, that has the largest impact on the predictable outcome. This is, conceptually, related to splines. Our demo briefly shows how to test such a model, before using it, within Excel, to perform a live prediction (scoring) of profit potential for a set of prospective customers. Incidentally, the Microsoft Linear Regression algorithm is simply a Regressive Decision Tree without any children, that is with only one, top-level, root node!

To get the most from Microsoft Decision Trees, you can parametrise them. The COMPLEXITY_PENALTY parameter helps you build a bushier, often easier to understand tree, or a slender, deeper tree, that may be more accurate, yet harder to read, in some cases. SPLIT_METHOD makes it possible to build binary trees, where each node has exactly two children, or complete trees, where each node represents all possible (and meaningful) values. SCORE_METHOD is the most interesting, but perhaps the least useful parameter, as it entirely changes the tree building process by using a different formula for deciding when to make a split, that is when to create a new node, and how to select the most meaningful attribute (input column). There are three options that you can use: Entropy, Bayesian with K2 Prior, and Bayesian Dirichlet Equivalent with Uniform Prior (BDE). The entropy technique is the simplest, and it finds attributes that have the largest chance to make a difference to the output, but it disregards prior knowledge, already encoded in the higher levels of the tree, therefore it can be somewhat blind to what a person would consider an important hindsight. The remaining two methods use that knowledge, referred to in data mining as priors, but they do it in a slightly different way. K2 uses a constant value, while BDE creates a weighted support for each predictable state based on the level in the tree and node support. Our video also explains the remaining parameters, which are more generic in nature: MAXIMUM_INPUT_ATTRIBUTES, MINIMUM_INPUT_ATTRIBUTES, and MINIMUM_SUPPORT.

Log in or purchase access to play the video.

  • Introduction to Data Mining with Microsoft SQL Server 24-min Watch with Free Subscription

  • Data Mining Concepts and Tools 50-min

  • Data Mining Model Building, Testing and Predicting with Microsoft SQL Server and Excel 1-hour 20-min

  • What Are Decision Trees? 10-min Free—Watch Now

  • Decision Trees in Depth 1-hour 54-min

  • Why Cluster and Segment Data? 9-min Watch with Free Subscription

  • Clustering in Depth 1-hour 50-min

  • What is Market Basket Analysis? 10-min Watch with Free Subscription

  • Association Rules in Depth 1-hour 35-min

  • HappyCars Sample Data Set for Learning Data Mining

  • Additional Code and Data Samples (R, ML Services, SSAS) Get with Free Subscription

Purchase a Full Access Subscription

Individual Subscription


Access all content on this site for 1 year.
Group Purchase

from $480/year

For small business & enterprise.
Group Purchase
  • You can also redeem a prepaid code.
  • Payments are instant and you will receive a tax invoice straight away.
  • We offer sales quotes/pro-forma invoices, and we accept purchase orders and bank transfers.
  • Your satisfaction is paramount: we offer a no-quibble refund guarantee.
  • See pricing FAQ for more detail.
In collaboration with
Project Botticelli logo Oxford Computer Training logo SQLBI logo Prodata logo