Decision Trees are the most useful Microsoft data mining technique: they are easy to use, simple to interpret, and they work fast, even on very large data sets. In essence, a decision tree is just a tree of nodes. Each node represents a logical decision, which you can just think of as a choice of a value of one of your inputs that would make the most profound diﬀerence to the output that you wish to study. Once you try a decision tree a few times, you will realise how easy, and useful they are to help you understand any sets of data. This almost 2-hour, in-depth video by Rafal starts with an explanation of the three key uses of decision trees, which are: data classiﬁcation, regression, and associative analysis, and then takes you on a comprehensive tour of this data mining algorithm, covering it in slides and detailed, hi-def demos. As this is a large module, make sure to use the “Jump to chapter” links, in the right-hand column of the page.

You can create a decision tree in several ways. It is simplest to start in Excel, using the Classify button on the Data Mining ribbon, as shown in the ﬁrst demo, in which you can see how to classify customers in terms of their long-term loyalty to a retailer, as measured by the number of life-time purchases. It is, however, more convenient to use SQL Server Data Tools (SSDT) to work with your decision trees on an ongoing basis, especially if you plan to change parameters, or you want to experiment with diﬀerent content types, for example changing from discrete to continuous data, and so on. Rafal shows you the just-introduced version of this tool, now based on the shell of Visual Studio 2012.

Microsoft Decision Trees behave as three related, but signiﬁcantly diﬀerent techniques. The simplest, flattened-data, that is a case-level decision tree, is the one that you might use most often. A more advanced form of the trees uses nested cases to perform associative analysis, which is similar in nature to the Association Rules algorithm. It is used to ﬁnd relationships between case-level attributes and the values of the nested key, as well as relationships between those keys. This technique builds a forest of decision trees, one for each value of the nested key, and then looks for relationships between the nodes of the trees in that forest. For example, you could use this technique to analyse customers and their demographical information (case-level inputs) and the purchases made by those customers (nested cases), as is shown in the extensive demo.

The third form of the trees is known as Regressive Decision Trees and it is used to model continuos data, such as income, proﬁt, or sales, as opposed to discrete, or discretised data—if you are not sure what those terms mean, follow our Data Mining Concepts and Tools tutorial. Regressive trees are based on the well-known statistical concept of regression analysis, which creates a formula to predict an outcome by means of a mathematical function of known, continuous inputs. There is, however, an additional beneﬁt of using a regressive decision tree to a simple regression formula. A tree is capable of including discrete data in a clever way: instead of building one formula, the tree is actually a tree of regression formulas, where each node is formed like in a traditional decision tree, by means of making the best split in the tree, based on the input that provides the most information, or, in other words, that has the largest impact on the predictable outcome. This is, conceptually, related to splines. Our demo briefly shows how to test such a model, before using it, within Excel, to perform a live prediction (scoring) of proﬁt potential for a set of prospective customers. Incidentally, the Microsoft Linear Regression algorithm is simply a Regressive Decision Tree without any children, that is with only one, top-level, root node!

To get the most from Microsoft Decision Trees, you can parametrise them. The COMPLEXITY_PENALTY parameter helps you build a bushier, often easier to understand tree, or a slender, deeper tree, that may be more accurate, yet harder to read, in some cases. SPLIT_METHOD makes it possible to build binary trees, where each node has exactly two children, or complete trees, where each node represents all possible (and meaningful) values. SCORE_METHOD is the most interesting, but perhaps the least useful parameter, as it entirely changes the tree building process by using a diﬀerent formula for deciding when to make a split, that is when to create a new node, and how to select the most meaningful attribute (input column). There are three options that you can use: Entropy, Bayesian with K2 Prior, and Bayesian Dirichlet Equivalent with Uniform Prior (BDE). The entropy technique is the simplest, and it ﬁnds attributes that have the largest chance to make a diﬀerence to the output, but it disregards prior knowledge, already encoded in the higher levels of the tree, therefore it can be somewhat blind to what a person would consider an important hindsight. The remaining two methods use that knowledge, referred to in data mining as priors, but they do it in a slightly diﬀerent way. K2 uses a constant value, while BDE creates a weighted support for each predictable state based on the level in the tree and node support. Our video also explains the remaining parameters, which are more generic in nature: MAXIMUM_INPUT_ATTRIBUTES, MINIMUM_INPUT_ATTRIBUTES, and MINIMUM_SUPPORT.