Additional Code and Data Samples (R, ML Services, SSAS) Get Free Access Purchase the entire course

23 December 2016 · · 1215 views

Classifier performance, mortgage default logistic regression, cross-sell and recommendations

Whilst testing classifier performance, it is helpful to compare a number of classification accuracy visualisations. You will find that there are many ways to do that in R, but SSAS Data Mining and Azure Machine Learning do not all support the same, broad set of diagnostic visualisations. You can easily run this simple R code to validate a classifier built in SSAS, Azure ML, or any another environment, by downloading it from:

It plots ROC, precision-recall, cost and lift curves, calculates optimum probability threshold, prints a confusion matrix, and additional metrics, for any two-class machine learning classifier. The only required inputs are a vector of known outcomes and a vector of predicted probabilities. As a bonus, this code will look up the optimum prediction probability threshold given a ratio of the cost of a False Positive to a False Negative.

These sample data sets have been prepared and used by Rafal on his Practical Machine Learning and Data Science classroom-style courses. They include a working SQL Server 2017/2019 Machine Learning Services mortgage analysis R script and the SQL Server 2016+ .bak file containing the 10 million rows of data for this demo, as well as code showing how to write DMX for predicting recommendations using Association Rules algorithm in SSAS Data Mining, with and without buyer-level demographic details.

Please note that we do not provide any support for the demo files available below, and they are provided as-is. By downloading them you hereby accept the terms of the Apache License 2.0 under which they are being distributed.

  • Classifier performance in R (ROC, lift, cost, precision-recall, prediction probability threshold) or get it from Github
  • SQL Server SSAS Association Rules recommendation/cross-sell predictions in DMX (with and without demographics)
  • Mortgage default logistic regression in Microsoft R for SQL Server ML Services script
  • Mortgages.bak SQL Server 2016+ database backup with 10 million rows (approx 86 MB download). Please note that this data has been derived from a Microsoft-owned demo available on MSDN. Although Microsoft do not provide this data as a SQL Server database, they retain the rights to this data, and so, this file although created by, it is not owned by, Tecflix Ltd or Project Botticelli Ltd.
  • If you are looking for our educational machine learning data set, HappyCars, it is available here


Log in or register for free to access this content.

  • Introduction to Data Mining with Microsoft SQL Server 24-min Watch with Free Subscription

  • Data Mining Concepts and Tools 50-min

  • Data Mining Model Building, Testing and Predicting with Microsoft SQL Server and Excel 1-hour 20-min

  • What Are Decision Trees? 10-min Free—Watch Now

  • Decision Trees in Depth 1-hour 54-min

  • Why Cluster and Segment Data? 9-min Watch with Free Subscription

  • Clustering in Depth 1-hour 50-min

  • What is Market Basket Analysis? 10-min Watch with Free Subscription

  • Association Rules in Depth 1-hour 35-min

  • HappyCars Sample Data Set for Learning Data Mining

  • Additional Code and Data Samples (R, ML Services, SSAS) Get with Free Subscription

Purchase a Full Access Subscription

Individual Subscription


Access all content on this site for 1 year.
Group Purchase

from $480/year

For small business & enterprise.
Group Purchase
  • You can also redeem a prepaid code.
  • Payments are instant and you will receive a tax invoice straight away.
  • We offer sales quotes/pro-forma invoices, and we accept purchase orders and bank transfers.
  • Your satisfaction is paramount: we offer a no-quibble refund guarantee.
  • See pricing FAQ for more detail.
In collaboration with
Project Botticelli logo Oxford Computer Training logo SQLBI logo Prodata logo