Tecflix

Microsoft Machine Learning Technologies: View Towards 2020 Purchase the entire course

24 December 2019 · · 1128 views

The Future Series (2019)

New Azure Machine Learning: Performance Metrics

If you would like to learn about what is new—literally just a couple of weeks old!—in Azure Machine Learning, and what is about to come in the next few months and beyond, watch this video. I—Rafal Lukawiecki—present a live, 30-min demo tour of the new platform, comparing it to its predecessor, now called Azure ML Studio Classic. Following the overview of the current Microsoft platform as available at the start of 2020, I discuss the current state and the future of the remainder of the Microsoft platform for machine learning, which includes the powerful ML Server and SQL Server ML Services, the 2019 Big Data Clusters, and a number of machine learning frameworks, notably Automated ML, ML.NET, MicrosoftML, RevoScale, and MLLSpark. I also summarise other, non-Microsoft ML technologies in terms of their current popularity in our industry.

I have specialised in machine learning for over a decade, when I started my first data mining projects for business customers. I have used almost every Microsoft ML technology since 2008, and plenty of non-Microsoft ones, like RapidMiner and WEKA, and, of course, a lot of R and a smattering of Python-based ones in the last few years. I have used Azure ML (Classic) extensively, mainly for prototyping and starting new projects, as well as teaching. I am pleased that after several false starts Microsoft have, at last, provided us with a relatively easy to use, powerful, and reasonably priced machine learning environment in the shape of the new Azure ML. To be honest, I have almost given up on it, as some of the earlier iterations felt like you had to plumb your way all the way to the moon while figuring out how to forge hot iron! Azure ML was getting so much in your way of doing real-world projects that I kept giving it wide berth.

To my pleasant surprise, the current iteration actually makes sense, and thankfully, gets out of the way of doing data science. Everything is provisioned for you, no more figuring out what obscure, undocumented Kubernetes parameters will make the difference between drowning in a rabbit hole or winning the bingo. You can forget about the plumbing, until you need it. You might be thinking this is because Microsoft, at last, added the visual UI, now called the Designer—this is name number three I think, and it may change again—and I can hear some of you thinking “maaaybe this guy does not like to write code”. Well, thank you, but having written some half a million lines of code—which is not that many, actually—I know when I want to code and when I would rather get started visually, by dragging stuff around.

When is the visual designer appropriate, then? Often. When in front of the customer, when prototyping or designing models. When working with higher abstractions, or users who know what they need, like your business sponsors, but who are allergic to code. Or when I am lazy. Or when I just want the pretty visual thing. So, is the new designer great? Not yet. It is getting there, it almost has what the old Azure ML Studio Classic had, and it will get there. Let me share some bad and good news with you now.

The bad is that it is pretty slow. I realise that Microsoft have not optimised it yet, it seems like there is no caching of anything—and I remember Classic being like that some 5 years ago when Roger Barga, who managed that team at Microsoft at the time, showed it to me in an early preview. Over time, AML Studio Classic got very snappy. For a small-to-medium data set you click a button and you have a decision tree and a simple neural network in 2–5 minutes. Need to re-run it? No probs, the Classic version of Azure ML will give you the goods in 54 seconds or so. The new AML Designer will take…22 minutes. Maybe 12 if you have just run it before. Why is it so? I am not sure, but I suppose shuffling all of those Docker containers is a bit like trying to load and unload a packed ship in a terminal where everybody is queuing and watching. I am pretty sure Microsoft will sort it out—maybe they need some pre-provisioned, Azure ML-optimised compute for us at the ready when we press Run?

Now the good news. First of all, the bad news isn’t so bad if you have a lot of data, as a 20 minute overhead on a job that takes hours now but used to take days is good news. But I have even better, real good news: the new AML Studio architecture allows you to mix the work you are doing in the visual Designer with anything you prefer to code, and even with the models that come from Automated ML. No matter how you build your models, all those approaches are now 1st class citizens of the new Azure ML. It was impossible in the old Azure ML Classic: there was the GUI, you could stick some R or Python inside it, but there was no way to access the models or any of the plumbing by code, except by calling the web service. Some lever PS scripts tried to reverse-engineer it, but it was a hack at best. The old Azure ML Classic simply lacked programmability and scriptability. It was impossible to manipulate or extract the models. Managing deployed services was a pain too, even though creating new ones was a breeze.

Of course, you, or someone you work with, will code a lot even if you use the visual designer. Most of my work is not deep learning, but normal, business-focused data science—which, by the way, I expect to continue and grow in popularity for all of us (see my previous video about the trends in ML and DS). I use a lot of statistics and plenty of traditional machine learning/statistical learning like Decision Trees, Logistics Regressions or all sorts of approaches for anomaly detection, or for time series. I use a lot of R, because that is where most of the cutting-edge development happens in this discipline. I am glad that I can run all of that code in the new Azure ML while being able to deploy and operationalise it as the new web services, now called Realtime Endpoints.

Another two interesting features of the new Azure ML, which I show in some detail in the 30-min demo, is the way you can use your compute resources from within the new Azure ML Studio. The obvious compute resources are the Training and Inference (Deployment) Compute clusters. If you need something bigger at your disposal, like hundreds of compute cores, it is easy to allocate them to your training Runs.

On the subject of VMs to use, consider using Low Priority VMs for your training clusters, they are about 80% cheaper than the regular ones—oh, Microsoft are in the process of renaming those as Spot pricing, matching what Amazon Web Services AWS have called them for a hundred years. The downside is that they can disappear while you are using them, but that is rarely an issue for ML training. Besides, new ones will re-spawn, as long as they are available in the region. I suspect at some stage Microsoft will copy AWS way of designing a fleet of VMs that contain dedicate and spot instances to have a balance of price and reliability, which would be useful for Inference Compute. By the way, make sure to request an increase in the core quota for your favourite machines, as the default of 24 is a bit skimpy. I got my requests approved within 2 minutes—thank you, Microsoft, for making it easier for me to spend money with you!

The other compute type are the very new Compute Instances (CI). They will replace Notebook VMs in all regions, eventually, but for now you can only find them in North Central US and UK South regions. I really like that idea a lot: those machines are my development environment, running the new JupyterLab, or the older Jupyter, and of course, my favourite, RStudio and RStudio Server, which I use all the time, and which I teach in my courses. CIs also support VS Code and you can ssh into them to kill an errant process etc. Best of all, storage for your files is mounted to your CI so you can share it with other CIs and with your Pipeline Drafts—the new name for the models or programs that you build in code or via the Designer. I show much more about the new Azure ML in the video—check out how I build a very simple model and examine its accuracy metrics.

My demo also shows the new-ish Automated ML. Any automated machine learning is limited, but it does save a good bit of time at the early stages of your projects, or if you don’t know your data yet. Its principle is simple: throw everything you can at the data and see what sticks to the wall. More seriously, it will try different combinations of algorithms and some automated data preparation steps, and then uses your chosen performance metric—please do not use Accuracy as shown in the video but something better like Precision-Recall or F1 score—to tell you which model is the “best” one. Once you are in the right ball park, you can start feature engineering and manual hyperparameter tuning to get a great model, that is one which is not necessarily very accurate, but which is sufficiently reliable, meaning that it will work on real-world data—subjects my students learn extensively in my courses.

Towards the end of this video I share a few observations about the likely future of other Microsoft technologies, notably Microsoft ML Server and ML Services inside SQL Server. I am a fan of this SQL+ML combination, but it has been neglected by Microsoft recently, and I am not sure there will be much innovation going that way. Why? I think Microsoft are coalescing their frameworks around Apache Spark, sold as Azure Databricks and SQL Server 2019 Big Data Clusters, as their preferred parallelisation engine, and less so using their older, proprietary methods. However, there is a lot of great stuff in SQL Server ML today! The  PREDICT T-SQL statement is amazing, and with sp_rxPredict you can get nanosecond-scale scoring inside the database. My customers like that approach a lot. No web services need to be killed in the process, it is reliable, simple, almost old-tech, except very relevant to real-world applications. Do you think Netflix calculate recommendations when you click on something? No way. They have a batch that precalculates all predictions for the 160 million users on a 4-hour rolling cycle, straight into database tables. I hope that the various MLs will not forget about this common use case.

I am clearly getting excited about Azure ML again, and I am glad that all of this work is taking place. Sure, the bugs need to go, cryptic error messages need to clarify, and it must get snappier and faster. Give me all of that and I can see myself living under the new Azure ML tent for a while to come.

I have started developing a new online, video course focused on the new Azure ML, which I hope to launch in the spring of 2020. I will be also teaching it in my live classes. Stay in touch by making sure you subscribe to my newsletter—register, if you have not, or make sure to tick the tick box at the bottom of your My Account/Edit page to get it.

Welcome to the era of machine learning for the real world, and please enjoy doing good data science!

Rafal

Log in or purchase access to play the video.

Purchase a Full Access Subscription

 
Individual Subscription

$480/year

Access all content on this site for 1 year.
Purchase
Group Purchase

from $480/year

For small business & enterprise.
Group Purchase
 
  • You can also redeem a prepaid code.
  • Payments are instant and you will receive a tax invoice straight away.
  • We offer sales quotes/pro-forma invoices, and we accept purchase orders and bank transfers.
  • Your satisfaction is paramount: we offer a no-quibble refund guarantee.
  • See pricing FAQ for more detail.
In collaboration with
Project Botticelli logo Oxford Computer Training logo SQLBI logo Prodata logo