Machine Learning for IT Security: From ML to Security AI Purchase the entire course

4 October 2019 · · 320 views

The Future Series (2019)

Security Artificial Intelligence

This video explains how to get started with machine learning for IT security purposes. Using ML for security can offer valuable new opportunities, leading to better, more thorough, proactive security, with fewer breaches. You can thank your existing, and some new, security-related data, such as logs, that we rarely rely on in a proactive manner. Logs are usually analysed to find what happened, rather than to predict what is going to happen. Further, their size, messiness, and general inability to draw valid conclusions makes them hard to work with using traditional approaches.

Interestingly, this is exactly where machine learning can help. ML is good at working with large amounts of potentially confusing data. Above all, you can draw conclusions about the safety risk of new, previously unseen situations using your ML models—assuming you have followed the foundations of any good data science process and you have validated your models thoroughly to a statistical level of significance, and you have tested them before deployment.

As with all machine learning and data science projects, you need to prepare your data carefully, structuring it into the common, flat table, row-per-event (case) format, where columns represent attributes that you can legally, ethically, and morally, analyse. If you are going to follow the supervised approach, you will need one additional column, the predictable target, also known as the label. Most likely you will build a classification model using an algorithm such as decision trees, logistics regression, or a neural network. This classifier will enable you to predict if an event, or a security credential, or person, or an application etc, that is currently performing some action, is threatening enough to warrant an intervention. This label column must exist before you build your model, and it needs to contain a known fact, ideally a Boolean of some sort, denoting if the known event was or was not a security issue.

What if you do not have such data? Or what if you are still at an earlier stage and you are merely doing some detective work to find out if there were any security issues that your data has captured? The unsupervised approach can help. Your data needs to be prepared in an identical way to the supervised approach: one row per case, many columns for each analysable attribute. However, you will not have, or will not need to use, the label (predictable) column. In this approach you will use such algorithms as clustering or association rules to find natural groupings of cases, looking for smaller, anomalous clusters, or outliers than belong to no cluster, or links, that connect one suspicious event to another. There are many other ways how you can use combinations of algorithms to build your security model using machine learning.

Sooner or later, when you try to use your model in production, you will face a major issue: false positives (FPs). Simply speaking, your system will make mistakes and it will class benign events as threats! Dealing with them in a just, legal, ethical and a common sense manner is utterly important, but not easy. Otherwise you will cause damage, upset, even expose your organisation to legal expenses or reputational losses, perhaps even straying into morally reprehensible territory—denying people a service they deserve simply because a computer thinks there is something wrong with them is unacceptable.

Having a process for dealing with false positives is fundamentally important, but so is the tuning of your model to balance their occurrence against its ability not to miss threats, measured as false negatives (FN). My courses teach a great deal about this, there are many ways to find a balance between those two opposing goals (FP vs FN) that is good for everyone involved, whilst being cost-effective. By the way, that is, essentially, practical machine learning, as applied to real-world projects and situations, rather than treated as a mathematical optimisation problem.

If all of this works well, you may want to take a step into the future of Artificial Intelligence for Security, aka Security AI. To do that, you need to introduce two interesting but risky automations. First of all, your machine learning model needs to be updated regularly, taking in new data (new logs etc), but also the results of its own (ie. model’s) actions. In other words, when we know that your model is making good or wrong decisions, that very fact becomes additional, important data, that is continually used to update the model. This is something we have done for decades, but more recently it became popular under the, trendy name of reinforcement learning. Secondly—and very carefully—you may want to automate the security decisions that your model makes. Yes, that means the machine will decide to proactively deny access, when it predicts a sufficiently high level of security risk.

Those two steps: autonomous security decision actions and automatic model updates using new data and knowledge of its own mistakes is what turns your ML model into a Security AI system, one that learns and improves its chance of success, automatically.

With automatic Security AI it is very important to continually validate your models, because they will deteriorate if left to their own. Let me stress again that you also need a wonderful, people-friendly process for dealing with false positives. I have customers who have built that, for example for fraud detection, which is a form of security AI. In their case, I am happy to say that all autonomous actions of the system are vetted by a human controller, ensuring no “computer says no” annoyances, not to mention unethical actions. I wish everyone did it that way.

Log in or purchase access to play the video.

Purchase a Full Access Subscription

Individual Subscription


Access all content on this site for 1 year.
Group Purchase

from $480/year

For small business & enterprise.
Group Purchase
  • You can also redeem a prepaid code.
  • Payments are instant and you will receive a tax invoice straight away.
  • We offer sales quotes/pro-forma invoices, and we accept purchase orders and bank transfers.
  • Your satisfaction is paramount: we offer a no-quibble refund guarantee.
  • See pricing FAQ for more detail.
In collaboration with
Project Botticelli logo Oxford Computer Training logo SQLBI logo Prodata logo