After a relentless stream of high profile data breaches, the spotlight is on corporations to improve their security operations. As the recent disclosure from Experian demonstrated, even enterprises with the most sophisticated prevention layers are vulnerable. Most CISOs are now searching for “post-prevention” systems that can provide visibility on active attacks that have defeated their prevention systems.
While nearly all cyber security companies have at least one PhD data scientist on staff and claim to have ‘machine learning’ capabilities, what does that really mean? As a CISO or a SOC manager, what do you need to look for when you hear your team is evaluating a new security tool that uses machine learning? In a vastly under-staffed industry, selecting the proper tool can not only save time, but can be the difference between falling victim to the next major data breach and minimizing the damage as quickly as possible.
Supervised vs. Unsupervised Machine Learning
The first factor to consider is whether the machine learning is based on supervised or unsupervised algorithms. A supervised machine learning algorithm is based on a training set. Usually, providing predefined set of bad data and predefined set of good data to the algorithm as an initial input. Once a new unknown data is provided, the algorithm determines whether it is good or bad. With supervised machine learning, you get a confidence level of how bad the data is, which you can triage or just focus on high confidence level results.
An unsupervised machine learning algorithm is not based on any training set. The algorithm will then try to cluster the incoming data or find anomalies. With unsupervised machine learning, you simply receive indicators, either by clustering suspicious behavior or focusing on an anomaly, which requires additional understanding effort and domain expertise in order to drill down and investigate.
If your team has enough security experts and also data scientists, a security tool focused on unsupervised machine learning is the right choice. Your team will be able to easily drill down the clusters to understand which of the anomalies should be investigated further.
However, if you are short of security experts or don’t have data scientists on your security team, you should choose a security solution which is focused on supervised machine learning. This way, your team will only need to focus on the high confidence level results while reducing their workload.
Cloud-based vs. On premise
Another common, yet ambiguous, term security companies use is “big data.” Big data is not just about a large accumulation of data, it’s also about the kind of data the solution is processing. It’s important to understand that while many companies claim that their products can look at many types of data, most of those tools are actually looking at just one or two types of data. In addition, the dimension of time has a significant impact on the effectiveness of the machine learning algorithm, as well as the amount of data processed. Running a machine learning algorithm over a long period of time (weeks, months or even years), requires a lot of computing power, memory and storage – something that is easy to deliver in a cloud based solution. On premises security solutions are often limited by the data center flexibility and costs. This limits the time scale available to the machine learning algorithms and reduces their efficacy.
Recent breaches statistics shows that the average time to detect and contain an attack is well over 6-months. For example, the Experian breach which was recently publicly disclosed by T-Mobile’s CEO, was detected and remediated on September 2015. The actual breach occurred on September 2013, two years before that. So, if you believe the statistics, or you need the solution to provide the best price-performance, you should choose a security solution that is built from the ground up to utilize the benefits of the cloud.
Companies have varying security needs, but one thing is clear: Adapting to the latest advantages in security are a company’s best bet to fending off cyber attacks. Machine learning is an important tactic, but understanding machine learning well enough to pick a solution that fits a company’s needs is just as critical. There are four primary ways to implement machine learning based security solutions – supervised or unsupervised machine learning algorithms and cloud-based or on premise implementation. Supervised machine learning reduces the workload on the security team. A cloud-based implementation allows for the scale and flexibility necessary for running the machine learning algorithm over a long period of time. Therefore, if you are like most security operations – limited in both human resources (security experts and data scientists) and data center resources – cloud-based supervised machine learning is the way to go.