This Is Your SQL Server on Machine Learning
Applying Machine Learning models to database management turns the old paradigms upside down. Folks of a certain age remember the old “this is your brain on drugs” commercials from the 80s. For this post, we are going to borrow from this analogy to observe your SQL Server on Machine Learning.
What Is the Benefit of Applying Machine Learning Models to SQL Server?
Machine Learning enables you to:
- Predict performance trends, capacity and potential security and/or compliance breaches
- Correlate system spikes and/or anomalous behavior to specific events, actions and code
- Model all possible fixes and identify the remediation that has the highest likelihood for success
The Power of Influence
It all starts with understanding what factors within the database itself influence each other. This varies with each use case and is influenced by business requirements, maintenance patterns and available system resources. Basically, databases are like people. Would you expect your doctor to prescribe the same medication for three random people just because they share the characteristic of being human?
Delta Bravo’s machine learning algorithms track the relationships between critical performance metrics for each SQL Server database. Here is a heatmap that shows, for this particular database, what metrics influence each other the most. High influence is reflected by a positive number and dark red tones, no influence is zero and gray tones. Negative influence is reflected by negative numbers and black tones.
Translating Models into Action
For the sake of brevity (further detail is available in our whitepaper), we’re going to focus on the following Use case:
- Identify a problematic system trend that has NOT reached a threshold*/been alerted on
- Quantify the trend and verify that trend is going to continue into the future
- Associate the trend with a specific event, measure impact of event
- Identify root cause, quantify impact, identify specific action causing impact
- Provide remediation recommendation
The work you are about to see was performed in 4 clicks (45 seconds) using the Delta Bravo UI.
Let’s start with a quick view of the Delta Bravo System Health panel for SQL Server Instance DemoSQL-2.
We observe a problematic trend with this SQL Server Instance’s CPU. Is this trend temporary? Seasonal? Let’s use Predictive Analytics to find out.
We see that the problematic trend is forecasted to continue, growing at a rate of nearly 90% over the next 14 days. However, our system thresholds* have not been hit yet. This means the system is acting in an anomalous fashion. Let’s identify the specific anomalies that are influencing this CPU trend.
In the graphs above, the gray shadow is a machine learning algorithm that represents the “acceptable range” or baseline for system behavior associated with that metric. We see that, while no thresholds have been reached for these metrics, behavior is outside the scope of the baselined “norm.” Why?
By selecting one of the graphs, we’re able to zoom in for more detail. The Blue lines represent specific Events that influenced the rise in that metric.
By selecting the line prior to the large red spike, we see that an Object was altered. This procedure impacted Query behavior adversely. We are able to see the code that was used to alter the Object, as well as the quantified impact this change had on Query performance.
Using AI to Recommend and Implement a Fix
From here, the AI runs through a series of possible fixes and identifies which ones will have the highest likelihood of success and prioritizes their impact. In this case, the recommended fix is adding a series of Indexes.
Similar workflows are applied to Security, Capacity planning and other aspects of database management. We believe the use case is changing; its no longer about monitoring, daily care and feeding. Using Machine Learning and AI to manage large database deployments helps your best people scale where you need them most, and for your systems to run at peak efficiency and performance.
*Delta Bravo has the ability to set thresholds, but we feel this is a dated and reactive way to monitor/manage system behavior.