This Is Your SQL Server on Machine Learning

This Is Your SQL Server on Machine Learning

delta bravo, sql server machine learning, database machine learning, ai

Applying Machine Learning models to database management turns the old paradigms upside down. Folks of a certain age remember the old “this is your brain on drugs” commercials from the 80s. For this post, we are going to borrow from this analogy to observe your SQL Server on Machine Learning.

What Is the Benefit of Applying Machine Learning Models to SQL Server?

Machine Learning enables you to:

  • Predict performance trends, capacity and potential security and/or compliance breaches
  • Correlate system spikes and/or anomalous behavior to specific events, actions and code
  • Model all possible fixes and identify the remediation that has the highest likelihood for success

The Power of Influence

It all starts with understanding what factors within the database itself influence each other. This varies with each use case and is influenced by business requirements, maintenance patterns and available system resources. Basically, databases are like people. Would you expect your doctor to prescribe the same medication for three random people just because they share the characteristic of being human?

Delta Bravo’s machine learning algorithms track the relationships between critical performance metrics for each SQL Server database. Here is a heatmap that shows, for this particular database, what metrics influence each other the most. High influence is reflected by a positive number and dark red tones, no influence is zero and gray tones. Negative influence is reflected by negative numbers and black tones.

delta bravo, machine learning, AI for the databaseTranslating Models into Action

For the sake of brevity (further detail is available in our whitepaper), we’re going to focus on the following Use case:

  • Identify a problematic system trend that has NOT reached a threshold*/been alerted on
  • Quantify the trend and verify that trend is going to continue into the future
  • Associate the trend with a specific event, measure impact of event
  • Identify root cause, quantify impact, identify specific action causing impact
  • Provide remediation recommendation

The work you are about to see was performed in 4 clicks (45 seconds) using the Delta Bravo UI. 

Let’s start with a quick view of the Delta Bravo System Health panel for SQL Server Instance DemoSQL-2.

We observe a problematic trend with this SQL Server Instance’s CPU. Is this trend temporary?  Seasonal? Let’s use Predictive Analytics to find out.

We see that the problematic trend is forecasted to continue, growing at a rate of nearly 90% over the next 14 days. However, our system thresholds* have not been hit yet. This means the system is acting in an anomalous fashion. Let’s identify the specific anomalies that are influencing this CPU trend.

delta bravo, predictive analytics for database, SQL Server

In the graphs above, the gray shadow is a machine learning algorithm that represents the “acceptable range” or baseline for system behavior associated with that metric. We see that, while no thresholds have been reached for these metrics, behavior is outside the scope of the baselined “norm.” Why?

By selecting one of the graphs, we’re able to zoom in for more detail. The Blue lines represent specific Events that influenced the rise in that metric.

delta bravo, SQL Server, machine learning

By selecting the line prior to the large red spike, we see that an Object was altered. This procedure impacted Query behavior adversely. We are able to see the code that was used to alter the Object, as well as the quantified impact this change had on Query performance.

delta bravo, machine learning, AI for the database

Using AI to Recommend and Implement a Fix

From here, the AI runs through a series of possible fixes and identifies which ones will have the highest likelihood of success and prioritizes their impact. In this case, the recommended fix is adding a series of Indexes.

delta bravo, database AI, SQL Server machine learning

Similar workflows are applied to Security, Capacity planning and other aspects of database management. We believe the use case is changing; its no longer about monitoring, daily care and feeding. Using Machine Learning and AI to manage large database deployments helps your best people scale where you need them most, and for your systems to run at peak efficiency and performance.

*Delta Bravo has the ability to set thresholds, but we feel this is a dated and reactive way to monitor/manage system behavior.

Delta Bravo Database Security: How It Works

Delta Bravo Database Security: How It Works

Delta Bravo Database Security features an instant Security Analysis of all databases connected to the system. Within 2 minutes of launching Delta Bravo, users can understand how their current database security levels stack up to standards ranging from PCI and HIPAA all the way up to the US Department of Defense STIG standards.

Delta Bravo instantly provides a breakdown of the security rule, scripts to validate that condition in your environment and scripts to fix it.

 

Delta Bravo Database Security is not doing a full IT stack compliance check- our scans are specific to the database we are connected to.  We are only indicating topics which MAY be out of compliance specific to SQL, MySQL and PostgreSQL depending on the type of data which is stored in the databases.

SOX

The Sarbanes-Oxley (SOX) Act of 2002 is intended to be a revision of federal securities laws which apply to publicly traded companies. Its stated goal is “To protect investors by improving the accuracy and reliability of corporate disclosures made pursuant to the security laws, and for other purposes”. In short, it makes the companies and their leadership responsible for accurate financial reporting, much of which depends on reliable and secure information systems.

Specific to SQL server, Delta Bravo scans and monitor for the following potential SOX compliance issues:

  1. Access and Authentication: Only people who are authorized to use the system can access it.
  2. Monitoring: The capture of events such as authentication attempts, system and account changes, and backup status.
  3. Data Integrity: Being sure that data is not being illegally modified and is being backed up, archived or retained to preserve its integrity.

HIPAA

The Health Insurance Portability and Accountability Act of 1996 establishes a set of national standards for protecting certain individual health information. The primary goal is to ensure that individual’s health information is properly protected while allowing certain information to be securely shared for the promotion of high quality health care and to protect the public’s health and wellbeing. It covers:

  1. Health plans
  2. Health Care Clearinghouses
  3. Healthcare providers who conduct certain financial and administrative transactions electronically.

In order to meet HIPAA standards, the organization must constantly audit and report all access attempts and events related to the databases which contain sensitive Protected Health Information (PHI) records.

Delta Bravo scans and monitor for the following potential HIPAA compliance issues:

  1. Access and Authentication: Only people who are authorized to use the system can access it.
  2. Monitoring: The capture of events such as authentication attempts, system and account changes, and backup status.
  3. Data Integrity: Being sure that data is not being illegally modified at rest or in transit and is being backed up, archived or retained to preserve its integrity.

PCI

Originally released in 2004, the Payment Card Industry Data Security Standard (PCI DSS) applies to all entities involved in payment card processing who store, process or transmit cardholder data or sensitive authentication data. It is intended to minimize the risk of storing credit card data and is overseen by the Payment Card Industry Security Standards Council which is made up of representatives from most major credit card providers.

PCI DSS is made up of twelve security requirements which encompass the entire network. Specific to SQL server, Delta Bravo scans and monitor for the following potential PCI compliance issues:

  1. SQL default usernames and passwords
  2. Protection of cardholder data at rest
  3. Encrypted transmission of cardholder data
  4. Overall security of the system
  5. Restriction of access to cardholder data by business need to know
  6. Authentication access to the system
  7. Monitoring and recording of network access to cardholder data

Delta Bravo Database Security Summary

Delta Bravo Database Security features add instant value for administrators, line of business stakeholders and executives. Within hours, companies can significantly strengthen their security posture at the data tier.

While database security is more important than ever, it’s still an overlooked part of day-to-day administration.  Security does not ship in the box and each application is unique in its SQL Server security requirements.  Developers need to understand which combination of features and functionality are most appropriate to counter known threats, and to anticipate threats that may arise in the future.

Delta Bravo Performance Counters:   SQL Server Target vs Total Memory

Delta Bravo Performance Counters: SQL Server Target vs Total Memory

Why is this an important SQL Server Performance Indicator?

Delta Bravo uses this counter to assess the degree of memory pressure the system is under.  High memory pressure is a cost driver, necessitating additional resources before user experience is impacted.

Target Server Memory (KB) is the amount of memory that SQL Server is willing (potential) to allocate to the buffer pool under its current load. Total Server Memory (KB) is what SQL currently has allocated.  The Total Server Memory is the current amount of memory currently assigned to SQL Server. Upon staring SQL Server its total memory will be low and it will grow throughout the warm-up period while SQL Server is bringing pages into its buffer pool and until it reaches a steady state. Once the steady state is reached, the Total Server Memory measurement should not decrease importantly as that would indicate that SQL Server is being forced to dynamically deallocate its memory due to system-level memory pressure.

delta bravo

If this counter is still growing the server has not yet reached its steady-state, and it is still trying to populate the cache and get pages loaded into memory.  Performance will likely be somewhat slower during this time since more disk I/O is required at this stage.  This behavior is normal.  Eventually Total Server Memory should approximate Target Server Memory, keeping a ratio close to 1.

If the Total Server Memory value is significantly lower than the Target Server Memory value during normal SQL Server operation, it can mean that there’s memory pressure on the server so SQL Server cannot get as much memory as needed, or that the Maximum server memory option is set too low.

So when do I need to add more memory?

If Total Server Memory is less than Target Server Memory it can be a sign of memory pressure, but before going to the business asking for more money for more memory, evaluate some other counters to validate SQL is in memory contention.

Start with Page Life Expectancy, which should be well above the 300.  This tells you how long pages are staying in the buffer pool, and a value of 300 equates to 5 minutes.  If you have 120GB of buffer pool and it is churning over 5 minutes, that equates to 409.6 MB/sec sustained disk I/O for the system which is a lot of disk activity to have to sustain.

Examine Lazy Writes/sec, which tells you that number of times the buffer pool flushed dirty pages to disk outside of the CHECKPOINT process.  This should be near zero.  Also review Free Pages/sec and Free List Stalls/sec.  You don’t want to see Free Pages bottom out which will result in a Free List Stall while the buffer pool has to free pages for usage. Lastly, look at Memory Grants Pending which will tell you if you have processes waiting on workspace memory to execute.

If these supporting counters exhibit excessive behavior, then it may be time to increase memory allocation.

Delta Bravo Performance Counters:   CPU % vs. Process Privileged Time (Total)

Delta Bravo Performance Counters: CPU % vs. Process Privileged Time (Total)

Why is this an important SQL Server Performance Indicator?

Delta Bravo uses this metric to determine whether the processor problems originate from internal Windows processes, or are caused by a user application. If Delta Bravo identifies high CPU usage on a SQL Server instance, the next step is to narrow down the high CPU problem to the lowest possible level–the component which is causing high CPU.

The CPU % vs. Process Privileged Time (Total) counter helps Delta Bravo understand the time and energy the system spends on Windows kernel commands (SQL Server I/O requests). If the CPU % vs. Process Privileged Time value is high, kernel mode processes are using a lot of processor time, the machine is busy executing basic operating system tasks and cannot run user processes and other applications, such as SQL Server. The recommended values for CPU@ vs. Process Privilege Time are 5 to 10%, or maximum 20% of the % Total Processor Time.

delta bravo

Do you know what’s keeping your processor busy?

There are two different states to be aware of when talking about processors executing instructions: Privileged mode and User mode.  Some operating system threads and interrupts (including all device driver functions) as well as Kernel-mode threads execute in privileged mode.

When dealing with Privileged mode operations, there are two modes to consider – Interrupt mode and Deferred Procedure Call (DPC) mode.  Interrupt mode is reserved for interrupt service routines which are device driver functions.  When looking at this in Performance Monitor, % Interrupt Time is the time the processor spends receiving and servicing hardware interrupts during sample intervals.  This value is an indirect indicator of the activity of devices that generate interrupts, such as the system clock, the mouse, disk drivers, data communication lines, network interface cards and other peripheral devices.  These devices normally interrupt the processor when they have completed a task or require attention.  Normal thread execution is suspended during interrupts.  Most system clocks interrupt the processor every 10 milliseconds, creating a background of interrupt activity.  DPC mode is time spent in routines known as deferred procedure calls – these are routines scheduled by device drivers to complete interrupt processing.  DPC’s are often referred to as soft interrupts.  From the Performance Monitor perspective, the % DPC Time counter shows the percentage of the time that the system was executing in DPC mode.  Measuring these counters separately can provide insight into whether there are issues with the interrupt service routine or its DPC.

Taking Action with Delta Bravo

If you noticed that the % Interrupt Time counter is much higher, you may have a problem with a device driver or piece of hardware.  Comparing the Interrupts / sec counter between the baseline and your current performance log, if the current rate is proportional to the level in the baseline, then the device driver code is the most likely culprit.  If the Interrupt rate is significantly higher, you are probably experiencing a hardware issue.