Delta Bravo Performance Counters:   SQL Server Target vs Total Memory

Delta Bravo Performance Counters: SQL Server Target vs Total Memory

Why is this an important SQL Server Performance Indicator?

Delta Bravo uses this counter to assess the degree of memory pressure the system is under.  High memory pressure is a cost driver, necessitating additional resources before user experience is impacted.

Target Server Memory (KB) is the amount of memory that SQL Server is willing (potential) to allocate to the buffer pool under its current load. Total Server Memory (KB) is what SQL currently has allocated.  The Total Server Memory is the current amount of memory currently assigned to SQL Server. Upon staring SQL Server its total memory will be low and it will grow throughout the warm-up period while SQL Server is bringing pages into its buffer pool and until it reaches a steady state. Once the steady state is reached, the Total Server Memory measurement should not decrease importantly as that would indicate that SQL Server is being forced to dynamically deallocate its memory due to system-level memory pressure.

delta bravo

If this counter is still growing the server has not yet reached its steady-state, and it is still trying to populate the cache and get pages loaded into memory.  Performance will likely be somewhat slower during this time since more disk I/O is required at this stage.  This behavior is normal.  Eventually Total Server Memory should approximate Target Server Memory, keeping a ratio close to 1.

If the Total Server Memory value is significantly lower than the Target Server Memory value during normal SQL Server operation, it can mean that there’s memory pressure on the server so SQL Server cannot get as much memory as needed, or that the Maximum server memory option is set too low.

So when do I need to add more memory?

If Total Server Memory is less than Target Server Memory it can be a sign of memory pressure, but before going to the business asking for more money for more memory, evaluate some other counters to validate SQL is in memory contention.

Start with Page Life Expectancy, which should be well above the 300.  This tells you how long pages are staying in the buffer pool, and a value of 300 equates to 5 minutes.  If you have 120GB of buffer pool and it is churning over 5 minutes, that equates to 409.6 MB/sec sustained disk I/O for the system which is a lot of disk activity to have to sustain.

Examine Lazy Writes/sec, which tells you that number of times the buffer pool flushed dirty pages to disk outside of the CHECKPOINT process.  This should be near zero.  Also review Free Pages/sec and Free List Stalls/sec.  You don’t want to see Free Pages bottom out which will result in a Free List Stall while the buffer pool has to free pages for usage. Lastly, look at Memory Grants Pending which will tell you if you have processes waiting on workspace memory to execute.

If these supporting counters exhibit excessive behavior, then it may be time to increase memory allocation.

Delta Bravo Performance Counters:   CPU % vs. Process Privileged Time (Total)

Delta Bravo Performance Counters: CPU % vs. Process Privileged Time (Total)

Why is this an important SQL Server Performance Indicator?

Delta Bravo uses this metric to determine whether the processor problems originate from internal Windows processes, or are caused by a user application. If Delta Bravo identifies high CPU usage on a SQL Server instance, the next step is to narrow down the high CPU problem to the lowest possible level–the component which is causing high CPU.

The CPU % vs. Process Privileged Time (Total) counter helps Delta Bravo understand the time and energy the system spends on Windows kernel commands (SQL Server I/O requests). If the CPU % vs. Process Privileged Time value is high, kernel mode processes are using a lot of processor time, the machine is busy executing basic operating system tasks and cannot run user processes and other applications, such as SQL Server. The recommended values for CPU@ vs. Process Privilege Time are 5 to 10%, or maximum 20% of the % Total Processor Time.

delta bravo

Do you know what’s keeping your processor busy?

There are two different states to be aware of when talking about processors executing instructions: Privileged mode and User mode.  Some operating system threads and interrupts (including all device driver functions) as well as Kernel-mode threads execute in privileged mode.

When dealing with Privileged mode operations, there are two modes to consider – Interrupt mode and Deferred Procedure Call (DPC) mode.  Interrupt mode is reserved for interrupt service routines which are device driver functions.  When looking at this in Performance Monitor, % Interrupt Time is the time the processor spends receiving and servicing hardware interrupts during sample intervals.  This value is an indirect indicator of the activity of devices that generate interrupts, such as the system clock, the mouse, disk drivers, data communication lines, network interface cards and other peripheral devices.  These devices normally interrupt the processor when they have completed a task or require attention.  Normal thread execution is suspended during interrupts.  Most system clocks interrupt the processor every 10 milliseconds, creating a background of interrupt activity.  DPC mode is time spent in routines known as deferred procedure calls – these are routines scheduled by device drivers to complete interrupt processing.  DPC’s are often referred to as soft interrupts.  From the Performance Monitor perspective, the % DPC Time counter shows the percentage of the time that the system was executing in DPC mode.  Measuring these counters separately can provide insight into whether there are issues with the interrupt service routine or its DPC.

Taking Action with Delta Bravo

If you noticed that the % Interrupt Time counter is much higher, you may have a problem with a device driver or piece of hardware.  Comparing the Interrupts / sec counter between the baseline and your current performance log, if the current rate is proportional to the level in the baseline, then the device driver code is the most likely culprit.  If the Interrupt rate is significantly higher, you are probably experiencing a hardware issue.