The main cause of VMware host CPU saturation is that VMs running on the host are demanding more CPU resources than is available from the host.
Definition
There are several scenarios for occurrences of VMware host CPU saturation, each with a slightly different approach to solving the problem:
- The host has a low number of VMs each with high CPU demand: In this scenario, you maywant to first try increasing the efficiency of CPU usage within the VM. VMs with high CPU demands are those most likely to benefit from simple performance tuning. In addition, unless care is taken with VM placement, moving VMs with high CPU demands to a new host may simply move the problem to that host.
- The host has a high number of VMs each with low or moderate CPU demand: It may be preferable to attempt to solve the problem by first increasing the efficiency of CPU usage within the VM. VMs with high CPU demands are those most likely to benefit from simple performance tuning. In addition, unless care is taken with VM placement, moving VMs with high CPU demands to a new host may simply move the problem to that host.
- The host has a wide mix of VMs with both high and low CPU demand: The appropriate solution will depend on available resources and skill sets. If additional VMware ESX hosts are available, it may be best to re-balance load. If additional hosts are not available, or expertise exists for tuning the high-demand VMs, then increasing efficiency may be the best approach.
Solutions
Without adding additional computer resources, there are four potential approaches to solving these VMware host CPU saturation issues:
Solution #1: Reducing the number of VMs
The most common and easiest solution is to reduce the demand for CPU resources on the host. Migrating VMs to ESX hosts with available CPU resources does this. To be successful, performance charts must be used to determine the CPU usage of each VM on the host. Available CPU resources on the target hosts must also be monitored and recorded. It is important to ensure that the migrated VMs will not cause CPU saturation on the target hosts. Don't forget to account for peak usage performance of all Oracle Vmware or SQL Server on VMware applications. This data is available in the historical performance charts when using vCenter to manage ESX hosts. If vSphere's VMotion feature is configured, and the VMs and hosts meet the requirements for VMotion, then the load can be re-balanced with no downtime for the affected VMs. If additional ESX hosts are not available, it is possible to eliminate host CPU saturation by powering off non-critical VMs. This will make additional CPU resources available for critical applications. Remember that even mostly idle VMs consume CPU resources. Power them off if you can. If not, you may want to increase their efficiency or use resource controls.
Solution #2: Increasing CPU Resources with DRS Clusters
Using DRS clusters to increase available CPU resources is similar to the manually reducing the number of VMs. However, in a DRS Cluster, the rebalancing of load can be performed automatically, and it is not necessary to manually compute the load compatibility of specific VMs and hosts, or to account for peak-usage periods.
Solution #3: Increasing Efficiency in Oracle VMware and SQL Server VMs
The amount of CPU resources available on a given host is finite. In order to increase the amount of work that can be performed on a saturated host, it is necessary to increase the efficiency with which applications and Oracle and SQL Server VMs use those resources. CPU resources may be wasted due to sub-optimal tuning of applications and operating systems within VMs, or inefficient assignment of host resources to VMs.
The efficiency with which an application/OS combination uses CPU resources depends on many factors specific to the application, OS, and hardware. While a discussion of how to tune applications, databases, and OSes is topic for another article, the following steps may provide value:
- Tune the applications using the most CPU. Most application vendors provide performance tuning guides that document best-practices and procedures for their application. These guides often include OS-level tuning advice and best-practices.
- Tune the operating systems to be more efficient from a resource usage perspective by consulting OS vendor resources. There are also many books and white-papers that discuss the general principles of performance tuning.
These OEM procedures, best practices, and industry recommendations also apply to virtualized and non-virtualized environments. There are, however, some application and OS-level tunings that are particularly effective in a virtualized environment. They include:
- Configuring the application and guest OS to use large pages when allocating memory. See the VMware technical paper on Large Page Performance for more information and instructions on enabling large pages.
- Reduce the timer interrupt-rate for the guest OS. A high timer-interrupt rate does not necessarily cause performance problems, but it can add overhead that may impact the performance of a VM. To check the timer-interrupt rate from esxtop, select the CPU screen (press C) and add SUMMARY STATS (press f then h). Review the TIMER/S measurement for all VMs to see if it is higher than 1000 for any VM. If some VMs have a high timer-interrupt it may be possible to reduce the rate and thus the overhead. See the VMware technical paper Timekeeping in VMware Virtual Machines for more information.
- Allocating more memory to a VM may enable the application running in the VM to operate more efficiently. Additional memory may enable the application to reduce I/O overhead or allocate more space for critical resources. Check the performance tuning information for the specific application to see if additional memory will improve efficiency. Remember that some applications need to be explicitly configured to use additional memory.
- Reducing the number of vCPUs allocated to VMs that are not using their full CPU resource will make more resources available to other VMs. Even if the extra vCPUs are idle, they still incur a cost in CPU resources, both in the VMware ESX scheduler, and in the guest OS overhead involved in managing the extra vCPU.
Solution #4: Using Resource Controls
If it is not possible to re-balance CPU load or increase efficiency and/or all possible steps have been taken but host CPU saturation still exists, then using resource controls may be the solution. This will result in applications like batch jobs or long-running numerical calculations taking longer to complete. The results, however, will still be correct and useful. Other resource sensitive applications may experience failures, or may be unable to meet critical business requirements, when denied sufficient CPU resources. The resource controls available in vSphere 4 can be used to ensure that they always get sufficient CPU resources even when VMware host CPU saturation exists. Additional information about resource controls is available in the vSphere Resource Management Guide.
Confio IgniteVM
Confio IgniteVM helps identify the impact of CPU ready time for sites running Oracle on VMware, SQL Server on VMware, and other virtual databases. IgniteVM helps DBAs maintain performance and availability on virtual servers. IgniteVM is the only virtualization-aware database monitoring solution.
Learn more about IgniteVM solutions for: