Wait-Time Analysis
Introduction to Wait-Time Analysis
Until recently, tuning IT application performance has been largely a guessing game. DBAs are finding this unacceptable considering the relentless focus their IT organizations place on cost-efficiency and productivity. The traditional approaches for database and application tuning that involve collecting large volumes of statistics and making trial and error changes are still in widespread use but these "server-oriented" statistics do not provide DBAs a true reflection of the end-user experience.
In response to the increasing need for more customer-relevant data, leading consultants, DBAs and training organizations are focusing on performance tuning practices, such as Wait-Time analysis, that are directly tied to end-user service levels and improvements in operating efficiency.
What is Wait-Time Analysis?
Wait-Time analysis is a new approach to application and database performance improvement that allows users to make tuning decisions based on optimal service impact. The principles of Wait-Time analysis allow DBAs, developers and application owners to align their efforts with the service levels desired by their IT customers.
Wait-Time analysis for IT applications is the singular focus of measuring and improving the service time to the IT customer. By identifying exactly what contributes to longer service time, IT professionals can focus not on the thousands of available statistics, but on the most important bottlenecks that have direct and quantifiable impact on the IT customer.
The Problem with Conventional Statistics
Typical database performance monitoring tools can be sophisticated in their measurement and presentation of the number of events and execution ratios. While somewhat meaningful and easy to capture, these statistics do not reflect a relevant view of the end-user experience nor do they reveal with any precision where the problem originated. Assessing performance without focusing on time impact leaves DBAs guessing about what actions to take in order to address their most important user oriented problems.
There are several critical distinctions between Wait-Time analysis and standard performance monitoring tools. The first is that Wait-Time analysis specifically measures the length of time it takes for an action to take place, from moment of request to completion. Also, instead of relying on system wide averages, an effective Wait-Time implementation measures the contribution made by each step performed by the server to that total wait time. The result is data that allows DBAs to precisely isolate not only which server queries are causing wait-time for the end-user, but which SQL statements within those queries are causing the delays.

Another key limitation with typical IT monitoring tools is the creation of individual information 'silos' that localize statistics for a single type of system and do not expose an end-users view of performance.
Without the ability to track the flow of transactions across multiple systems, each IT group can only try to optimize its own statistics, not the response time to the customer. And without a collaborative view of the contributions of each system to customer wait-time, the result is a 'finger-pointing' session where blame is deflected from one group to the next with no real resolution.
Managing Service Levels
Because Wait-Time analysis measures the collective time delays causing end users to wait for an information request, it is the measurement technique most closely matched to end user service levels. For organizations focused on Service Level Management (SLM) techniques, or those bound by Service Level Agreements (SLAs), Wait-Time analysis techniques allow the IT department to measure performance that is most relevant to achieving the stated service level goals. Service level management typically identifies technical metrics that define whether performance is adequate, and Wait-Time data is the basis for evaluating these metrics.
Practical Considerations for Wait-Time Analysis
The Wait-Time approach to performance monitoring described here is only practical if it can be implemented efficiently in a performance sensitive production environment. Effective, reliable, and low-impact approaches are available to meet these emerging requirements for efficiency. Here are some practical considerations:
 |
|
 |
|
| Key Considerations |
Importance |
| Low Impact Data Capture |
Data capturing should not place a burden on your production systems. Agentless architectures offload processing to a separate system that reduces production impact. |
| Agentless Database Operation |
Further reduce production impact by capturing session data on remote servers that allow data to be stored and analyzed offline. |
| Lightweight Application Monitoring |
Byte Code Instrumentation (BCI) allows for full, granular, and continuous monitoring with a low impact on J2EE applications of less than 1%. |
| Passive Monitoring of Production Data |
Monitor real production, not simulated test transactions. |
| Continuous Monitoring |
Insist on continuous monitoring across all sessions to ensure any operation can be deeply examined at any time. |
|
|
 |
|
 |
With increased focus on service levels as the most important measure of IT productivity, Wait-Time analysis has emerged as the preferred monitoring technique for those customer-focused organizations. Wait-Time analysis tells IT organizations the exact origin of the problem, what impact that problem is having on the end user, and which organization can best fix it.
|