System Tuning Instead of Database Tuning

 

If you ask experts in application development about the best ways to do performance tuning, they will give you a hundred possible answers. If you ask expert DBAs about the same issue, the number of ideas will be in the thousands. From the point of view of the system architect, it seems as though everybody knows perfectly well how to make the part of the system that is their responsibility perform as quickly as possible. However, sooner or later every project hits exactly the same brick wall: the system does not run fast enough. This paper proposes a more structured, big picture “performance grammar” approach that can be helpful in all stages of the project development lifecycle.

Tuning Approaches

Historically, all tuning methodologies could be separated into two main groups:

1.        Product-based

a.        Native

b.       Third-party

2.        Function-based

 

This section will discuss each type in turn.

 

Native Product-Based Monitoring Tools

The first group (native) is really a way of utilizing what a specific tool provides. As a result, the whole methodology is based on two dimensions:

 

·         What information about the behavior of the product is available at each point in time;

·         How that information can be used to make the system perform better.

 

The first dimension is the richest if you are looking at the RDBMS or networking. For example, in the Oracle RDBMS, everyone is accustomed to STATSPACK and TKPROF. Version 10g introduced a whole new set of automatic monitors and the number of data sources grows with each release. But the analysis of information provided by all of these tools is still an art. There are just too many permutations and nuances even for “gurus” to understand exactly what is going on. Recommendations often look like “J. Lewis thinks…” or “The Tom said…”, because it takes too much time to replicate the whole scenario and thinking patterns. But can the scenario from the example be applied to your real case?

 

From the author’s experience, if you use the native product-based approach, little time is spent on real problem-solving. First, you search for a more-or-less similar problem pattern and then you modify your case to look like the closest one. Although this may move the project forward, this does not solve the problem itself.  You just avoided it with an often less effective solution that just happened to work.


 

 

 

 

 

Table 1 summarizes the pros and cons of using native product-based monitoring tools.

Pros

Cons

Monitoring tools are tightly integrated with the kernel which minimizes the impact of monitoring on the main process itself or any other side effects.

Nobody can guarantee that in the next version the tool will still be available or have the same features. This may lead to migration issues.

Many tools are used by the manufacturers themselves (such as TKPROF) which makes them highly effective and bug-free.

We have to trust the manufacturer, so a third opinion is missing.

 

If a manufacturer has a number of products, it is easier to have integrated information.

It is problematic to integrate information from different levels and manufacturers.

These tools usually provide a lot of information and focus on low-end data.

Huge data volumes make identifying needed information problematic.

Table 1: Pros and Cons of Product-Based Monitoring Tools

 

Third-Party Product-Based Monitoring Tools

Third-party tools try to resolve some issues that cannot be resolved using native tools. Any significant player on the market has (or thinks they have) a product to make developers’ lives better. Confio Ignite for Java/Oracle, Quest SQL Tuning, and the Embarcadero Performance Center, despite their differences, also have a lot in common. If you take a look at the changes in these products over the last few years, you can see that what started as “yet another performance monitoring tool” gradually migrated into something close to a decision support system (DSS) with the following features:

·         Versioning of solutions

·         Possibility of comparing multiple solutions by a user-defined set of parameters

·         Libraries of possible solutions for each class of problem

·         Advanced reporting capabilities

 

Oracle version 10g aggressively enters this arena by providing a large number of performance advisors; but we have to be thankful to other software manufacturers for these changes. Now everybody can recognize that just having a hundreds of init.ora-parameters is not enough to tune the system. You need both a DSS input matrix and an analytical matrix:

 

·         Input matrix (Methods/Parameters/Results) – specifies the decisions that could be made to resolve a specified problem. There are different ways of describing the set of possible solutions, but the idea still is the same. You detect all valid algorithms and, for each algorithm, describe all input data (variables/conventions/definitions). For each combination of input parameters, you provide either the expected or testing result (whenever possible).

·         Analytical matrix (Precision/Effectiveness/Cost) – resolves each valid option to a level where more quantitative priorities could be set. First, you need to define the solution boundaries (for example, some memory tuning techniques are applicable in one operating system and not in another). Next, you need to make an expert judgment about how good the solution is based purely on its effectiveness (complete/mostly complete/partial, etc.). The last task is to estimate all costs associated with both the development and implementation of the solution.

 

With the current level of complexity of database-related projects, there are very few problems that have one and only one correct solution. Even if that solution exists, the time allocated to spend on the problem is usually less than the time required to find the optimal solution. As a result, in the near future, integration of DSS and “artificial intellect” algorithms in the tuning process will increase. Table 2 summarizes the pros and cons of third party product-based monitoring tools.

 

Pros

Cons

Pre-processing of raw data makes the process of performance analysis simpler.

We have to trust the software manufacturer, because the technique of pre-processing is hidden.

DSS-methods are becoming more and more popular. Self-educating advisors look like a logical step forward.

Nobody yet knows where to draw the line between automatic and manual processes.

These tools provide better integration with non-Oracle software

The additional cost of third-party tools must be added to the cost of Oracle licenses.

Table 2: Pros and Cons of Third Party Monitoring Tools

 

There is one major problem with any product-based performance tuning concept. It is barely a “methodology” because  it is usually nothing more than a set of tools and best practices about how to use these tools. There could be a number of reasons but the major one is very simple: there is no single product which can be used to build the whole system. Each tool may be perfect for its intended purpose, but there are currently no third-party tools that provide true end-to-end solutions.

Function-Based Tuning

Function-based tuning concepts have naturally evolved from the problems that developers encountered with product-based solutions. It has been the author’s experience that, sooner or later, most developers decide to build some kind of a structure to capture performance information only about the module that they are working on. When writing a module, you always know its major functional parameters. As a result, you have a significantly better chance of setting the correct tuning priorities (for example, who cares if an overnight batch process takes 80% of the CPU as long as it can load all of the data before people come to the office). You also know what additional information could be crucial for tuning and debugging. Therefore you can  create your own tuning methodology by defining:

·         A set of “things” to be monitored – explicitly in the code

·         A set of additional data to be logged – explicitly in the code

·         A set of measured quantities that define the appropriate and inappropriate behavior of the module – in specifications

·         Best practices of interpreting the collected data – usually in your head

Some development teams are starting to standardize these “home-grown” monitoring tools and reuse them across projects. But since these tools were built to answer very precise questions, they include the original concepts of the first module for which they were designed. Thus the “global” tuning methodology of the project is nothing more than an extension and summary of the methods of each functional module. Table 3 summarizes the pros and cons of function-based tuning.

 

Pros

Cons

A function-based approach is usually implemented in the most critical system areas. By looking at what areas have been monitored more actively than others, the tuning efforts can be better focused throughout the development and implementation processes.

This type of tuning is very limited by design and often difficult to extend or alter.

 

Function-based technologies are built by developers for developers so there is no need for extra “translation.”

 

When modifications are needed, the tuning concepts are not always used in the way in which they were originally intended. It can lead to unexpected results and unwanted side effects.

Since the team who built the monitoring tool is usually the same one using the information it provides, this information is clearly communicated and properly used.

“Bottom-up” development often misses system-wide effects (integration of functions is the most complicated part ).

 

 

Tuning aspects of different modules may not be compatible.

Table 3: Pros and Cons of Function-Based Tuning

In general, the function-based approach works best as long as the project is not very large or the number of developers is limited. There may be too many ideas hidden inside of the code where you may be blinded by nice data monitoring and miss the real problem. Alternatively, you can go crazy by looking for a problem that is really a side effect of incompatible tuning methods. Neither situation is desirable.

System Tuning Methods

From the point of view of a rank-and-file developer, the current state of system tuning  may be acceptable. There are new tools with new features, etc. However, from the broader view of system architect, the situation is far from reasonable. The number of failed projects due to unacceptable performance discovered in the later stages of the project is enormous. There are a number of reasons for these failures:

1.        Communication gap between different teams involved in the same project

·         Having the option of using so many different tools and selecting the right one often becomes problematic. Architectural decisions become more complex because, in some cases, the decision is forced not by the current information, but by market expectations. This is especially true with user interfaces (JSP, JSP/Struts, JSF....)

·         There are too many details in each tool. Because of the increased complexity of the development environment and significantly narrow specialization of different teams, the whole process of explaining why one solution is valid for the whole project and one is not can be problematic.

2.        "Wars” between different teams working on large projects

·         The problem of placing business logic in different places can become a political rather than a functional decision Without an influential architect at the top, the whole process can become an issue of who can first declare ownership of the problem.

3.        “Managerial” decisions instead of architectural ones

·         Because of the high project failure rate, system architects often have less decision making power in a project than managers. This may decrease the  initial cost of the system (so everything looks good on  paper for top management) , but can cause major problems down the road during the development and implementation processes when cut corners produce disastrous results.

4.        Original goals were unreachable from the very beginning

·         There is no trusted way of predicting performance in the early stages of a project. It is still an architect’s vision that drives the original specifications. In large projects, it is impossible to predict how the whole system will “mutate” between the original specifications and the final implementation.

Many of these reasons for project failures can be avoided by asking two questions:

1.        How can you get different teams to speak the same language?

2.        How can you create a generic performance measuring mechanism?

Having a system tuning methodology is a reasonable way to answer both of these questions. The proposed tuning language consists of the following elements:

·         Command is an atomic part of the process (any command on any tier).

·         Step is a single logically complete operation, which consists of a number of commands.

·         Request is an action consisting of a number of steps. A request is passed between different processing tiers.

·         Arc is a complete processing cycle in one direction (always one-way) that could be either a communication step between one tier and another, or a set of steps on the same tier.

·         Token transfer is a point of the algorithm where the current activity is transferred from one level to another.

·         Round-trip is a complete cycle from the moment the request leaves the tier to the point when it comes back with some response information.

Using this terminology, a generic 3-tier application could always be described as either a 9-arc or a 5-round-trip structure as shown in Figure 1.

Figure 1: Diagram of Generic 3-Tier Application

 

Using the example of a 3 tier-system (JClient/Oracle IAS 10g/Oracle DB 10g) where a client application needs to move a customer from state A (newly received data) to state B (potential interest), the following steps would apply:

The 9-arc structure represents the whole set of actions in the following way:

1.        Pre-process the request on the client

·         Collect information about currently connected users and other global parameters

2.        Send the request to the application server

·         Call a servlet with appropriate parameters

3.        Process the request on the application server

·         Take passed parameters and prepare required PL/SQL calls (set bind variables)

4.        Send the request to the database

·         Fire a database call via JDBC

5.        Process the request in the database

·         Execute requested procedures and set appropriate output bind variables

6.        Send the request back to the application server

·         Pass output variables back to the servlet

7.        Interpret the processed request on the application server

·         Analyze received data and prepare a response that can be understood by the client application

8.        Send the request back to the client

·         Return a response to the client

9.        Interpret the processed request on the client

·         Interpret the response and execute corresponding UI-activities

 

The 5-round-trip structure is significantly simpler and could be used when detecting a single-directional arc is problematic (usually, it happens with steps #2,4,6,8 – networking activity is always a bit tricky):

 At the client level:

1.        From the moment that a request is initiated to the end of processing

·          User clicked the button/response is displayed

2.        From the moment that a request is sent to the application server to the moment that the response is received

·         Start of servlet call to end of servlet call

At the application server level:

1.        From the moment that a request is accepted to the moment that a response is sent back

·         Start of processing in the servlet to end of processing in the servlet

2.        From the moment that a request is sent to the database

·         Start of JDBC call to end of JDBC call

At the database level

1.        From the moment that a request is accepted to the moment that a response is sent back

·         Start of the block/end of the block

 

The described process with corresponding analytical models allows both developers and architects to communicate more precisely and articulate problems in a way that is understandable to all. Another advantage is that there is a common measuring point. Since all actions are within the scope of the same request, it is possible to make comparisons. Switching from code-driven tuning (initiated by some programming event) to user-driven tuning (initiated by a user request), provides a significantly better understanding of end-user needs. Since IT systems are built to make the life of end users better, the users’ feelings about the system can make or break a project.

The next section describes a number of real cases where the system tuning process described here was used.

 

Case studies

The system tuning methodology described above is already working in a number of projects at Dulcian, Inc. Two examples representing both the classical client-server and the more contemporary 3-tier architectures will be described.

 

Multi-tier OLTP system with hundreds of simultaneous customers

For this project, the “famous” problem of inheriting a project from another team was encountered. As always , the architecture had already been defined, and there was a huge amount of existing code that was “supposed to work.” As the project got closer to completion the following issues arose:

·         Parts of the code were written by different teams.

·         There were three different communication channels plus a number of internal pseudo-languages being used.

·         There were no defined end-user goals.

·         There were no defined performance measurements.

 

Because of time pressure, our team decided to tackle one problem at a time. But very soon we were losing complete control of the system. At that point, the system tuning approach was implemented, which ended up saving both time and the system itself.

As a starting point, we needed to identify the most critical end user requests. It is very important for development teams to understand the prioritization of system modules and performance boundaries. Otherwise it is impossible to say what is adequate and what is not. In this case, we had a set of tasks performed by a single operator a few times each week (OrgTree maintenance). Originally, this appeared to be low priority issue, but later we figured out that a set of changes could be quite extensive and could have some side effects in the running system. Finally, we had to not only modify the architecture, but spend a significant amount of time on tuning.

 

The next logical step was to identify the full path from initialization to completion (start/end points of each arc and points of “token transfer”) for each request, including all required resources for each arc/roundtrip. The process of  laying out the path can itself be helpful in identifying problems. When we described the whole process of creating a new customer  from start to finish, we found that the last step (on the database) required too many exclusive row locks in the generic table where the length of a single row was less than one block. But since Oracle locks are implemented by blocks, we were locking more than was needed. That discovery led us to completely rethink the database locking points in the system.

 

Once all of the request paths have been identified, it is possible to start defining standardized practices, since you have enough information to combine requests by common properties. This step ensures that you don’t have multiple implementations of the same task. In our project, a whole set of problems was caused by the fact that Java programmers do not know SQL and don’t use bind variables. As a result, there were a number of very similar SELECT statements with hard-coded parameters. Replacing these queries with a more generic mechanism decreased wait-times on initial loads from 25 seconds to 7. After some fine tuning, that number was reduced again to 3 seconds.

 

The last part of the process is to validate the scalability of the system. For each request type, we defined an average and maximum number of requests per period of time and re-evaluated the required resource consumption of each step in the path. A problem area was detected because of the nature of Dulcian’s State-Transition Engine (STE). Even in a 3-tier environment, at some point we had to keep the connection state-full (a series of commands from the same user should be running in the same session). But if you have several hundred simultaneous users, you may have to keep several hundred unique connections just for the STE. Since this was a waste of resources, we created a more advanced mechanism for creating, releasing, and reusing sessions.

 

The system is now fully operational worldwide, new modules are released regularly, and the whole development cycle is supported by a team about one-third the size of the initial one (which was inherited from the client side of the project).

 

Client-server OLTP system with a small number of users but highly complex code

Another system currently in production started as a pilot project in Oracle Forms 9i to test the application repository before moving the whole solution to a JavaServer Page-based system. From the very beginning, nobody cared about real production usage, scalability, or extensions. At some point the client started to wonder whether or not it was necessary to migrate to JSPs if the testing application runs well enough. Now the problem of production evaluation became the most critical success factor.

Life became even more complicated because, although physically the project was a client-server application, internally it could be described in terms of 3-tier logic: client application/application repository/production data. Since the system end users would actually only interact with a small number of modules using a very limited set of commands, the process of identifying requests, arcs and standard practices appeared simple. However, the whole application repository was a big black box. There were too many requests and a very complicated structure. As a result of system tuning analysis, we proved that validating the existing repository against the required workload would result in a successful production system without the rewrite.

From the same test results, it was possible to identify all of the required resources and potential bottlenecks. This left the question of scalability. We were able to start extending the system in all possible dimensions with no problems. But when the amount of repository rules reached a high level, we started to see significant degradation in performance. Since were still in testing mode,  we were able to identify the large number of rules as a possible danger. However, a significant amount of work was required to make the rules reusable.

In this case, even though system tuning methodology did not help with the original development cycle, it provided a valuable resource for decision making in later project stages, because we had an articulated a way of describing problems and identifying potential areas of difficulty.

Summary

System tuning is a product-independent methodology applicable to most projects. Its top-to-bottom approach increases the chances of making correct system-wide architectural decisions. The unified vocabulary makes the discussion of these decisions between the different teams involved in a project significantly easier. In addition, using a system tuning mechanism makes it easier to detect possible dangers in the early stages of the project.

That said, having a system tuning methodology is not a “silver bullet,” but the difference between no methodology at all and something complete enough to provide meaningful results is significant and leads to more successful, working systems.

 

About the Author

Michael Rosenblum is a development DBA at Dulcian, Inc. He is responsible for system tuning and application architecture. He supports Dulcian developers by writing complex PL/SQL routines and researching new features. Mr. Rosenblum is the co-author of PL/SQL for Dummies (Wiley Press, 2006). Michael is a frequent presenter at various regional and national Oracle user group conferences. In his native Ukraine, he received the scholarship of the President of Ukraine, a Masters Degree in Information Systems, and a Diploma with Honors from the Kiev National University of Economics.