Logical Partitioning of a Database: A New Way of
Thinking about Enhancing Performance
Dr.
Dulcian, Inc.
Overview
A number of months ago, I was asked to review a technology company’s product strategy. The company, InfoCruiser, was developing a main memory database that could run either with Oracle as the core engine or with their own proprietary engine.
Their idea was to replicate a portion of the database in a memory structure and use a dedicated DBMS to support queries to that portion of the database. At the time, I was not terribly impressed by the idea. Why not simply pin tables into memory or place them on a RAM disk? After all, the cost of RAM disks has recently dropped significantly. With the advent of Oracle 9i’s Real Application Cluster (RAC) ability to partition workload across multiple machines, does InfoCruiser’s approach make sense?
I took the opportunity to review the company’s product, dbCruiser, and observed some good performance statistics from their system tests. I had to ask myself: Did InfoCruiser have a great idea? Is partitioning the database based on its usage patterns something worth doing?
At its root, the idea behind the dbCruiser technology is system partitioning, which is the core idea behind many performance tuning strategies. The available partitioning strategies currently include:
· Pinning tables into memory
· Striping tables across disks
· Placing database objects into separate tablespaces on separate disk drives
· Oracle 9i’s RAC technology
None of these techniques can be tied very closely to how tables are used in the system. However, there is a partitioning based strategy that is based on system usage. We have been using it for years. We call it “data warehousing.”
Data Warehousing as a Performance Tuning
Strategy
Data warehousing is not often thought of as a performance tuning strategy; but it is a classic partitioning strategy. Using a data warehouse, a subset of the OLTP database is massaged, denormalized and placed into a separate database. Some, if not all, of the reporting activity is offloaded to the warehouse.
In data warehouses, a copy is made of a large portion (perhaps all) of the system’s data and placed on a separate database server. This removes the reporting activity from the main system as well as allowing for a warehouse that is tuned to specifically support the reporting requirements without having to worry about any OLTP considerations.
A traditional warehouse structure is shown in Figure 1. As you can see from the diagram, applications are partitioned between the core system and the warehouse.

Figure 1: Data Warehouse
Architecture
Advantages
It is well-established that data warehousing leads to improvements in overall system performance. Not only do reports run much faster because they are running on a database explicitly tuned for this purpose, but also because the workload has been partitioned. All of the transaction processing takes place in the core database, while all of the reporting takes place in the data warehouse. Using this strategy means that two independent machines are responsible for the entire system’s activity.
By creating a data warehouse, overall system performance is
improved. The warehouse structure is designed explicitly to support the reports
that
Disadvantages
Using a data warehouse to improve performance does not improve performance/usability without costs:
Expanding the data warehousing strategy for use in OLTP
systems
The database architecture of most applications relies on a
single database system to meet both read (queries) and write (updates, inserts,
deletes) requirements. This means
that the DBA must tune the system to balance the needs of both kinds of
transactions. Just as an example,
consider indexing. For an OLTP system with a heavy insert, update and delete
load, every index added means an additional index that must be maintained. The
more indexes there are, the slower the inserts, updates and deletes
The role of the DBA in tuning the system is to balance these
two conflicting types of tuning.
Therefore, the DBA is always making tradeoffs in tuning a system. The
system must be tuned to balance the needs of inserts, updates and deletes
against the needs of query access. Frequently, it is very difficult to tune a
system to support both kinds of activities. As data volume increases, the
complexity of system increases, and as the number of users increases, every
system
The idea behind using a data warehouse can also be applied to performance tuning of an OLTP system using data partitioning.
Not all tables are the same. Tables fall into several categories:
a) Reference tables that are small, and not volatile (e.g. Status and Type tables)
b) Volatile, core data tables that are very large and highly volatile (e.g. Sales, Purchase Order)
c) Access tables that are medium in size, relatively non-volatile but are highly accessed (e.g. product catalog)
The access table portion of the system is relatively small. This presents an opportunity to partition the system by examining usage patterns. Is there something different that can be done with the less volatile, less frequently accessed tables?
The optimal solution is to divide the database into two areas:
· Access table area (lower volatility, high level-access)
· Volatile table area (higher volatility, lower level access)
Relevant reference tables can be duplicated and placed in memory in each area.
For the volatile table area, Oracle technology is perfectly suited to handle this using traditional performance tuning techniques.
Take the tables that are heavily queried (but not heavily updated) and place them in a separate database. Utilities can be written to automatically decide what tables to copy into this structure to minimize the maintenance overhead.
This “query mostly” database would be tuned differently, probably using RAM disks and an optimized temporary tablespace.
Then all of the queries can be automatically intercepted and a decision made about which database to use to execute the queries. This would be done either by writing a low level routine or by taking advantage of Oracle’s materialized views to force the queries into the new system.
This environment would achieve the same kind of performance
benefits as a data warehousing approach. The queries would run much faster since
they are located on a dedicated machine and the queries have been offloaded from the main OLTP system so the rest of
the DML
The
proposed architecture is shown below in Fig 2.

Figure 2: Partitioned Database
Architecture
This idea of partitioning of the query data can be implemented in various ways.
Each strategy
The first strategy is a simple solution both from a technical and DBA perspective. It involves identifying tables that would best benefit by being in a high-performance area and moving them to that faster medium. Using this strategy eliminates the concern about placing a table in the wrong area since the cost of placing a table incorrectly is very low. The only disadvantage to this solution is that only modest performance gains can be achieved with this approach.
In most cases, half of all of the CPU cycles in a system are spent in the access area (based on examination of several internal systems). If a typical system is assumed, then half of the database activity could be moved to a dedicated access area.
Using a Strategy 1 implementation, moving the access area into main memory would double the performance of the access area SQL. There would be no effect on the volatile area since the same processor would still be supporting both areas. This would lead to approximately a 30% improvement in throughput.
Note that the user perception of these performance benefits would be less dramatic because network traffic is unaffected.
In the second strategy, all access area SQL is handled with in its own database or a dedicated computer. This strategy is obviously a much more aggressive approach to the problem.
There are two possible methods to support this strategy:
a) Partition the databases - Place the volatile area in one instance and the access area in a second instance. The databases communicate using some type of transfer protocol or database links so that the two areas can access each other for queries that span the two systems. This is a very difficult strategy to implement.
b) Store the entire database physically in one place and replicate the access area in a second instance. The problem of keeping the access partition synchronized remains; however, the low volatility of the access area makes this a straightforward problem to solve.
This second strategy has several advantages:
·
Queries to the access area
·
Offloading a significant percentage of the work
in the volatile area
Implementation of this strategy would require preprocessing of all SQL entering the system to intercept and reroute it to either the access or volatile areas. Returned results must be synchronized.
An efficient method of keeping the two copies of the access area in synch with each other is critical. Depending upon the application, either periodic refreshes of the access area or two-face commits would be necessary. Materialized views may be used to support this strategy.
Using a Strategy 2 implementation results in at least the
same performance improvements for the access area SQL as in Strategy 1. However, removing all of the access area
SQL from the volatile area would decrease the load on the first server by
50%. Some overhead
This strategy is comprised of the Strategy 2 approach with the addition of a separate non-Oracle database to support the access area. The idea is that Oracle is a well developed product for OLTP systems, but it is not specifically designed to operate optimally in a main memory only environment. There have been tests of products that have attempted to outperform Oracle in the main memory, query-only environment. Indications are that such a system could outperform Oracle by about 10 to 1.
The advantage is that the access area would be able to run very quickly. The disadvantage is that this technology is still evolving and is not ready for production usage as of this writing.
Using Strategy 3, the highest benefits can be realized. As
mentioned before, main memory databases can outperform an Oracle database by a
factor of 10 to 1 on query performance. This
Of course, we can always resort to the easiest tuning solution of all: “Throw a bigger, faster computer at the problem.” In organizations with infinite money, this is a viable alternative.
Within Oracle, the database activity can be partitioned using a parallel query option and Real Application Clustering (RAC), where the workload is spread across various processors or machines. The impact of these strategies is unknown.
All of these techniques involve significant costs, whether for hardware, software or expertise to create the necessary structures and setup. What is needed is a way to provide substantial performance improvements without the need for professional expertise or large hardware/software expenditures.
With the proposed architecture, the idea is to take into account how the tables (objects) in the system are used as the basis for creating database subsets. Not all tables are used in the same way and handling these tables differently can drastically improve overall system performance.
Separating query traffic from transactions in an OLTP system is a challenging problem. Proper implementation requires the following:
It is important to identify the correct tables to place in the “query mostly” partition. Ideally, this should be automated by a utility that identifies good candidate tables/views.
If most of the query traffic is removed from the database,
then standard tuning logic
The query database
Queries must be routed to the appropriate part of the system. This can either be done using a custom query router or, if materialized views are used as the implementation structure, then Oracle can be tricked into acting as the query router.
The query-mostly area
System performance may be significantly enhanced through partitioning the tables in the system based upon usage patterns. There is an opportunity to move tables that are heavily queried but not heavily updated to a “query-mostly” database. This partitioning results in a dramatic improvement in overall system performance.