Advanced Web-Based Reporting Techniques in PL/SQL

Michael Rosenblum, Dulcian Inc.

One of the biggest shifts in software development in recent years has been the necessity of making everything accessible on the Internet. Building systems that are accessible over the web requires new strategies and careful development and deployment. The already high rate of project failures in the past has climbed even higher given the complexity of developing and implementing web-based systems. It is necessary to build on the experience of the past while developing new methodologies to accommodate different technologies.

When building systems, reporting is often forgotten about in the early stages of an IT development project. This can lead to disaster later on when users begin listing all of the reporting requirements that are needed in the system. Reporting must be taken into account early on in the project development life cycle. There are additional challenges faced when creating web-based reporting systems. This paper will describe some techniques that have been successfully used at Dulcian to create these types of systems.

The Reporting Process in a Database System

“A database report presents information retrieved from a table or query in a preformatted, attractive manner” (Mike Chapple - databases.about.com). Using this definition the reporting process can be broken down into a series of actions:

1. Sending the request

2. Translating request from the UI to the database terms

3. Processing the request

4. Retrieving the result

5. Formatting the result

6. Presenting the result to the user

Figure 1 shows how these actions are mapped to a web-based environment.

Figure 1: Reports in the Web Environment

In reviewing the diagram, several potential problems are evident:

1. Communication between the client and application server goes through the web and may not be on the fastest possible channel.

2. The application server may process hundreds of requests.

3. The database may consist of tables with millions of records.

The following section will discuss how these problems can be addressed when building a web-based reporting system.

Reporting System Requirements

One of the most desirable features for a reporting system is flexibility. This flexibility also makes the systems more difficult to build and still achieve adequate performance. This section discusses the sample requirements from two different systems built at Dulcian. The problems encountered in fulfilling these requirements and some possible solutions are discussed in the following section.

1. Search engine requirement

In the first system example, there is a specialized repository (conceptually close to the Oracle data dictionary) at the top of the regular database. This repository is maintained by system architects and regularly has about 50 tables with 1000 columns in total. End users should be able to search through the tables registered in the repository. But at a certain point, not all elements can be available.

Additionally, the system architects needed to maintain a list of available options (tables and columns) as a sub-set of the total repository. This means that users can only access selected elements (usually 50 columns from 20 tables). The data dictionary is also partitioned by projects so that searches can be done by project or through all of the available data. Figure 2 shows the two search paths, one for users and one for system architects.

Figure 2: Generic Selection Model

2. Reporting engine requirement

An apparently simple-looking report was needed based on a small number of warehouse tables. Although the real system was more complex, for the purposes of this paper, assume a single Customer table with 100 columns and some simple lookup tables. The simplified data model is shown in Figure 3.

Figure 3: Simplified Data Model

Users were able to specify the desired level of detail in the report as well as where the breaks occurred. Up to 10 different columns of the report also needed to be user-specified. Within each cell, users could select from any number (up to 20) of statistics to appear such as average number of phone calls, average age, count of customers in each cell, etc.

This total reporting environment provided generic filtering, and on-the-fly structural, column, and report statistic specification. A typical report specified by a user required 2-3 filter criteria, 3-5 levels of breaks (resulting in 200-400 rows in the report), 5 or more columns, and 4 or more statistics in each cell. The number of customers in the system Customer table was in the 5-10 million range.

Problem Analysis

Building flexibility into a reporting system requires solving some complex problems in order to provide users with the reports that they want and still achieve adequate performance. This section discusses these issues and some possible solutions to the problems encountered in reporting systems described above.

Search Engine

For the reporting system requiring a search engine, the first problem involves flexible data sources. Even if it has been determined how the report should look, each time it is run, it may be created from a different set of tables. In addition, the list of available options may change so that there cannot be any hard-coded structures at the database level.

Structure of the problem

For the six reporting system actions listed in the first section, flexible data sources produce some potential complications:

1. Sending the request - This is problematic with flexible data sources.

2. Translating request from the UI to the database terms - Successful - Even when the available options are flexible, each of them corresponds to a precise database object.

3. Processing the request - This is difficult because of dynamically defined data sources.

4. Retrieving the result – This is clear because the result is always the same.

5. Formatting the result – No formatting is required.

6. Present the result to the user – This is easy since there is no modification required.

From this list, it is evident that the critical part of the solution has to do with the way in which the request is defined and how it is processed on the database side because of the dynamic nature of the options available to user.

Reporting engine

The first problem involved the calculation of certain statistics. A few algorithms existed to do this but the results were inadequate. The process of sending the reporting request even when the definition of the report was complex was fairly straightforward. Once the report was defined, there were no problems in calling it. Also, translating the report request from the user interface to the database was also easily handled. However, other complications involved the following reporting system processes.

Structure of the problem

Using the six actions of the reporting process mentioned above, the problems were as follows:

1. Sending the request – This is clear because we know the report structure and only need a few input parameters from the user.

2. Translating request from the UI to the database terms – This is also clear because, from the database side, the input parameters are exactly the same as they are for the user.

3. Processing the request - This is difficult because the system in question had a very large data source with many different statistics and dynamically defined columns.

4. Retrieving the result - This is difficult because columns are defined on-the-fly.

5. Formatting the result - This is difficult because breaks are defined on-the-fly. There is no completely static model of the outcome.

6. Presenting the result to the user - This is difficult because everything is defined dynamically.

Solution #1 (Hypothetical)

The most obvious solution was to set up a dynamic matrix report using Oracle Reports with Smart functions in each cell to calculate the statistics. The problem with this approach is that each statistic requires overall filter criteria for the report as well as additional filtering for the rows and columns. Using this approach involved individually calculating each of these statistics. For the report described, this would mean 200 rows x 5 columns x 4 statistics which totals 4000 independent queries needed to generate a single report.

Again, of the six actions of the reporting process mentioned above, this solution had the following impact:

1. Sending the request - Successful

2. Translating request from the UI to the database terms - Successful

3. Processing the request - Failed

4. Retrieving the result - Not handled adequately

5. Formatting the result - Not handled adequately

6. Presenting the result to the user - Successful

This solution was not feasible since the report performance would clearly be unacceptable.

Solution #2 (Attempted - not successful)

This solution was actually used on a project to create a similar but slightly less complex report with only 7 pre-defined rows (for specified regions).

This approach used global temporary tables. Global temporary tables are ideal for building a single-use session-specific temporary table usually used for reporting or as an intermediate staging area in a data migration. These tables are defined as any other tables using a CREATE TABLE command and flagging them as global temporary tables. Inserts, updates and deletes work as usual but are only visible to the session where they were created. When the session terminates, the data is lost.

In order to make the report run quickly, report-level and raw-level filters were applied first to populate seven independent global temporary tables. Next, statistics were run independently against the pre-populated seven tables.

Therefore, of the six actions mentioned of the reporting process mentioned above, this solution had the following impact:

1. Sending the request - Successful

2. Translating request from the UI to the database terms - Successful

3. Processing the request - Successful

4. Retrieving the result – Failed because the available statistics are not flexible enough and the solution is not scalable.

5. Formatting the result – Failed because the available list of breaks is not flexible, solution is not scalable)

6. Present the result to the user - Partially successful - some flexibility in the report structure

This was a reasonable solution in that it was possible to extract the relevant customer data and place it into much smaller tables from which the 7 rows x 5 columns x 4 statistics (140 queries) could be run. Although the report code was nicely modularized and reasonably easy to maintain, this report had barely adequate performance. Clearly this approach would not be scalable for the more complex reporting problem described above since up to 400 global temporary tables and up to 1500 queries would have been necessary, resulting in completely inadequate performance.

The Solution that Worked

In order to solve both problems, an entirely new architecture had to be created that would leverage some of the recent additions to SQL and PL/SQL. The idea was to create a complete image of the final report on the database side. All of the report logic was moved out of Oracle Reports and into procedural code resulting in a much simpler report. A diagram of the solution is shown in Figure 4.

Figure 4: Structure of the solution

There are two core concepts in the proposed algorithm: object collections and dynamic SQL. Both object collections and Dynamic SQL contribute to solving the reporting system problems described above. Object collections provide a mechanism for preparing, storing, and retrieving the report, while dynamic cursors can populate the database in the most convenient way. A detailed description of these is beyond the scope of this paper.

Implementing the Solution

The following steps are necessary in order to implement the solution described above using object collections and Dynamic SQL.

A. Preparation

The first question to answer would be: what is the most convenient representation of the information for the reporting tool? Since the bulk of the processing will be handled by the database, this should be accomplished at the very beginning of the project. The new object type should be built exactly as required with the object collection type based upon it.

A virtual “report table” will be the variable (in the specifications of the package to be used) that instantiates the collection type. Since it is not possible to directly use packaged variables in SQL, the function that only returns the variable is required.

One of the major advantages of object collections is that they can be cast to the table. This “table” can processed like any other table. In this case, it is used to build the view. This view includes everything a developer may need to visualize the report.

B. Data Access

The proper communication structure is known. The next task is to populate the created object collection with the appropriate data. A core part of data querying is the dynamic cursor. A new variable of type “ref cursor” is needed. This “ref cursor” is a large string used to store the SQL statement to be built as well as the generalized record (to do breaks, statistics etc).

The required SQL string will be created based on the user’s request. It should exactly match the expected generalized record (with some placeholders). Then there is still enough information to process all of the required records.

C. Statistics

At this point in the process, there is enough information to walk through all of the targeted records at once and solve the first problem (search engine), because the algorithm described can retrieve information from any column in any table (as it was specified).

But for the reporting engine requirement, there is another set of issues to contend with, namely multiple statistics.

Solving this problem is also the crux of the solution for speeding up the report’s performance. A single query was used to walk through all of the customers (see the data model in the Figure 2). This is why, rather than executing a query for each statistic, it is possible to update all of the appropriate statistics based on the values associated with the currently processed customer. The statistics were placed into an individual PL/SQL table using a simple hash function to concatenate the row, column and statistic number. This enabled easy insert and retrieval of statistic values. (A separate routine was used for each statistic type. This helped modularize the code and keep it well organized).

The rest of the process is fairly straightforward:

· Calculate each statistical element into the special collector (set of PL/SQL tables serving as temporary storage). Performance-wise, it is better to perform all necessary calculations at one time for each record and place it in the memory for future grouping).

· Spin through the collector and update proper objects from the collection.

· Rollup the object collection to the specified break-points.

NOTE: Code examples are described in detail in another ODTUG 2004 conference paper entitled “Object Types vs. PL/SQL Tables: 2 Practical Examples” - Michael Rosenblum)

Conclusions

Solving the search engine problem required a new style of development involving an additional level of abstraction (based on object collections and dynamic SQL) between the user and the database. Achieving adequate performance and ease of system maintenance were valid proofs of concept for the architecture developed for building web-based applications at Dulcian. The method has proven to be very scalable in the case of the reporting engine.

Object collections helped to decrease the complexity of reports from the developer’s side. The result looks like a regular view and does not require a lot of pre-processing on the client side. Moving most of the logic into the database makes the system much easier to use and maintain.

About the Author

Michael Rosenblum is a DBA at Dulcian, Inc. He also supports the Dulcian developers by writing complex PL/SQL and researching new features.