Fusion Development: A Database-Centric Approach

· The database interaction (including the SQL) will be more efficient using the thick database approach.

Code that is database-intensive frequently has to make many round trips to the database to retrieve the data. Object-oriented (OO) developers frequently forget that a getter or setter to a piece of information stored in the database is not a zero cost operation. It is common for single screens to execute hundreds or even thousands of round trips to the database to retrieve information. Applications built this way may even transmit more information to the database than they return, since the query to request the information may actually be larger than the information retrieved.

It is very easy to make an otherwise excellently built application perform poorly because of the SQL. Either individual SQL statements are not written optimally or there are opportunities where several (or even hundreds) of SQL statements could be replaced with a single statement. This kind of optimization is database-specific and requires specialized talent to accomplish. Most Java programmers are not database programming experts.

In some percentage of systems, the database is being "overheated" and there is the perception that performance could be improved by moving processing out of the database. These situations are very rare. Overheated databases are almost always the result of poorly written SQL or code that is passing many more SQL statements than would be necessary if the code were written efficiently.

Moving code out of the database may improve performance when the applications are very computation-intensive. For example, if the code is performing a linear programming algorithm, or operating on a large data matrix where many complex operations are being executed (for example large matrix inversions), moving the data to a dedicated server to execute the computational routines could improve performance. Only situations requiring running hundreds of calculations per value justify moving the data to a separate server (and even then the database may do this just as quickly). Improvements in Oracle’s 10g DBMS floating point performance mean that computations should rarely take place outside of the DBMS. Code that includes hundreds of data validation business rules is not a situation where conventional wisdom will prevail. This type of code runs very well in the database.

If the development team includes PL/SQL coders who are not particularly skilled and you are competing with a very skilled team using a good data migration product (like Ab Initio) on a large dataset (10 million records or more), then the PL/SQL code may not outperform that of the Ab Initio team. However, skilled database developers with experience in writing code for large datasets will usually produce more efficient code that provides better performance.

2. Less code required

When writing database-intensive code, a skilled SQL and PL/SQL developer will write far less code that runs faster and is easier to maintain than an equally skilled Java programmer. Database developers can usually write highly efficient database code that can outperform the code written by their highly skilled Java counterparts. The reason for this is that there are so many options available to skilled database programmers that are not in the skilled Java programmer's "bag of tricks." Therefore, database-intensive code will always be more efficiently written in the database.

3. Less development time needed

The thick database approach provides a clear roadmap for application development, which simplifies the decisions to be made with respect to the application architecture. Furthermore, the development tasks can be neatly divided between the database team and the UI team. This partitioning of the application development effort effectively means building two smaller applications rather than one large one, which is usually easier.

Because the UI can be built more quickly, it can be shown to users right away. This provides faster feedback to the development team and helps to identify design errors much earlier in the process.

4. Easier to maintain

Because the application being built is divided into two parts, each has less code to maintain. In addition, the application is clearly partitioned so that when a business rule changes, it is only necessary to look through half of the code to find it. The usual rule of thumb is that as the number of lines of code in an application doubles, the complexity increases by a factor of four.

5. Easier to refactor when UI technology changes

The general argument for placing all of the logic in the middle tier is to make the application database-independent thus enabling easy movement from one database platform to another. The reality is that very few applications need to be database-independent. Companies change their underlying database platform about as often as they move their corporate headquarters.

There are some applications that require database independence and have low enough computational requirements that a thick database approach is not necessary. Product vendors, like BMC (who developed the Remedy Service Management tool) deal with many clients working on different platforms. If an application only needs a small database and the computational requirements are minimal, there is no reason to use this database approach.

Most organizations do not really require database platform-independence. Web architecture is inherently more volatile than the database platform. Even if you are committed to the J2EE platform, it continues to evolve. In the last few years, Oracle has moved its preferred platform from JavaServer Pages (JSPs) to JSP/Struts (with or without ADF), to JavaServer Faces (with or without ADF). Simultaneously, changes are being made to the .Net platform. Organizations may switch between the two major platforms or to any of the variants (like Swing). There are also other platforms that developers may use such as Cold Fusion or 4GL-like platforms such as Portal or Application Express. Recently SOA architecture has been added to the mix to further complicate the picture.

Every year, the choices for web development change. There has been a constant evolution in the web environment that forces a complete generational shift every 6-12 months, and there seems to be no end in sight to this trend. The latest evolution is towards “rich internet applications” where it is possible to achieve even greater sophistication than in client/server applications a few years ago. The J2EE platform is working with Faces and AJAX technology where others deploy applications using Macromedia Flash technology (such as Lazlo). You can do a Google search for the term “Rich Internet Applications” to see current news about this topic.

At Dulcian, the first version of the graphical Business Rules Information Manager (BRIM^Ò) Repository Manager was built using JDeveloper 3.0 including Data Aware Components (DACs) and Business Components for Java (BC4J). Since then the application has needed to be rewritten five times in order to stay consistent with the current JDeveloper architecture. But thanks to the use of the thick database approach, the rewrites now only take a few weeks.

The defense against the chaos of a rapidly evolving standard is to not build web applications with very much logic placed in these volatile web technologies. The larger the percentage of the application in the database, the less rewriting will be required with the next shift in web architecture.

6. Better use of different talent levels

Using a thick database approach allows organizations with existing database talent to use that talent more effectively. With minimal additional training, skilled SQL and PL/SQL developers can help build web applications with no web skills whatsoever.

If an organization has limited UI development skills, then thick database usage allows minimally skilled UI developers to build applications with little new training needed. If sophisticated UI developers are available, they can focus on delivering very high quality user interfaces.

The Thick Database Development Process

One of the real strengths of the thick database approach is that the two portions of an application can be coded independently. Once the interface between these two parts of the application is defined, the teams can work in isolation until substantive portions are working.

In practice, this approach works very well. Usually the first version of the UI is built within a few days so it can be used as the first version testing environment for the database team and feedback can be received from users.

To maximize the efficiency of this approach, an Agile process is used rather than a linear set of development steps. Minimal design work is done to produce a partially working system. Additional functionality is then created in an iterative design process.

Working in the BRIMÒ environment simplifies some things for us that might be harder in a non-BRIMÒ environment. For example, it is not necessary to worry about workflow or data validation business rules as part of the UI design since the BRIM-Object functionality takes care of this part of the process.

The basic steps of the thick database development process are as follows:

1. UI Design

First, it is necessary to design the application. Screens are designed on paper and white boards are used for page flows. Real screen mock-ups are usually a waste of time. A careful diagram on a piece of paper suffices for the initial UI design.

2. Interface Design

Once the UI design is complete, the exact views required are determined along with the APIs that will be called.

3. Interface Stubbing

The first coding performed is stubbing out the code for the views and APIs. The views can be as simple as select <values> from dual. The APIs are nothing more than functions that return a correct value (usually hard-coded). The idea is to write the stubs as quickly as possible so that the UI developers can get started.

Interfaces will change as the application matures. Sometimes what is returned changes, additional columns are needed, or even whole new views or APIs can be added. In each case, even if the development time is fairly short, they should be stubbed out first so that the UI team can incorporate them while they are being developed.

4. UI and Database Development

UI and database development take place at the same time. The UI team takes the APIs and incorporates them into the application while the database team makes them work.

How thick is too thick?

What would happen if the thick database approach were followed completely and 100% of all UI logic were placed in the database?

Placing all of the logic in the database and requiring round trips for everything means that:

· Tabbing out of a field requires a round trip to find out whether or not the field is required.

· Opening an LOV requires populating it from the database.

· The database is queried every time you leave a page to determine which page to navigate to next.

This would be a pathologically complete way to implement the thick database approach. Clearly a system built this way would be sub-optimal.

At Dulcian, an application using exactly this approach was created for a web-based customer order entry system that had a very large number of business rules. For example, based upon the value in a given field, some fields would be enabled while others were disabled.

All rules for the application were stored in a database repository. The easiest way to build the first version of the system was to leave all of the rules in the database and access them at runtime. Surprisingly, this version of the system worked. Users complained that the screens “flickered” as data is entered, but they were otherwise satisfied with the applications and did not request additional changes to improve performance. Since the screens have minimal logic in them, screen refresh is sub-second over a normal internet connection.

This application demonstrates an amazing proof of concept and supports the idea that using a 100% thick database approach is viable. The system mentioned above was created with no logic whatsoever coded in the UI. Every UI behavior required a trip to the database to determine what should happen next. Despite this, the system still performed reasonably well and the database was not overwhelmed. However, this approach is clearly not optimal since it requires at least one potentially unnecessary round trip for each UI action and there are some extra unnecessary database queries and executions being performed. However, these round trips are limited and the superfluous database execution is minimal. Systems do not fail because of a few round trips between the application server and the database. Systems fail because of single screens requiring hundreds of round trips or due to passing poorly formed SQL to the database to execute.

Based on the results described here, it is possible to create a thick database system without fear of performance issues. Marginal performance improvements can be made by reducing trips to the database where appropriate, but this step is not essential to creating a viable system. This does not mean that poorly written code is acceptable as long as you use the thick-database approach. A slowly running query will run slowly no matter where it is executed. Thousands of unnecessary queries will bring a system to a halt even if they are all initiated from within a database-side procedure, but a 100% thick database approach is possible.

How thin is too thin?

Can a skilled team successfully build applications that are 100% database “thin”? In order to have any hope of success, building a system with no reliance on the database will have to be done by a highly skilled team.

UI developers must understand how to minimize round trips to the database to avoid difficulties. Under the best of circumstances, this can be a very difficult problem. Consider the requirement of displaying a customer information screen with information taken from many different sources. Many round trips to the database will be needed unless the developer is very clever and uses nested SELECT statements like this:

Select

(Select … from … ) value1,

(Select … from … ) value2,

(Select … from … ) value3

From dual

Use of other middle tier technologies such as BPEL can also be a performance killer. If a large percentage of the nodes in the BPEL flow require access to the database, then many round trips will be needed. BPEL users should try to cache all necessary data required by a BPEL process in the middle tier prior to executing the flow.

A first rate team can usually survive ignoring the database, but this is a very difficult way to develop.

Stateless Programming

Client/server development was inherently stateful. Developers counted on user sessions being persistent. Developers used persistent global variables and assumed that transactions were only being committed to the database on demand.

Web development usually needs to be stateless. Typically there is a reusable set of sessions in a session pool, which means that you cannot rely on session-specific information to support the application.

Database developers tend to strongly resist moving into a stateless environment. Stateful coding techniques are very hard to give up. However, developers must abandon a stateful development style in order to successfully build web applications.

There are times when you can use a stateful web development style. You must ensure that the connection pool is larger than the maximum number of concurrent users. For internal applications, this can be an option, but you are really just postponing the inevitable shift to stateless programming.

Thick Database Success Stories

This section describes several thick-database success stories.

1. Complex Batch Routine

The first case consisted of a data-intensive task. The task was to calculate and distribute sales goals to a sales force. The algorithm was driven by parameters for each salesperson and region stored in the database. A user interface was built in Java and all calculations were done in Java. The routine consisted of 40 classes, about 10,000 lines of code, and required six weeks to write.

Annual sales goals for 300 sales persons were calculated by the completed application in about 20 minutes. The system made about 5000 round trips from the application server to the database in order to run the routine. The most significant performance problems were due to poorly coded SQL. After refactoring and tuning the SQL, the application required only 20 seconds to execute.

At some point, significant changes to the algorithm were required, so it was decided to discard the existing Java and rewrite it as a database-side routine.

The resulting database routine consisted of about 500 lines of PL/SQL code written in about three days (less than a week including testing and debugging). Because of the number of data entry screens, the Java UI code still required about 3000 lines of code. When the routine executes, it makes a single round trip to the database and completes in .2 seconds.

The results are summarized in Table 1:

Database code

# of lines

Java code

# of lines

Execution speed

Database development time

Java development Time

Conventional development

10000

20 mins, reduced to 20 seconds

1 week (SQL tuning)

6 weeks

Thick database

development

500

3000

.2 seconds

1 week

Table 1: Comparison of conventional vs. thick database approach – Complex Batch Routine

2. Complex UI Tree - Version 1

Trying to support a complex tree navigation structure (something like the object navigator in products such as TOAD or SQL Navigator) provided the first evidence that conventional wisdom was not always appropriate. This project was developed in 2000 when trying to build the tree control for Dulcian’s BRIMÒ tool Repository Manager.

This was our first attempt at creating a complex Java program, so we followed conventional wisdom to the letter. Everything was placed in the middle tier. All queries were stored on the application server and no code was placed in the database. As a result, queries spun off child queries which in turn spun off other child queries (7 levels deep), and a single tree refresh took 12 minutes. Since then, the tree control has been rewritten a few times. Each time more and more logic was moved into the database.

Over time, views that flattened the hierarchical database structure were created and the UI design was refactored so that the tree is less deep, only requiring a single query. As a result, the tree refresh executes with sub-second response.

3. Complex UI Tree - Version 2

On a more recent project, we inherited a sophisticated tree written in Java using conventional wisdom. There was virtually no code in the database, and with the exception of a few views, all of the code was Java.

Performance was poor, mainly because of excessive round trips and unnecessary full tree refreshes. There were 12,000 lines of code spread across 50 classes making the code difficult to modify.

Using a complete thick database approach, the design was refactored so that all tree display logic was stored in the database. The tree is sent by the database to the user interface as an XML file. Modifications to the tree are also sent as small XML files that indicate tree-node refresh commands.

The revised structure still required quite a lot of code (4000 lines of database code and 4000 lines of Java), but the new design resulted in five times better performance for tree-related actions. Network traffic was decreased by 50%-98% for individual actions. Modifications to the tree now require several hours of development time rather than several weeks.

In the original system design, the initial load of the tree required transferring 350K of information and 12 round trips. The revised architecture required a 2K load in a single round trip.

The results are summarized in Table 2:

Database Code

# of lines

Java code

# of lines

Network traffic

Performance

Database development time

Java development Time

Conventional development

1000

16000

5-500K per round trip

3-20 seconds per transaction

1 week

6 months

Thick database

development

4000

.2-15k per round trip

<1-4 seconds per transaction

6 weeks

Table 2: Comparison of conventional vs. thick database approach in Complex UI Tree – Version 2

A Tale of Two Systems

It is somewhat unfair to apply the thick database approach to existing, poorly performing systems since this may only highlight its possible effects on projects that would particularly benefit from using the approach. Also, it is always easier to build something better the second time.

This example compares two standard OLTP systems that are similar in complexity. One was built using conventional wisdom and the other using the thick database approach.

The comparison described here convincingly demonstrated the success of the thick database approach. Dulcian created two systems within the last year. The first was an internal Modification Request Tracker built using the conventional approach by an experienced Java development team. The second was a complex order entry system built using the thick database approach by a team with equivalent experience.

An earlier version of the Modification Request Tracker had been built by an offshore, inadequately skilled development team. To create a working version of the system took about 6 months, at which point it was decided that the architecture was flawed. After another six months of conventional development, the next version of the system was still not adequate. Performance was poor, the code was hard to maintain, and the system was quite buggy. It consisted of 13,000 lines of Java code based on a few database-side views requiring about 2300 lines of code. The strutsconfig.xml file was 657 lines long.

A different team was tasked with supporting a complex order entry system. The thick database approach was used from the start with all navigation supported using a tree on the left hand side of the screen. The tree itself is built into the database and all navigation logic is handled in the database as well. When the user clicks a tree element, the application server makes a request to the database to determine the navigation path. Screen-specific views were created for each screen, so there is very little logic in the user interface Java code. The result was 2800 lines of Java code and about 4000 lines of database code. The strutsconfig.xml file consisted of 100 lines. The entire system is much easier to maintain. Significant refactoring has proved necessary in order to support changing requirements, handle bugs in the tools being used, and fix architectural errors. This refactoring has never required more than two weeks. The results are summarized in Table 3.

Database code

# of lines

Java code

# of lines

StrutsConfig.xml

# of lines

Database development time

Java development time

Conventional development

2300

13000

657

2 weeks

6 months

Thick database

development

3900

2800

2 weeks

1 week

Table 3: Comparison of conventional vs. thick database approach for two OLTP systems

Conclusions

In our experience, moving code from the middle tier to the database had the following benefits:

· Reduced the total amount of code

· Reduced development time

· Improved performance

· Reduced network traffic

· Reduced the complexity of the application

Therefore, the thick database approach is a viable alternative to the conventional wisdom of reducing reliance on the database. It leverages existing database talent and can result in dramatic improvements in performance. The current trend in moving logic into the middle tier may have been premature given the experiences detailed in this paper.

About the Author

Dr. Paul Dorsey is the founder and president of Dulcian, Inc. an Oracle consulting firm specializing in business rules and web based application development. He is the chief architect of Dulcian's Business Rules Information Manager (BRIM^®) tool. Paul is the co-author of seven Oracle Press books on Designer, Database Design, Developer, and JDeveloper, which have been translated into nine languages as well as the Wiley Press book PL/SQL for Dummies. Paul is an Oracle Fusion Middleware Regional Director. He is President Emeritus of NYOUG and the Associate Editor of the International Oracle User Group’s SELECT Journal. In 2003, Dr. Dorsey was honored by ODTUG as volunteer of the year, in 2001 by IOUG as volunteer of the year and by Oracle as one of the six initial honorary Oracle 9i Certified Masters. Paul is also the founder and Chairperson of the ODTUG Symposium, currently in its eighth year. Dr. Dorsey's submission of a Survey Generator built to collect data for The Preeclampsia Foundation was the winner of the 2007 Oracle Fusion Middleware Developer Challenge.