Repository-Based
Dr. Paul Dorsey, Dulcian, Inc.
Introduction
Ensuring that data is valid is one of the core requirements of most information systems. Many data validation requirements are easily supported by the database or application through simple properties. Properties such as data type or length are so simple that it is tempting to say that these rules should not even be independently handled, but simply a part of the data structure. However, this is often not the case. For example, in the situation where customer information is periodically accepted from an external data source, the raw data should be entered into a table for manipulation and validation, even if it violates rules. The mechanism handled by SQL Loader which takes the rejected records and writes them to a separate file is neither convenient nor user-friendly. It is much easier to allow the data file to enter into the system with no validation rules whatsoever and then apply the validation rules to the records now comfortably stored in the database. At that point, it is possible to identify which business rules have been violated by the entire logical record.
Many business rules, especially complex rules involving multiple records, cannot be practically enforced by the database. Even if it is possible to enforce these rules in the database, it is often not desirable to do so. It is frequently better to allow the invalid data to be entered into the system so that it can be corrected and made valid at a later time. There is another reason for handling validation rules independently. The validation of information often does not take place until a particular point in a logical object’s process flow. For example, a merchandise order needs to be valid before it is processed. Sometimes particular validation rules are contingent and may only apply to an object under specific circumstances. Numbers on a budget may be considered valid if revenues “are close to expenses” whereas in an accounting system, for each journal entry, debits must equal credits to the penny.
Validation rules are often very complex and span multiple database tables as demonstrated in the following examples:
· Ensuring that the dates where a person resided in a specific place are later than that person’s date of birth involves comparing information that resides in multiple tables.
These are just a few examples demonstrating that validation business rules are not simply properties in the database and cannot be easily handled through database triggers.
What is the
appropriate solution for handling validation?
Traditionally, validation rules are embedded in code
scattered throughout data entry applications, creating a maintenance nightmare
where even finding a specific rule can be difficult, let alone changing it.
“What are all of the validation business rules being enforced?” is an almost
impossible question to answer. One solution to this dilemma is to collect all of
the business rules into a single code structure. Traditional Oracle shops may
create a large PL/SQL package. Java shops might create a large collection of
classes to reside in the middle tier. Users of JDeveloper may use the
Application Development Framework Business Components
(
Not only are there a large number of validation rules, but these rules are constantly changing. Validation rules are more volatile than most other kinds of system requirements. This is not only due to changes in the business, but also due to the fact that users do not always recognize that the specified rules have many exceptions. The conditions under which many of the rules are enforced need to be continually refined.
Why haven’t existing systems collapsed under the weight of
The optimal solution is to develop some type of rule grammar and/or repository to support all of the validation logic. This paper proposes a solution involving both elements, namely, a simple grammar as well as a small repository.
Case Study:
The Problem – Description of Project
The project that inspired the solution described in this paper was a recruiting system designed and built for the U.S Air Force Reserve. During the recruiting process, there is a great deal of information gathered about individuals being admitted into the Reserve. In addition to the standard information, for security reasons, an individual’s education, employment, residence and medical histories are all gathered. Security clearances are handled by a different agency. Therefore, all of the information collected must be bundled into an XML file and sent off for processing by a separate organization. There are hundreds of logical rules associated with the data in this XML file. Some of these rules are very complex. For example, there cannot be any gaps in an individual’s employment history. The document describing all of the rules is summarized in a 22-page Excel spreadsheet, where each row is a separate rule. The narrative description of the rules is contained in a 250-page monograph.
From looking at these rules and performing initial testing, it was clear that the task of validating the XML file would have required thousands of lines of code. In addition, the intention was to reuse the solution for other military service branches, some of which might be interested in a Java-based solution or some other programming language. Therefore, having to create massive amounts of code that might have to be re-written in another language was not a viable option.
For these reasons, the decision was made to use a repository-based approach.
Case Study:
The Solution – Validation Rules to Support Security Clearance
Validation
Initially, a one-table repository to handle the validation rules was envisioned. For rules involving multiple tables, views would be created joining those tables, and rules would be written against these views. For example, in the Contract table, the “Start Date must be less than End Date” rule was written as follows:
:StartDate < :EndDate
For a rule stating: “Every department must have at least one employee,” a view would be created for the Department table that would include the number of employees in that department. The rule would look like this:
:NumberofEmployees > 0
This architecture would have worked, but would have meant building and maintaining a large number of views with a lot of hand-coded PL/SQL to generate the calculated columns. Therefore this approach was abandoned fairly early on in the project. The architecture was too limited. Managing the rules in this environment would not have been significantly easier than writing them all in code. Further, if forced to move into a Java/XML environment, potentially not even using a database, the entire concept of using views would fall apart.
It was therefore decided to extend the validation rule grammar to support, not only references to columns in the table, but also take advantage of the parent-child relationships between the tables in order to unambiguously reference data values in a different table when running a validation rule.
The syntax used to accomplish this is as follows:
:_Child to refer to child table (the “many” table in a 1-many relationship)
:_Parent to refer to parent table (the “1” table in a 1-many relationship)
In order to say that the employee hire date must be greater than the create date for the parent department, the rule would be associated with the Employee table and be written as:
:_Parent.Department.CreateDate < :StartDate
For referencing the child table, there are usually many child records, so some type of aggregator method is called. To indicate that each department must have at least one employee, the rule would be associated with the Department table and be written as:
:_Child.Employee.Employee_OID.Count > 0
The grammar was designed to support all standard aggregation functions (Sum, Count, Min, Max, etc.).
It was also necessary for this project to filter the child rows, so a WHERE clause was added to the method call. For example, to enforce the rule that the department must have at least one active (isActive = ‘Y’) employee, the rule would be associated with the Department table and be written as:
:_Child.Employee.Employee_OID.Count(where = :_Child.Employee.isActive = ‘Y’) > 0
If there were more than one relationship between the two tables, there had to be some way to identify which relationship was the correct one to use for the join. In this case, the foreign key column was specified in the code. In the above example, if the foreign key column in the Employee table were DepartmentEmployer_OID, then the rule would be associated with the Department table and be written as:
:_Child.Employee(DepartmentEmployer_OID).Employee_OID.Count(
where
= :_Child.Employee.isActive = ‘Y’) >
0
It is apparent that these rules can eventually become quite complex to read so it is just as easy to write “real” code. There was one rule that was so complex that hand-coding was used, even though the grammar could support the description. However, out of the 22 pages of rules, there were only a few rules that required any hand coding. Even the “Debits must equal Credits” rule for a journal entry is not too difficult to enforce. This rule would be associated with the JournalEntry table and written as:
:_Child.JEDetail.Amount.SUM(where = :_Child.JEDetail.DrCrType = ‘Dr’) =
:_Child.JEDetail.Amount.SUM(where = :_Child.JEDetail.DrCrType = ‘Cr’)
Contingent
Rule Execution
The next problem encountered was that the rules were only contingently being executed. For example, if the “Debits must equal credits” rule is only for “Financial” transactions that are “ReadyToBeProcessed,” then this rule would be associated with the JournalEntry table and be written as:
(:_Child.JEDetail.Amount.SUM(where = :_Child.JEDetail.DrCrType = ‘Dr’) =
:_Child.JEDetail.Amount.SUM(where = :_Child.JEDetail.DrCrType = ‘Cr’) )
and :JEType = ‘Financial’
and :Status = ‘ReadyToBeProcessed’
Now the rule is starting to look a lot more complex. In this system, there might be many similar rules for “Financial - Ready to be processed” journal entries. It was decided to move the conditional part of the rule to its own property. Then the rule would be associated with the JournalEntry table and be written as:
Condition: :JEType = ‘Financial’
and :Status = ‘ReadyToBeProcessed’
Rule: (:_Child.JEDetail.Amount.SUM(where = :_Child.JEDetail.DrCrType = ‘Dr’) =
:_Child.JEDetail.Amount.SUM(where = :_Child.JEDetail.DrCrType = ‘Cr’) )
This was also the point where the decision was made to use a multi-table repository. Since object reuse was the driving force in the repository design, every element was created to be reusable.
Description
of the Rules Repository
The rules repository was implemented in an Oracle
database. A Rule is grouped
by Project. A Rule can belong to any number of Projects through the RuleUsage table.
This was done so that the same rules could be used in various
contexts. For example, there is one
set of rules that validates whether or not a customer can be sent for a security
clearance, but only a subset of those rules are relevant if the customer is
being processed without a security clearance. Note that there is an active
indicator in RuleUsage. This is so that a rule could
be declared as active for debugging purposes.
A rule is optionally attached to a Condition. A Condition is a Boolean expression
indicating whether or not the rule should be invoked. In practice, about half of
all rules have conditions attached to them.
A Validation is the basic rule that is enforced (e.g. :StartDate < :EndDate). It
has a many-many attachment to the Rule table through RuleDetail. A Validation can also be grouped using a ValidationGroup and then attached as a group to a
rule.
Note that a Rule, Validation, and Condition are all attached
to a specific table. Theoretically, these objects could be reused across tables
if the column names were exactly the same. However, this idea seemed to
unnecessarily complicate the model.
Error messages are built from the Condition and the Validation error_tx. The
user enters a user-friendly error message for the Condition and the Validation.
The system then uses that text to build the error text if the rule fails. For example, assume a rule on the
Department table with the following repository values:
Condition
Rule_tx = :Active_YN = ‘N’
Error_tx = ‘the department is inactive’
Validation
Rule_TX = :endDate is not null
Error_tx = ‘there must be a valid end date’
If the rule fails, the generated error message would be: “If
the department is inactive then there must be a valid end date.”
Since the system is created with reusable components, the error messages can be overridden for a specific usage. The Condition error_tx override is stored in the Rule class, and the Validation error_tx override is stored in the RuleDetail class.
A UML diagram of the rules repository is shown in Figure 1.

Figure 1: Rule repository data model
User-Friendly
Rules
“For each Department
where the department is inactive (active indicator = ‘N’), there must be a valid
end date (end date is not null).”
Note that the names of the columns in the generated rule include the user-friendly names from the repository rather than the actual column names.
The
Validation Repository Manager
The level of object reuse in the system made it difficult to create a user interface for entering and maintaining rules. It is helpful to think of a simple master-detail relationship between the Rule table and individual validations. However, rules are reusable across different logical applications and individual validations are reusable across individual rules. The result was an application built with a simple master-detail relationship between rules and validations from a user perspective. The complexity of the many-to-many relationship is hidden from the person entering the rules. When a user specifies a new validation in a rule, a selection can be made from an existing validation, or a new one can be entered. When viewing validations or rules, users can see how many times each of these objects is used elsewhere in the system. This prevents those entering the rules from modifying an existing rule or individual validation and causing inadvertent side effects elsewhere in the system.
It was also challenging to communicate clearly to users what actual error message would be generated for a particular rule. The error message could be specified at the object or object usage level, both for conditions and individual validations. A button was added to the user interface to generate a sample error message that users could read to validate the appropriateness of the message in each context.
Screen real estate in such a complex system was also a problem. In addition to rules, error messages and descriptions in large text boxes, there was an English language translation of the rule displayed. This problem was solved by using a combination of on demand popup windows and scrolling text fields.
It became important to find and navigate to particular
validations and rules. A separate Rule Finder was built where users could enter
search criteria. Rules satisfying the entered criteria would appear in a list.
Double clicking the rule navigated to the selected rule in the
system.
Rule
Enforcement
Once the rules have been specified, they must be enforced. For validation, this means that when objects fail a validation rule, some error message is generated. Up to this point, rule enforcement has not even been mentioned, nor has any specific environment been specified. This is the core idea of the business rules approach. The representation of the business rules is independent of the enforcement of the rules. It is possible to determine the appropriate logical grammar to describe the rules and in what context they will be enforced without making any assumptions about the physical representation of the data or the way in which the rules will be enforced.
In using the rules approach to building systems, approximately 30% of the time is spent designing the repository and developing the rule grammar. Another 10% of the time is spent creating the appropriate user interface to work with the repository. Writing the code generator/enforcement mechanism requires approximately 10% of the project time. Entering the rules into the repository consumes the remaining 50% of the time. This breakdown indicates that even drastic changes to the way in which rules are enforced will only require a relatively small amount of effort (a few days to a few weeks of effort).
There are various alternatives, each of which is achievable and surprisingly simple to implement:
1.
Interpreted mode: Leave the rules in the
repository and generate the code using an EXECUTE IMMEDIATE mechanism. This is
the simplest alternative and is frequently used for a proof-of-concept. The
downside is performance. Every rule that is enforced requires an EXECUTE
IMMEDIATE operation. If there are only a few rules, this mechanism works well;
however, if the number of rules is in the hundreds or thousands, severe
performance degradation will be experienced.
2.
Generate PL/SQL into the database (Compiled
mode): Using this alternative assumes that the data exists in Oracle tables and
straightforward PL/SQL will be generated to execute the validation logic. The
downside to this approach is that changes to the rules require regenerating the
code. If everything is generated into a single package, this can require up to
30 seconds each time the code is regenerated. A 30-second recompile time might
not seem long, but in a debugging cycle with numerous iterations of rule
tweaking, waiting 30 seconds for each recompile is not desirable. The main
benefit of this approach is its speed at runtime. By contrast, using the
interpreted mode instead might require 30-40 seconds at runtime.
3.
Generate Java to run against XML: In order to
perform validation away from the Oracle database (for example, to run on an
isolated laptop for later upload to the database), it is necessary to validate
without access to Oracle. This can be done by generating Java code to perform
the same validations against an XML file. PL/SQL code generators have been
successfully migrated to an XML data source using the Sunopsis utility that allows the use of SQL to access an XML
data source.
For the project described in this paper, the Compiled mode approach was used for version one. For the next version, the third approach will be used.
In addition to simple error messages, it was also important for the Validator to provide information allowing users to identify the reason(s) for the validation failure. The following information was provided to users:
· Object being validated when the rule failed
· Value of each field referenced in the rule
· Place in the user interface where the information could be corrected
The ultimate goal was to allow users to double click any field referenced in the error message and navigate to the appropriate portion of the application user interface.
A screen shot of the repository manager is shown in Figure 2:
Figure 2: Repository Manager User Interface
Lessons
Learned: The Benefits of Repository-Based Development
This project was a great success for the repository-based development approach. The time spent building the architecture and generators was more than offset by the speed at which hundreds of rules could be supported. The validation engine built makes further rule specification extremely rapid.
As more and more business rules are moved into repositories and out of code, the benefits of repository-based coding have become more and more apparent. It is easier to make objects reusable and they are far easier to manage in a repository than as code objects. It is also easer to make changes to the repository and perform an impact analysis of potential changes.
At some level, it feels like a “better” way to do
programming. With very little
development time required to create a repository manager, a more efficient IDE
for application development is created than can be built using tools with
millions of dollars of development effort such as TOAD for PL/SQL, JDeveloper or
Eclipse for Java. One can only wonder what could be achieved in repository-based
development if it attracted the attention of one of the major software
development vendors.
About the
Author
Dr.