Looping the Loop: Different Ways of Working with Recursive Structures

Introduction

Recursion in data modeling is an unfortunately underutilized feature. The current crop of developers (most of whom come from the JavaEE side of the world) seems even less inclined to use recursive structures.

This paper will describe the single most useful recursive modeling pattern in significant depth. This pattern has stood the test of time. It has been used on many projects, large and small over the years and solves a host of messy problems.

This paper assumes that the reader understands basic recursive database tables and can write a simple CONNECT BY clause. In addition, it is assumed that you know how to store data using a tree structure in a single recursive table.

Recursion can be used for various structures such as linked lists for contract versions, etc. But by far the most common problem that recursion solves is the storage of tree structures. Even the venerable EMP table in the SCOTT/TIGER schema represents a hierarchical structure of employees.

However, a simple recursion on an object table is not as useful as one might think. The problem is that data values tend to change over time, and those changes are “of interest” (meaning they must be kept). So, this leads many people to create a clumsy model that is very difficult to work with, as shown in Figure 1.

Figure 1: Simple Commonly Used Recursion Model

 

In order to recognize the fact that the object changes over time, or that associations change over time, it is tempting to put Start and End Dates everywhere or create a VERSIONING table off to the side of the THING table. This leads to a model that is incomprehensible to "mortals," and the developers begin to question the sanity of the data modelers.

Herein lies the main problem with recursion. Simple recursion is inadequate to model business needs. The way in which the basic model is adapted to meet those needs results in a convoluted mess. In order to avoid this mistake, this paper describes a clean, elegant, recursive design that can support a wide range of requirements.

Business Case

As an example, this paper will describe a business case based on a real system and lay out the solution to the recursion problem. The case is based on a tree structure, which is the most common place to use recursion. The other main use for recursion is object versioning (for example: contract modifications). The various alternatives for how to deal with object versioning deserve their own paper and will not be discussed here.

System Requirements:

·         Support an organizational tree structure. There are 6 levels in the organization. People must be placed in the organizational tree.

·         The tree changes over time. Events occurring at a specific point in time should roll up using the tree that was valid at the time of the event.

·         Identify changes to take place at a future time and automatically put them into effect at the appropriate time.

·         For reporting purposes, organize information using both functional and geographic hierarchies. This will allow for different rollup structures for reports.

The Basic Model

The core pattern is a reusable tree structure as shown in Figure 2.

 

 

Figure 2: Reusable tree structure

 

The idea here is to pull the tree and tree node out of the core object (Thing1 and Thing2) classes.

Tree Class

Each object in the Tree class represents a tree structure of a particular type for a period of time. The Tree class tends to have the following attributes:

·         Name – A logical name for the tree. Frequently not used, but seems like a good idea at design time.

·         Description – Also a nice idea that is rarely ever used.

·         StartDate and EndDate – Dates for which the tree is valid.

·         Type – VERY important. This is the type of the tree. In this example, “FUNCTIONAL” or “GEOGRAPHIC”. Even if there is only one kind of tree in the system, always include a Type attribute. Eventually, you will need more than one tree type before your system goes into production.

·         Status – Whether the tree is Current, Future, Past, or, Potential

 

Placing dates on the tree structure may seem like an odd idea. This means that the tree will need to be versioned for every little change in the structure. If you want to make a change to your Org structure, you will need to clone the whole tree. However, this tends to be the best alternative.

Tree Node Class

Each object in the Tree Node class is a node on the tree. It is primarily a set of pointers to “Thing” classes. It can also be a simple Folder node that does not point to any “thing” object.

There is no Start/End Date attribute in this class. Time dependency is only at the tree level. This means that many records might be stored in a different design, but this makes for a very clear and easy design with which to work.

 The Tree Node class tends to have the following attribute:

·         FolderName   Either a node points to a “thing” or is itself a grouping folder.

Other than the pointers to the “Thing” classes and some kind of primary key, there are no other attributes in this class.

Thing1 and Thing2 Classes

These represent the standard object classes in your model. They could be OrgUnit or Person classes. The point of Thing1 and Thing2 is to demonstrate that you can have a tree with more than one kind of thing in it. Note that in real systems, the class names tend to be domain-specific. For example: OrgTree, OrgTreeNode, and Org.

Strengths of the Model

This model pattern has many advantages:

1.       Each tree exists as its own recursive structure. Once you have identified the tree of interest, the query is a simple recursive query and there is no need to deal with dates in node elements.

2.       The model pattern is reusable whenever you need a tree structure. It becomes very easy to see the “trees” in your data model. Once developers have coded one such structure, the coding for other structures is similar.

3.       The model makes it clear what you are modeling. Rather than seeing odd recursive relationships here and there, or obscure association classes, this kind of structure is explicitly storing trees.   

Limitations of the Model

It is a good idea to have your structural business rules enforced by your data model. However, this model does not completely enforce everything. There are various rules in our real system example that are not enforced by this model:

1.       Only specific types of Orgs are allowed to be children of other types of Orgs; or Orgs (Thing1) are not allowed to be children of People (Thing2). This model allows any “thing” to be a child of any other “thing.”

2.       There should only be one tree of a specific type that can be valid for any date range. This is obviously not enforced.

Model Extensions and Implementation Issues

In the process of working with this model, a few unexpected and interesting issues arise:

·         How can you manage scheduled changes to the model (ones that did not yet happen)?

·         How can you correctly report over the desired period of time?

There are obviously alternative solutions to these problems. This paper describes the actual solution that is now being used in production.

Future Trees – The Time Machine

"We all have our time machines, don't we? Those that take us back are memories...And those that carry us forward, are dreams." – H.G Wells

There are different ways of handling future changes, but after a number of experiments we arrived at the following solution:

·         Create a special LOG table with complete description of the required change and the moment when the change should be applied.

·         An AUDIT table to stores and maintain the implemented change transactions.

·         Apply changes using a database job, fired after midnight.

·         Mutually exclusive future changes (person received a privilege, but it was later revoked) null themselves out.

·         Meaningless future changes are automatically detected and removed (Ex. a scheduled change of the person’s role can be deleted if the person is to be removed from the tree entirely)

·         Successfully applied changes are moved from the LOG table into the AUDIT table; unsuccessful changes generate emergency email to the support team.

The process of maintaining future changes is as follows:

·         When changes must be scheduled, a special temporary “Future Tree” is created as a clone of the current one. It is also possible to clone only a part of the tree starting from the specified node (if you need to work only on a single department, it is possible to check out only that department)

·         To prevent concurrent future modifications, when a user enters a tree modification request, the root node of the relevant portion of the tree is locked, which automatically locks all dependent nodes.

·         If there are scheduled changes in the LOG table between now and required future date for the checked out part of the tree,  these changes are also applied to the future tree

·         All changes to the temporary tree are converted into an event in the LOG table with the appropriate date of scheduling.

·         When edits to the tree are complete, the temporary tree is removed

 

The main advantage of using these strategies is the clear visibility of all changes and the sequence of their application. This allows you to resolve all scheduling conflicts directly rather than using some kind of complex analysis.

Information Over Time

Technically, this question should be split into two possible cases, namely information as of now, and information between two dates. Both of these cases require similar thinking patterns, but with slight variations in implementation.

The problem of data "as of now” is purely performance-related. It is not difficult to find the most recent Org tree and its children. The issue is that, in the active system, the frequency of such requests is too high. If the total number of rows in the TreeNode table is in the millions, the cost of processing could become unacceptable. The solution to this problem is the introduction of the concept of “current snapshot" with the following characteristics:

·         Implemented as a de-normalized materialized view, where all levels of the tree are represented as separate columns. For example, if the depth of the hierarchy = 6, there will be 6 separate columns to store IDs of corresponding objects.

·         In addition to a flattened hierarchical structure, the view also contains the most often called data elements (names of the organizational units, personal first/last names, phones, address, email etc.). The idea is to have the majority of necessary information directly accessible without additional joins.

·         Refresh is done either by request (if there are critical changes to be “published” immediately) or during the midnight database job.

·         There are a lot of indexes on all possible columns.

 

The other problem (data between two dates) usually arises in regard to reporting requirements. It is clear that there may be many valid hierarchies over a period of time. So, what happens if you request a report like First Quarter Sales?   Should the tree as of the beginning or the end of the quarter be used?  Obviously, neither of these solutions works. If a sale made at the beginning of the quarter is made by an Org that rolls up to OrgA and a sale at the end of the quarter is made by an Org that rolls up to OrgB, the report should correctly credit Orgs A and B respectively.

To be able to provide meaningful reporting information, the system needs two sets of data:

·         Day-to-day application logic – All required events must record a full hierarchical rollup as of the moment of occurrence (using the same example, 6 levels mean 6 columns in the log) and exact timestamp. Because of the extra processing cost involved in this recording, it makes sense to do this kind of extended logging only when necessary.

·         Data maintenance and storage – It is critical to query which hierarchical chains were active between defined timestamps as quickly as possible. Even if all of the information is available, having hundreds of OrgStructures and millions of nodes could make the reporting process very user unfriendly. Our solution is to introduce a table that can be appended to answer the “flipped” question. Instead of answering the question "What organization belongs to what tree?," the table shows the length of time that the specified hierarchical chain existed (i.e. Office A11 was part of Department A1 in Sector A between Jan 1 and Feb 1). This table is appended any time that a scheduled change to the organizational tree is applied (secondary task of the “time machine”).

Programming Language Aspects

From the coding side of working with hierarchical models, there are three “generic” topics to consider:

1.       Recursion in PL/SQL

2.       CONNECT BY clause and all related functions

3.       Common table expressions (CTE), aka recursive sub-query factoring

Before going back to the main example and showing how things are being done in the real production environment, a review of the  afore mentioned options is useful.

 

Recursion in PL/SQL

Surprisingly enough there has not been a lot written about recursion in PL/SQL. Everybody “knows” that any PL/SQL unit can call itself, as shown in this classic factorial example:

CREATE OR REPLACE FUNCTION f_factorial_nr(in_nr INTEGER) RETURN NUMBER AS

 BEGIN

       IF in_nr in (0,1) THEN -- Checking for last value to process of n-1

            RETURN 1;

       ELSIF in_nr < 0 then

            return null;

       ELSE

          RETURN(in_nr * f_factorial_nr(in_nr-1)); -- Recursive

     END IF;

 END;

 

The problem is that this example does not really deal with tasks that people usually solve by using recursive PL/SQL code. To be fair, it is very infrequent when server-side code is utilized for heavy mathematical tasks. Usually it is a bit less about pure number-crunching, and more about some kind of repetitive data-related process, which completely changes the situation. Now the list of potential issues becomes much broader than “how not to create infinite loop” as explained here:

1.       Cursors: Creating FOR-loops with the recursion call inside is a VERY BAD idea. The reason is that until you reach the bottom of the parent-child link, all cursors are kept open. If there are different processes using the same program, this means that you may have scores of cursors active, which is very resource-intensive. If you must implement such logic, the right idea is to bulk fetch the results of the query to the collection on each level and spin through objects of the collection as shown here:

 

Bad idea

create or replace function f_processLevelDown_tx (i_fk number) return varchar2 is   

    v_out_tx varchar2(32000);

begin

    for c in (select * from emp where mgr = i_fk)

    loop

        dbms_output.put_line(c.ename);

        p_processEmp(c.empno); -- some kind of processing

        if v_out_tx!='OK' then

            raise_application_error(-20999,v_out_tx);

        end if;

    end loop;

   

    return 'OK';

exception

    when others then return 'E:FK'||i_fk||'> error:'||sqlerrm;

end;

 

Good  idea

create or replace function f_processLevelDown_tx (i_fk number) return varchar2 is   

    v_out_tx varchar2(32000);   

    type rec_tt is table of emp%rowtype;

    v_tt rec_tt;

begin

    select * bulk COLLECT into v_tt from emp where mgr = i_fk;

 

    if v_tt.count()>0 then

        for i in v_tt.first..v_tt.last loop

           p_processEmp(c.empno); -- some kind of processing

               v_out_tx:=f_processLevelDown_tx(v_tt(i).empno);

            if v_out_tx!='OK'

            then

                raise_application_error(-20999,v_out_tx);

            end if;

        end loop;

    end if;

   

    return 'OK';

exception

    when others then return 'E:FK'||i_fk||'> error:'||sqlerrm;

end;

 

2.       Variables: All local variables exist only in the current scope. This means that if you need to have some values visible across the whole recursion call, these values must either be passed down to the child call as input parameters or stored as global PL/SQL variables in a separate package. Usually my rule of thumb is the following: If the value is used directly in the child and nowhere else, it is a parameter; if there the same value could be used in multiple places, it is a global variable (of scalar type if the value can be overridden or collection type if multiple copies of the variable should be kept active). The previous example perfectly illustrates a case when passing values as parameters is appropriate.

 

The other case can be illustrated by the following example:


-- Processing should continue, but the department with the failure should be marked as corrupted
          ...

        for i in v_tt.first..v_tt.last

        loop

            if main_pkg.v_process_tt(v_tt(i).deptno)!='Processing failed!'

            then
                p_processEmp(c.empno); -- some kind of processing

                v_out_tx:=f_processLevelDown_tx(v_tt(i).empno);

            end if;

           

            if v_out_tx!='OK'

            then

                main_pkg.v_process_tt(v_tt(i).deptno):='Processing failed!'; -- special package variable

            end if;

        end loop;
        ...

 

3.       Transaction control: Each time a new PL/SQL block is called, a new “nested” transaction is initiated that automatically inherits all transaction-level properties of the parent one. This also means that if your function is marked as an “autonomous transaction,” each recursive call would also spawn another autonomous transaction. Therefore, all transaction-level resources would be separate for each call.

4.       Exception handling: There are two main issues with recursive structures, namely how to know precisely where an error occurred, and what to do after the problem is detected. From our experience, the first issue should be handled manually. Oracle is much better at presenting a full trace of the exception, but normally this information is meaningless unless there are additional user-defined values associated with it, such as chain of input parameters up to the point of failure. After the problem is detected, the most efficient handling is usually to completely roll back changes to the moment before the recursive structure was called. This is much simpler (see the example) and significantly less complex than a full hierarchical cleanup (although, this option may or may not be available in the case of explicit commits or autonomous transactions as part of a recursion).

-- Main caller

declare

    v_tx varchar2(32000);

begin

    savepoint beforeLoop;

    begin

        v_tx:=f_processLevelDown_tx(7839);       
        -- business logic failure

        if v_tx!='OK' then

            rollback to savepoint beforeLoop;            

        end if;

    exception

        when others then
            -- abnormal failure

            rollback to savepoint beforeLoop;

            raise;

    end;

end;

 

CONNECT BY Clause and Related Function

Oracle’s way of working with recursive data structures could be described in a single statement, but then you need two hours of additional explanation to cover the related details. For the purposes of this paper, there is no good reason to discuss the syntax details, but the following example covers all of the key elements:

 

select SYS_CONNECT_BY_PATH(empno,'|') path_tx,   

       CONNECT_BY_ROOT ename root_tx,

       CONNECT_BY_ISCYCLE isCycle_yn,

       CONNECT_BY_ISLEAF isLeaf_yn,

       LEVEL level_nr,

       a.*

from emp a

start with mgr is null

connect by  nocycle mgr = prior empno

order siblings by ename

 

This example includes the clause to link the parent/child structure and the list of Oracle built-in functions. These elements cover everything needed to work with hierarchical data structures. However, when you step out of the SCOTT/TIGER environment, the real-life problems are slightly different.

WHERE Clause

Sometimes developers do not recognize that they are falling into the performance trap until it is too late and the code is already in production for some period of time. One classic case is having WHERE-clauses in the recursive query. Using the original example, when trying to get all Tree Nodes belonging to a certain tree, the data volume will encompass 1.6 million nodes, 800 trees (2000 nodes per tree) up to 6 levels deep, with indexes are created on all related columns (TreeNode PK, TreeNode RFK, TreeNode FK).

Unfortunately, the simplest version of the code took significantly more time than expected, namely 2 minutes of runtime, a full table scan on the Detail table, and 3.6 M consistent gets.

select *

from TreeNode

where Tree_oid = :1

connect by TreeNode_rfk = prior TreeNode_oid

start with TreeNode_rfk is null

 

The reason is very simple. Filters from the WHERE clause are applied to the result set only AFTER the whole hierarchical lookup is finished. This means that the START WITH clause will have to scan the whole table to identify all root nodes (RFK is null), build all trees, and only afterwards filter out what exactly is needed. In short, this is a bad idea! Oracle did exactly what was requested; we received correct results; but the path took significantly longer than needed.

To help Oracle, explicitly specify that only objects only related to a certain tree are needed as shown here:

select *

from TreeNode
where Tree_oid = :1

connect by TreeNode_rfk = prior TreeNode_oid

start with TreeNode_rfk is null and Tree_oid = :1

 

This version of the query behaves much better, requiring only 0.08 second of execution time, 5K consistent gets, and all index operations. Although, this query is not 100% perfect due to the nature of the tree, if the root node belongs to the tree, all detected children will also belong to the same tree. This means that applying an additional WHERE-clause filter is completely redundant, so the final version should look like this:

select *

from TreeNode
connect by TreeNode_rfk = prior TreeNode_oid

start with TreeNode_rfk is null and Tree_oid = :1

 

It is also reasonable to mention a popular alternative to solve the described performance problem, namely "nested SELECTS," as shown here:

select *

from (select *
          from TreeNode
          where and Tree_oid = :1)
connect by TreeNode_rfk = prior TreeNode_oid

start with TreeNode_rfk is null

 

Since this query explicitly articulates exactly what we are trying to do, it is used partially as “self-documenting” code. It runs as fast as the previous case, but if you take a closer look at the execution plan, it is clear that Oracle rewrote this query in the following way:

select *

from TreeNode

connect by TreeNode_rfk = prior TreeNode_oid and Tree_oid = :1

start with TreeNode_rfk is null and Tree_oid = :1

 

So, we are back to square one (plus an extra redundant condition in the CONNECT BY).

Overall, as you can see from the examples in this section, you need to help Oracle correctly identify the root node in the most efficient way. It should also be obvious that having indexes on all columns involved in the hierarchical queries is mandatory.

Joins

Life becomes even more interesting when you need to join multiple tables. Assume that you need to build a tree containing all names of all nodes, but only primary assignments from the people involved are needed. The straightforward solution would look as follows:

 

Select nvl(r.fullName_tx,o.UnitName_tx) childName_tx,

          d.*

from TreeNode d,

     Person r,

     OrgUnit o

where d.Person_oid = r.Person_oid (+)

and    d.OrgUnit_oid = o.OrgUnit_oid (+)

and    (d.Person_oid is null or d.role_cd = 'Primry')

connect by d.TreeNode_rfk = prior d.TreeNode_oid

start with d.TreeNode_rfk is null and d.Tree_oid = :1

 

Examining the Explain Plan, it is clear that the CBO takes different approaches to different parts of the WHERE-clause. Joins are applied before (and while) linking the tree, while filters are applied afterwards. But if the filters are referencing only the OrgTree table, this means performing joins on more nodes than is really necessary. The alternative could look as follows:

 

Select nvl(r.fullName_tx,o.UnitName_tx) childName_tx,

         d.*

from (select *

      from TreeNode

      connect by TreeNode_rfk = prior TreeNode_oid

      start with TreeNode_rfk is null and Tree_oid = :1) d  ,

     Person r,

     OrgUnit o

where d.Person_oid = r.Person_oid (+)

and    d.OrgUnit_oid = o.OrgUnit_oid (+)

and    (d.Person_oid is null or d.role_cd = 'Primry')

 

In this variation of the same query, the process of tree building is done first, the filter is applied to its result, and only then are all joins made. This change of order saves about 10% of the processing time (at least in this example).

Common Table Expressions

It is valid to say that a standards-based development environment is usually a good thing. Unfortunately, it is also valid to say that the standards may or may not make sense. From the point of view of open-source adherents, CONNECT-BY has a major flaw since it is Oracle’s own extension of SQL, while majority of other database vendors support a feature called “Common Table Expressions” (CTE). Starting with Oracle version 11g R2, the same mechanism was also introduced as “Recursive Sub-query Factoring.” The following is the core example:

With employees (empno, name, mgr) as

 (

 -- anchor

 select empno, ename, mgr

  from   emp

  where  mgr is null

  union all

 -- recursive block

  select e.empno, e.ename, e.mgr

  from   emp e,

         employees m

  where m.empno = e.mgr

) search depth first by name set seq

select empno, name, mgr, seq

from   employees

 

The idea has several advantages:

1.       Run the anchor part of the UNION ALL query to get root elements.

2.       Pass a set of root elements to the second part of the query and get the next set (second level) of records

3.       Repeat step 2 until no rows are accessed.

The biggest conceptual difference between CTE and CONNECT-BY is the fact that theoretically CTE works by SETs of rows, while CONNECT-BY works row-by-row instead of passing down a set of parent objects, Oracle would take one record at a time and look for its parents. This is usually the most highly touted advantage of CTE because this kind of SET-based thinking should significantly improve performance and allow for a higher level of flexibility in your SQL statements.

The reality is slightly more complex. CTE can do everything that can be done by CONNECT BY, while the reverse statement is not true. However:

·         Since Oracle had CONNECT-BY starting with version 2, it is very well optimized. We ran a number of tests, and found out that the CBO can build much more efficient plans for it.

·         CONNECT-BY has built-in functions and although it is possible to hand-code them, the code may be difficult to read. Many others may not agree, but it is up to the reader to decide (See the following links: http://rwijk.blogspot.com/2009/11/recursive-subquery-factoring.html and http://technology.amis.nl/blog/6104/oracle-rdbms-11gr2-goodbye-connect-by-or-the-end-of-hierarchical-querying-as-we-know-it)

·         In SQL Server, under the hood, the engine is doing row-level processing (the analysis is explained here  http://explainextended.com/2009/11/18/sql-server-are-the-recursive-ctes-really-set-based/). This means that if you are trying to build platform-independent code, the real behavior may still be different.

 

Overall, from the technical side, there is no good reason to use CTE at its current level of implementation. From the functional side, proponents of CTE tout the fact that it allows reusability of results in the recursion, while CONNECT-BY needs everything to be pre-calculated. In this case they may have a point, but only up to a certain extent as shown in this simple example including a set of employees (with salaries) and set of bonuses.

·         Bonuses are described as percentages of salaries.

·         Employees may have multiple bonuses associated with them.

·         If multiple bonuses are applied, the following one is applied to the already corrected amount (not the original)

o    10% and 15% bonuses <> 25% bonus. It will be calculated as X+0.1*X+(X+0.1*X)*0.15 = X+0.265X

·                     Total salary after the application of all bonuses cannot be more than 20% of the original amount

·                     We need to create a query that would run a payroll.

This is a case when CTE looks like a logical solution because it is necessary to apply the first set of bonuses, check conditions, apply the second set of bonuses, check conditions, and repeat, as shown here:

 

with bonussummary (empno_nr, name_tx, salary_nr, maxsalary_nr, bonustotal_nr, 

               bonuslist_tx, paycheck, bunuscount_nr )

as

(

select empno_nr, name_tx, salary_nr, salary_nr* 1.2 maxsalary_nr,

       0 bonustotal_nr, ' ' bonuslist_tx, 0 paycheck, 0 bunuscount_nr

from t_emp

  union all

select empbonus.empno_nr, empbonus.name_tx,  empbonus.salary_nr,

       bonussummary.maxsalary_nr,

       bonussummary.bonustotal_nr + empbonus.salary_nr / 100 * empbonus.bonus_nr,

       bonussummary.bonuslist_tx || '<B:' || empbonus.bonusid_nr || '>',

       least(empbonus.salary_nr + bonussummary.bonustotal_nr + empbonus.salary_nr / 100 * empbonus.bonus_nr,

             bonussummary.maxsalary_nr) paycheck,      

      bunuscount_nr + 1

  from (select e.empno_nr, e.name_tx, e.salary_nr,  b.bonusid_nr, b.bonus_nr, b.bonustype_tx

         from t_empbonus eb,

              t_bonus b,

              t_emp e

        where e.empno_nr = eb.empno_nr

          and b.bonusid_nr = eb.bonusid_nr

        )empbonus,

       bonussummary

 where empbonus.empno_nr = bonussummary.empno_nr

   and instr(bonussummary.bonuslist_tx, '<B:' || empbonus.bonusid_nr || '>') = 0

   and empbonus.salary_nr + bonussummary.bonustotal_nr < bonussummary.maxsalary_nr

),

FinalPaycheck As

(

  SELECT

    bonussummary.*,

    ROW_NUMBER() OVER (PARTITION BY EMPNO_NR ORDER BY paycheck DESC) AS RowNumber

  FROM bonussummary

)

select *

from FinalPaycheck

where rownumber = 1

order by empno_nr, maxsalary_nr

 

It is clear that the solution is slightly less convenient and readable than desired. Although it is possible to understand what is going on, it requires a skill set significantly above that of the average developer (especially in terms of debugging). The same problem implemented as PL/SQL code would be much more maintainable and simple enough that it does not need to be presented here. Overall, our recommendation would be to be aware of CTE, but stay away from it (unless there are very solid reasons to do otherwise).

Conclusion

There is a lot more to be said about managing recurring structures on all levels (coding best practices, database storage tuning, performance issues, etc.), but these topics cannot be covered in a single paper. The critical message is that developers should not be afraid of hierarchical data and coding structures. It is possible to effectively use them to solve real-life problems. In addition, as with any other complex structure, it is important to understand what is going on “under the hood” in order to make the most of this functionality. The Oracle RDBMS environment is sometimes too rich to just blindly make architectural decisions. For example, switching to CTE instead of CONNECT BY is currently probably premature. However, being aware of all of the existing built-ins and ways of internal query optimization can save you from reinventing the wheel.

About the Authors

Dr. Paul Dorsey is the founder and president of Dulcian, Inc. an Oracle consulting firm specializing in business rules and web based application development. He is the chief architect of Dulcian's Business Rules Information Manager (BRIM®) tool. Paul is the co-author of seven Oracle Press books on Designer, Database Design, Developer, and JDeveloper, which have been translated into nine languages as well as the Wiley Press book PL/SQL for Dummies. Paul is an Oracle ACE Director. He is President Emeritus of NYOUG and an Editor of the International Oracle User Group’s SELECT Journal. In 2003, Dr. Dorsey was honored by ODTUG as volunteer of the year, in 2001 by IOUG as volunteer of the year and by Oracle as one of the six initial honorary Oracle 9i Certified Masters. Paul is also the founder and Chairperson of the ODTUG Symposium, currently in its eleventh year. Dr. Dorsey's submission of a Survey Generator built to collect data for The Preeclampsia Foundation was the winner of the 2007 Oracle Fusion Middleware Developer Challenge and Oracle selected him as the 2007 PL/SQL Developer of the Year. Paul can be contacted at paul_dorsey@dulcian.com

 

Michael Rosenblum is a Development DBA at Dulcian, Inc. He is responsible for system tuning and application architecture. He supports Dulcian developers by writing complex PL/SQL routines and researching new features. Mr. Rosenblum is the co-author of PL/SQL for Dummies (Wiley Press, 2006). Michael is a frequent presenter at various regional and national Oracle user group conferences. He won the ODTUG 2009 Best Speaker Award. In his native Ukraine, he received the scholarship of the President of Ukraine, a Masters Degree in Information Systems, and a Diploma with Honors from the Kiev National University of Economics.