INTRODUCTION TO OBJECT MODELING IN ORACLE8

 

Dr. Paul Dorsey

Dulcian, Inc.

 

Overview of the Modeling Environment

 Currently, within relational database design, the standard way we do logical modeling for databases is using Entity Relationship Modeling (ER Modeling). The most common syntax used within Oracle is that laid out by Richard Barker in his relational modeling book (Case Method: Entity Relationship Modelling (Computer Aided Systems Engineering, 1990, Addison Wesley). Relational modeling has served us very well for the last ten years. But relational modeling has a somewhat limited vocabulary for representing data related business rules. I am not advocating that we jump on the object modeling bandwagon simply because it is the latest trend, but rather because most modelers feel hampered and constrained by the limitations of the ER modeling structure.

The Unified Modeling Language (UML) Alternative

 Unified Modeling Language (UML) will not answer all of our modeling needs. We will still deal with an environment where some of the data-related business rules cannot be represented by the UML data model. However, UML certainly gives us a more flexible system that allows us to accurately represent a much greater percentage of the data-related business rules. As the limitations of ER modeling are enumerated, hopefully, you will understand why there is a compelling need to replace the current industry standard of ER modeling with the much more flexible and robust object-oriented UML.

 

Problems with Entity Relationship Modeling

 The first problem with the ER paradigm is reflected in its name. The core idea behind ER modeling is that we can reasonably represent the data model of an organization through entities and their relationships with one another. Even on the surface, this seems to be a grossly over-simplified and inadequate language for describing an organization’s information. The obvious advantage of UML is the addition of process information to the data model. Even without this information, ER modeling still has serious data-related limitations.

In order to better understand the differences between ER Modeling and UML, the terminology must be clearly defined. First, a discussion of the types of entities used in ER modeling is necessary. There is really only one type of entity in an Entity Relationship Diagram (ERD). There are different kinds of entities in the world. The most obvious and serious limitation of ERD’s is the inability to distinguish between commonly used core entities such as Purchase Orders (PO’s) or Employees (frequently referred to as Major Entities) and so-called Minor Entities such as Code Description Lookup Tables.

 

"Association" or "Intersection" entities that arise from a many-to-many relationship between two other tables are clearly a different type of entity. However, there is no way to indicate this in a traditional ERD. Entities that are time-related (i.e. time is part of the unique identifier) are also not able to be distinguished. In many cases, this distinction would enhance the effectiveness of the diagram.

Another type of entity is one that is an abstraction of other entities. For example, in a retail model, you might recognize that sales, purchases, invoices and receipts of goods are all similar types of structures and can be represented as commodity movements. There is no way in an ERD to show that the commodity movement structure is an abstraction or genericization of other structures.

Within the ER modeling environment, we are limited to one type of modeling entity. One might argue that there are really two types of entities in ER modeling:

  1. Subtypes
  2. Supertypes

In reality, these types reflect a relationship between two entities rather than a different type. All that an entity has to distinguish it, other than its name, is some number of attributes. It is not possible to represent what the allowable actions are or how that entity functions within the organization.

 

Using UML 

The important thing to recognize is that, for people familiar with ER modeling, the shift to UML will not be a traumatic one. Serious modelers are well acquainted with the limitations of Entity Relationship Diagrams. With UML, some, but by no means all, of the limitations are eliminated. Some of the things we dislike about ERDs are still present with UML. Even worse, there is one place where Barker’s ERD notation is actually more robust than UML.

Keep in mind that the notion of an "Entity" in ER modeling is being replaced with the notion of an "Object Class" in UML. There is some discrepancy in the literature concerning what is meant by an "object." Here are some definitions from various sources:

 

  1. "…Object-oriented systems represent information in units called objects which consist of data and a set of operations to manipulate them. An object can be an invoice, a filing cabinet, a type of employee, or a computerized representation of a part in a jet engine. Each includes data and logic enabling it to do certain things. For example, the engine object includes data about the characteristics of the engine and software that determine the kinds of things that the engine can do, such as rotate in a certain direction." Don Tapscott & Art Caston, Paradigm Shift, McGraw-Hill, 1993, p. 172.
  2. "Object: An element of a computer system that has a unique identity, state (represented by public and private data), and public and private operations (methods) that represent the behaviour of the object over time." Glossary definition from George Wilkie, Object-Oriented Software Engineering, Addison-Wesley Publishing Co., 1993, p. 386.
  3. "An object is some distinct, identifiable thing for which the system must store data in order to perform the fundamental activities in the system." James A. Koval, Analyzing Systems, Prentice-Hall, 1988, p. 184.
  4. "Objects are active components that exhibit behavior when stimulated by messages, or transactions." Andersen Consulting – Arthur A. Andersen & Co. S.C., Foundations of Business Systems, 2nd Edition, The Dryden Press, 1989, p. 233.
  5. "Object: A component of a logical database description that represents a real-world entity about which information is stored." Glossary definition from James Martin, Information Engineering: Book II – Planning & Analysis, Prentice-Hall, 1990, p. 474.
  6. "Object….(2) A structure in an object-oriented program that contains an encapsulated data structure and data methods. Such objects are arranged in a hierarchy so that objects can inherit methods from their parents. (3) In DB2, a term used to refer to databases, tables, views, indexes and other structures." Glossary definition from David M. Kroenke, Database Processing, Prentice-Hall, 1995, p. 583.
  7. "Object…(2) In object-oriented approaches, a distinct person, place or thing with relevant knowledge or actions." Glossary definition from Thomas A. Bruce, Designing Quality Databases with IDEF1X Information Models, Dorset House Publishing, 1992, p. 535.

Note that with the exception of Methods, which are PL/SQL functions and procedures, the definitions of an Object Class are the same as the definition of an entity. In particular, James Martin’s definition (#5) is exactly how most of us would describe an entity.

The proper definition of an object class in this context is broader than that of an entity. You can use objects to describe process information as well. However, for this discussion, "object" will be used in the context of data objects. An object class is something of interest to the organization or a means of classifying something of interest. The crucial point is that an entity or object class always represents something in the real world. This is one of the key mechanisms you can use to check the validity of your data model. Can you articulate precisely the real-world "thing" that each Object Class corresponds to? If it cannot be precisely articulated, then you do not have a valid data model. THERE ARE NO EXCEPTIONS TO THIS RULE!

 

Object Classes and Methods

The most important thing to do is to clearly describe each object or object class. A good description never contains the phrase "This object contains information about…." Keep in mind that an object must be something in the real world. An object, is, in itself, a generalization of something of interest. We can discuss objects and classes (instantiations) of objects.

 

Let’s look at the simplest sounding example, which is actually a fairly complex modeling problem: Names, Addresses, Phone numbers, and Contacts for an organization. In the ER modeling world, most people have reached the conclusion that creating a separate special entity for Address, Phone number and Contact is a good idea. It allows people in an organization to have multiple addresses for different purposes as well as tracking old addresses, multiple phones used for different purposes (general information, billing information, order status), appropriate contacts for mailings, calls, faxes, etc.

We have a class called Address and the object is a specific physical address at a specific physical location. For those people familiar with ER modeling, this is the same as an entity and an instance of that entity, which frequently translates to each row in a particular table. This construct of "object of a class/instance of an entity/row in a table" is useful in naming object classes.

For the Address object class, a good definition would be "a specific, physical address" since each object belonging to this class is a specific, physical address. You could also define each instance of the entity or each row in the Address table in the same way. All mean the same thing. Without this clause, it is easy to be sloppy about object class names. It is common to describe a whole class as "all the addresses for the company" in one case and, in another case, describe it in terms of specific elements. Descriptions at the object level are the most useful since it is ultimately objects that you will work with. Using this convention for describing an object class with a description of a representative object of that class is highly recommended. An accurate description of the object class is much more important than coming up with a good name for the class.

To try to capture all of this richness in attributes within every table that can have an address or phone number is a very difficult problem. The obvious object classes are clearly Address, Phone number and Contact. These will certainly be kept track of. An object can also be a classification of something of interest such as address use (billing, home) phone use (voice, FAX) etc. There is an independent classification needed for phones as well, i.e., physical type of phone. Alternatively, we may want physical classifications for addresses, tracking whether they are apartments, homes, offices, etc. There are some additional wrinkles that pose interesting modeling problems such as multiple addresses that are linked (multiple offices within one address), multiple office buildings, multiple mail stops. However, that level of detail is rarely necessary.

For ER modelers, simply think of an Object Class as being synonymous with an Entity. However, there is one very significant difference, namely that Object Classes are associated with "Methods." Methods can be thought of as PL/SQL functions and procedures that act on the Object Class. If you think about Object Classes translating into tables, Methods are PL/SQL functions that interact with the tables. For example, for an Employee Object Class, the associated Methods would include Hire, Fire, Give Raise, Assign to Dept., Assign to Committee. It is possible to make this list of methods exhaustive so that developers need only interact with these Methods rather than directly manipulating the data in the Object Classes. This ability to isolate the developers from the physical structure of the Object Classes is considered a primary advantage of the UML technique.

 

Of course, this is all theoretical nonsense. In reality, in the near future, I don’t foresee throwing this layer of abstraction on top of Object Classes, thereby hiding the structure from developers. However, the ability to create these Methods for the purpose of building partial APIs to the data structure is a particularly powerful concept, especially in the development of packaged applications. Through Oracle roles, you might want to restrict some class of developers to accessing only an Object Class’ Methods without direct access to the Object Class itself.

 

Diagramming Relationships

There are several classifications of relationships in ER Diagramming. The first consists of:

  1. Cardinality
  2. Dependency (the UID bar)
  3. Subtype/Supertype

This list does not provide nearly enough flexibility to describe the myriad of possible relationships in a complex data model. It is not possible to visually indicate one entity being a genericization of another or show that one entity is a classification (code lookup) for another entity.

A. Cardinality

Of the relationships that do exist, cardinality is extremely limited. The appropriate cardinality model between two entities is:

Somewhere between N1-N2 instances of entity1 relates to somewhere between N1-N2 instances of entity2.

This concept of full cardinality representation in modeling is not new with UML. Even some of the early ER diagramming techniques used similar notation. Barker’s "crow’s feet" notation is slightly more readable, but still terribly limiting. All that can be described in Barker’s cardinality is that 0, 1, or N of one entity is related to 0, 1, or N of another entity. This prevents us from representing many data-related business rules. One-to-many relationships can easily be represented as shown in Line 1 of Figure 1. This reads as 0 or 1 objects in Object Class A relate to any number of object in Object Class B. Note: * is shorthand for 0…N

Figure 1: Cardinality Relationships

 

Other relationships cannot be represented easily at all in an ERD and in Designer/2000. Trying to represent these relationships would result in the generation of very convoluted tables. For example, if our business rule is that there are exactly five people on a team, trying to represent such a business rule with an ERD is quite difficult. UML makes this much simpler as shown in line 2 of Figure 1.

 For a business rule that says "we have at least two members on a committee," again, a clumsy ERD is necessary to even describe this rule but is easily represented in UML as shown in line 3 of Figure 1.

 Finally, a rule that says "you cannot have more than 20 people on a committee is virtually impossible to represent clearly on an ERD but is simple in UML as shown in line 4 of Figure 1.

 The previous examples demonstrate that the cardinality flexibility of UML is far superior to that of ERDs.

 

B. Dependent Relationships

Dependent relationships can be represented in three different ways:

In all three cases, the same idea is represented. The definitions discriminating between each of these dependent relationship types are not well defined. Aggregation is the weaker bundling. Composition is a stronger bundling. It is still not clear to me where qualified association fits into this group. To add to the confusion, Oracle’s Object Database Designer treats aggregation and composition in the same way and doesn’t include qualified associations at all.

Conceptually, using aggregation to indicate a definition of the parent object where the child objects may exist independently makes the most sense. For example, if I were representing the relationship between committee and people, a committee is composed of people; but people have an independent existence outside of that committee. Therefore, aggregation is the logical representation for this relationship. Composition denotes an even stronger relationship where the child records have no meaning outside of the context of the parent records. For example, for Purchase Order and Purchase Order Details, composition is the appropriate representation since PO Detail has no meaning in isolation from the PO. The obvious structures to use in each case would be nested tables for aggregation and an array of references in the parent table for composition. However, you might want to consider an array of references in both instances to allow for independent access to the detail (child) table.

Qualified association means that object class names are written in multiple places on the diagram and should not be used. See Figure 2:

 

Figure 2: Dependent Relationships

 C. Subtypes

At first glance, UML syntax appears to be equivalent to ERD syntax when it comes to subtypes. Using the example of hourly/salaried employees, there is no apparent advantage to UML syntax over ERD syntax as shown in Figure 3:

  

 Figure 3: Subtype Example A

 

However, because of the UML syntax, we can represent a particular object class being subtyped simultaneously in multiple ways. An example of this is an Object Class for Contracts which can be classified as either fixed-price/time and materials (T&M) or also government/private. To represent this in UML is very simple as shown in Figure 4.

 

Figure 4: Subtype Example B

 

Representing this as an ERD requires a Cartesian product of all possible subtypes. In this example, it is possible. However, if each of the subtypes had five or more values, then the Cartesian product becomes unwieldy.

 

Many-to-Many Relationships

ERDs are again limited in the ability to represent many-to-many relationships. In an ERD, a many-to-many relationship is represented with an intersection table with 2 one-to-many dependent relationships; whereas with UML, we can use a much more natural notation that identifies the intersection entity as clearly belonging to the relationship as shown in Figure 5.

 

 

 

Figure 5: Many-to-Many Relationships

In UML, the intersection entity is called an "Association Class." According to UML theory, this Association Class is a formal intersection whose unique identifier is exactly one object from each of the subordinate classes. In this case, it would provide an unreasonable restriction that an employee can only be associated with a particular company, at most, one time. This structure does not yet exist in Oracle’s Object Database Designer. I hope that when this is implemented, they will not adhere to standards requiring such a restriction.

 

Multiple Classification

A particular object class may exist which consists of two or more other object classes. For example, a person can be both a customer and an employee. In ERDs, this is referred to as "non-exclusive subtypes." The standard PO construct of classification is traditionally handled with either a single entity with all attributes from all types included and filled in where appropriate or with separate entities for each structure using optional mandatory 1-1 relationships as shown in Figure 6:

 

 

Figure 6: Multiple Classification

The UML notation for this is somewhat cleaner and clearly shows the classification relationship between the object classes.

 

 Disadvantages of UML Diagramming

As mentioned earlier, UML is not a panacea. It does not solve all of the problems associated with ER modeling but does make our task somewhat easier. There are some disadvantages to UML:

 

  1. There is no concept of an arc. Consequently, if you have "OR" relationships, in order to represent those in UML, you have to create an aggregating structure. For example, if you wanted to say that a contract can be with either a person or a company, you need to create an artificial object class called "a contracting entity" and relate that to the contract as shown in Figure 7:

     

     

     

     

     

     

     

     

     

     Figure 7: UML "OR" Relationship

     

  2. We are still missing a way of denoting time-related entities. These are an important type of Object Class, just as they are an important type of Entity requiring special treatment. It would be useful to be able to visually identify time-related object classes. This problem in ERDs has still not been solved in UML.
  3. One of my ongoing complaints with respect to ER modeling had been an inability to represent two relationships going to the same entity sharing information. For example, in Figure 8, a contract is initiated either with a company or through a particular agent’s affiliation with a company. If you represent your model this way in Designer/2000, you will bring the company ID into the Contract table twice. In order to get around this problem for your physical data model, you either need to physically modify tables in the left structure. Or, in order to generate cleanly from
  1. Reference tables are still a problem. It is possible to use dynamic classification; however, I would prefer an ability to somehow flag an Object Class as a "Reference Object Class." These can be handled fairly well in UML. However, Oracle’s Object Database Designer does not support this. "Dynamic Classification" refers to the fact that the number of referencing values is indeterminate. The UML notation for this is shown in Figure 9.
  2. One useful feature of this notation is that not only is the name of the reference table visible, but also a few sample values. This information is extremely helpful.

 

 

 

 

 

 

Figure 9: UML Notation for Reference Tables

 

Physical Implementation of Relationships

 In general, objects (entities) translate into tables in the database. In a relational environment, this means directly translating the entity (object) to a table that is a flat file. Each column in the table corresponds to one of the entity’s attributes. Each row corresponds to a specific instantiation of the entity

The idea is to link the records from one table back to another. In Oracle7, the way this is done is through foreign key relationships. In Oracle8, it’s not so simple. The implementation of new UML structures in Oracle8 is incomplete. As a result, even though these structures are available, limitations in the current implementation may cause us to question whether to use them. There will be other Oracle8 papers at this conference so it is not the intention of this paper to discuss the strengths and weaknesses of Oracle8 objects. However, we will briefly discuss two different kinds of new structures:

  1. Nested Tables: In nested tables, for all intents and purposes, the datatype is a table itself. For example, in a PO table, you can have a column called "PO Detail," which is itself a table.

    The disadvantage to such a structure is that the only access to a PO detail is through the PO table making inventory usage reports more complex.

  2. Array of References: In Oracle8, rather than using foreign keys, you can use references similar to CODASYL pointers. Because you can now use Arrays as datatypes, you can store arrays as references.

This allows you to store parent/child relationships as an array of references in the parent table. This construct preserves the detail table independently.

 

 

 

Inheritance

One of the most commonly discussed central features of object-oriented development is inheritance. If you have an object with associated functions or procedures (methods) and an object class (similar to a table) that this object is a generalization of, those Methods are inherited by the child object class. For example, in considering the traditional entity relationship classic subtype example of salaried and hourly employees, salaried employees would inherit the Methods of employees. Inheritance is not yet implemented in Oracle8 causing one to question what the advantages of using object-oriented design might be.

 

Advantages of Using an Object-Oriented Approach

I think that, with Oracle8, we are still in roughly the same environment conceptually as we were with Oracle6. With Oracle6, we could declare referential integrity constraints but the database didn’t enforce them. Similarly, we have some object-oriented structures in Oracle8 but they are not yet fully implemented

This does not mean that we shouldn’t start doing object oriented design. There are some places where we can take advantage of the new Oracle8 structures. Even if we choose to implement our object-oriented designs using version 7.3 constructs, our designs will be better if we shift our thinking in an object-oriented direction.

I started shifting my logical database design towards object-oriented structures about a year ago. Since that time, I have seen my database designs improve greatly because of the object-oriented approach. Specifically, the biggest advantage I’ve noticed is that the number of entities (and ultimately tables) in my models have greatly decreased. The structures are much more robust and flexible so that, as the inevitable new user requirement pops up, the probability that I will have to make a significant change to the data model is greatly reduced. Two recent examples have made this advantage very clear to me.

In designing a retail system, there was a large number of small sets of structures that all looked similar from a structural standpoint, namely PO Requests, POs, Invoices, Receipts of Goods and Inter-company Transfers. Applying object-oriented thinking allowed me to recognize that, in all of these cases, we were talking about movement if merchandise. As a result, instead of having separate structures for all of these constructs, we created a single structure called "Merchandise Movement" that greatly simplified the model, in addition to making it more robust. Late in the design process, the users came up with a new requirement to support Purchase Returns back to vendors. Because of the object-oriented approach, we were able to easily accommodate these changes without major changes to the data model.

In another situation at a large financial organization, we were faced with different aggregation paths for stock portfolios. The existing legacy system required two sets of structures, one for the main aggregations and a separate one for the ad hoc rollup structure. This required a great deal of manual intervention and maintenance. Because of the two separate structures, the original database had 81 entities. After redesign using object-oriented structures, the total number of entities was reduced to 27. This not only created a smaller, easier to maintain system, but also included more flexibility and capabilities than the original system.

 

In both of these cases, object-oriented design resulted in cleaner data models, increased system flexibility and easier maintenance. However, because we are still waiting for fully functional UML products to hit the market, both systems were created using Designer/2000.

UML will not magically make you design object-oriented databases any more than using ERDs made you develop 3nf data structures. It will be just as easy to build bad databases using UML as it has been to build bad databases using ERDs. In addition, because UML products such as Oracle’s Object Database Designer (the UML portion of Designer/2000) will generate some Oracle8 structures automatically, there may be a greater likelihood of building databases that are completely unusable than was the case with the more limited 7.3 structures.

 

Oracle’s Object Database Designer

Oracle’s Object Database Designer product (ODD) is now in beta release ODD uses a partial implementation of UML to help design object-relational databases. It will be interfaced with the Designer/2000 product to assist designers in building Oracle8 DDL. Unfortunately, this product is only a partial implementation of UML. A number of key structures are missing including multiple classification, intersection object classes and dynamic classification for reference tables. Nevertheless, the product is a good first effort and could be used to design object relational databases.

 

Conclusion

Though still somewhat flawed, UML represents a significant step forward in modeling notation. Not only are we able to represent more data-related business rules using UML, it also encourages us to think in a more object-oriented fashion with respect to our designs. Unfortunately Oracle’s Object Database Designer is not a full implementation of UML. Nevertheless, it is a significant improvement over ER modeling and should be used.

 

ERD’s are dead. There is absolutely no reason that people should still use a design method, which, in almost every way, has been improved upon by UML.

©1998 Dulcian, Inc.