Dr. Paul Dorsey, Dulcian, Inc.
Many people question whether any part of the Unified Modeling Language (UML) can be used for data modeling. Some have suggested creating a new tool to explicitly support data modeling. However, with some extensions, the UML can be used very effectively to design databases.
With the advent of increasingly complex systems, a clear and concise way of representing them visually became increasingly important. The Unified Modeling Language (UML) was developed by Grady Booch, Jim Rumbaugh, and Ivar Jacobson as a response to that need. In order to create a single system for modeling and documenting information systems and business processes, UML was created with an underlying object-oriented analysis and design philosophy. To build successful systems, a sound model is essential. It communicates the overall system plan to the entire development team. As stated in the UML Summary Document (UML Summary, version 1.1, 1 September 1997, Rational Software et al.), the primary goals in designing UML were the following:
·
Provide extensibility and specialization
mechanisms to extend the core concepts.
·
Be independent of particular programming
languages and development processes.
·
Provide a formal basis for understanding the
modeling language.
·
Encourage the growth of the object-oriented tools
market.
·
Support higher-level development concepts such as
collaboration, frameworks, patterns, and components.
· Integrate best practices
For the last several years, I have been investigating the use of UML class diagrams to design databases and I now use this mechanism almost exclusively to design data models. Much to my surprise, the UML has proven itself to be a superior tool to ERDs. UML is now the standard environment for object-oriented design and development. The most commonly recognized parts of the UML modeling environment are class diagrams, which somewhat resemble ERDs. There is some debate within both the object-oriented and relational communities concerning the applicability of UML class diagrams for representing structural data business rules such as those traditionally articulated in ERDs.
Six years ago, I attempted to resolve this issue in conjunction with the writing of Oracle8 Design Using UML Object Modeling (Dorsey& Hudicka, Oracle Press, 1999). In the course of writing that book, I concluded that UML class diagrams had not been originally intended for designing data models but were suited to the task and, in some cases, superior to ERDs. UML Class diagrams with the appropriate extensions represent a significant step forward in data modeling. Structural business rules can be represented more easily and completely using an extended UML syntax than was ever possible with ERDs. This paper will show the important extensions to UML and demonstrate the advantages of using UML for creating data models that represent the structural business rules of any system.
For all of the reasons just stated, UML should be the
language of choice for building object-oriented systems.
I. Overview
of UML class diagrams
The first version of UML included the following diagram
types
1.
Class
diagram This
is the data modeling diagramming language. It is similar in scope to ER
modeling.
2.
Object
diagram This
is a class diagram for only one set of objects. Think of it as a data model
where you show example data rather than the whole data model. This is very
useful for explaining complex diagrams.
3.
Use case
diagram A
use case is similar to the idea of a “function” in Oracle’s CASE method. A use
case diagram shows the interaction among actors (for example, customers and
employees) and use cases. There is no analogue to this diagram in Oracle’s
methodology.
4.
Sequence
diagram A
sequence diagram shows an interaction of objects arranged in a time sequence.
This is similar to the process flow diagrammer in Oracle
Designer.
5.
Collaboration diagram (also
called communication diagrams) A collaboration diagram shows the objects and
messages that are passed between those objects in order to perform some
function. There is no analogue to this diagram in Oracle’s
methodology.
6.
Statechart diagram Statechart
diagrams are standard state transition diagrams. They show what states an object
can be in and what causes the object to change states. There is no analogue to
this diagram in Oracle’s methodology.
7.
Activity
diagram An
activity diagram is a type of flowchart. It represents operation and decision
points. This is similar to the data flowchart in Oracle Designer.
8.
Component diagram A
component diagram shows dependencies and organization among components.
9.
Deployment diagram (also
called implementation diagrams) A
deployment diagram includes the run-time processing node configuration.
UML 2.0 added four new types:
10.
Interaction overview diagrams Variation
of activity diagrams that includes an overview of the system process
flows
11.
Package
diagrams Subset
of class diagrams used to organize elements into related
groups
12.
Composite
Structure diagrams These
diagrams show the internal structure of items such as classes or use cases
including their interaction with other parts of the
system.
13. Timing diagrams These
diagrams are used to show changes in states over time.
The three most commonly used diagrams in data modeling are Use Cases, Class and Activity diagrams. In JDeveloper support is only provided for these three diagram types.
A.
Classes
B.
Attributes
Attributes are given very little attention in the UML. Most UML books barely mention them. This is unfortunate. Attributes in an OO design are a much richer construct than in ERDs. Not only do you have the normal attributes familiar to data modelers, you also have things like derived attributes. However, the biggest complication arises from generalization. Attributes may be inherited from abstract or concrete classes and the class you are in may itself be abstract or concrete. Each type of attribute in each case is handled differently. It is beyond the scope of this paper to fully explore this topic.
B.
Associations/Close Associations
An association indicates that one object has a link to another object. It does not tell anything about the nature of that link. Sometimes it is useful to model a tighter association between objects. You might want to say that one object is part of another, or that an object is partially defined by its associations to other objects. Such associations between classes are called close associations to distinguish them from regular associations.
From a modeling perspective, the association between a purchase order and a purchase order detail is quite different from the association between a project and an employee who is acting as project manager. In the purchase order/purchase order detail case, it does not make sense to have a purchase order detail without a purchase order. The detail is part of the parent object. A purchase is, to some extent, defined by its details. The details indicate what was purchased. In addition, details about purchase orders never move from one purchase order to another.
The association between a project and its project manager is quite different. Projects are independent objects. They are of interest to the organization regardless of who manages them. A project is not defined by who the manager is; and managers can easily be replaced on projects. Similarly, project managers are simply employees who, as one of their roles, can act as a manager of a project. An employee can also have other roles. An employee need not even be associated with any project and can manage several projects at once.
The relationship between a purchase order and its purchase order details is an example of a close association, because the child cannot exist without the parent. That is, they are closely related. From an implementation perspective, close associations are interesting because items that are close associations may have different requirements. Objects built from closely associated classes should be retrieved together, so they should be stored in such a way that makes retrieval of those objects efficient. Once a class is closely associated, you may want to prevent the changing of that link or to create, update, and delete closely associated objects together, so having them stored as some kind of grouped object makes sense.
Close
Associations
Using the UML, there are two different types of close
associations between object classes:
· Aggregation (also called “weak aggregation”)
· Composition (also called “strong aggregation”)
Composition and aggregation are new concepts for ER
modelers and will require careful explanation.
Aggregation
In the UML aggregation association, objects from one class collectively define the objects in the aggregation class. Class A is said to be an aggregation of class B if an object in class A is defined as a collection of objects from class B. Objects from class B need not be attached to any object from class A. The classic example of this kind of association is the one between a committee and a person, in that a committee is made up of the people on the committee. A committee can be defined as a collection of people.
Aggregation does not correspond to any concept in entity relationship modeling. This is a new concept of a relationship that is much weaker than the dependency relationship. Aggregation means that the two classes are more strongly related than a simple association, but they can still exist independently.
In the relational dependent relationship, the child object cannot be thought of outside the context of its parent. In aggregation, the parent usually cannot be thought of outside the context of its children. The aggregation is represented by an unfilled diamond in UML. Some classic examples of aggregation relationships in UML are shown here:
The other aspect of an aggregation association is that the
details (Person and Team in these examples) have relevance outside of the
context of their masters (Committee and League).
Sometimes, aggregation is used because of a unique workflow. One system encountered by one of the authors included some government contract change requests. These requests came in individually over an extended period and were eventually bundled together into a contract modification, as shown in Figure 1.
Figure 1: Aggregation example in ERD and UML
formats
If you show an aggregation in your class model, what does this mean for the generated structure? The child object is closely associated with the apparent object, but may also exist independently. The only impact is that if any part of the object is being modified, it should lock all related records in the aggregation.
If you are generating your user interface, aggregation
could also be used to indicate that the child objects can optionally be viewed
as parts of the parent object.
Composition
Composition (also called “strong aggregation”) is similar
to aggregation. Class A is said to be a composition of class B if each
object of class B is a part of an object of class A. Objects of class B may not
exist unless they are part of a specific object from class A. Class B objects
may not exist independently. An object from class B may not be a composition
child of more than one object at a time, whether it is from class A or another
class. In an aggregation association, the master is composed of its details, but
the details can be independent of the master. In a composition association, the
master is still composed of the details, but these details cannot be thought of
outside the context of the master. The dependency examples of
The formal rule in the UML is that the detail can exist independently of the master until it is attached to a master. However, from that point forward, the detail must always be associated with some master. The distinction suggested here between aggregation and composition is a more restrictive condition than required by the formal UML syntax, but is more logically clean and consistent with the way in which databases interact with these constructs.
Figure 3: Simple composition association
The definition of composition is similar to the dependency relationship in ERDs, but a bit more restrictive than ERD dependency.
Composition is only used to indicate that objects in the detail object class always belong to one and only one master and have no independent meaning apart from that master. Therefore, PO/PO Detail association is a good example of composition.
Composition in UML is slightly more restrictive than dependency in an ERD. For example, in an ERD you might want to say that a Course at a university is dependent upon the Department where it is offered. Furthermore, specific Offerings of this course are dependent upon the Course, as shown in Figure 4. However, this would not be a composition in the UML.
Figure 4: Composition (UML) Dependency (ERD) comparison
Notice how the UML in Figure 4 uses simple association. Actually, using composition in this case would not violate the composition definition in UML, nor would it violate our more restrictive definition. However, in practice, object-oriented designers only use composition when the composition detail objects are created and destroyed at the same time as the parent object. Because Courses and Course Offerings are created completely independently from their parents, composition should not be used in this situation. Thus, from an implementation perspective, you might want to use composition even though UML tradition would argue against it.
The implementation of composition is similar to aggregation. Modification of any one of the related objects should lock the whole group.
If you are generating your user interface, composition
could also be used to indicate that the child objects can only be viewed or
modified as parts of the parent object.
C.
XOR
In ERDs, you can specify that a particular instance of an entity can be associated with either an instance of one entity or another but not both. This is shown by a line that connects the two relationships. Of course, the same construct exists in UML. However, this structure is used far less frequently. The stronger generalization model in UML means that modelers will usually create an abstract generalization class attached to the association, thus eliminating the need for the XOR relationship.
In UML, you need not restrict yourself to XOR as the only
relationship among associations.
Other interactions among associations are possible, but are unusual and
beyond the scope of this paper.
D.
Generalization
The generalization association is a concept similar to that of a supertype/subtype relationship in an ERD. For example, an Employee can be either hourly or salaried as shown in this ERD:
In UML, the same concept can be represented as shown here:
This UML diagram indicates that you have an Employee class. The {abstract} constraint indicates that the class cannot have any independent objects. If the abstract constraint were omitted, this would indicate that it is possible to have an employee who is neither hourly nor salaried. Since the employee class cannot have any objects, what is its purpose? If there are attributes defined for the Employee class, they are inherited by the Hourly and Salaried classes. For example, a First Name and Last Name attribute defined for the Employee class would automatically be inherited by the Hourly and Salaried classes.
Associations to the Employee class are also inherited. An association between Employee and Department as shown in diagram A of Figure 5 also means that the salaried and hourly classes inherit the association to the department class just as if it had been drawn as shown in diagram B of Figure 5.
Figure 5: Association Inheritance
Methods are also inherited. For example, defined methods such as “Hire,” “Fire,” or “Give Raise” in the Employee class would automatically be inherited by the Hourly and Salaried classes.
II.
Translation of Class Diagrams to a Relational Database
Translation from a class diagram to a relational database is not obvious. Of course, classes more or less map to tables and attributes map to columns. But the situation is more complex.
Support for generalization is particularly problematic. The traditional approach is to generate a table for each class. Either the inherited attributes are inherited resulting in denormalized tables or inherited attributes are not inherited, requiring a multi-table join. Neither of these situations is viable. This leaves the modeler with two alternatives:
1. Don’t use generalization.
2. Only use generalization for analysis and remove it for the implementation model.
Derived attributes provide a similar sort of problem. If designers use them, you end up with redundant columns in the database usually resulting in 3NF violations. If a generator is going to translate classes into tables and attributes into columns, then either the modeler must not use many class diagram elements or the resultant database will not be 3NF, usable, or both.
The
alternative to direct translation of classes to tables is to generate both the
table and an interface object (a view,
There are several products that are relevant to discuss that
use UML for data modeling.
A. Oracle’s
JDeveloper 10g
Oracle’s JDeveloper 10g product has two mechanisms for generating tables from class diagrams. The first is from the Entity Object Modeler in the Application Development Framework (ADF) Business Components (BC) portion of the product. Oracle has architected its own middle tier component originally called Business Components for Java (BC4J) and now marketed as the business component portion of its Application Development Framework.
The normal usage of the business component framework is to start out with a fully-formed relational database and build middle tier components, generating most of the structure from the database. Developers can then modify this structure, adding significant business logic in the middle tier.
To accommodate requests from users wanting to model within the same tool, Oracle added the capability of first building the middle tier components and then using these components directly to generate the database. The problem with this approach is that the mapping from business component elements to the database is relatively simple-minded:
· Business component entity objects are directly translated to relational tables.
· Entities become tables.
· Attributes become columns.
Using this approach, foreign key attributes must be manually specified prior to generation. During generation, tables are dropped and recreated so that even the simple addition of an attribute cannot be done if there is already data in the tables.
The second JDeveloper
mechanism for data modeling is an explicit class diagram where classes can be
stereotyped as tables. This is a relatively straightforward database modeler
where users define tables, columns, foreign keys, check constraints, etc. as in
any other modeling tool. There is no notion of generalization or derived
attributes included.
In both of the JDeveloper modeling mechanisms, the metadata repository is stored in XML files where the information is readable and, to some extent, editable. However, users should stick to the IDE or underlying APIs to interact with these XML files.
Oracle’s support for generating the database from JDeveloper 10g does not approach the vision of this paper where a designer might create a logical class diagram and have the product generate the appropriate database.
Pros: Tight
integration with database, part of JDeveloper
Developers want to be able to create models within JDeveloper. JDeveloper 10g introduces a physical database modeler that allows users to specify tables, columns, and foreign key relationships using UML class diagrams.
JDeveloper is second to none as a Java IDE. As data modeling evolves, it will be a valuable part of the tool.
OO-centric designers who are satisfied with the generation algorithm will be happy with the product.
Cons:
Somewhat limited in scope
Full UML-based modeling including inheritance, aggregation, and composition capabilities would be a welcome addition to JDeveloper at some point. The 10g release includes the beginnings of a solid data modeling tool using a limited subset of the UML.
The Database Modeler cannot be used yet for complete logical and physical database design. Oracle Designer should still be used for that purpose. JDeveloper users without access to a full-featured database design tool such as Oracle Designer may find JDeveloper's modeling capabilities adequate for simpler applications.
B.
Paradoxically,
The repository for Rational Rose is in its own proprietary
document format, which is accessible and editable but making changes is not very
easy. Where this tool excels is in the development of the software management
process. Many resources have been devoted to the software development
architecture, but resources devoted to database design are lacking. It would be
useful to find out how the Rational team views database design from reading
white papers on the
In both Rational Rose and JDeveloper, database design virtually seems like an afterthought or a reflection of the relatively minor role played by database design in many OO-centric development teams.
Rational Rose Data Modeler is a visual modeling tool that
makes it possible for database designers, analysts, architects, developers and
anyone else on your development team to work together, capturing and sharing
business requirements, and tracking them as they change throughout the process.
It provides the realization of the ER methodology using UML notation to bring
database designers together with the software development team. With UML, the
database designer can capture information such as constraints, triggers and
indexes directly on the diagram rather than representing them with hidden
properties behind the scenes. Rational Rose Data Modeler gives you the freedom
to transfer between object and data models and take advantage of basic
transformation types such as many-to-many relationships. This tool provides an
intuitive way to visualize the architecture of the database and how it ties into
the application.
Pros:
Industry-standard, Java-friendly
It is the best tool for generating tables that look like
classes.
Cons: Odd
generation algorithm
For all of its strengths, either you will end up with a poor database design or you will not use much of the richness of class diagrams. Classes and attributes will get directly translated into tables and columns.
C. Dulcian’s
BRIMÒ[1]
Dulcian, Inc.’s offering for data modeling using UML employs a business rules approach to fully generate systems using an “executable UML” approach. BRIM works exclusively in the Oracle environment and is not portable to other database structures.
Within BRIM, a class
diagram is specified including inheritance, derived attributes, etc. Views and
relational database tables to support the class diagrams are simultaneously
generated. Using this approach enables BRIM to include a much richer
specification within the class diagram than is available in other tools. Derived
attributes and generalization are also explicitly supported. The BRIM
repository is stored in an Oracle database which can be queried or updated
through APIs.
One disadvantage of the additional functionality that BRIM provides is some loss of developer control and flexibility regarding how things are generated. In comparison, JDeveloper and Rational Rose do not include as much functionality as BRIM or force a specific generation algorithm as is the case in BRIM.
BRIM chose to use object IDs (OIDs) as the physical primary key for all tables. It still stores and enforces the logical primary key, but uses OIDs to keep the implementation simpler.
Generalization sometimes causes redundant columns to be
generated in underlying tables. The
views that interface with the tables keep the data from getting out of
synch.
Pros:
Repository-based, rich functionality
BRIM generates both tables and views so that translating from a class diagram to a database results in both a good database design to make DBAs happy as well as a set of structures to make an OO development team happy.
Once the business rules have been placed in the BRIM
repository, the system is generated. This is one of the real strengths of the
BRIM environment. It is not necessary to wait for the system to be complete
before generating a first version. BRIM developers should get into the habit of
generating a system as soon as enough of the system has been entered to test.
Additional system pieces can be quickly generated, supporting a
The BRIM repository is a set of Oracle tables. Population of the repository need not be done exclusively through the Repository Manager. A complete set of APIs exists (some used by the Repository Manager), any or all of which can be used to manipulate the repository.
Cons: Oracle
only; Highly proprietary solution
BRIM only works in the Oracle environment and takes a very
strong stand on the “right” way to generate a database (and indeed the whole
system). If you buy into the
philosophy of the product, you will be very happy, but if you are looking for
many options in the generation algorithms, BRIM will not support these
options.
Conclusions
UML class diagrams are a great way to do data modeling.
Unfortunately the tools to support this approach are still evolving. OO centric
tools (like
ABout the
Author
Dr.