Data Modeling

ID#

Question

Answer

1

Is anyone aware of a recommendation regarding the use of surrogate keys within an Oracle implementation ?

Does using the natural key aid developer understanding of the data model (that is, what type of data a table holds)? 

 

There are technical pros and cons to both sides. The biggest problem with using surrogate keys is that it is much easier to make a bad data model by using surrogate keys. Data Models we have encountered that use surrogate keys usually have more normalization errors. 

Consider many of the object-oriented models that most  people generate. The unclear thinking could be helped by considering what the logical primary key should be.

Using a mixture may be the best solution. Use surrogate keys whenever there is a recursion. Allow the primary key to propagate to the child table in dependant relationships (UID bars for you Designer folks). We used to use codes as primary keys in reference tables but we have now gone to a generic structure that has replaced all reference tables.   Replace the traditional primary key with a surrogate key in chained dependencies to avoid 7 table primary keys.

If you are going to use surrogate keys, then you should model first using logical keys, then replace them with surrogate keys after the data model audit, much as you would treat a denormalization for performance reasons.

2

What is a "natural key" vs. a "surrogate key?" Is it better to use one or the other?

Natural Key - The logical primary key for the table

Surrogate key - A system generated numeric primary key (PK)

Surrogate keys are sometime called "soft" keys. 

 

There is debate as to which PK philosophy is best.  Some folks want to use only system  generated keys, others want to never use them. Others will use natural keys in some circumstances and surrogate keys in others. 

 

The desire to use one method over the other is based on some quite complex trade offs.

 

Natural keys:

  • are more logical

  • can sometimes can mean fewer joins

  • help to encourage good modeling

  • are traditional

  • make snooping around in the data easier

Surrogate keys:

  • are shorter

  • are easier to join to

  • take less storage

  • enable natural key fields to be easily changed

  • are what Object Oriented (and object relational) databases use

Ultimately there is no perfect solution. Each designer will have to make

his or her own decision.

3

Is there a comprehensive document about database normalization?

From Stephen Edelstein:

 

Normalization is a specific term that does not imply anything but a poor design at a certain point in the design process. Think of it with this adage:

The key, the whole key, and nothing but the key: 

  • In 1NF (first normal form) all of the columns depend on the primary key 

  • In 2NF, all of the columns depend on the whole (i.e. compound) primary key 

  • in 3NF all of the columns depend on nothing but the key, i.e. not on each other

This is relevant at the level of defining the attributes for an entity. Each attribute MUST conform to these 3 forms, or else they

are not normalized. 

From Paul Dorsey:

Normalization is a very specific, very formal definition of relation construction. The real definitions of normalization are almost unreadable by anyone outside of mathematics.

Note that in computer science, the "standard" database text is CJ Date's

which itself does not use formal definitions. If you design

databases for a living, then Stephen's definitions are probably fine. The

next level up in rigor would be any book called "Intro to Database" at your

college bookstore (like I said Date is the standard there). If you want to

get serious about this, there is a book written by Ulman

which uses formal mathematical definitions. 

 

Although it is a common perception that everything bad in a database is a denormalization, this is not necessarily the case.

4

For a field that is going to store True or False, what is the best data type?

1. Char(1) value(y,n)

or

2. number(1) value(1,0)

 

I prefer Char(1). The value is clear from the data. If you use a numeric, it is always a guess as to what values correspond to true and false.

If you need an unknown value, we use 'X'.

 

There are trivial performance/storage gains that can be achieved through

using a numeric value, but I think that what is lost in clarity does not

justify using a numeric value.

Someone suggested using a Boolean datatype, but that will not work in the DB. There is no Boolean datatype in any version of the database that I have access to.

5

  We are going to start modeling/developing a multi-language application. The application must run on the  web from several countries and we have decided to use Designer & Developer 6i. One requirement is to have the ability to select the language at application logon time.  We want to have just one copy of the application so, at runtime we have to be able to programmatically change item labels, titles, messages, etc. to the selected language.  Even our data must be multi-language, too.

 

Has anybody had experience doing something like this? Does anybody know about papers, articles about modeling/developing applications like this?

 

This is a very interesting problem.  It divides itself into 2 problems:

 

1) Application customization (multi lingual labels, help, error messages)

and

 

2) Multi-lingual data

 

For the application customization we built a repository to store application elements and their properties. Then we used the idea of a "label set" to specify the changes to the object properties.  Then you write an interface in your product (we did for forms) to read the repository at startup and modify itself.  Once you do this, you not only have multi-lingual customization but also the ability to have user preferences in your applications.  We built our repository to be product independent so we can have the same functionality when we move to Java or any other environment.

 

For data there are no easy alternatives.  Some people make multi-columns in the tables for each language, others make lots of child tables for each object to store the multi-lingual values.  We embedded the logic to support multi-lingual data in our business rules engine.  The designer specifies an attribute as "multi-lingual" and then the engine does the following:

 

1) When the table is built, each multi lingual attribute gets a column for each language named <attrib_name>_<lang_code> e.g. dname_eng would be the English column for dname in the dept table.

2)Then we generate a view that sits on top of this table that presents the correct language depending on active language for the user.  This way the developers build applications that they do not even know are multi-lingual as the "engine" takes care of the logic.

3) Multi-lingual inserts/modifications are handled via a simple built-in that displays the values for an attribute in a two column layout

LANGUAGE----VALUE.

6

How can I implement a Primary Key for Updateable Key ?

 

In Designer, Delete is the only option that I could apply.

 

If I force updating row from PL/SQL I got an error: ORA-02292: integrity constraint violated - child record found

Must I create a PL/SQL package for this?

 

An updatable PK in a relational DB is a pretty dangerous thing to do. Many people will tell you that it "can't" be done, which is not exactly correct.

 

The problem is that RDBMS's use the primary key value as the record pointer. This means that if you change the PK value you have to change all  of the records pointing to that PK value.  This could mean you would be required to update millions of records. Therefore, it also greatly increases your chances for deadlock.

 

To answer your question... you shouldn't do it.

If you need this functionality, create a synthetic primary key (numeric, system generated) that you will use for the physical primary key. Represent your logical primary key as a unique constraint on the table.

7

 Dave Freyer wrote:

 

We are an engineering organization, and as a longtime and devoted user of Designer and Developer, I have to say that the new 9i IDS is the most complete and integrated package available - that satisfies 90% of our requirements.  The remaining 10% is accomplished through Embarcadero which is used for our UML modeling. 

 

JDeveloper 9i is a great tool, HOWEVER, it falls short in its modeling implementation.  It would be great if Designer or JDeveloper were as enthusiastic about its adoption of UML modeling as Rational or Embarcadero. 

 

With respect to JDeveloper, class and activity diagrams are important, however to get there we do UML off line -- we should be able to create use cases, scenarios, collaboration, sequence, component, deployment, state, package diagrams in addition class and activity -- then we could all really 'rock'.

 

 

 

 

 

I want to make a sort of important point here.  There have been a number of posts mentioning UML in general and UML in JDeveloper in particular.  I'm doing a paper on this at ODTUG so I have been giving it a lot of thought.

 

The UML has 9 parts.  It was explicitly created to support OO programming projects, not relational DB projects.  The obvious hole is how to design your DB.  There is no "data modeling tool" in the UML.  Some vendors (including Dulcian) have used class diagrams to design databases.

 

JDeveloper uses class diagrams for 2 purposes:

  1) to model BC4J entity objects and associations  and

  2) to model Java "things" (classes, interfaces, etc.)

 

JDeveloper does not particularly support data modeling yet, though you can see the beginnings there.

 

The activity modeler in JDeveloper is likewise used to model a specific kind of activity for web application design.

 

In neither case are the UML utilities in JDeveloper particularly suited for high level logical analysis, although they can be used for this purpose.

 

OO people largely use UML for analysis as a way to capture the logical requirements of the system and then use it as a road map to manually build their systems.  As a side effect, I find that when I examine UML class models that look (superficially anyway) like data models, they look "sloppy" to me.  But all a class model "generates" for an OO person is Java class (think PL/SQL procedure) declarations. You get a named Java class for the UML class and the attributes turn into variable declarations. You get to write the rest of the system by hand. If the class model is not exactly "right" from a relational perspective, then you can fix it in the way in which you write your code, in much the same way that you can work around a bad data model by working harder in your application logic.

 

This is not a "business rule" based approach.  There is not enough depth in the UML to support full system specification.  This is the direction we went at Dulcian, and we had to make big, significant extensions to UML to pull it off.

 

The point is that there are 2 major ways to use the UML:

1) As an analysis tool

In this case, you are essentially creating an analysis document.  You are using an OO structure to capture your requirements.  What the specific elements in UML mean exactly is up to you. You must then decide how to implement them.

 

2) As a code generator

Various portions of the UML (usually class and activity diagrams) can be used to actually generate real "stuff."  Different vendors have used the diagrams to generate different kinds of objects.  In this case, you need to completely understand how the vendors are generating "stuff" so that you can design your models correctly. You can't use your intuitive understanding of "composition" to decide when to use it, any more than you could use your intuitive understanding of a "dependent" relationship to know when to choose to use a UID bar in Designer.

 

We always blurred this distinction in Designer.  We drew "logical" ER diagrams, and then generated to a relational DB. But I think it is smart to blur the distinction.  By having an implementation algorithm for how to translate an ERD into a database, we applied a certain rigor to our modeling that the OO people are still searching for.

 

----

JDeveloper will eventually catch up with Rose or Embarcadero for analysis, but I don't think we will get there for a while.  Designer went its whole life without really supporting the early steps in the life cycle design very well.  It was very good at designing the system when you got closer to implementation, but its sweet spot was not supporting strategy or creating a good analysis document.

 

I am much more interested in seeing JDeveloper get really good at data modeling and system generation.  I suspect that we will see great strides in this area in the next year.

 

I do NOT want the JDeveloper team to divert energy to give me a way to enter a use case or scenario.  There are lots of tools on the market that do that well. I would be happy if they left that one alone for a while.

 

So, for your logical OO methodology, use Rose or Embarcadero. When you are ready to build a system in an Oracle environment, use JDeveloper. For a business rules based approach,... call me.

 

8

Can UML be used for data modeling?

 

UML class diagrams do not explicitly support data modeling.  They can be extended to do so. Dulcian's BRIM® product does this.  I am not sure that anyone else credibly plays in that space yet.

 

There is talk about the UML standard being enhanced to explicitly support data modeling.  I am not aware of this being done yet.

 

Oracle JDeveloper will support UML class diagrams for data modeling in the next year or so; but it is not there yet.  You can get an idea of the directions they are thinking by looking at the way in which the product generates tables from BC4J entity objects.

 

UML class diagrams will not be included in Designer.

 

I wrote a paper about UML data modeling for DM Review online.  One link to it is at http://www.datawarehouse.com/article/?articleid=3040

I guess they also put on their data warehouse SIG page.

 

The book I wrote in 1998 Database Design Using UML Object Modeling (Dorsey & Hudicka, Oracle Press) is an in-depth (though now dated) treatment of the topic.

 

9

John Flack wrote:
I have an application system definition where I did a System Design capture and then a Table-to-Entity retrofit.  This system has three intersection tables, each implementing a many-to-many relationship between two tables. The intersection tables have, of course, been retrofitted to three entities, each with relationships to the two entities that they connect.
 
 I now want to rename the relationships, as I consider relationship names to be very important to understanding and validating the design.  I want to be able to say to a user:
 

 Is it true that "Each entity_name must be/may be relationship_name "one and
 only one"/'one or more" other_entity_name"?
 

And I want to be able to do this in both directions as in:

·       Each "person" must be "located at" one or more "addresses."

·       Each "address" must be "the location of" one and only one "person".
 

But in this case, the entity in between the two other entities represents the relationship between them - in other words it isn't a real entity. Suppose an "address" can actually belong to more than one "person", and we have an entity called "person address" to implement this, because we have an attribute, "address type" to tell if this is a business or home address. The same address might be Mary's home, but John's office and his home. 

 

What should I call the relationships between "person" and "person address" and between "address" and "person address" so that the Each statements work?

 

Martin wrote:

 

I would argue that, at any case, an additional conceptual model should be created to be used for communication with business people. In the conceptual model the intersection entities are removed and replaced by many to many relationships. This must be separated from the other logical model (different containers).
 

-------------------------------------------------------------------------------------------------------

 

Ken Atkins wrote:

 

Yeah!!  Someone else is advocating *true* conceptual modeling!!!   I've found it very effective for years, but not many people want to do it, maybe because no tools really support it (for instance, there is no "live" connection between your conceptual model in one container to your logical model in another).

 

In the conceptual model, you leave in the many-to-many relationships, and if you actually need information in the cross table, then it *is* an entity (otherwise why do you need information about it?).  If it *is* an entity, you need to name it and give the relationships meaningful names.

 

-----

 

When I said "not many tools support it", I meant that not many tools (none that I know of) support having three levels of models (conceptual, logical, and physical) that are all connected.   You can do both conceptual and logical ERD models in Designer, but there is no easy way to maintain a "live" connection (like you can between logical and physical), which makes it more work to maintain a conceptual model going forward.

 

 ---------------------------------

 

 

Paul Dorsey wrote:

 

I think the problem is broader than “no tools support it”.  I have thought long and hard about this, being in a position to implement it in BRIM®, I concluded that the problem has no nice solution. And if we provided the capability, few would use it.

 

Consider the problem of keeping just two models (logical and physical) in synch.  You would need to somehow decide what elements belonged in the logical model and what elements belonged in the physical model.  I think we are already in trouble here.  But let’s say for a moment that you can come up with some decision that would allow you to know which elements belonged on each side. You would need a way to have each logical model element “inherited” down to the physical.  If that element changes in any way on the physical side (for implementation reasons), you would want to preserve those changes if the logical model changed.

 

Of course, some changes between logical and physical are trivial (element name changes, data types, etc.), but others require complex transformations (additional entities, generic structures replacing concrete ones).  The logic to keep all this straight is daunting. Then when you are all finished, you have a tool that is so complex that the designers can barely understand it, and less than 3% of the user community will understand it. We have some empirical precedent here.  Consider the percentage of Oracle shops that achieved 100% generation of Forms (and what percent of those spent more doing it than if they had coded by hand?). Consider the percentage of people who use SCM (after it finally worked).  It is possible to build features that finally outstrip our ability to economically use them.  I think we may even be running the risk of that in the J2EE environment.  I wonder what J2EE is doing to the percentage of failed projects.  I do not think it is helping the statistic.

 

Now, couple that with the fact that the number of shops that are interested in keeping multi-models is pretty small.  A few years back at ODTUG we had a modeling Lunch Bunch. About one third of the people there used the two-model approach. Two-thirds of the people there had abandoned the multi-model approach and lived with a single model.  Keep in mind that these people were the Oracle modeling elite. 

 

But really, the most compelling argument against multiple models is that I can see little value in maintaining them. Even if the perfect solution could be found for tool support, there will always be a cost associated with maintaining multiple models.  We also need to live with the increased risk that more models probably means more models getting out of synch. 

 

What do we really gain by using multiple models? Arguably, we get a little better ability to communicate with lower level users.  I am not impressed by that argument.  We can pretend that users read ERDs or UML class diagrams, but the reality is that the models are merely an aid to communication.  It is our skill in explaining the models that is the key to success in having users understand the model.  If you put an enterprise ERD in front of your average user, it doesn’t matter whether or not it is a conceptual model, a logical model, or a physical model, it will be lovely wall paper without someone to walk through the model with them. 

 

When I abandoned the multi-model approach years ago, I described it this way.  I believe that you can have a data model that fairly represents the logical requirements of the system and can be directly implemented.  As we move through our understanding of the system, from conceptual, through logical to physical, the model will evolve.  We will continue to gather requirements and deepen our understanding of the system.  However, rather than having to ask the question: “Did this change our conceptual or logical and or physical understanding of the system?” and then having to decide what models should be undated with every little change, I live with one model and allow it to evolve as the design process progresses.  Is this a perfect solution? Probably not. But it is certainly “good enough”.  In engineering we are taught “good enough is best”, perhaps that is not a bad philosophy here.

 

 

P.S.

 

Actually, there is a happy alternative for people who want to support multiple models.  Just do it.  I am sure most products (including Designer and BRIM®) support a full and open API.  You have the ability to make model synchronization as easy as you want. I am willing to commit to the fact that if you need a repository change to support your vision, I’ll make the change. But I think BRIM® already has the hooks to support you when we added a clone_id to support our vision of configuration management.

 

   What is the difference between a many-to-many relationship in ERDs and UML?

Jeff Jacobs said:

Good modeling practice says that all M:M should be resolved before generating database schema.  Association entities created by the modeler from M:M need to have good names, and in almost all situations there is in fact corresponding business terminology.

If you want to be sloppy, Designer will create an association table if you failed to resolve an M:M in your ERD and apply the transformer. The PK and FK columns will be filled in.

--------------------------------------------------------------------- 

Someone replied:

Hugely disagree...

M:M typically hide a great deal of information, which of course can be useful at times. But the resulting association entities should be carefully explored for additional business-related information.  This is why naming is so important.  My typical classroom examples start with the M:M: "each EMPLOYEE may be assigned to one or more PROJECTs", Each PROJECT may be
worked on by one or more EMLOYEEs".

Inexperienced modelers will produce a resulting association entity called "EMP/PROJ", and everything will die there. But an experienced modeler will name it "ASSIGNMENT".  Reviewing the needs for assignment with the users will typically result in it becoming "WEEKLY ASSIGNMENT", and uncovering additional attributes, such as HOURS BILLED.

Every resolution of an M:M should result in an iterative review and analysis of relevant areas in the model.

I almost wish we would discard the terminology "association entity".  It's only meaningful when talking about resolving an M:M; after that, it should be treated like any other entity.

I agree that the best practice, AS THE ERD IS CURRENTLY DESIGNED is as you say –to  explicitly create the association entities.  However, association entities are not, IMHO, actual entities, and really shouldn't be drawn as such.  They are simply there to give us a place to hang attributes of a relationship, and to associate an intersection table where the logical design meets the physical design.  I'd like to have the ability to draw the relationship as a many-to-many, add attributes to the relationship, and have the database design transformer generate the intersection table definition properly with the link back to the relationship.  As a bonus, I'd like the Table to Entity retrofit recognize intersection tables and retrofit them to many-to-many relationships, not association entities or maybe have it selectable as a preference.

--------------------------------------------------------------------

Martin Persson said:

I do agree with this, this is an excellent contribution to this topic - and I have seen these lazy "intersection entities" just combining the names from A to B with A_B alot, which says nothing. UML has one advantage here where you can put attributes to the association, as the association can be given three names: two "roles" and the association itself. Your example would be solved by "Employee assigned to" (role) and a "Project may be worked on by" (role) and the association itself  will be "Assignment," which can have attributes.

 

By putting in an intersection entity called Assignment, we run into the problem where this topic "naming relationships" all started. Is an Employee assigned to an assignment only?
Is a project worked on by an assignment only? No - we IS/IT people can easily read the model and understand that these names go "above" the assignment, and of course a few inexperienced people can understand this too. For clarity, we always need to consider the naming. There must be names from the assignment to the employee and the project too. Here the lazy way is to just put a dash or dot or something. How is it clear to the model reader that the intersection entity cannot stand alone? (an inexperienced reader may not directly understand the ERD syntax that the relationship builds the primary unique identifier) and that this is not just another entity that happens to have a m:1 relationship to two different other entities.

I would say that UML shows the association much more clearly. The Employee may be assigned to (zero) or many projects - by an assignment (the name of the association). A project may be worked on by (1) or many projects - (and there is an assignment for each.) So, how do we give good relationship names in this situation in an ERD - so the reader of the model understand that the relationship is from A to B but in the model there is a intersection "A/B" (even with a good name) between? The situation is not always as obvious as employee and project.

-----------------------------------------------------------------------------------------------

 

Paul Dorsey said:

 

I want to point out that UML is not a panacea either. A many-many with an association class in UML class diagrams is a little different than the same construct in ERDs. 

 

UML is actually a bit more restrictive.  If you say:

                C

                 |

                 |

A ------------------------- B

     *                    *

 

The primary key of C MUST be A_OID, B_OID.  i.e. duplicates are not allowed.

 

Since we frequently want to have duplicates (to support history if nothing else), this construct is frequently too restrictive for our purposes.

 

I went around and around on this issue when I started doing real UML models with BRIM®.  I eventually came up with 2 acceptable solutions:

 

  1. The first was to stereotype the association with <<duplicates allowed>>.  UML explicitly allows extensions like this.  By placing the stereotype on the association, I effectively (and radically) redefined what is meant by the UML many-to-many.  This solution is a little unorthodox. 

  2. I finally came up with what I believe is a better solution. Now I put a {history} keyword on the C class.  This automatically adds a StartDate and EndDate attribute and then adds engine-level logical deletion of the records without their removal from the DB.  So the rule becomes: “there can only be one ACTIVE C for each A,B pair.”  This is, of course, not as flexible as the first solution, but works far more often than you might think.  In BRIM® we use the {history} but dropped <<duplicates allowed>> on the association. 

Oracle Designer FAQs
Oracle Developer FAQs
Oracle RDBMS FAQs
PL/SQL FAQs
Data Modeling FAQs
JDeveloper FAQs
BRIM® FAQs
Business Rules FAQs

Oracle Consulting Services | Application Development | Custom Training | Data Migration | Data Warehousing

  About Dulcian | Papers & Presentations | Related Links

Publications | Employment Opportunities | Products | Home | Contact Us

©2000 Dulcian, Inc.