DataMIGä – A Data Migration Management Tool Suite
Joseph R. Hudicka
Dulcian, Inc.
June 9, 1998
Abstract
This paper describes DataMIGä – a sophisticated product capable of handling arbitrarily complex maps, supporting data migration for legacy systems and data warehouses. DataMIGä is a complete Data Migration Management Tool Suite. DataMIGä offers the following key features:
The tool uses a point and click GUI interface built in Developer/2000. The back-end, which actually generates the scripts and triggers, is written in PL/SQL. All maps are stored within a Designer/2000 style repository, capable of storing arbitrarily complex data maps.
Current Environment
Within our company, we have built very sophisticated templates to support the Designer 2000/Developer/2000 product suites. Currently, these products give us a nearly ideal environment for the design and development of applications. They allow is to design and build more quickly than we could have thought possible even a year ago. The coming advances in Designer 2000 2.0, Developer 2000 2.0 and Oracle8 will allow us to achieve even greater productivity.
This increase in productivity enables most portions of the SDLC to be accomplished easily and quickly. In addition, the systems that we build are designed flexibly so that they can be relatively easily modified as business requirements change. The increased productivity in design and development has brought the issue of data migration more clearly into focus.
Prior to the development of DataMIGä , a standard systems development project required spending approximately 50% of the time doing nothing but data migration. Data migration was the single largest component of any development project.
For data warehouses, the problem of data migration is even more acute. In one case, a data warehouse project spent 80% of the total budget on the data migration script. In these types of projects, not only do one-time migrations need to be accommodated, but periodic refreshes of the warehouse are also required. These periodic migration scripts require much more careful construction and tuning. Such scripts must also be maintained over time as the warehouse and legacy systems evolve. A similar problem exists in OLTP systems with data feeds that are external to the organization. These scripts, which must serve as the foundation for EDI applications between organizations, must also be tuned and maintained for periodic data feeds.
Classes of Data Migration
There are two distinct classes of data migrations:
Any data migration requires a variety of mapping strategies. Migrations may include any or all of the following: one-to-one table mappings, aggregated summary tables based upon conditional logic and update/purge logic for periodic loads. All of these formats, and a variety of others, must be supported for a migration to succeed.
DataMIGä Background
We encountered a project with a very complex data migration component from a source outside the organization. We recognized that simply writing the data migration script in PL/SQL would have left the organization with an unmaintainable program, several hundred pages in length. It was therefore essential that we use some type of data migration tool where data map specifications could be stored in the repository and an efficient user interface was present to generate and maintain those data maps.
Data Migration Tools
In looking for a product, we set forth a number of criteria for our search:
Over the past year, several product offerings have come to market that enable analysts to map legacy systems to future production systems, and, in turn, generate the code with which to perform the data migration.
Our vision was to have an integrated repository where the data structure definitions for all of an organization’s systems including the mappings between legacy and future systems can be stored and maintained. This same repository should generate SQL and PL/SQL code to perform data migrations since these languages are native to Oracle.
Needless to say, the tools on the market fell far short of our expectations. Several flaws were apparent:
Therefore, to satisfy the needs of the project, we decided to build our own tool, which would meet all of our search criteria. DataMIGä was designed to incorporate all of the following aspects of a robust data migration utility:
The Structure and Functions of DataMIGä
To support complex data maps, we needed a rich mapping environment. The traditional construct of mapping tables and columns from a legacy system to tables and columns in a new system or data warehouse is not nearly sufficient to support a data migration project. We need to support the following:
Each map takes as its source any of these objects or already migrated destination tables. Then, the map writes to a destination table. Finally, maps need to be able to be called as sub-maps. This allows relatively simple maps to be combined into a single, complex map. This structure is illustrated in Figure 1.

Figure 1 – DataMIGä model
The following are examples of the complexity that DataMIGä is capable of supporting.
The source system has two columns for zip code, ZIP and ZIP4. We want to combine them into one ZIP column in the destination system. We can create the following formula:
SOURCE.ZIP || ‘-‘ || SOURCE.ZIP4
You select the source columns from an LOV, which shows all columns for the source tables for the current map.
The source system has one-name columns with the format LAST NAME, FIRST NAME. The new system has LNAME and FNAME columns. We can perform this by building the following formulas:
Last Name
SUBSTR(EMP.NAME,1,INSTR(EMP.NAME,’,’,1,1)-1);
First Name
SUBSTR(EMP.NAME,INSTR(EMP,NAME,’,’,1,1)+1);
The old system has a status field that uses "A" for active and "I" for inactive. The new system has an ACTIVE_YN field that should be set to Y for active and N for inactive. This can be performed by building the following formula:
DECODE(SOURCE.STATUS,’A’,’Y’,’N’);
For an accounting system, in order to perform a complex operation on a value before copying it to the new system, put this computation into a function at the database level. Next, enter the syntax for the function call into the system. Then, use the function wherever you need to.
We have provided this functionality to support conditions where the standard GUI interface does not support the necessary complexity. In practice, this feature is rarely used. Our experience has been that 95-100% of maps in a data migration project are supported through the basic functionality within DataMIGä .
To ensure that the old key is copied into the new table, after the master table has been populated, you can link the master destination table with the source table using the old primary key. You can then store the new primary key in the detail table.
For an accounting system, four different tables were holding accounting information for one record. We wanted to combine these into one table in the new system. DataMIG can support any number of joins with any filter clauses necessary.
In the old system, we had client name, address, and phone number information on one record in one table. In the new system, we wanted to create one record in the client, address, and phone tables. We created a master map that inserted the client information into the new client table. We then created two child maps, one for the address and one for the phone. To populate the Client ID foreign key for these tables, we used the client sequence number. Since DataMIGä recognized that the name of the sequence number was not the name of the table, CURRVAL instead of NEXTVAL was used for the sequence number.
We only want to create a record in the address table if the address was filled in. We can create conditional logic such as the following:
IF source.addr is not null THEN
CALL_MAP(‘ADDR_CLIENT’);
END IF;
Some legacy systems do not have lookup tables. We can populate them in one of two ways:
DataMIGä can build a script that will check to see if the data exists. If it does, update the record, otherwise, insert a new record.
Data Transformation and Cleansing
In order to be useful, all migration tools must handle data transformation and cleansing. Data transformation is the process by which we massage the format of source data into that of the destination data. A good example of transformation would be taking a source column which stores DATE information in a textual format, and changing it to one of Oracle's proprietary data formats.
Cleansing on the other hand is the process of cleaning invalid data during its migration to the destination. This is where we might change those STATE = '.' values to something like "NA" for "NOT AVAILABLE."
Data transformation and cleansing are implemented in DataMIG through functions and conditional logic.
Script Generation Languages
DataMIG supports generation in 2 Oracle native formats.
SQL
SQL is useful for two very common migration needs: one-to-one mappings, and aggregations. One-to-one mappings are the kind where a particular source file, i.e. Departments, is used to populate one, and only one, destination table, i.e. DEPT. In this case, there is no conditional logic to apply, thus eliminating the need for a PL/SQL solution. Aggregates are summary tables that snapshot a data set at a level higher than the atomic level. While parallel querying and mega-processor servers are becoming more and more abundant, aggregates now serve reporting needs in a cost-effective manner. Aggregates rarely depend upon conditional logic, and are easily implemented through a SQL solution.
PL/SQL
Periodic loads require more than simple "INSERT INTO" syntax. "Updates" and, at times, "Deletes" are necessary when refreshing a warehouse instance.
It should be clear, however, that one set of code can and should accomplish both the initial migration and the periodic refreshes. This is where PL/SQL is extremely useful.
PL/SQL's conditional logic provides us with the capability to determine whether or not a particular row already exists in the destination, and conclude whether an Update or Insert is appropriate, row by row.
Parallelism
DataMIG supports a full network structure of tasks. Every time a task is completed, the available task list is searched and every appropriate task is started. The true objective of any data migration is to seamlessly integrate every source element into a new database instance. All too often, however, the focus tends to be on "How quickly can we migrate this data set?" DataMIGä offers two methods for the execution of your migration scripts: Sequential and Parallel
A. Sequential Migration
In a sequential data load, scripts are executed based upon relational dependencies between tables (i.e. lookup tables are populated before tables that refer to them, etc.). Scripts are executed one at a time, until all scripts have been processed.
For example, I have three tables, PRODUCT, DEPT and EMP, where EMP has one foreign key, to DEPT. By the sequential method, we would load these tables in the order of PRODUCT, DEPT and EMP, one after the other, as a result of EMP's dependency upon DEPT (see Figure 2), realizing a total migration time of seventy minutes.

Figure 2 - Sequential Data Migration
B. Parallel Migration
In Parallel mode, DataMIGä analyzes the dependencies between tables in further detail, and executes scripts the moment that every dependant table is populated.
The parallel scheduling engine realizes that PRODUCT has no dependencies to any other table, and would therefore execute the PRODUCT map simultaneously with the DEPT and EMP maps, which do have a dependency between them (see Figure 3). The resulting total migration time is now reduced to forty minutes, a savings of 30 minutes overall, or better than 40% of the total migration!

Figure 3 - Parallel Data Migration
As you can see, the Parallel mode provides a clean method to drastically reduce the overall processing time of any data migration.
C. Constraint/Index Management Utility
Constraints and Indexes cause sub-optimal performance of migration scripts. Every record inserted into a table also requires updates to all existing indexes. DataMIGä offers a point and click interface to seamlessly manage constraints and indexes during migration process.
D. Automated Error and Integrity Violation Tracking
The parallel scheduling engine offers a compiler to ensure that mappings are syntactically correct prior to their execution. This step alone eliminates 90% of most migration script failures, prior to running them. From here, the scheduling engine tracks any and all execution errors electronically.
Integrity Violation tracking can be performed during execution or after. The most productive method of tracking integrity violations is after the maps have been executed. This way, every integrity violation will be recorded. If you choose to track these violations during execution, you will only identify the first violation, as it causes an error, forcing the map to stop running.
Comprehensive Reporting
The completeness of migration scripts is very difficult to identify. Migration itself is iterative in nature. DataMIGä offers a series of quality assurance reports that help determine the state of the migration project. Reports are available by map, across maps for impact analysis, impact analysis between Designer/2000 and DataMIGä , and audit reports that identify both SOURCE and TARGET columns that have not been mapped. Another report drills into the nature of your source data by providing counts of total records, total values per column and total unique values per column, highlighting columns that require special transformation rules, or may not require mapping, at all.
Conclusions
Data migration is becoming an increasingly important part of the overall systems development process. DataMIGä can simplify and expedite this process by doing the following:
DataMIGä is a sophisticated product, capable of handling arbitrarily complex maps, supporting data migration for legacy systems and data warehouses. It will also generate triggers to keep any two heterogeneous Oracle databases in synch.