Oracle 9i Data Warehouse Review

 

Robert F. Edwards

Dulcian, Inc.

Overview

The Oracle9i Server is the first major release of the database since Oracle 8, and 8i, and contains several major new features, such as Real Application Clusters (Cache Fusion), and e-Business and Business Intelligence platforms.  This review will focus on the features of the Business Intelligence platform, and the new version release 3i of Oracle Warehouse Builder, Oracle’s warehouse design and ETL (extraction, transformation and loading) tool. 

 

The following data warehousing features are introduced on Oracle9i, and will be discussed in this paper:

·         OLAP Server

·         Additional analytical SQL functions

·         ETL infrastructure

·         Data Mining.

 

Oracle Warehouse Builder (OWB), although it is a separate product from the 9i server, is a key tool in the design and deployment of data warehouse applications, including data migration (ETL), mapping and code generation.  Key new features of Warehouse Builder 3i are the change in the mapping architecture, associated new mapping features, and other major enhancements.

 

True Data Warehouse Platform

Since Release 7.3, Oracle has been building data warehousing support into the database server with such features as bit-mapped indexing, optimizer recognition of star-joins, partitioning, ROLLUP, CUBE and the first of the analytical SQL functions.  These and many additional significant capabilities have been added to releases 8 and 8i, to support the basic core requirements of performance, scalability and manageability that Oracle believes are critical to the success of a data warehouse.  The additions to Oracle 9i support the core requirements and beyond with integrated OLAP, ETL, Data Mining, and added functions to become a true data warehouse platform.

 

A data warehouse, data marts, analytical OLAP services, data mining, along with ETL and analytical SQL functions, can all be resident on one or more instances of Oracle9i Server.  Furthermore, the 9i server itself is a core component of Oracle’s E-Business Intelligence platform that consists of Warehouse Builder interfacing the application data on the input side of the server, and Oracle9i Application Server, with Discoverer and Reports, interfacing the server on the output, or user access side.  Interfacing with Warehouse Builder and 9i Application Server, is the Oracle Repository, resident on Oracle9i, maintaining the metadata for the data warehouse.  The repository meets the industry accepted Common Warehouse Metamodel (CWM) standard for enabling tools to communicate with each other.

 

Data Warehousing Core Requirements

Oracle established three core requirements for success of the data warehouse:

·         Performance

·         Scalability

·         Manageability

 

The most important requirement of all, performance, is measured by the response of the warehouse in returning a result set to a user in response to that user’s query - in other words - query response time.  Scalability is the ability of the warehouse to support growth in both data volume and new users accessing the data warehouse.  Over time, the data volume will grow to terabyte size, and new users will clamber to get at all the new data.  The third requirement, manageability, is to ensure the data warehouse, regardless of data and user growth, will continue to be simple to maintain, without additional DBA or other administrator resources.

 

Performance Enhancements

Oracle adds the following enhancements to improve the performance of Oracle9i and the warehouse:

·         Bitmap Join Indexes – index structure spans multiple tables, to favor index access over table access

·         Enhanced Materialized Views – enhanced query rewrite to support a broader class of queries, and fast, incremental refreshes to quickly update a wider variety of materialized views

·         Full Outer Joins supported – ANSI join syntax; restrictions removed from the one-sided outer joins

·         SQL With Clause –improves performance of complex queries, including sub-queries used in multiple places

·         Adaptive Direct I/O Operations – dynamic adjustment of number of I/O buffers for efficient data transfer

·         Automatic Memory Tuning – dynamic adjustment of runtime memory allocation for query execution

 

Scalability Enhancements

The following enhancements improve server and warehouse scalability:

·         List Partitioning – permits lists of discrete values, rather than ranges of values, to provide precise control of placement of data into table partitions; for example, a list of country names for partitions based on continents

·         Parallel Queries – provides for a finer grain of breakdown of queries for distribution among processors

Dynamic load balancing is enhanced, and inter-node parallel query is improved for clustered and massively parallel processor (MPP) platforms.

 

Manageability Enhancements

Manageability enhancements provide for better control of resources, to support a growing workload resulting from additional users running queries on the data warehouse, and ensure that:

·         Appropriate resources are allocated to queries

·         Throughput of the entire data warehouse platform is maximized

·         Warehouse Administrators and users can view the status of ongoing jobs

·         DBAs can specify automatic abort and queue of queries running outside established parameters, to maintain an optimal system load

 

The Database Resource Manager (introduced in Oracle8i) controls the allocation of resources provided to Resource Consumer Groups (RCG), which are assigned among the multiple populations of warehouse end-users.  The RCGs are specified by the DBA, who controls how resources are allocated to each group.  8i provided control for the CPU time allocated to each RCG.  Oracle9i adds the following additional controls on RCGs:

·         Limit the number of sessions

·         Govern the running time of queries

·         Change of the RCG itself, based on DBA specified criteria, if parameters limits are exceeded

 

Additional enhancements supporting manageability requirements are:

·         Updates to the DBMS_STATS built-in package, to improve statistics gathering

·         Execution plans are now placed in cache, to permit viewing the plan while queries are executing

·         Summary Advisor – makes recommendations to the DBA, based on schema characteristics and workload history

·         Query Rewrite facility has capability to force queries to use materialized views, or be aborted

 

OLAP Services

OLAP, or on-line analytical processing, is a high performance calculation engine, designed for providing fast response times to an analyst’s queries.  OLAP Services, like its predecessor, Oracle Express, is a specially designed database that meets the requirements of a business intelligence analysis tool.  It provides aggregated (summary) and customized data structures that can rapidly respond to complex analytical queries based on multiple dimensions. 

 

Oracle Express, a traditional OLAP server, is a stand-alone, proprietary multidimensional database engine, complete with its own data store, metadata, ETL, administration and support facilities.  In the past, the high costs of running a separate database server to support the needs of business intelligence was justified, because servers such as Express had the high performance and complex analytical capability lacking in relational databases.  This is no longer the case.

 

Oracle9i OLAP Server converges Oracle Express technology and the Oracle database into an integrated relational OLAP engine.  All data resides on the relational database.  Most queries are satisfied by new analytical SQL functions, and the remainder is handled with functions written to the Java API interface also introduced with 9i Server.  Oracle Enterprise Manager handles all administration tasks for OLAP Services, just as it does for the relational database. 

 

Oracle9i OLAP is replacing Oracle Express, although the nearly 30 year-old analytical database will continue to be supported for several years to come.  Since Oracle 7.3, Oracle has added significant upgrades to the database, to support large data warehouses and the analysis requirements of business intelligence applications.  A couple of these enhancements are analytical SQL functions and Java OLAP API, which combine to support the complex analytical calculations typically found on multidimensional servers.

 

Business Intelligence Applications

Oracle9i OLAP supports all four categories of business intelligence applications:

·         Reporting

·         Ad-hoc query and Reporting

·         Multidimensional analysis

·         Planning

 

Reporting applications generally provide static or parameterized reports, which can include large volume canned batch reports.  These reports typically have minimal analytical requirements, and are based on relational databases, using SQL. 

 

Ad-hoc query applications offer the user a high level of interaction, allowing them to explore data using a variety of data access techniques.  Ad-hoc reporting also uses a relational database for its data, often with a multidimensional data model, and offer very useful analytic functions.  Ad-hoc applications are usually the most sophisticated reporting applications used directly against a data warehouse. 

 

Analytical applications offer the capability to answer complex questions as well as perform ad-hoc data exploration.  Queries may be multidimensional, with nested rankings along separate dimensions, and complex calculations on yet another dimension.  Most analytical applications use stand-alone analytical databases, such as Oracle Express, to support their complex multidimensional queries. 

 

Planning applications are used to predict outcomes or results, and are very different from query and reporting applications.  Planning applications generate data using models, forecasts, special aggregation and allocation methods, and scenario management.  Corporate budgeting and financial analysis are common examples of planning applications.  These applications can analyze past performance and profitability, build revenue and spending plans, and model the effects of change on the financial plan.  Demand planning applications assist in predicting market demand.

 

New Trends in Business Intelligence

·         Collaborative analysis and planning

·         Convergence of operational and business intelligence applications

 

Collaborative analysis is the ability to support widely geographically distributed users, with a central data store, shared user defined analytical objects, a convenient means for consolidating data, and deployment on the Internet. 

 

Analytical and planning applications and operational applications are typically separated from each other, on analytical and relational databases, respectively, that provide the most appropriate environment for the application.  For example, accounting systems reside on relational databases but do not provide analytical tools.  Oracle9i provides a database platform that supports both operational and analytical applications.  This will permit the operational and analytical systems to converge into a single, integrated application with analytical tools making it more functional and less costly.

 

Architecture

Oracle’s new OLAP server is now integrated with the relational database.  The relational database, not a proprietary multidimensional array structure, is used to store OLAP data.  The relational database provides scalability for very large data volumes, terabytes, which is difficult for most multidimensional databases.  Overhead is also lower for the relational database, which supports faster loads. 

 

Administration and Support

The Oracle Enterprise Manager handles all administration and management, including data modeling, high availability and security for OLAP server, as well as for the database.  These services include instance management, session management, configuration management, data modeling, performance monitoring, and job scheduling.  Data modeling is managed via the OEM OLAP tool.  The multidimensional elements defined by the OLAP tool are cubes, facts, dimensions, hierarchies, levels, and attributes, which are mapped to the data warehouse fact and dimension tables. 

 

High availability capability supports the demands made on large, corporate-level OLAP applications.  A significant high availability feature is partitioning, which allows administrators to take small portions of the OLAP data set off-line for maintenance, while keeping most of the data on-line and accessible to users.  Backup and recovery technology introduced in Oracle8 is enhanced to provide improved and faster recovery for important analytical applications.

 

Scalability and Performance

The success of Oracle9i OLAP is its capability to perform at or above the response speeds for stand-alone, multidimensional analytical databases, with very large data sets, and to efficiently manage aggregated data.  Traditional analytic databases have faster response times than Oracle9i OLAP for small data volumes (under 250 GB), but the response times for both are in the sub-second range, so end users will be satisfied with both technologies.  For large data sets (over 250 GB) 9i OLAP gains a significant advantage over the competition, as 9i OLAP’s response time slows marginally, and the latter’s response time slows dramatically, as the data set size grows.  This is because Oracle builds an optimum execution plan that pays dividends for complex queries and large data sets.

 

SQL for OLAP

Oracle provided a support foundation for OLAP in Oracle8i, with star join transformations, bitmap indexes, parallel query, function indexes and some OLAP functions, such as CUBE and ROLLUP.  Oracle9i introduces the following additional performance features used by OLAP, such as bitmap join indexes, grouping sets, WITH clause, and automatic memory tuning.  An overview of these features appears in the Performance Enhancement and Analytical SQL sections of this paper.

 

Java OLAP API

The Java OLAP API provides an object-oriented environment for building complex analytical queries.  A full range of functions is provided by the Java API to support analytical applications.  Support is also provided for asymmetrical queries, like nested rankings, and the calculation of dimension members and measures (facts).  The Java OLAP API insulates the developer from SQL by offering OLAP specific functionality.  An OLAP SQL generator builds the SQL necessary to perform the very complex multidimensional queries. 

 

The Java OLAP API also interfaces Oracle9i with a metadata provider (data model), a query processor (execution plan) and a multidimensional engine (analytic workspace).  The query processor determines the execution plan for the query, and chooses between the SQL generator and the multidimensional engine for accessing data.  The multidimensional engine manages the data in special external data files called analytic workspaces, which are used to perform certain predictive analytic calculations.  The metadata provider provides data model and source information to the API, for both the OLAP catalog and analytic workspace.    

 

Development Environment

Business Intelligence (BI) Beans are analytically aware building blocks designed for building sophisticated business intelligence applications.  BI Beans are Java beans that can provide conventional tabular, multidimensional and graphic views of data, build queries with a multidimensional context, or define customized measures (facts). 

 

JDeveloper 9i, totally re-written in a new release itself, is the primary development tool for building business intelligence applications around Oracle9i OLAP.  Analytical objects built in JDeveloper, using the BI Beans, are stored in the Beans Repository, on the database.

 

OLAP Migration Plan

Oracle is formed OLAP Services by integrating Oracle Express technology with the Oracle database.  Oracle will add all new features, including a new analytical and planning application, to 9i OLAP Services.  Oracle Express, the Financial and Sales Analyzers, and the Planning application, are now in maintenance mode.  All products will receive support for several years, and will receive enhancements to stay compatible with the 9i database.  Express will be able to interface with 9i OLAP Server, but the Financial, Sales and Planning applications will only continue to support Express, not the new OLAP services.  Migration tools will be available to move data and applications from Express to 9i OLAP Server.

 

Analytical SQL

Analytical SQL functions were introduced as early as Oracle8i, with a large group of functions added in 8i, and another batch in 9i.  The analytical SQL supports OLAP with aggregation and grouping functions, and data mining, with sampling and statistics capabilities.  Most of the new SQL functions have been added to the latest ANSI SQL standard.

 

The analytical functions added in Oracle8i include:

·         Ranking – “Find the top 5 sales reps in each sales office, ordered by sales”

·         Moving-window aggregates – cumulative sums and moving averages

·         Period over period comparisons – “What is the sales growth from December 2000 over December 1999?”

·         Ratio-to-report – “What is December’s sales, as a percentage of the entire year’s sales?”

·         Statistical functions – linear regression and correlations

 

Functions added in 9i are:

·         Inverse percentiles – find data which corresponds to a specified percentile value

·         Hypothetical rank and distributions – find what rank or percentile value a hypothetical data value would have if added to an existing data set

·         Histograms – create a width-balanced histogram of data, by returning a number representing the bucket number

·         First and last aggregates – compares the first or last element with any other element in a group

·         Grouping sets – allows a query to specify the exact levels of aggregation of interest

·         Composite Group By – grouping by a specified group of columns which are treated as a single unit for grouping

 

Aggregation functions support OLAP-based analysis.  Introduced in Oracle8i, the ROLLUP and CUBE functions provide groupings at the upper levels of grouped columns not previously available with the GROUP BY clause.  For example, when grouping by department and job for sum(sal) on the Emp table in the SCOTT/TIGER schema, you could get aggregated salaries for job within department, only.  ROLLUP will give you a total for all employees in each Department and for the whole company.  CUBE will give you totals for job, only, at the company level, in addition to the ROLLUP totals.

Grouping sets allows you to group aggregates by sets of columns.  Concatenated grouping sets will give you all combinations of the columns in grouping sets, working like a Cartesian product.  Composite columns in a group by will give you groupings with fixed sets of columns that can not be broken up, that is, the column sets will always be grouped together, or not at all. 

 

Data Mining

The objective of data mining is to discover patterns and relationships in data.  Data mining tools use machine-learning algorithms to search through large volumes of data, seeking out unknown patterns and relationships.  The algorithms learn the patterns and relationships discovered and build models that can be used to score new data with.  Scoring is the process of predicting outcomes, with use of the models.  All of the data mining functions in 9i are built using a Java API interface.  As of this writing, data mining is only available as an extra cost option on Oracle9i Enterprise Edition.

 

Oracle acquired data mining technology by buying Darwin and its parent company, Thinking Machines Corp, in the late ‘90s. In its acquired configuration, Darwin was a stand-alone data mining tool that ran only on Unix platforms, and accepted only flat file data sets as input.  Today, Darwin Release 3.7 is capable of accessing the Oracle database and the network, and offers wizards for user interface, along with a Windows GUI, and web browsers.  Integrated with Oracle9i, data mining performs all data processing and access using the relational database.  Darwin, which is part of the Data Mining Suite, is also integrated with the new CRM 11i marketing analysis application. 

 

Data Mining Concepts

Data mining is divided into two learning techniques: supervised and unsupervised.  The supervised technique requires the analyst to specify a target field or dependent variable.  It then sifts through data searching for patterns and relationships between the independent and dependent variables.  The unsupervised learning technique does not work with a specific objective assigned by an analyst.  Instead, unsupervised algorithms search for associations and clusters without an objective.

 

Using the supervised learning technique, the algorithm sifts through the data finding patterns and builds a model that describes the functional relationships.  The data is divided into three equal sampling groups, for training, testing and evaluation phases.  The training sample is used to build the initial model.  The testing data sample is used to refine the model.  The model is then applied to the evaluation data, which is used to make predictions, and evaluates the model’s accuracy.

 

The unsupervised learning algorithms are usually applied against historical data to find associations and clusters, without a specific business objective.  Sometimes, the models can be used to make predictions. 

 

Machine-learning Techniques

Oracle9i Data Mining uses two machine-learning analysis techniques, Classification and Prediction, also known as the Naïve Bayes algorithm, and Association Rules. 

 

The Bayes classification and prediction algorithm is a supervised learning technique.  Bayes can produce both binary and multiple results outcomes.  The binary outcome is a yes-no, true-false Boolean result.  The multiple outcome can have more than two results, such as A-B-C-D, minus-zero-positive, yes-no-possible-not applicable, percentiles, and so on.   Each outcome has an associated percentage probability of occurring.  

 

The Association Rules algorithm is used to detect associated, or co-occurring, events hidden in data.  Association analysis is commonly used in market basket analysis, to seek out popular combinations of products sold to customers, such as milk, cereal and bananas.  “Association Rules” is an unsupervised learning technique. 

 

Model Scoring

The results of analysis by either the Bayes or Association Rules techniques are used to build models by which new data can be scored.  Scoring is the process of predicting outcomes with the use of models.  Scoring may be done in batch or on-demand mode.  Batch mode is used to score multiple records stored in a table, the results of which are stored in another table.  On-demand scoring is used for single records, and the result is passed back to the application calling the scoring function.

 

Data Mining in the 9i Database

Oracle Data Mining simplifies the extraction of business intelligence from large volumes of data.  Instead of extracting data into flat files for an external tool, Data Mining obtains its source data from the relational database.  The data mining functionality is embedded in Oracle9i, so the data preparation, model building and model scoring processes are all located on the database. 

 

Java Functions

Java API is used to build all functions used in 9i Data Mining, including data preparation, model building and model-based scoring.  The API is adaptable to different user groups.  Complex configuration settings can be exposed for use by advanced users, while a fully automated mode can be used by business users. 

 

ETL Infrastructure

ETL stands for Extraction, Transformation and Loading, the industry’s term for the data migration process in data warehousing.  As indicated by its name, ETL consists of three processing activities:

·         Getting data from the source application systems

·         Converting the data per the warehouse standard format, and cleaning it of errors and inconsistencies

·         Loading the transformed data into the warehouse dimension and fact tables. 

 

ETL also performs aggregation and customization work on data moving out to the data marts.

 

In the past ETL had to be performed by in-house built procedures written in PL/SQL or other languages, or with a third-party tool.  Data migration activities consume very extensive resources during a project, even for an operational application.  Furthermore, data warehouse ETL is an ongoing activity; it does not end after a legacy application moves its data to its replacement system and is shut down. 

 

Past and Future Practices

In the past, ETL processing flows could be placed in one of two categories: External or Staged.  In External ETL, most of the transformation processing took place outside of the data warehouse database, and was performed mostly by third party transformation tools.  Data from the source applications is fed into the ETL tool, processed in memory, then is inserted into the target warehouse tables.  If warehouse data is needed during transformation processing, such as for key lookups, the warehouse database is treated as an external data source.  The principal database functions are loading data, maintaining indexes and validating constraints.  This approach is called transform-then-load.

 

In Staged ETL, the extracted data is loaded into staging tables on the warehouse database, and the transform processing is performed in the database.  Data from all sources is loaded raw into ‘neutral’ staging tables, i.e. varchar2(4000) columns.  Transformation is performed on the raw text, manipulated in memory, and perhaps loaded into intermediate staging tables for processing by additional functions.  Finally, the warehouse ready data is loaded into the target tables.  This approach is called load-then-transform.

 

Oracle9i’s database capabilities are enhanced to perform specific ETL tasks. The ETL process flow has been significantly upgraded so that the database becomes part of the ETL solution.  The new approach, called transform-while-loading,

is performed without first loading source data into staging tables.  The extracted data is processed in memory and then loaded directly into the warehouse tables. 

 

Oracle markets an ETL toolset called Warehouse Builder, which can also be used to design a warehouse database schema.  Warehouse Builder is used to map source and target data, and build transform processes.  The new Warehouse Builder, Release 3i, is reviewed later in this paper.

 

Oracle9i ETL ‘Toolkit’

Oracle9i Server offers a scalable ETL infrastructure that can grow with the data volume and new applications added to the warehouse.  Major new features or enhancements to ETL, described in detail below, include:

·         Change data capture

·         External file access

·         Upserts

·         Multiple table inserts

·         Table functions

·         Transportable tablespaces

·         Resumable statements

·         Parallel DML operations

·         Partitioning and partition maintenance operations

 

The server’s ETL functions can be leveraged by Warehouse Builder’s design and deployment functions.

 

New Features/Enhancements

The following are new features or enhancements in the 9i version:

 

Change Data Capture detects changes in source data on the application systems.  There are two ways changes can be detected, by using the Replication option or by mining the archived redo logs.  Neither method adversely impacts the application system performance from ETL queries.  The Replication method copies the application tables to another database, where they can be queried as required.  The second method, using Oracle’s LogMiner tool, the archived redo logs can be queried in a search for changes in data.

 

External Tables are files residing on the server’s file system, not database files or tablespaces.  Oracle gives a user the capability to access external flat files with SQL as if they were database tables and the barrier between the database and the file system is transparent.  The files do not need to be loaded into staging tables first in order to access them.

 

Upserts is a new function that updates a row in a table if it exists, and inserts data if a row doesn’t already exist.  This function operates in a single step, and eliminates having to build a PL/SQL routine to perform this common task.

 

Multiple Table Inserts allows the insertion of data from a common source into more than one table, in a single step.

This functionality previously had to be performed by repeating insert routines for each table.

 

Table Functions are program units that perform complex transformations that cannot be performed by a single SQL function.  The routines can be written in any language supported by the Oracle Server, PL/SQL, C/C++, Java, and others.

Table functions can build complex transformations by pipelining SQL functions, passing data from one function to the next.  These functions can run in parallel and are scalable. 

 

Transportable Tablespaces were introduced in Oracle 8 and are used to move data from one database tablespace to another database’s tablespace.  Previously, the operating system version, database version and blocksize had to be the same between the source and target database.  In Oracle9i, the blocksize restriction is now removed.  The source and target tablespaces may now have different blocksizes for transportable tablespaces to be used.  Since blocksizes for applications are small and for a warehouse are large, this enhancement makes data movement an easier task with 9i.

 

Resumable Statements allow for suspension and resumption of major database operations in the event of a repairable failure (space limit and out-of-space errors). 

 

Parallel DML Operations allow for unrestricted parallel direct path inserts into partitioned tables.  Previously, parallel inserts were limited by the number of table partitions.

 
Partitioning and Partition Maintenance Operations are key features for handling very large data warehouses, reaching into the terabyte range.  Partitioning allows for distributing data across distinct physical data files for very large tables.  Previous partitioning techniques distributed the data using ranges of dates or primary key values.  Oracle9i introduces distribution by value lists, such as groups of countries in a region or a continent.  Individual partitions can be taken off-line for maintenance or drop/add operations, leaving the remaining partitions on-line for user access.

 

Warehouse Builder 3i

Warehouse Builder is Oracle’s data warehouse design and ETL development tool.  Warehouse Builder designs, deploys and manages warehouses, data marts and business intelligence applications.  In the ETL area, Warehouse Builder maps data sources and target tables and generates transform code. 

 

Overview

Using Warehouse Builder, a user can:

·         Graphically model relational and multidimensional target systems

·         Extract data from relational databases, flat files, Web click streams and packaged (ERP) applications

·         Cleanse, validate and integrate data using pre-defined or custom transformations

·         Load the data into the Oracle database

·         Manage all operational aspects of a data warehouse, including loads, monitoring, scheduling, workflow and life-cycle changes

 
Typical Use of Warehouse Builder

The following are activities that can be performed with Warehouse Builder:

1.       Defining Sources – Acquire metadata about source data on relational databases and flat files

2.       Designing the Data Warehouse – Build a relational or star schema data model for the target tables

3.       Mapping Sources to Targets – Map columns to columns and define transformations

4.       Generating Code – Generates SQL DDL for target table definitions and source to target mappings; for flat files, SQL Loader control file code is generated

5.       Instantiating the Warehouse – Execution of the target definition SQL DDL, to build and ‘deploy’ the warehouse

6.       Extracting Data – Execution of transformation code to extract, transform and load data into the warehouse

 

Major Version Release

Release 3i is a major version release of Warehouse Builder.  Like the releases before it, 3i is tightly integrated with the Oracle database server and works closely with the database and ETL functions.  Release 3i is particularly suited for working with Oracle9i, and was designed to leverage 9i’s new and enhanced ETL functions.  Another feature of Warehouse Builder is to build and maintain the Oracle Repository, which stores the metadata for the warehouse.

 

The major new features on Warehouse Builder 3i is the new mapping architecture and mapping and transform enhancements.  A number of significant enhancements to other functions were also added to release 3i.

 

Mapping Architecture

The previous mapping architecture provided for a two-level map, high-level and detailed.  Source and target mappings must also be made before transformations could be built.  If source and target maps had to be re-built, the transformations had to be dropped then re-created, based on the new mappings. 

 

With release 3i, this is no longer necessary.  Transformations are no longer dependent on source and target mappings.  Connectivity between transforms and mappings is no longer direct but through parameters.  Mappings can now be dropped and re-created without affecting the transformations.

 

There are two types of transformation operators: operators and transforms.  Operators take input rows and generate a subset number of output rows.  Operators are typically AGGREGATE and FILTER functions, both of which produce fewer output rows than input rows.  Transforms have a direct one-for-one relationship between input and output rows, and are typically standard SQL functions, such as TO_CHAR. 

 

Mapping and Transform Features

Warehouse Builder 3i comes with the following new or enhanced mapping and transform features:

·         Join Operator – joins multiple sources, for insert into one target table

·         Split Operator – output to multiple targets, from a single source table

·         Multi-stage Transformation – pipeline SQL expressions

·         Multiple Targets – insert or update multiple tables in a single pass

·         Key Lookup – lookup with application key and replace with warehouse key

·         Set Operator – support for UNION, UNION ALL, MINUS, INTERSECT

·         Inline Expression – SQL expression in SELECT clause

·         Incremental Code Generation – generate and view code up to any point within the data flow

 

Other Enhanced Features

In addition to mapping and transforms, other enhancements to 3i include:

·         Multi-stage transforms – chaining together of SQL and PL/SQL transforms

·         Enhanced flat file integrator – supports multiple record types and multiple targets, re-sampling

·         Metadata Reconciliation – supports user reconciliation/direction of actual source system changes

·         Upgrade or drop warehouse objects – provides impact analysis of changes, maintains data integrity during changes

·         Expression Builder – builds SQL expressions for transformations

·         Multiple-user support on Warehouse Builder Repository –more than one user may log on to the repository

·         International Name and Address Cleansing – Oracle Pure Integrate™ tool, support for 30 countries

·         Reverse Engineering of Code – reverse engineer existing PL/SQL scripts into the transformation repository

·         Total Data Warehousing Management – Manage all operational aspects of the data warehouse

·         Performance enhancements – partitioning technique, foreign key constraint manipulation, batch update/insert/delete

 

Summary

Oracle9i Server is positioned by Oracle as a true data warehouse platform and the core of its E-Business Intelligence Platform.  9i Server comes with an integrated relational database, OLAP server, data mining, analytical SQL and ETL infrastructures.  All data is stored and processed on the relational database.  ETL provides support for the data warehouse and data mining.  Analytical SQL and Java API support OLAP and data mining.  9i Server is Internet and Web enabled, and is supported with the JDeveloper Java environments.  Warehouse Builder 3i, a major version release, offers changes in the mapping architecture and enhanced mapping and transformation features, along with several significant enhancements in other functional areas of the product.

 

References:

All of the following are Oracle white papers, which may be downloaded from www.oracle.com or otn.oracle.com:

Lumpkin, George, Oracle9i for e-Business: Business Intelligence, October 2000, Oracle Corp.

Endress, Bud, Oracle9i OLAP: A Scalable Web-based Business Intelligence Platform, April 2001, Oracle Corp.

Berger, Charles, Oracle9i Data Mining, June 2001, Oracle Corp

Baer, Hermann, and Patrick Amor, The Oracle Server as an ETL Toolkit, October 2000, Oracle Corp.

Khachatourian, Henrick, New Features Overview of the New Oracle Warehouse Builder, October 2000, Oracle Corp.

 

Robert F. Edwards is a Senior Developer with Dulcian, Inc., a consulting firm based in Iselin, New Jersey.  He may be contacted at 732-744-1116, email: redwards@dulcian.com.