Managing Unstructured Data: LOBS, SecureFiles, BasicFiles

If you start learning database, about 99% of time the focus in on the three core datatypes (DATE, NUMBER, VARCHAR2). However, in the reality the media context (pictures, movies, documents, and sounds) represent the largest and the fastest growing part of any contemporary IT system. As a result, correct handling of such data is as much critical to the overall project success as effective manipulation with financial information or number-crunching. Also, since the total volume of media content is usually distributed by a reasonably small number of attributes, the cost of mishandling of each of them is much higher.

It is common for all database solutions (not only Oracle) to utilize a class of datatypes designed to work with large objects called LOBs. For each the version of Oracle RDBMS, the maximum size of them could differ, but right now  it goes up to 8-128TB depending on configuration and environment. These LOBs can be divided into two groups based on the way in which the data is stored:

Internal large objects are stored within the database itself and are being accessed by a special mechanisms (described below), which is separate from a regular table data access. There are three datatypes of internal LOBs (all supported by multiple platforms):

·         BLOBs are used to store binary information (usually, multimedia).

·         CLOBs are used for textual information.

·         NCLOB is used to store information in the National Character Set (similar to CLOB).

External large objects are stored in the file system and only the pointer is stored in the database. This pointer is represented via Oracle-proprietary BFILE datatype. BFILE is used to point to files stored in the operating system and provide read-only access to these files (if you need to write to operating system files – you should either use UTL_FILE package or custom-built JAVA procedures).

This paper will describe some important factors to keep in mind when working with large objects in a system. Before jumping to details it is worthwhile to mention that in the version 11gR1 Oracle introduced an extended internal storage mechanism for handling CLOB/BLOB, called SecureFile (in beta-releases and in some early papers it was also called FastFile) to differentiate from traditional BasicFile implementation. In short, the whole story could be summarized as follows:

·         BasicFile

·         Exactly the same implementation as in 9i/10g

·         Kept for backward compatibility

·         No extra licenses required

·         SecureFile

·         Introduced in 11gR1

·         Already extended in 11gR2

·         Will be extended even further (rumors)

·         A lot of interesting features, but some of then require additional licenses.

Since a lot of people are still on 10g or on 11g but not switched to SecureFile solutions in this paper I will explicitly outline behavior differences (if any of them exist)

Underlying Concepts to Understand

There are some issues specific to large objects that you need to understand before viewing the actual code syntax associated with them.

Data Access

Since you may have gigabytes of data in each field of a column in your system, the problem of accessing the data becomes the focus of the whole architecture. Oracle has a fairly elegant solution, namely to separate the data itself from the mechanism of accessing that data. This results in two separate entities: LOB data and a LOB locator that points to LOB data and allows communication with the data.

To understand this data structure, imagine a huge set of barrels with water and a pipe that can take water from the barrel, do something with it and put it back. The following analogy makes the whole picture clear - if you want to make a barrel (LOB)  accessible by a different person (sub-routine), you don’t need to extract and pass the whole amount of water (LOB data), you just need to pass the pipe (locator) pointing to the right barrel. But from the other side, if you need to pour water from one barrel to the other one, the same pipe can be used as a tunnel.

This leads us to understanding of two kinds of LOB operations:

·         Copy semantics are used when the data alone is copied from the source to the destination and a new locator is created for the new LOB.

·         Reference semantics are used when only the locator is copied without any change to the underlying data.

Data states

Any attribute may have either a NULL or NOT NULL value by the nature of the relational database theory.  Because of the existence of locators, from the practical point of view LOBs have not two, but three possible data states:

·         Null – The variable or column in the row exists, but is not initialized

·         Empty – The variable or column in the row exists and has a locator, but that locator is not pointing to any data. IS NULL check will return FALSE for an Empty state.

·         Populated - The variable or column in the row exists, has a locator, and contains data of non-zero length.

The Empty state is very important. Since you can access LOBs only via locators, you must first create them. In some environments you must have an initial NULL value, but for PL/SQL activities it makes sense to immediately initialize any LOB column as Empty to save an extra step.

Data storage

LOBs can be tricky from the DBA’s point of view. If external LOBs (BFILE) are nothing more than pointers to files stored in the operating system, internal LOBs leave a lot of space for configuration. There are two different kinds of internal LOBs:

·         Persistent LOBs are represented as a value in the column of the table. As a result, they participate in the transaction (changes could be committed/rolled back) and generate logs (if configured to do so).

·         Temporary LOBs are created when you instantiate the LOB variable. But when you insert the temporary LOB into the table, it becomes a persistent LOB.

Since LOBs by design are created to support large volumes of data, it is completely logical that these datatypes have also extended ways of handling UNDO retention. It became even more critical since the introduction of FLASHBACK functionality, because inappropriate generation of UNDO for LOB columns could significantly increase space requirements to guarantee required retention period. At the current point, the following options are available:

1.       BasicFile

a.       Disabled (default) – only support consistent reads and do not participate in the FLASHBACK logic

b.       Enabled – the same UNDO_RETENTION parameter should be applied to the LOB column as to a regular data

2.       SecureFile

a.       Auto (default) - only support consistent reads and do not participate in the FLASHBACK logic

b.       None – do not generate UNDO at all

c.        MAX <N> - keep up to N megabytes of UNDO

d.       MIN <N> - guarantee up to N seconds of retention. It allows setting a different value from an overall UNDO_RETENTION setting.

 

The major difference between LOBs and other datatypes is that even variables are not created in memory. Everything is happening via physical storage. Temporary LOBs are created in the temporary tablespace and released when they are not needed anymore. With persistent LOBs, each LOB attribute has its own storage structure separate from the table in which it is located. As usual in Oracle, each storage structure is represented as a separate segment.

If regular table data is stored in blocks, LOB data is stored in chunks. Each chunk may consist of one or more database blocks (up to 32KB). Setting the chunk size may have significant performance impacts since Oracle reads/writes one chunk at a time. The wrong chunk size could significantly increase the number of I/O operations. In SecureFile implementation chunks are dynamical (in the attempt to allocate as much of continuous space as possible) and cannot be managed manually anymore (at least for now).

To navigate chunks, Oracle uses a special LOB index (also physically represented as a separate segment). As a result, each LOB column is physically represented by two segments one to store data and one to store the index. These segments have the same properties as regular tables: tablespace, initial extend, next extend etc. The ability to articulate the physical storage properties for each internal LOB column can come in handy for making the database structure more manageable. You can locate a tablespace on a separate drive, set different block size, etc. In some versions of Oracle you can even specify different properties for the index and data segments. Currently, they must be the same, and there are restrictions on what you can do with LOB indexes; for example, you cannot drop or rebuild them.

Performance considerations

Each operation with an LOB chunk requires physical I/O. As a result, you may end up with a high number of IO-related wait events in the system. But it is reasonable to ask the following question: Why place data in the special storage structure if in some rows you may only have a small amount of data? Using the online ordering system example, you might have remarks about some goods that only require between 1KB and 1MB of space. To handle such cases Oracle allows you to store data in the row (instead of outside of the row) if you have less than 3964 bytes. It would cause all small remarks from the example to be processed as if they are regular VARCHAR2(4000) columns.  When their size exceeds this limit, the data will be moved to LOB storage. In some cases, you might even consider disabling this feature since (in almost all cases) it is the best option.

Another critical performance question would be to figure out how all operations with such large data volumes would impact buffer cache. As usual, Oracle provided us enough options to adjust the caching option in a number of ways:

·         NOCACHE is the default option. It is designed to be used only if you are rarely accessing the LOBs or the LOBs are extremely large. From the physical point, existing implementations are completely different:

·         BasicFile – use DirectRead/DirectWrite. Even these mechanisms allow to tunnel to the storage a lot of data, in the IO-active system (especially OLTP) they could cause significant “hiccups”.

·         SecureFile – utilize a special shared pool area (managed by SHARED_IO_POOL).

·         CACHE is the best option for LOBs requiring a lot of read/write activity.

·         CACHE READS help when you create the LOB once, read data from it, and the size of LOBs to be read in the system at any time does not take too much space out of the buffer pool. “Write” processes are implemented in the same was as of NOCACHE option.

If your database is running in ARCHIVELOG mode (as majority of databases are), the problem of generating too many logs becomes a real headache for DBAs. Since LOBs have their own storage segments, from the very beginning it was possible to setup its logging option which could be different from the table owning the LOB column. Unfortunately, having NOLOGGING for a column in case of a catastrophic crash meant that the whole rows would not be accessible until LOB columns would be reset to a stable state. To solve this problem “SecureFile” mechanism introduced FILESYSTEM_LIKE_LOGGING, which would mean preserving all metadata option, while not logging any changes to the LOB itself. It would make the whole table accessible even in the case of a major failure or switch to a standby. This option may be viable if the data in CLOB could be easily retrieved from other sources or of a temporal nature. By the way, CACHE/CACHE READS options always imply LOGGING option (for all implementations).

Enabling SecureFile

It is needed to mention that Oracle also introduced a special parameter that handles usage of SecureFile storage mechanism – DB_SECUREFILE. This parameter could take the following values:

·         Permitted (Default) – BasicFile is created unless SecureFile is explicitly specified

·         Always – SecureFile is created unless BasicFile is explicitly specified

·         Force – always create SecureFile

·         Ignore –allow creation of SecureFile attributes, but treat them as BasicFile (all SecureFile features are disabled)

·         Never – raise exceptions if SecureFile storage mechanism is selected.

Selection of appropriate parameter should be driven by your system policies., but I would strongly recommend to use either FORCE or NEVER to preserve code consistency.

Standard Use of  LOBs

As an example, let’s create an online shopping catalog of electronic goods where each record contained the name of the item, user manual text and front page image, and a link to the original text file with the manual stored on the server. Because of the nature of required data, LOB datatypes are mandatory, as shown here:

 

create table goods_tab

   (item_id      number primary key,

    name_tx      varchar2(256),

    remarks_cl   CLOB DEFAULT empty_clob(),

    manual_cl    CLOB DEFAULT empty_clob(),

    firstpage_bl BLOB DEFAULT empty_blob(),

    mastertxt_bf BFILE)

LOB(remarks_cl) store as SecureFile remarks_seg(

    tablespace USERS

    enable storage in row    

    cache)

LOB(manual_cl) store as SecureFile manual_seg(

    tablespace LOBS_BIG

    disable storage in row

    nocache

    filesystem_like_logging)     

LOB(firstpage_bl) store as BasicFile firstpage_seg(

    tablespace LOBS_BIG

    disable storage in row

    chunk 32768      

    cache reads) 

 

This example includes all three datatypes: CLOB, BLOB, BFILE and two implementations (BasicFile, SecureFile). Also each internal LOB has its own storage block at the end of the table definition. There are a number of factors to consider when using this approach:

·         Each internal LOB has explicit segment names (remarks_seq, manual_seq, firstpage_seq) instead of system-generated ones. This is done for the convenience of working with the user_segments dictionary view.

·         Since we are planning to work with LOBs in PL/SQL, all internal LOBs are initialized to empty values (so they now contain a locator that could be retrieved) via special functions - empty_clob() and empty_blob( ).

·         The column remarks_cl is accessed and modified very often, but the amount of data is not very large. Therefore, the best option is to place the column in the same tablespace as the main data so that if you have joins to other tables Oracle needs to do less work. This enables storage within the row for small amounts of data. The cache option should also be enabled for performance optimizations. Since a lot of people will be working with that column, you definitely don’t want to generate extra wait events because of direct read/direct write operations.

·         The column manual_cl is accessed not very often and always can be reloaded from master files. That’s why the independent tablespace, no storage in the row, and no caching options are appropriate here and filesystem_like_logging.

·         The difference between firstpage_bl and  manual_cl is that although this column will never be updated, it could be read by different users often enough. This is the reason why you should enable caching on reads.

This table also illustrates that both BasicFile and SecureFile storage mechanisms could be used in the same table simultaneously. Although, for the sake of code maintenance it should be avoided unless there are good reasons.

At this point, you have configured the physical storage configured, but somewhere on the server are a number of manuals. It is necessary to get the information in the manuals to the database. If you have had some experience with the UTL_FILE, you know that starting with version 9i the only way that PL/SQL can access operating system files is via directories. You can build a directory using the following code:

create directory IO as 'C:\IO';

grant read, write on directory IO to public;

 

The following examples use global variables to store/reuse the ID of the record with which you are currently working. Since there is no need to leave the context of the same database session, the packaged variable should be a valid storage place:

 

create or replace package pkg_global
is
  v_current_id number;

end;

 

 

Load BFILE

Now you can move the required files into that directory and start loading them using this code:

declare

    v_bf BFILE := BFILENAME ('IO', 'book.txt');

begin

    insert into goods_tab (item_id, name_tx, mastertxt_bf)

    values (object_seq.nextval, 'The New Book', v_bf)

    returning item_id into pkg_global.v_current_id;

end;

 

In order to show that you can use the same datatypes in PL/SQL, you can create a variable of type BLOB using the special built-in function BFILENAME that takes the directory and file name and returns a temporary locator. After that the code inserted a newly created locator to the table and made it permanent. In the current version of Oracle, it does not matter whether or not the specified file really exists; it is your responsibility to check before using it.

Load CLOB and BLOB from files

Assuming that there is a file with the book in the folder IO, there is an easy way of loading its contents into the corresponding CLOB using the special PL/SQL APIs provided by the built-in package DBMS_LOB shown here:

 

declare                           

    v_file_bf     BFILE;

    v_manual_cl   CLOB;

    lang_ctx      NUMBER := DBMS_LOB.default_lang_ctx;

    charset_id    NUMBER := 0;

    src_offset    NUMBER := 1;

    dst_offset    NUMBER := 1;

    warning       NUMBER;

begin     

    select mastertxt_bf, manual_cl

    into v_file_bf, v_manual_cl

    from goods_tab

    where item_id = pkg_global.v_current_id

    for update of manual_cl;         

 

    DBMS_LOB.fileopen (v_file_bf, DBMS_LOB.file_readonly);

    DBMS_LOB.loadclobfromfile (v_manual_cl,

                               v_file_bf,

                               DBMS_LOB.getlength (v_file_bf),

                               src_offset, dst_offset,

                               charset_id, lang_ctx,

                               warning);

   DBMS_LOB.fileclose (v_file_bf);

end;

 

This code illustrates the real meaning of locators. There is no UPDATE in the block, but the value in the table will be changed. Using SELECT…INTO…FOR UPDATE locks the record and returns the locators back to the LOBs. But these special locators contain the ID of the current transaction (more about transaction issues a bit later). This means that you can not only read data from the LOB, but also write to the LOB. Using the “barrel and pipe” analogy, you have your own tube and your own barrel to do whatever you want. The way to read data is very straightforward: open the file via locator, read the data, close the file. Because of possible language issues, the reading of textual information to the CLOB is a bit tricky. That is why you have the extra options of specifying the language and character set. Source and destination offset parameters are also very interesting. They are of type IN/OUT and originally specify the starting points for reading and writing. But when the procedure call is completed, they are set to the ending points. That way you always know how many bytes (for BLOB) and characters (for CLOB) were read, and how many of them were written.

The process of reading the image of the first page is even simpler. Since it contains binary information you don’t need to worry about languages as shown here:

 

declare

    v_file_bf  BFILE:= BFILENAME ('IO','picture.gif');           

    v_firstpage_bl   BLOB;

    src_offset       NUMBER := 1;

    dst_offset       NUMBER := 1;

begin

    select firstpage_bl

      into v_firstpage_bl

      from goods_tab

    where item_id = pkg_global.v_current_id

    for update of firstpage_bl;

 

    DBMS_LOB.fileopen (v_file_bf, DBMS_LOB.file_readonly);

    DBMS_LOB.loadblobfromfile (v_firstpage_bl,

                               v_file_bf,

                               DBMS_LOB.getlength (v_file_bf),

                               dst_offset, src_offset);

    DBMS_LOB.fileclose (v_file_bf);

end;

 

As an illustration of available options, instead of using the existing locator to the external file, you can create a temporary one on the fly and process it using the same locking logic and DBMS_LOB package.

SecureFiles extras (for extra money)

Since different production companies are having different IT budgets, this paper is mostly focused on the “common denominator” – Oracle features, available in all editions and for all possible installation types. But it will be unfair to just ignore a number of advanced options introduced with SecureFile storage implementation only because of additional licensing fees involved:

·         “Oracle Advanced Compression Option” gives you access to:

·         De-duplication – possibility to preserve only one copy of LOB if they exactly match

·         Compression (High/Medium/Low) – built-in basic archiver to compress the data. Technically, a trade-off between CPU and extra storage

·         “Oracle Advanced Security Option” gives you access to

·         Encryption – direct implementation of Transparent Data Encryption per LOB column

Some of the mentioned features will be shown further down in the paper, but overall at the current point my remarks about these extra options are the following

·         De-duplication – useless for smaller systems. Could save some space for a large system where the same file could be sent to hundreds of people

·         Encryption – it is always nice to have higher granularity of what you can and what you cannot encrypt.

·         Compression – definitely makes sense in a lot of cases, but should not be applied blindly because of CPU cost.

Special cases and special problems

The level of complexity introduced by LOBs requires a number of restrictions to be placed on their use (even later versions of Oracle attempt to remove as many of them as possible). For beginners, there are three major areas of concern: generic restrictions, string processing problems, and transaction limitations which are discussed here.

Generic Restrictions

As of Oracle 11g release 2, there are couple of restriction sets that you need to be aware of:

1.       SQL activity restrictions:

a.       You cannot have LOB columns in ORDER BY or GROUP BY clauses or any aggregate functions.

b.       You cannot have an LOB column in a SELECT DISTINCT statement.

c.        You cannot join two tables using LOB columns.

d.       Direct binding of string variables is limited to 4000 characters if you are passing a string into the CLOB column. This restriction is a bit tricky and requires an example. In the following code, the first output will return 4000 (because string was directly passed into the UPDATE statement), but in the second case the output will be 6000 (because the string was passed via PL/SQL variable). In case you need to write more than 32K of information, you must use DBMS_LOB package.

declare

    v_tx varchar2(6000):=lpad('*',6000,'*');

    v_count_nr number;

begin

    update goods_tab

    set remarks_cl =lpad('*',6000,'*')

    where item_id = pkg_global.v_current_id       

    returning length (remarks_cl) into v_count_nr;

    dbms_output.put_line('Length:'||v_count_nr);

   

    update goods_tab

    set remarks_cl =v_tx

    where item_id = pkg_global.v_current_id       

    returning length (remarks_cl) into v_count_nr;

    dbms_output.put_line('Length:'||v_count_nr);   

end;     

2.       DDL restrictions:

a.       LOB columns cannot be a part of a primary key

b.       LOB columns cannot be a part of an index (unless you are using a domain index, Oracle Text or Function-Based index).

c.        You cannot specify a LOB column in the trigger clause FOR UPDATE OF.

d.       If you change LOBs using the locator with the DBML_LOB package, no update trigger is fired on the table.

3.       DBLink restrictions:

a.       You can only use CREATE TABLE AS SELECT and INSERT AS SELECT if the remote table contains LOBs. No other activity is permitted.

4.       Administration restrictions:

a.       Only a limited number of BFILEs can be opened at the same time. The maximum number is set up by the initialization parameter SESSION_MAX_OPEN_FILES. The default value is 10, but it can be modified by the DBA.

b.       Once a table with an internal LOB is created, only some LOB parameters can be modified.  You can change the tablespace, storage properties, caching options, but you cannot modify the chunk size, or storage-in-the-row option.

String restrictions

Oracle tries to simplify string activities for CLOBs by including overloads of standard built-in functions to support larger amounts of data. You can now also use explicit conversions of datatypes. For example you can assign a CLOB column to a VARCHAR2 PL/SQL variable as long as it can hold all of the data from the CLOB. Conversely, you can initialize a CLOB variable with a VARCHAR2 value. As a result, there are some activities that could be done using SQL semantics (built-in functions) or API semantics (DBMS_LOB package).

Even though SQL semantics are much easier to work with, there are some reasons for when not to use them:

·         If you are working with more than 100K of data for each CLOB, APIs handle caching significantly better.

·         If there is a great deal of random access to the data in the CLOB, APIs utilize LOB indexes much more efficiently.

There are also some differences between PL/SQL code and SQL statements (even inside of PL/SQL routines) from the perspective of what you can and cannot do with LOBs. You can compare LOBs (>,!=, between) only as a part of a PL/SQL routine as shown here:

declare

    v_remarks_cl CLOB;

    v_manual_cl CLOB;

begin

    select remarks_cl, manual_cl

    into v_remarks_cl, v_manual_cl

    from goods_tab

    where item_id = pkg_global.v_current_id        

    --and remarks_cl!=manual_cl -- INVALID

    ;

   

    if v_remarks_cl!=v_manual_cl -- VALID

    then

        dbms_output.put_line('Compared');

    end if;

end;

 

Using SQL semantics could get you into a lot of trouble with some built-in functions. INITCAP, SOUNDEX, TRANSLATE, DECODE and some other functions will process only the first 4K (for SQL statements) and 32K (for PL/SQL code) of your data. In the following example the second statement will raise the exception because TRANSLATE could not process 6000 characters:

declare

    v_tx varchar2(6000):=lpad('a',6000,'a');

    v_count_nr number;

begin  

    update goods_tab

    set remarks_cl =v_tx

    where item_id = pkg_global.v_current_id        

    returning length (remarks_cl) into v_count_nr;

    dbms_output.put_line('Length:'||v_count_nr);  

 

    update goods_tab

    set remarks_cl = translate (remarks_cl,'a','A')   

    returning length (remarks_cl) into v_count_nr;

    dbms_output.put_line('Length:'||v_count_nr);  

    

end;

 

Transaction restrictions

There are a number of restrictions when using LOBs for transaction control:

1.       Each locator may or may not contain a transaction ID.

·         If you already started a new transaction (SELECT FOR UPDATE, INSERT/UPDATE/DELETE, PRAGMA autonomous transaction), your locator will contain the transaction ID.

·         If you use SELECT FOR UPDATE of an LOB column, the transaction is started implicitly and your locator will contain the transaction ID.

2.       You cannot read using the locator when it contains an old transaction ID (for example, you made a number of data changes and committed them), but your session parameter TRANSACTION LEVEL is set to SERIALIZABLE.  This is a very rare case.

3.       First write using the locator:

·         You need to have a lock on the record containing the LOB that you are updating at the moment you are trying to perform the update (not necessarily at the moment of acquiring of the locator). That lock could be the result of SELECT FOR UPDATE, INSERT, or UPDATE. (It is enough to update any column in the record to create the lock.)

   

declare

    v_manual_cl   CLOB;

    v_add_tx      VARCHAR2 (2000) :='Loaded: '||TO_CHAR(SYSDATE,'mm/dd/yyyy hh24:mi');

begin

    select manual_cl          

      into v_manual_cl

      from goods_tab

    where item_id = pkg_global.v_current_id;

 

    update goods_tab

    set name_tx = '<'||name_tx||'>'

    where item_id = pkg_global.v_current_id;    

 

    DBMS_LOB.writeappend (v_manual_cl, LENGTH (v_add_tx), v_add_tx);   

end;

 

·         If your locator did not contain the transaction ID, but was used to update the LOB, now it will contain the transaction ID (as in the previous example). But if your locator already contained the transaction ID, nothing will change.

4.       Consecutive write using the locator:

·         If your locator contains a transaction ID that differs from the current one, the update will always fail because locators cannot span transactions as shown here:

 

declare

    v_manual_cl   CLOB;

    v_add_tx      VARCHAR2 (2000) :='Loaded: '||TO_CHAR(SYSDATE,'mm/dd/yyyy hh24:mi');

begin

    select manual_cl

      into v_manual_cl

      from goods_tab

    where item_id = pkg_global.v_current_id;       

 

    update goods_tab

    set name_tx = name_tx||'>'

    where item_id = pkg_global.v_current_id;           

 

    DBMS_LOB.writeappend (v_manual_cl, LENGTH (v_add_tx), v_add_tx);    

   

    rollback; -- end of transaction

   

    DBMS_LOB.writeappend (v_manual_cl, LENGTH (v_add_tx), v_add_tx); -- FAIL!

end;

 

This information can be simplified into three rules:

1.       You can perform read operations using locators as much as you want.

2.       If you want to write using a locator, you need to have a lock on the record.

3.       If you want to write using the same locator multiple times, you have to do it in the same transaction.

 

11g Additions (SecureFiles only)

In terms of pure PL/SQL SecureFile storage implementation brought a number of interesting things to show. The first one continues the recent thread of providing extra granularity to the developer – now you can set advanced parameters not only per column, but per instance. The following example will illustrate the logic, previously un-implementable by pure PL/SQL at all, namely use compression for all LOBs above 1 MB of size:

-- enable compression for the whole column

alter table goods_tab modify lob(manual_cl) (compress high);

-- check compression

select l.column_name, l.segment_name, s.bytes/1024/1024 Mb,l.compression,l.securefile

  from user_lobs l,

       user_segments s

where l.segment_name=s.segment_name

and l.table_name  = 'GOODS_TAB'

-- example of the procedure

procedure p_loadLob(i_id number, i_cl CLOB)

is  

    v_cl CLOB;

begin

    insert into goods_tab (item_id, manual_cl)

    values (i_id, i_cl)

    returning manual_cl into v_cl;

   

    if length(v_cl)<100000 then

        DBMS_LOB.setoptions(v_cl,DBMS_LOB.opt_compress,0);   

    end if;   

END;

The other set of new tools has to do with advanced automatic control of data chunks written to the tablespace. Since Oracle manages chunks on the fly, it seems logical, that it should be possible to directly access and modify any data up to the highest possible size (32K). And indeed, DBMS_LOB package now got the following functions – Fragment_Insert, Fragment_Delete, Fragment_Move, Fragment_Replace with the logic as described:

declare

    v_cl CLOB;

begin

    select manual_cl

    into v_cl

    from goods_tab

    where item_id = 1

    for update;

   

    dbms_lob.fragment_insert(v_cl, 3, 5, '111');   

    dbms_output.put_line(substr(v_cl,1,10));

   

    dbms_lob.fragment_move(v_cl, 3, 5, 2);   

    dbms_output.put_line(substr(v_cl,1,10));

 

    dbms_lob.fragment_replace(v_cl,3,4,2,'9999');

    dbms_output.put_line(substr(v_cl,1,10));   

 

   

    dbms_lob.fragment_delete(v_cl,4,2);

    dbms_output.put_line(substr(v_cl,1,10));   

end;

Real-World examples

The following information is based on an actual project that the author worked on, in which handling large objects was particularly relevant.

HTML on the fly

Since most modern development environments now support CLOBs, they are a very useful as a way of communicating large amounts of read-only information to the client. Assume that you have a large organizational structure that could be versioned.  However, before doing the versioning, you must validate the new structural model before rolling it over the old one. There may be no errors, a few errors, or many errors.

Final solution involved building an HTML page on the fly and returning it as a CLOB:

·         A function takes a parameter and returns a CLOB in one round-trip.

·         Temporary CLOBs are released automatically.

·         Full formatting can be done in the CLOB itself. Since most current development environments understand HTML, it is the best choice.

This implementation allowed to solve all potential problems of alternative solutions:

·         It is a single-call operation – no session-dependency

·         All temporal data is cleaned up automatically

·         Formatting could be adjusted at any time.

In the simplified form the code will look as follows:

function f_validateOrgStruct_CL (...) return CLOB

is

    v_out_cl CLOB;

    v_break_tx varchar2(4):='<BR>';   

    v_hasErrors_yn varchar2(1):='N';

    ...

    <<number of cursors>>

    ...   

    procedure p_addToClob (in_tx varchar2) is

    begin

        dbms_lob.writeappend(v_out_cl,length(in_tx),in_tx);

    end;   

begin

    dbms_lob.createtemporary(v_out_cl,true,dbms_lob.Call);

 

    p_addToClob('------------------VALIDATE RECRUITERS-------------'||v_break_tx);

    for rec_recrtr in c_recrtr loop

        if rec_recrtr.ric_tx is null  then

            p_addToClob('  * '||rec_recrtr.rc_recrtr_dsp||' - missing code'||v_break_tx);

            v_hasErrors_yn:='Y';

        end if;

    end loop;

    ...

    if v_hasErrors_yn='Y' then

        p_addToClob(v_break_tx||'<font color="red">*** ERRORS ARE DETECTED! ***</font>');   

    end if;       

 

    return v_out_cl;

exception

when others then

    return '<font color="red">*** FATAL ERRORS! ***</font>'||v_break_tx||sqlerrm;

end;

 

The code is very straightforward. First, you create a temporary CLOB. Considering that the resulting size is not very large, you can make it cached (second Boolean parameter set to TRUE). The third parameter is set to DBMS_LOB.Call (another option is DBMS_LOB.Session or DBMS_LOB.Transaction in 11gR2). Since you are not planning to reuse the LOB, it makes sense to mark it ready to be released immediately after the function finishes its execution. Now you have initialized the CLOB so you can start writing error messages, remarks, headers, etc. This example only includes two HTML tags <BR> and <FONT>, but the idea is clear. If, at some point, you need to change the formatting you can simply alter the function without touching either the middle-tier or the client code.

CLOBs and XML

One of the most critical lesson learned over years of working with Oracle’s XMLType is that internally (unless you specify otherwise – you could use object-relational of binary XML) it contains a CLOB column, created with the default settings. As it was already described, the default setting may or may not be the best option, depending on your implementation. For example, if the column is very actively accessed, having NOCACHE would be a really bad idea – and that is exactly what happened to one of my production environments. I noticed an abnormally high number of Direct IO operations – and the suspect was pretty obvious. What was not obvious is how to correctly adjust parameters. The following example will walk you through the whole chain of events:

create table goods_xml (id number, data_xml XMLType)

 

select *

from user_lobs

where table_name = 'GOODS_XML'

 

alter table GOODS_XML modify lob(SYS_NC00003$) (cache);

 

alter table GOODS_XML add review_xml XMLType

XMLType column review_xml

store as SecureFile CLOB

    review_seg(

        tablespace USERS

        enable storage in row    

        cache)

Emails from the database

Another area where large objects can be very useful is if you need to send email directly from the database using the UTL_SMTP package. Sending attachments is particularly challenging. Due to space limitations, this code is based on the well-known Oracle package DEMO_MAIL (found in manuals or on OTN):

 

procedure p_attach_file (conn in out nocopy utl_smtp.connection,

    i_filename_tx varchar2,

    i_directory_tx varchar2)

is

   v_file_bl     BLOB;

   v_file_bf     BFILE:=BFILENAME (i_filename_tx, i_directory_tx);

   src_offset    pls_integer := 1;

   dst_offset    pls_integer := 1;

   v_length_nr   pls_integer;

 

   v_mod_nr     pls_integer;

   v_pieces_nr  pls_integer;

   v_counter_nr pls_integer         := 1;

 

   v_amt_nr     binary_integer      := 54; -- amount of bytes to be read at once

   v_filepos_nr pls_integer         := 1;

  

   v_buf_raw    raw (100); -- buffer to communicate with UTL_SMTP

  

begin

    dbms_lob.createtemporary (v_file_bl, true, dbms_lob.call );   

    dbms_lob.fileopen (v_file_bf, dbms_lob.file_readonly);

    v_length_nr:=dbms_lob.getlength(v_file_bl);

    dbms_lob.loadblobfromfile(v_file_bl, v_file_bf, v_length_nr, dst_offset, src_offset);

    dbms_lob.fileclose (v_file_bf);

   

    demo_mail.begin_attachment (conn,'application/octet-stream', true, i_filename_tx, 'base64');

 

    v_mod_nr    := mod (v_length_nr, v_amt_nr);

    v_pieces_nr := trunc (v_length_nr / v_amt_nr);

 

    while (v_counter_nr <= v_pieces_nr) loop

        dbms_lob.read (v_file_bl, v_amt_nr, v_filepos_nr, v_buf_raw);

        demo_mail.write_raw (conn, utl_encode.base64_encode (v_buf_raw));

        v_filepos_nr := v_counter_nr * v_amt_nr + 1;

        v_counter_nr := v_counter_nr + 1;

    end loop;

 

    if (v_mod_nr <> 0) then

        -- read a piece of original file

        dbms_lob.read (v_file_bl, v_mod_nr, v_filepos_nr, v_buf_raw);

        -- the best way to send binary data is via BASE64 encoding

        demo_mail.write_raw (conn, utl_encode.base64_encode (v_buf_raw));

    end if;

   

    demo_mail.end_attachment (conn);

end;

 

Assuming that you have already have established a connection to an SMTP server, the logic of the current routine is as follows:

1.       Load the file you are planning to attach to the temporary LOB. This step is needed for performance reasons. Of course, you can read data directly using BFILE, but in that case, each operation will cause a direct read (and a large number of wait events with any significant number of users). Even loading a lot of data to the temporary tablespace is not the best idea. However, since you never send more than a few megabytes, the database can easily handle it.

2.       The second step is to attach the data to the email. By SMTP protocol standards, you cannot just pour binary data into the body of the email. The data should be encoded into the special BASE64 format (textual representation of binary data) and you should only send a limited number of bytes at a time. This is the reason why you need to use a buffer. The amount of data retrieved at one step could be tuned depending upon your environment (up to 2000 bytes at once). Common knowledge recommends not setting this value higher than 100.

 

Summary

Large objects can be very useful in the current system development environment because most information can now be stored in the database. But as with any advanced feature, you need a thorough understanding of its core mechanisms, ideas and principles. Otherwise you can do more harm to your system than good. Don’t ever try to use new features in production systems before they have gone through a full testing cycle and don’t believe everything you read without testing it for yourself (not even this paper)!

About the Author

Michael Rosenblum is a Software Architect/Development DBA at Dulcian, Inc. where he is responsible for system tuning and application architecture. Michael supports Dulcian developers by writing complex PL/SQL routines and researching new features. He is the co-author of PL/SQL for Dummies (Wiley Press, 2006), contributing author of Expert PL/SQL Practices (APress, 2011), and author of a number of database-related articles (IOUG Select Journal, ODTUG Tech Journal) and conference papers. Michael is a frequent presenter at various regional and national Oracle user group conferences and winner of the ODTUG Kaleidoscope 2009 Best Speaker Award.