loading an external table using Oracle Data Pump (NEW 10g FEATURE)


The oracle_datapump specifies an “access driver”, a method whereby you can load an external table using Oracle Data Pump.You can now unload and transform large volumes of data into a platform-independent flat file for data propagation or storage, either serially or in parallel. To unload data using an external table, utilize the Oracle Database 10g new feature called the oracle_datapump access driver.
In versions prior to Oracle Database 10g, the only access method available for Oracle external tables was oracle_loader, essentially a sqlloader interface that only allowed reading from external tables into the Oracle database. Now, with the advent of the oracle_datapump, we can unload internal tables in external flat files as well.

For example, here is a command that creates an external table as an unload of existing data:

CREATE TABLE proj_ext
ORGANIZATION EXTERNAL (TYPE ORACLE_DATAPUMP DEFAULT DIRECTORY proj_dir
ACCESS PARAMETERS (LOGFILE log_dir:proj) LOCATION (‘proj_dmp’))
AS SELECT * FROM pl.projects;

The primary clause that determines if an external table is used for loading or unloading is the access_parameters clause that follows the ORGANIZATION EXTERNAL in the above example.

Oracle 10g Unloading Data Using External Tables and the oracle_datapump Access Driver

To unload data using external tables you use the oracle_datapump access driver. The data stream that is utilized to unload data is a proprietary format and contains all the column data for every row being unloaded. There is no realistic upper bound on the size of a stream, and for this reason, an oracle_datapump stream is usually handled in pieces that are called stream fragments.
The oracle_datapump VERSION Clause

When data is unloaded from an Oracle database into an earlier version of the database, the VERSION clause must be used. In the case where the source database is of a later version than the target, the VERSION clause should be specified for the external table used to unload the data, and the value specified must be the version that will be used to read the data.

When using external tables to export data, the oracle_datapump access driver must be used to populate the external file. The oracle_datapump driver writes the data in a binary format that can only be read by the oracle_datapump access driver. Once the file has been populated with data, that file can be used as the data file for another external table in the same database or in a different database, as long as it is Oracle Database 10g or later, and has the oracle_datapump access driver available.

The process by which you use the oracle_datapump access driver to unload and load data is shown below. (The example assumes that the directory object proj_dir already exists and that user PL has read and write access to it.)

To unload data into an external file, the external table must be created using the AS SELECT clause. The next example creates an external table named projects_ext and uses AS SELECT to populate the external data file with the data from table projects in the PL schema.

SQL> CREATE TABLE projects_ext
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT DIRECTORY proj_dir
6 LOCATION (‘proj_ext.dmp’)
7 )
8 AS SELECT * FROM projects;
Table created.

To verify that the creation worked, describe both the projects and the new external table. They should both match.

SQL> DESCRIBE projects

Name Null? Type
————————- ——— ——–
PROJ_NO NOT NULL NUMBER(6)
PROJ_NAME NOT NULL VARCHAR2(20)
DESCRIPTION NOT NULL VARCHAR2(255)

SQL> DESCRIBE projects_ext

Name Null? Type
————————- ——– ——–
PROJ_NO NOT NULL NUMBER(6)
PROJ_NAME NOT NULL VARCHAR2(20)
DESCRIPTION NOT NULL VARCHAR2(255)

Once the external table is created, it can be queried just like any internal table.

To show this, let us select the count of records in the external table:

SQL> SELECT COUNT(*) FROM projects_ext;

COUNT(*)
———-
120

You can also verify a proper creation by comparing the data in the external table against the data in projects. There should be no differences.

SQL> SELECT * FROM projects MINUS SELECT * FROM projects_ext;

no rows selected

Once an external table has been created and the data file populated by the CREATE TABLE AS SELECT statement, then no data may be added, updated or deleted from the external table. Any future attempt to modify the data in the external table will fail with an error.

The following example shows an attempt to perform DML on an existing external table. This will return the ORA-30657 error, as shown.

SQL> DELETE FROM projects_ext WHERE proj_no = 16;
DELETE FROM projects_ext WHERE proj_no = 16
*
ERROR at line 1:
ORA-30657: operation not supported on external organized table

You can now use the data file created for the external table as the data file for another external table in the same database or a different database.

Note that when you create an external table that uses an existing file, there is no AS SELECT clause for the CREATE TABLE statement. Also note that the datatype for any number compatible datatype (integer, decimal, etc.) column will be converted to a NUMBER datatype.

To speed the creation of an external data file using the external table oracle_datapump access method, the table can be created in parallel, let’s look at that next.

Oracle Parallel Unload

To perform a successful parallel unload operation, the data file must be placed on a disk big enough to hold all the data being written. If there is not sufficient space for all of the data in the location specified by the directory object, then an error will be returned for the CREATE TABLE AS SELECT statement.

One way to alleviate the space problem is to create multiple files on multiple disk assets when executing the CREATE TABLE AS SELECT.

Multiple files are created by specifying multiple independent locations in the LOCATION clause and by specifying the PARALLEL clause. Each parallel slave that is created to populate the external table will write to its own file.

The number of files in the LOCATION clause must match the degree of parallelization, since each slave requires its own file. Extra files that are specified will be ignored. If you do not specify enough files for the degree of parallelization, then the degree of parallelization will be silently lowered to match the number of files specified in the LOCATION clause.

Here is an example of unloading the projects table into two files.

SQL> CREATE TABLE projects_ext_2
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT DIRECTORY proj_dir
6 LOCATION (‘proj_ext1.dmp’, ‘proj_ext2.dmp’)
7 )
8 PARALLEL 2
9 AS SELECT * FROM pl.projects;

Table created.

The degree of parallelization is not tied to the number of files in the LOCATION clause when reading from oracle_datapump external tables. The information in the data files about the granulation of the data is provided so that multiple parallel slaves can read different portions of the same file. Therefore, even if there is only one external data file, the degree of parallelization can be increased to reduce the time required to read the file.

Oracle 10g Supported Datatypes

When moving external table data sets between two platforms, you may encounter the following situations:

– The database character set and the database national character set may be different between the two platforms.

– The endianness of the platforms for the two databases may be different.

– The oracle_datapump access driver can automatically resolve some of these situations.
You will receive an ORA-30656 error if you attempt to use a datatype that is not supported for external tables. For example, if the unsupported datatype, LONG, is used.
Oracle 10g Unsupported Datatypes

The external table feature supports only a subset of all possible column datatypes. In particular, it supports character datatypes (except LONG), the RAW datatype, all numeric datatypes, and all date, timestamp, and interval datatypes.

Leave a comment