Oracle Database provides the SAMPLE clause that can be issued with a SELECT statement over a table. In the following query, we’re randomly selecting records from the CUSTOMERS table with a 20% probability.
This means that each record has a 20% chance of being selected. Perhaps contrary to expectations, this will normally not produce a result with 20% of the records from the CUSTOMERS table. This approximate sample size is quite adequate for most situations.
A variant of the SAMPLE clause is SAMPLE BLOCK, where each block of records has the same chance of being selected, 20% in our example. Since records are selected at the block level, this offers a performance improvement for large tables and should not adversely impact the randomness of the sample.
SAMPLE and SAMPLE BLOCK allow the sample percent to range from .000001 to, but not including, 100. An optional second parameter is the seed value, used to help ensure repeatability between executions. The seed value can range from 0 to 4294967295.
SAMPLE BLOCK (20, 8621);
Repeatability is important if, e.g., you want to compare models or test results across model builds, where the only variation is model settings.
In 10g the syntax for the sample clause is:
#1. sample [block] ( N [,M] ) [seed(S)]
You can sample rows or blocks (optional keyword [block])
You must specify the percentage sample size (N) – recognizing that the result will only approximate that size
You can specify that the sample should consist of groups of ([,M]) consecutive rows or blocks
You can supply a seed value ([seed (S)]) to the random number generator that drives the sample so that the sample is repeatable.
The ‘seed’ option appeared in 10g, and is documented there.
The ‘consecutive rows/blocks’ option doesn’t seem to be documented but appears in 9i.
While SAMPLE is built into the SQL syntax, it has limitations when applied to views or complex joins. SAMPLE relies on the existence of primary keys. If the underlying tables have the necessary primary keys, you may be able to use a view, but if the primary keys are absent, the SAMPLE clause may not work.
To address this limitation, we can use a technique that uses ORA_HASH.
The ORA_HASH technique can be used with both tables and views, whether or not primary keys are specified. The Oracle Data Miner user interface uses this technique to support sampling of both tables and views.