Instance recovery occurs when an instance goes down abruptly, either via a SHUTDOWN ABORT, a killing of a background process, or a crash of a node or the instance itself. After an ungraceful shutdown, it is necessary for the database to go through the process of rolling forward all information in the redo logs and rolling back any transactions that had not yet been committed. This process is known asinstance recovery and is usually automatically performed by the SMON process.
The redo logs for all RAC instances are located either on an OCFS shared disk asset or on a RAW file system that is visible to all the other RAC instances. This allows any other node to recover for a failed RAC node in the event of instance failure.
There are basically two types of failure in a RAC environment: instance and media. Instance failure involves the loss of one or more RAC instances, whether due to node failure or connectivity failure. Media failure involves the loss of one or more of the disk assets used to store the database files themselves.
1. All nodes available.
2. One or more RAC instances fail.
3. Node failure is detected by any one of the remaining instances.
4. Global Resource Directory(GRD) is reconfigured and distributed among
5. The instance which first detected the failed instance, reads the failed
instances redo logs to determine the logs which are needed to be recovered.
The above task is done by the SMON process of the instance that detected failure.
6. Until this time database activity is frozen, The SMON issues recovery requests
for all the blocks that are needed for recovery. Once all the blocks are available,
the other blocks which are not needed for recovery are available for normal processing.
7. Oracle performs roll forward operation against the blocks that were modified by the
failed instance but were not written to disk using redo log recorded transactions.
8. Once redo logs are applied, uncomitted transactions are rolled back using
9. Database on the RAC in now fully available.