Cluster Ready Services (CRS) and cluster synchronization service (CSS)


CRS is responsible for managing HA options within the cluster. The crsd process manages CRS operations. CRS manages two kinds of resources:

  • Cluster resources
  • Local resources

A cluster resource is a resource that is cluster aware and is managed over the entire cluster via the crsctlcommand. Cluster resources are subject to cross-node switchover and failover. This means that a resource can be assigned to one or more nodes, but may be re-assigned to a different node (of failed over to a different node) on demand. Cluster resources are managed with the CRS daemon (crsd). The OCR is used by CRS to manage the resource.

A local resource runs on each node of the cluster. Examples of cluster resources are RAC instances and listeners. CRS can control these services, starting them, stopping them and restarting them in the event of a failure.

CSS

CSS is a service that is responsible for determining which nodes of the cluster are available to the cluster. CSS also supports other cluster processes by providing node membership information and locking services. The CSS uses the private interconnect for communications as well as the Clusterware voting disks. Through a combination of heartbeat messages over the interconnect and the voting disks CSS will determine the status of each node of the cluster.

CSS is also responsible for interfacing with any third-party Clusterware vendors. In these configurations CSS will interface with the vendor Clusterware and maintain the node membership information.

The CSS service is critical to Clusterware operations as it fences the operations of the nodes of the cluster. For example, if the interconnect fails on a given node then the failed node will no longer be able to communicate with the rest of the cluster. Without CSS controlling the situation, the isolated node could cause severe issues on the cluster including corruption of database data. This is what is known as a split-brain condition.

To avoid split-brain conditions CSS sends heartbeat messages across the cluster interconnect. If a node fails (say the interconnect fails or the node freezes) then that node will no longer send heartbeat messages. The surviving nodes will detect that the heartbeat messages from the node are no longer being sent.  CSS then uses the voting disks to determine which node has gone offline. CSS will then work with Oracle Clusterware to evict the missing node from the cluster.

The CSS uses several different processes. Failure of these process will result in the restart of the cluster. The CSS process are:

  • CSS daemon (ocssd) – Manages cluster node membership information. It’s also used in non-RAC installs to provide Group Services (GS). ASM uses GS to register itself and its disk groups.
  • CSS Agent (cssdagent) – Monitors the cluster and provides fencing services (was oprocd daemon in previous versions). The CSS Agent is also responsible for monitoring vendor Clusterware.
  • CSS Monitor (cssdmonitor) – This process monitors for node hangs, monitoris OCSSD processes for hangs and is also responsible for monitoring vendor Clusterware.

Clusterware Process Startup

Oracle Clusterware 11g Release 2 changes the way that Clusterware is started. In a Linux install, Clusterware is now started with one init script, init.ohasd which replaces a number of scripts that were previously used.

———————————————————————————————————

Stop (shut down) the Oracle cluster stack

su – root
cd $CRS_HOME/bin
# ./crsctl stop crs (must be run on each node)

./srvctl stop nodeapps -n node_name –> in 11.2 stops only ONS and eONS because of some dependencies.

If you want to check if the database is running you can run:

ps -ef | grep smon
oracle 246196 250208 0 14:29:11 pts/0 0:00 grep smon

If you want to check if the database listeners are running you can run:

ps -ef | grep lsnr
root 204886 229874 0 14:30:07 pts/0 0:00 grep lsnr

Here the listeners are running:

ps -ef | grep lsnr

oracle 282660 1 0 14:07:34 – 0:00 /oracle/grid/crs/11.2/bin/tnslsnr LISTENER_SCAN2 -inherit
oracle 299116 250208 0 14:30:00 pts/0 0:00 grep lsnr
oracle 303200 1 0 14:23:44 – 0:00 /oracle/grid/crs/11.2/bin/tnslsnr LISTENER_SCAN1 -inherit
oracle 315432 1 0 14:07:35 – 0:00 /oracle/grid/crs/11.2/bin/tnslsnr LISTENER_SCAN3 -inherit
oracle 323626 1 0 14:07:34 – 0:00 /oracle/grid/crs/11.2/bin/tnslsnr LISTENER -inherit

If you want to check if any clusterware component is running you can run:

ps -ef | grep crs
root 204842 229874 0 14:27:19 pts/0 0:00 grep crs

Here the clusterware components (resources) are running:

ps -ef | grep crs
root 155856 1 1 14:05:47 – 0:22 /oracle/grid/crs/11.2/bin/ohasd.bin reboot
root 159940 1 0 14:07:08 – 0:01 /oracle/grid/crs/11.2/bin/oclskd.bin
oracle 221270 1 0 14:06:43 – 0:02 /oracle/grid/crs/11.2/bin/gpnpd.bin
root 225322 1 0 14:06:45 – 0:02 /oracle/grid/crs/11.2/bin/cssdmonitor
oracle 229396 1 0 14:06:41 – 0:00 /oracle/grid/crs/11.2/bin/gipcd.bin
oracle 233498 1 0 14:06:41 – 0:00 /oracle/grid/crs/11.2/bin/mdnsd.bin
root 253952 1 0 14:06:46 – 0:01 /oracle/grid/crs/11.2/bin/orarootagent.bin
root 258060 1 0 14:06:59 – 0:00 /oracle/grid/crs/11.2/bin/octssd.bin reboot
root 262150 1 0 14:06:47 – 0:00 /bin/sh /oracle/grid/crs/11.2/bin/ocssd
oracle 270344 262150 1 14:06:47 – 0:11 /oracle/grid/crs/11.2/bin/ocssd.bin
oracle 274456 156062 0 14:07:10 – 0:00 /oracle/grid/crs/11.2/bin/evmlogger.bin -o /oracle/grid/crs/11.2/evm/log/evmlogger.info -l /oracle/grid/crs/11.2/evm/log/evmlogger.log
oracle 282660 1 0 14:07:34 – 0:00 /oracle/grid/crs/11.2/bin/tnslsnr LISTENER_SCAN2 -inherit
root 286742 1 6 14:07:17 – 0:36 /oracle/grid/crs/11.2/bin/orarootagent.bin
oracle 303200 1 1 14:23:44 – 0:00 /oracle/grid/crs/11.2/bin/tnslsnr LISTENER_SCAN1 -inherit
oracle 315432 1 0 14:07:35 – 0:00 /oracle/grid/crs/11.2/bin/tnslsnr LISTENER_SCAN3 -inherit
oracle 323626 1 0 14:07:34 – 0:00 /oracle/grid/crs/11.2/bin/tnslsnr LISTENER -inherit
oracle 156062 1 0 14:07:01 – 0:02 /oracle/grid/crs/11.2/bin/evmd.bin
root 229692 1 0 14:06:46 – 0:02 /oracle/grid/crs/11.2/bin/cssdagent
oracle 233762 1 0 14:06:40 – 0:01 /oracle/grid/crs/11.2/bin/oraagent.bin
oracle 246226 250208 0 14:32:34 pts/0 0:00 grep crs
oracle 254218 1 0 14:06:53 – 0:01 /oracle/grid/crs/11.2/bin/diskmon.bin -d -f
root 258554 1 0 14:07:01 – 0:09 /oracle/grid/crs/11.2/bin/crsd.bin reboot
oracle 270612 1 0 14:07:28 – 0:03 /oracle/grid/crs/11.2/bin/oraagent.bin

REFERENCE: http://oraclerac-rips.blogspot.in/     http://www.in-oracle.com/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s