This post list steps for removing a failed node from a cluster. The steps differs from steps in the previous node deletion posts (11gR1,11gR2 and 12c) such that one node has suffered a catastrophic failure and is not available for any kind of command or script executions. Therefore all the activities involved in removing the failed node are executed from a surviving node.
The environment used in this case is a two node RAC with role separation(11.2.0.4). Under normal operation it has the following resources and status. (formatted status)
As the node is unavailable, there's no un-installation involved. Run the inventory update command with surviving nodes. Inventory content for the Oracle home before the failed node is removed.
How to remove/delete a node from Grid Infrastructure Clusterware when the node has failed [ID 1262925.1]
Steps to Remove Node from Cluster When the Node Crashes Due to OS/Hardware Failure and cannot boot up [ID 466975.1]
RAC on Windows: How to Remove a Node from a Cluster When the Node Crashes Due to OS/Hardware Failure and Cannot Boot [ID 832054.1]
Related Post
Deleting a Node From 12cR1 RAC
Deleting a Node From 11gR2 RAC
Deleting a 11gR1 RAC Node
The environment used in this case is a two node RAC with role separation(11.2.0.4). Under normal operation it has the following resources and status. (formatted status)
Resource Name Type Target State HostAfter the node 2 (rhel6m2 node in this case) suffers a catastrophic failure, resources and status is as below. There are offline and failed over (vip) resources from rhel6m2.
------------- ------ ------- -------- ----------
ora.CLUSTER_DG.dg ora.diskgroup.type ONLINE ONLINE rhel6m1
ora.CLUSTER_DG.dg ora.diskgroup.type ONLINE ONLINE rhel6m2
ora.DATA.dg ora.diskgroup.type ONLINE ONLINE rhel6m1
ora.DATA.dg ora.diskgroup.type ONLINE ONLINE rhel6m2
ora.FLASH.dg ora.diskgroup.type ONLINE ONLINE rhel6m1
ora.FLASH.dg ora.diskgroup.type ONLINE ONLINE rhel6m2
ora.MYLISTENER.lsnr ora.listener.type ONLINE ONLINE rhel6m1
ora.MYLISTENER.lsnr ora.listener.type ONLINE ONLINE rhel6m2
ora.MYLISTENER_SCAN1.lsnr ora.scan_listener.type ONLINE ONLINE rhel6m2
ora.asm ora.asm.type ONLINE ONLINE rhel6m1
ora.asm ora.asm.type ONLINE ONLINE rhel6m2
ora.cvu ora.cvu.type ONLINE ONLINE rhel6m2
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora.net1.network ora.network.type ONLINE ONLINE rhel6m1
ora.net1.network ora.network.type ONLINE ONLINE rhel6m2
ora.oc4j ora.oc4j.type ONLINE ONLINE rhel6m2
ora.ons ora.ons.type ONLINE ONLINE rhel6m1
ora.ons ora.ons.type ONLINE ONLINE rhel6m2
ora.registry.acfs ora.registry.acfs.type ONLINE ONLINE rhel6m1
ora.registry.acfs ora.registry.acfs.type ONLINE ONLINE rhel6m2
ora.rhel6m1.vip ora.cluster_vip_net1.type ONLINE ONLINE rhel6m1
ora.rhel6m2.vip ora.cluster_vip_net1.type ONLINE ONLINE rhel6m2
ora.scan1.vip ora.scan_vip.type ONLINE ONLINE rhel6m2
ora.std11g2.db ora.database.type ONLINE ONLINE rhel6m1
ora.std11g2.db ora.database.type ONLINE ONLINE rhel6m2
ora.std11g2.myservice.svc ora.service.type ONLINE ONLINE rhel6m1
ora.std11g2.myservice.svc ora.service.type ONLINE ONLINE rhel6m2
ora.std11g2.abx.domain.net.svc ora.service.type ONLINE ONLINE rhel6m2
ora.std11g2.abx.domain.net.svc ora.service.type ONLINE ONLINE rhel6m1
Resource Name Type Target State HostRemoving of resources of the failed node begins at database resource level. There are two services running and they both have the DB instance on the failed node as a preferred instance (output is condensed)
------------- ------ ------- -------- ----------
ora.CLUSTER_DG.dg ora.diskgroup.type ONLINE ONLINE rhel6m1
ora.DATA.dg ora.diskgroup.type ONLINE ONLINE rhel6m1
ora.FLASH.dg ora.diskgroup.type ONLINE ONLINE rhel6m1
ora.MYLISTENER.lsnr ora.listener.type ONLINE ONLINE rhel6m1
ora.MYLISTENER_SCAN1.lsnr ora.scan_listener.type ONLINE ONLINE rhel6m1
ora.asm ora.asm.type ONLINE ONLINE rhel6m1
ora.cvu ora.cvu.type ONLINE ONLINE rhel6m1
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora.net1.network ora.network.type ONLINE ONLINE rhel6m1
ora.oc4j ora.oc4j.type ONLINE ONLINE rhel6m1
ora.ons ora.ons.type ONLINE ONLINE rhel6m1
ora.registry.acfs ora.registry.acfs.type ONLINE ONLINE rhel6m1
ora.rhel6m1.vip ora.cluster_vip_net1.type ONLINE ONLINE rhel6m1
ora.rhel6m2.vip ora.cluster_vip_net1.type ONLINE INTERMEDIATE rhel6m1
ora.scan1.vip ora.scan_vip.type ONLINE ONLINE rhel6m1
ora.std11g2.db ora.database.type ONLINE ONLINE rhel6m1
ora.std11g2.db ora.database.type ONLINE OFFLINE
ora.std11g2.myservice.svc ora.service.type ONLINE ONLINE rhel6m1
ora.std11g2.myservice.svc ora.service.type ONLINE OFFLINE
ora.std11g2.abx.domain.net.svc ora.service.type ONLINE OFFLINE
ora.std11g2.abx.domain.net.svc ora.service.type ONLINE ONLINE rhel6m1
srvctl config service -d std11g2Modify the service configuration so that only the surviving instances are set as preferred instances.
Service name: myservice
Service is enabled
Server pool: std11g2_myservice
Cardinality: 2
...
Preferred instances: std11g21,std11g22
Available instances:
Service name: abx.domain.net
Service is enabled
Server pool: std11g2_abx.domain.net
Cardinality: 2
...
Preferred instances: std11g21,std11g22
Available instances:
$ srvctl modify service -s myservice -d std11g2 -n -i std11g21 -fRemove the database instance on the failed node
$ srvctl modify service -s abx.domain.net -d std11g2 -n -i std11g21 -f
$ srvctl config service -d std11g2
Service name: myservice
Service is enabled
Server pool: std11g2_myservice
Cardinality: 1
..
Preferred instances: std11g21
Available instances:
Service name: abx.domain.net
Service is enabled
Server pool: std11g2_abx.domain.net
Cardinality: 1
..
Preferred instances: std11g21
Available instances:
$ srvctl status service -d std11g2
Service myservice is running on instance(s) std11g21
Service abx.domain.net is running on instance(s) std11g21
srvctl config database -d std11g2This is done using DBCA's instance management option. If the listener has a non-default name and port then accessing the DB will fail with below message. To fix this create a default listener (name listener and port 1521). Also if VNCR is used then remove the failed node from the registration list. Proceed to instance deletion by selecting the inactive instance on the failed node. As node 2 is not available following warning will be issued. Click continue and proceed. During the execution various other warning will appear such as unable to remove /etc/oratab etc all of these could be ignored. However DBCA didn't run till end, at 67% (observed through repeated runs on this 11.2.0.4 environment) following dialog box appeared. As seen on the screenshot it has no message, just an OK button. Clicking it doesn't end the DBCA session but goes to the beginning and exit the DBCA clicking cancel afterwards. However this doesn't appear to be a failure on the DBCA to remove the instance. In fact instance is removed as subsequent instance operation only list the instance on the surviving node. Querying the database also shows that instance 2 (std11g22 in this case) related undo tablespace and redo logs have been removed and only surviving instance related undo tablespace and redo logs are available.
Database unique name: std11g2
Database name: std11g2
...
Database instances: std11g21,std11g22
Disk Groups: DATA,FLASH
Mount point paths:
Services: myservice,abx.domain.net
Type: RAC
Database is administrator managed
SQL> select name from v$tablespace;Once the database resources are removed next step is to remove the Oracle database home entry for the failed node from the inventory.
NAME
------------------------------
SYSTEM
SYSAUX
UNDOTBS1
TEMP
USERS
EXAMPLE
TEST
7 rows selected.
SQL> select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ ---------
1 1 1598 52428800 512 2 NO CURRENT 68471125 07-JUL-16 2.8147E+14
2 1 1597 52428800 512 2 YES INACTIVE 68467762 07-JUL-16 68471125 07-JUL-16
srvctl config database -d std11g2
Database unique name: std11g2
Database name: std11g2
...
Database instances: std11g21
Disk Groups: DATA,FLASH
Mount point paths:
Services: myservice,abx.domain.net
Type: RAC
Database is administrator managed
As the node is unavailable, there's no un-installation involved. Run the inventory update command with surviving nodes. Inventory content for the Oracle home before the failed node is removed.
<HOME NAME="OraDb11g_home2" LOC="/opt/app/oracle/product/11.2.0/dbhome_4" TYPE="O" IDX="4">After the inventory update
<NODE_LIST>
<NODE NAME="rhel6m1"/>
<NODE NAME="rhel6m2"/>
</NODE_LIST>
</HOME>
./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={rhel6m1}"Next step is to remove the cluster resources and the node itself. If any of the node is in pin stat, unpin them. In this case both nodes are unpinned
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB. Actual 4095 MB Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /opt/app/oraInventory
'UpdateNodeList' was successful.
<HOME NAME="OraDb11g_home2" LOC="/opt/app/oracle/product/11.2.0/dbhome_4" TYPE="O" IDX="4">
<NODE_LIST>
<NODE NAME="rhel6m1"/>
</NODE_LIST>
</HOME>
olsnodes -s -tStop and remove the VIP resource of the failed node
rhel6m1 Active Unpinned
rhel6m2 Inactive Unpinned
# srvctl stop vip -i rhel6m2-vip -fRemove the failed node from the cluster configuration
# srvctl remove vip -i rhel6m2-vip -f
# crsctl delete node -n rhel6m2Finally remove the grid home for the failed node from the inventory. Before inventory update
CRS-4661: Node rhel6m2 successfully deleted.
<HOME NAME="Ora11g_gridinfrahome2" LOC="/opt/app/11.2.0/grid4" TYPE="O" IDX="3" CRS="true">After inventory update
<NODE_LIST>
<NODE NAME="rhel6m1"/>
<NODE NAME="rhel6m2"/>
</NODE_LIST>
</HOME>
./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={rhel6m1}" CRS=TRUEValidate the node removal with cluvfy
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB. Actual 4095 MB Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /opt/app/oraInventory
'UpdateNodeList' was successful.
<HOME NAME="Ora11g_gridinfrahome2" LOC="/opt/app/11.2.0/grid4" TYPE="O" IDX="3" CRS="true">
<NODE_LIST>
<NODE NAME="rhel6m1"/>
</NODE_LIST>
</HOME>
cluvfy stage -post nodedel -n rhel6m2Remove the default listener if one was created during instance remove step. The final status of resource is as below.
Performing post-checks for node removal
Checking CRS integrity...
Clusterware version consistency passed
CRS integrity check passed
Node removal check passed
Post-check for node removal was successful.
Resource Name Type Target State HostUseful metalink notes
------------- ------ ------- -------- ----------
ora.CLUSTER_DG.dg ora.diskgroup.type ONLINE ONLINE rhel6m1
ora.DATA.dg ora.diskgroup.type ONLINE ONLINE rhel6m1
ora.FLASH.dg ora.diskgroup.type ONLINE ONLINE rhel6m1
ora.MYLISTENER.lsnr ora.listener.type ONLINE ONLINE rhel6m1
ora.MYLISTENER_SCAN1.lsnr ora.scan_listener.type ONLINE ONLINE rhel6m1
ora.asm ora.asm.type ONLINE ONLINE rhel6m1
ora.cvu ora.cvu.type ONLINE ONLINE rhel6m1
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora.net1.network ora.network.type ONLINE ONLINE rhel6m1
ora.oc4j ora.oc4j.type ONLINE ONLINE rhel6m1
ora.ons ora.ons.type ONLINE ONLINE rhel6m1
ora.registry.acfs ora.registry.acfs.type ONLINE ONLINE rhel6m1
ora.rhel6m1.vip ora.cluster_vip_net1.type ONLINE ONLINE rhel6m1
ora.scan1.vip ora.scan_vip.type ONLINE ONLINE rhel6m1
ora.std11g2.db ora.database.type ONLINE ONLINE rhel6m1
ora.std11g2.myservice.svc ora.service.type ONLINE ONLINE rhel6m1
ora.std11g2.abx.domain.net.svc ora.service.type ONLINE ONLINE rhel6m1
How to remove/delete a node from Grid Infrastructure Clusterware when the node has failed [ID 1262925.1]
Steps to Remove Node from Cluster When the Node Crashes Due to OS/Hardware Failure and cannot boot up [ID 466975.1]
RAC on Windows: How to Remove a Node from a Cluster When the Node Crashes Due to OS/Hardware Failure and Cannot Boot [ID 832054.1]
Related Post
Deleting a Node From 12cR1 RAC
Deleting a Node From 11gR2 RAC
Deleting a 11gR1 RAC Node