oraclerac巡检过程详解(五)

2014-11-24 16:22:20 · 作者: · 浏览: 4
s:0 dropped:0 overruns:0 frame:0

TX packets:58 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:45226 (44.1 KiB) TX bytes:9567 (9.3 KiB)

Interrupt:169 Base address:0x18a4

lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

inet6 addr: ::1/128 Scope:Host

UP LOOPBACK RUNNING MTU:16436 Metric:1

RX packets:49025 errors:0 dropped:0 overruns:0 frame:0

TX packets:49025 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:11292111 (10.7 MiB) TX bytes:11292111 (10.7 MiB)

我们看一下网卡ip地址,被收回的私有eth1网卡ip现在已经恢复了,这是因为刚刚节点2进行了重启操作。重启后会初始化所有网卡,被我们禁用的eth1网卡被重新启用,重新恢复ip。

检查CRS进程状态,全都是健康的

[root@rac2 cssd]# crsctl check crs

CSS appears healthy

CRS appears healthy

EVM appears healthy

检查集群,实例,数据库,监听,ASM服务状态,也都是完好无损,全部启动了

[root@rac2 cssd]# crs_stat -t

Name Type Target State Host

------------------------------------------------------------

ora....B1.inst application ONLINE ONLINE rac1

ora....B2.inst application ONLINE ONLINE rac2

ora....DB1.srv application ONLINE ONLINE rac1

ora.....TAF.cs application ONLINE ONLINE rac1

ora.RACDB.db application ONLINE ONLINE rac1

ora....SM1.asm application ONLINE ONLINE rac1

ora....C1.lsnr application ONLINE ONLINE rac1

ora.rac1.gsd application ONLINE ONLINE rac1

ora.rac1.ons application ONLINE ONLINE rac1

ora.rac1.vip application ONLINE ONLINE rac1

ora....SM2.asm application ONLINE ONLINE rac2

ora....C2.lsnr application ONLINE ONLINE rac2

ora.rac2.gsd application ONLINE ONLINE rac2

ora.rac2.ons application ONLINE ONLINE rac2

ora.rac2.vip application ONLINE ONLINE rac2

RAC故障分析并解决的整个过程到此结束

三 模拟OCR磁盘不可用时,RAC会出现什么现象 给出故障定位的整个过程

OCR磁盘:OCR磁盘中注册了RAC所有的资源信息,包含集群、数据库、实例、监听、服务、ASM、存储、网络等等,只有被OCR磁盘注册的资源才能被CRS集群管理,CRS进程就是按照OCR磁盘中记录的资源来管理的,在我们的运维过程中可能会发生OCR磁盘信息丢失的情况,例如 在增减节点时,添加 or 删除OCR磁盘时可能都会发生。接下来我们模拟一下当OCR磁盘信息丢失时,如果定位故障并解决。

实验

1.检查OCR磁盘和CRS进程

(1)检查OCR磁盘,只有OCR磁盘没有问题,CRS进程才可以顺利管理

[root@rac2 cssd]# ocrcheck

Status of Oracle Cluster Registry is as follows :

Version : 2

Total space (kbytes) : 104344

Used space (kbytes) : 4344

Available space (kbytes) : 100000

ID : 1752469369

Device/File Name : /dev/raw/raw1 这个就是OCR磁盘所属的裸设备

Device/File integrity check succeeded

Device/File not configured

Cluster registry integrity check succeeded 完整检查完毕没有问题

(2)检查CRS状态

[root@rac2 cssd]# crsctl check crs

CSS appears healthy

CRS appears healthy

EVM appears healthy

集群进程全部健康

(3)关闭CRS守护进程

[root@rac2 sysconfig]# crsctl stop crs

Stopping resources. 停止资源

Successfully stopped CRS resources 停止CRS进程

Stopping CSSD. 停止CSSD进程

Shutting down CSS daemon.

Shutdown request successfully issued.

关闭请求执行成功

[root@rac2 sysconfig]# crsctl check crs

Failure 1 contacting CSS daemon 连接CSS守护进程失败

Cannot communicate with CRS 无法与CRS通信

Cannot communicate with EVM 无法与EVM通信

2.用root用户导出OCR磁盘内容进行OCR备份

[root@rac2 sysconfig]# ocrconfig -export /home/oracle/ocr.exp

[oracle@rac2 ~]$ pwd

/home/oracle

[oracle@rac2 ~]$ ll

total 108

-rw-r--r-- 1 root root 98074 Jul 18 11:20 ocr.exp 已经生成OCR导出文件

3.重启CRS守护进程

[root@rac2 sysconfig]# crsctl start crs

Attempting to start CRS stack 尝试启动CRS

The CRS stack will be started shortly CRS即将启动

检查CRS状态

[root@rac2 sysconfig]# crsctl check crs 很好,我们重新启动后就变正常了

CSS appears healthy

CRS appears healthy

EVM appears healthy

4.使用裸设备命令0字节覆盖OCR磁盘内容模拟丢失状态

[root@rac2 sysconfig]# dd if=/dev/ze