本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger 的Oracle技术博客 本文链接地址: Root cause of the Rac Instance crash ? 2014年11月8号21点左右某客户的数据库集群出现swap耗尽的情况,导致数据库无法正常使用。此时Oracle告警
本站文章除注明转载外,均为本站原创: 转载自love wife & love life —Roger 的Oracle技术博客
本文链接地址: Root cause of the Rac Instance crash ?
2014年11月8号21点左右某客户的数据库集群出现swap耗尽的情况,导致数据库无法正常使用。此时Oracle告警日志的错误如下:
Sat Nov 08 20:48<div style="color:transparent">本文来源gaodai.ma#com搞##代!^码网(</div>:36 CST 2014Thread 1 advanced to log sequence 10722 (LGWR switch) Current log# 2 seq# 10722 mem# 0: /dev/rlvxxxredo121 Current log# 2 seq# 10722 mem# 1: /dev/rlvxxxredo122Sat Nov 08 20:50:23 CST 2014Process startup failed, error stack:Sat Nov 08 20:50:41 CST 2014Errors in file /oracle/product/10.2.0/admin/xxx/bdump/xxx1_psp0_1835540.trc:ORA-27300: OS system dependent operation:fork failed with status: 12ORA-27301: OS failure message: Not enough spaceORA-27302: failure occurred at: skgpspawn3Sat Nov 08 20:50:41 CST 2014Process m000 died, see its trace fileSat Nov 08 20:50:41 CST 2014ksvcreate: Process(m000) creation failed。。。。。。。Sat Nov 08 21:51:33 CST 2014Thread 1 advanced to log sequence 10745 (LGWR switch) Current log# 1 seq# 10745 mem# 0: /dev/rlvxxxredo111 Current log# 1 seq# 10745 mem# 1: /dev/rlvxxxredo112Sat Nov 08 21:59:20 CST 2014Process startup failed, error stack:Sat Nov 08 21:59:21 CST 2014Errors in file /oracle/product/10.2.0/admin/xxx/bdump/xxx1_psp0_1835540.trc:ORA-27300: OS system dependent operation:fork failed with status: 12ORA-27301: OS failure message: Not enough spaceORA-27302: failure occurred at: skgpspawn3Sat Nov 08 21:59:21 CST 2014Process PZ95 died, see its trace file。。。。。。Process PZ95 died, see its trace fileSat Nov 08 22:04:09 CST 2014Process startup failed, error stack:Sat Nov 08 22:04:09 CST 2014Errors in file /oracle/product/10.2.0/admin/xxx/bdump/xxx1_psp0_1835540.trc:ORA-27300: OS system dependent operation:fork failed with status: 12ORA-27301: OS failure message: Not enough spaceORA-27302: failure occurred at: skgpspawn3Sat Nov 08 22:04:10 CST 2014Process PZ95 died, see its trace fileSat Nov 08 22:06:11 CST 2014Thread 1 advanced to log sequence 10747 (LGWR switch) Current log# 3 seq# 10747 mem# 0: /dev/rlvxxxredo131 Current log# 3 seq# 10747 mem# 1: /dev/rlvxxxredo132Sat Nov 08 22:41:05 CST 2014
根据数据库alert log的报错信息,我们可以判断,在8号20:56左右开始出现ORA-27300以及ORA-27301错误,根据Oracle MOS 文档
Troubleshooting ORA-27300 ORA-27301 ORA-27302 errors [ID 579365.1]的描述,我们可以知道,这个错误产生的原因就是内存不足导致.
出现该错误的主机为Oracle RAC的xxx1节点。该主机物理内存大小为96G,Oracle SGA配置为30G,PGA配置为6GB,操作系统Swap配置为16GB。
正常情况下,物理主机的内存是可以满足正常使用的。由于在20:56开始出现无法fork 进程,即使无法分配内存资源,说明在该时间点之前
物理主机的内存使用已经出现问题了。通过Nmon 监控,我们可以看到如下的数据:
我们可以看到,xxxdb1主机的物理内存从18:01分开始突然下降的很厉害,到18:14左右时,物理内存free Memory已经不足2GB了。而该主机的物理内存中,大部分为Process%所消耗,如下: