Root cause of the Rac Instance crash ？

本站文章除注明转载外，均为本站原创：转载自love wife love life —Roger 的Oracle技术博客本文链接地址: Root cause of the Rac Instance crash ？ 2014年11月8号21点左右某客户的数据库集群出现swap耗尽的情况，导致数据库无法正常使用。此时Oracle告警

本站文章除注明转载外，均为本站原创： 转载自love wife & love life —Roger 的Oracle技术博客

本文链接地址: Root cause of the Rac Instance crash ？

2014年11月8号21点左右某客户的数据库集群出现swap耗尽的情况，导致数据库无法正常使用。此时Oracle告警日志的错误如下：

Sat Nov 08 20:48<div style="color:transparent">本文来源gaodai.ma#com搞##代!^码网(</div>:36 CST 2014Thread 1 advanced to log sequence 10722 (LGWR switch) Current log# 2 seq# 10722 mem# 0: /dev/rlvxxxredo121 Current log# 2 seq# 10722 mem# 1: /dev/rlvxxxredo122Sat Nov 08 20:50:23 CST 2014Process startup failed, error stack:Sat Nov 08 20:50:41 CST 2014Errors in file /oracle/product/10.2.0/admin/xxx/bdump/xxx1_psp0_1835540.trc:ORA-27300: OS system dependent operation:fork failed with status: 12ORA-27301: OS failure message: Not enough spaceORA-27302: failure occurred at: skgpspawn3Sat Nov 08 20:50:41 CST 2014Process m000 died, see its trace fileSat Nov 08 20:50:41 CST 2014ksvcreate: Process(m000) creation failed。。。。。。。Sat Nov 08 21:51:33 CST 2014Thread 1 advanced to log sequence 10745 (LGWR switch) Current log# 1 seq# 10745 mem# 0: /dev/rlvxxxredo111 Current log# 1 seq# 10745 mem# 1: /dev/rlvxxxredo112Sat Nov 08 21:59:20 CST 2014Process startup failed, error stack:Sat Nov 08 21:59:21 CST 2014Errors in file /oracle/product/10.2.0/admin/xxx/bdump/xxx1_psp0_1835540.trc:ORA-27300: OS system dependent operation:fork failed with status: 12ORA-27301: OS failure message: Not enough spaceORA-27302: failure occurred at: skgpspawn3Sat Nov 08 21:59:21 CST 2014Process PZ95 died, see its trace file。。。。。。Process PZ95 died, see its trace fileSat Nov 08 22:04:09 CST 2014Process startup failed, error stack:Sat Nov 08 22:04:09 CST 2014Errors in file /oracle/product/10.2.0/admin/xxx/bdump/xxx1_psp0_1835540.trc:ORA-27300: OS system dependent operation:fork failed with status: 12ORA-27301: OS failure message: Not enough spaceORA-27302: failure occurred at: skgpspawn3Sat Nov 08 22:04:10 CST 2014Process PZ95 died, see its trace fileSat Nov 08 22:06:11 CST 2014Thread 1 advanced to log sequence 10747 (LGWR switch) Current log# 3 seq# 10747 mem# 0: /dev/rlvxxxredo131 Current log# 3 seq# 10747 mem# 1: /dev/rlvxxxredo132Sat Nov 08 22:41:05 CST 2014

根据数据库alert log的报错信息，我们可以判断，在8号20:56左右开始出现ORA-27300以及ORA-27301错误，根据Oracle MOS 文档
Troubleshooting ORA-27300 ORA-27301 ORA-27302 errors [ID 579365.1]的描述，我们可以知道，这个错误产生的原因就是内存不足导致.
出现该错误的主机为Oracle RAC的xxx1节点。该主机物理内存大小为96G，Oracle SGA配置为30G，PGA配置为6GB，操作系统Swap配置为16GB。
正常情况下，物理主机的内存是可以满足正常使用的。由于在20:56开始出现无法fork 进程，即使无法分配内存资源，说明在该时间点之前
物理主机的内存使用已经出现问题了。通过Nmon 监控，我们可以看到如下的数据：

我们可以看到，xxxdb1主机的物理内存从18:01分开始突然下降的很厉害，到18:14左右时，物理内存free Memory已经不足2GB了。而该主机的物理内存中，大部分为Process%所消耗，如下：

搞代码网（gaodaima.com）提供的所有资源部分来自互联网，如果有侵犯您的版权或其他权益，请说明详细缘由并提供版权或权益证明然后发送到邮箱[email protected]‍，我们会在看到邮件的第一时间内为您处理，或直接联系QQ：872152909。本网站采用BY-NC-SA协议进行授权
转载请注明原文链接：Root cause of the Rac Instance crash ？

Hi，您需要填写昵称和邮箱！