问题描述: 上午刚刚到办公室,就有监控人员邮件反馈,昨晚NDMCDB407数据库被重启过,让我分析一下数据库重启的原因。由于昨晚业务有版本上线,所以短信警告关闭了,所以没有短信下发到我手机上,而且故障时相关人员也没有通知到我。 1 检查alert日志 从aler
问题描述:
上午刚刚到办公室,就有监控人员邮件反馈,昨晚NDMCDB407数据库被重启过,让我分析一下数据库重启的原因。由于昨晚业务有版本上线,所以短信警告关闭了,所以没有短信下发到我手机上,而且故障时相关人员也没有通知到我。
1 检查alert日志
从alert日志中,可以看到,先是在03:29时有一个job运行失败了:Fri Aug 22 03:29:29 2014Errors in file/opt/oracle/diag/rdbms/ndmcdb/NDMCDB/trace/NDMCDB_j000_28856.trc:ORA-12012: error on auto execute of job 31ORA-04023: ObjectNDMC.DELETE_ANONY_RSHARE_INFO could not be validated or authorizedORA-06512: at "NDMC.PROC_NDMC_CANCEL_OPEN",line 5ORA-06512: at line 1然后在03:49时,出现了连接超时失败,而且一直持续到05:00:08:Fri Aug 22 03:49:43 2014*********************************************************************** Fatal NI connect error 12170. VERSION INFORMATION: TNS for Linux: Version 11.1.0.7.0 - Production Oracle Bequeath NT Protocol Adapter for Linux: Version 11.1.0.7.0 -Production TCP/IP NT Protocol Adapter for Linux: Version 11.1.0.7.0 - Production Time: 22-AUG-2014 03:49:43 Tracing not turned on. Tnserror struct: ns main err code: 12535 TNS-12535: TNS:operation timed out ns secondary err code: 12606 nt main err code: 0 nt secondary err code: 0 nt OS err code: 0 Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.130.87)(PORT=36628))WARNING: inbound connection timed out(ORA-3136)Fri Aug 22 03:49:44 2014……而且出现了连接数耗尽了:Fri Aug 22 03:49:50 2014ORA-00020: maximum number of processes 0exceeded ns secondary err code: 12560 ns secondary err code: 12560 ns main err code: 12537Fri Aug 22 03:49:50 2014……Fri Aug 22 03:51:48 2014 ***********************************************<div style="color:transparent">本文来源gaodai.ma#com搞#代!码(网</div>************************ Fatal NI connect error 12537, connectingto: (LOCAL=NO) VERSION INFORMATION: TNS for Linux: Version 11.1.0.7.0 - Production Oracle Bequeath NT Protocol Adapter for Linux: Version 11.1.0.7.0 -Production TCP/IP NT Protocol Adapter for Linux: Version 11.1.0.7.0 - Production Time: 22-AUG-2014 03:51:48 Tracing not turned on. Tnserror struct: ns main err code: 12537 TNS-12537: TNS:connection closedns secondaryerr code: 12560 nt main err code: 0 nt secondary err code: 0 nt OS err code: 0ORA-609 : opiodr aborting process unknownospid (30476_47044991385184)Fri Aug 22 04:14:15 2014ORA-28 : opiodr aborting process unknownospid (24925_46986315964000)Fri Aug 22 04:16:27 2014ORA-28 : opiodr aborting process unknownospid (22475_47013891882592)Fri Aug 22 04:16:28 2014ORA-28 : opiodr aborting process unknownospid (21356_47116835528288)Fri Aug 22 04:16:29 2014ORA-28 : opiodr aborting process unknownospid (24947_47774766210656)ORA-28 : opiodr aborting process unknownospid (14958_47053435166304)……Fri Aug 22 05:00:05 2014ORA-28 : opiodr aborting process unknownospid (25765_46941307182688)Fri Aug 22 05:00:08 2014ORA-28 : opiodr aborting process unknownospid (4949_47396524895840)于是在05:04数据库被关闭,从日志来看,这是正常关闭的,初步怀疑是人为关闭或是VCS双机自动将数据库关闭了:Fri Aug 22 05:04:10 2014Stopping background process SMCOStopping background process FBDAShutting down instance: further logonsdisabledFri Aug 22 05:04:12 2014Stopping background process CJQ0Stopping background process QMNCStopping background process MMNLStopping background process MMONShutting down instance (immediate)License high water mark = 1220Stopping Job queue slave processes, flags =7Fri Aug 22 05:04:20 2014Waiting for Job queue slaves to completeJob queue slave processes stoppedFri Aug 22 05:09:11 2014License high water mark = 1220USER (ospid: 25110): terminating theinstanceTermination issued to instance processes.Waiting for the processes to exitFri Aug 22 05:09:21 2014Instance termination failed to kill one ormore processesInstance terminated by USER, pid = 25110