ÂÞÁÐÒ»ÏÂÓйØoprocdµÄ֪ʶµã
oprocdÊÇoracleÔÚracÖÐÒýÈëÓÃÀ´fencing ioµÄ
ÔÚunixϵͳÏ£¬Èç¹ûÎÒÃÇûÓвÉÓÃoracleÖ®ÍâµÄµÚÈý·½¼¯ÈºÈí¼þ£¬²Å»á´æÔÚoprocd½ø³Ì
ÔÚlinuxϵͳÏ£¬Ö»ÓÐÔÚ10.2.0.4°æ±¾ºó£¬²Å»á¾ßÓÐoprocd½ø³Ì
ÔÚwindowÏ£¬²»»á´æÔÚoprocd ½ø³Ì£¬µ«ÊÇ»á´æÔÚÒ»¸öoraFenceService·þÎñ£¬ÓÃÀ´ÊµÏÖÏàͬµÄ¹¦ÄÜ£¬¸Ã·þÎñ²ÉÓõļ¼ÊõÊÇ»ùÓÚwindowsµÄ£¬Óëoprocd²»Í¬
oprocd½ø³Ì¿ÉÒÔÔËÐÐÔÚÁ½ÕßģʽÏ£ºfatalºÍno fatal£¬ÔÚfatalģʽÏ£¬Èç¹ûϵͳhangס£¬»òÕ߯äËûÔÒò´¥·¢oprocdÔòoprocd½ø³Ì»á×Ô¶¯ÖØÆô·þÎñÆ÷¡£ÔÚno fatalģʽÏ£¬Èç¹ûϵͳhangס£¬»òÕ߯äËûÔÒò´¥·¢oprocd½ø³Ì£¬Ôòoprocd½ø³Ì»áÔÚÈÕÖ¾ÖмǼ¾¯¸æÐÅÏ¢£¬µ«ÊDz»»áÖØÆôϵͳ¡£
oprocd½ø³Ì¾ßÓÐÁ½¸ö²ÎÊý£ºtimeout Ö¸¶¨oprocd½ø³Ìµ÷ÓõÄʱ¼ä¼ä¸ô margin Ö¸¶¨ÔÊÐíµÄʱ¼äÆ«²î£¬Èç¹ûʱ¼äÆ«²î³¬¹ýmargin£¬Ôòoprocd½ø³Ì»áÖØÆôϵͳ»òÕ߼Ǽ´íÎóÐÅÏ¢µ½ÈÕÖ¾¡£
oprocd½ø³ÌµÄÈÕÖ¾ÎļþλÓÚ£º/etc/oracle/oprocd »òÕß /var/opt/oracle/oprocd
oprocd½ø³Ì´Ócssd½ø³ÌÅÉÉú¶øÀ´£¬²¢ÇÒÒÔrootÓû§Éí·ÝÔÊÐí
[root@node2 init.d]# ps -ef | grep oprocd
root 5109 11227 0 20:37 pts/0 00:00:00 grep oprocd
root 5758 4849 0 19:14 ? 00:00:00 /bin/sh /etc/init.d/init.cssd oprocd
root 6084 5758 0 19:14 ? 00:00:00 /u01/app/crs_home/bin/oprocd.bin run -t 1000 -m 10000 -hsi 5:10:50:75:90 -f
Èç¹ûÒ»¸ö½Úµã±»hangסÁ˺ܳ¤Ê±¼ä£¬ÄÇô¼¯ÈºÖÐµÄÆäËû½Úµã»á°Ñ¸Ã½ÚµãÌÞ³ý³öÈ¥£¬ÔÚÕâÖÖÇé¿öÏ£¬ÎÒÃÇÐèÒª²ÉÈ¡´ëÊ©ÖØÆô±»hangסµÄ½Úµã£¬ÒÔ±ã´ïµ½fencing ioµÄÄ¿µÄ¡£oprocd±»ÉèÖÃÁËÁ½¸ö²ÎÊý£ºtimeout ºÍmargin£¬½ø³Ì»áÿ¼ä¸ôtimeoutʱ¼ä±»»½ÐÑÒ»´Î£¬Èç¹û±¾´Î±»»½ÐѵÄʱ¼äÓëÉϴα»»½ÐѵÄʱ¼ä¼ä¸ô³¬¹ýtimeout+margin£¬ÄÇôoprocd½ø³Ì»áÈÏΪoracle ½Úµã±»hangס£¬Òò´Ë»á×Ô¶¯ÖØÆô½Úµã»òÕß½«¾¯¸æÐÅϢдÈëÈÕÖ¾¡£
ͨ³£Çé¿öÏ£¬ÎÒÃÇ¿ÉÒÔ½«oprocd½ø³ÌÖØÆôϵͳµÄÔÒò¹éΪËÄÀࣺ
1:£º²Ù×÷ϵͳµÄµ÷¶ÈÎÊÌâ
2£º²Ù×÷ϵͳµÄ´æÔÚÓ²¼þ»òÕßÇý¶¯ÎÊÌâ
3£ºÏµÍ³¾ßÓдóÁ¿¸ºÔØ£¬µ¼Öµ÷¶È³ÌÐòÎÞ·¨¼°Ê±µ÷Èëoprocd½ø³Ì
4£ºoracle bug
Bug 5015469 ¨C OPROCD may reboot the node whenever the system date is moved
backwards.
Fixed in 10.2.0.3+
Fixed in 10.1.0.3 + One off patch for Bug 4206159.
Fixed in 10.2.0.4+
Fixed in 10.2.0.3+
Bug 4206159 ¨C Oprocd is prone to time regression due to current API used (AIX only)
Diagnostic Fixes (VERY NECESSARY IN MOST CASES):
Bug 5137401 ¨C Oprocd logfile is cleared after a reboot
Bug 5037858 ¨C Increase the warning levels if a reboot is approaching
oprocd½ø³ÌµÄÁ½¸ö²ÎÊý£ºtimeoutºÍmargin£¬ÆäĬÈÏÖµÔÚinit.cssd ÎļþÖÐÖ¸¶¨£¬Èç
[root@node2 init.d]# cat init.cssd | grep ^OPROCD_DEFAULT_
OPROCD_DEFAULT_TIMEOUT=1000
OPROCD_DEFAULT_MARGIN=500
OPROCD_DEFAULT_HISTORGRAM=
Òò´Ë£¬Ä¬ÈÏÇé¿öÏ£¬Èç¹ûÁ½´Î»½ÐÑoprocd½ø³ÌµÄʱ¼ä¼ä¸ô³¬¹ý1.5s£¬oprocd½ø³Ì¾Í»áÖØÆôϵͳ¡£ÕâÍùÍùÊDz»ºÏÊʵģ¬Èç¹ûÎÒÃÇÊÖ¹¤ÐÞ¸Äinit.cssdÎļþÖеÄĬÈÏÖµ£¬ÐèÒªoracle support²Å¿ÉÒÔ¡£
Èç¹ûÐèÒªÍ»ÆÆ1.5sµÄÏÞÖÆ£¬ÎÒÃÇ¿ÉÒÔµ÷ÓÃinit.cssdÀ´ÊµÏÖÄ¿µÄ£¬Í¨¹ýµ÷ÓÃinit.cssd¿ÉÒÔÐÞ¸ÄÁ½¸ö²ÎÊý£ºreboottime ºÍ diagwait£¬Èç¹ûdiagwait> reboottime,ÄÇômargin=diagwait-reboottime¡£ÔÚÉèÖÃdiagwaitʱ£¬ÐèÒª½«¼¯ÈºÖÐËùÓнڵãµÄËùÓнø³ÌÍ£µô£¬¶¼ÔÚ¿ÉÒÔÔì³ÉÊý¾ÝË𻵣¬Ö»ÐèÔÚracÖеÄÒ»¸ö½ÚµãÐ޸ļ´¿É¡£½¨Ò齫diagwaitÐÞ¸ÄΪ13
[root@node2 bin]# ./crsctl get css reboottime
3
[root@node2 bin]# ./crsctl get css diagwait
13
[root@node2 bin]# ./crsctl set css diagwait 13 -force
ÔÚ11.2.0.1ºó£¬ÎÒÃDz»ÔÙÐèÒªÐÞ¸Ädiagwait£¬Òò´Ë¼Ü¹¹ÒѾ·¢ÉúÁ˸ı䡣
ÔÚwindowsÏÂÎÒÃÇÒ²¿ÉÒÔÐÞ¸Ädiagwait£¬µ«ÊÇÓëÔÚlinuxϲ»Í¬£¬ÐÞ¸Ädiagwait²»»áÔì³ÉÉÏÃæµÄ±ä»¯¡£
ÏÂÃæÔÙÀ´¿´Ò»ÏÂÓйØhangcheck_timerµÄÓйØÐÅÏ¢£¬hangcheck_timerÓëoprocd¿ÉÒÔʵÏÖÏàͬµÄ¹¦ÄÜ£¬µ«ÊÇÁ½ÕßÖ®¼äûÓбØÈ»µÄÁªÏµ
Hangcheck-Timer Module
Hangcheck-Timer Module Requirements for Oracle 9i, 10g, and 11g RAC on Linux
Starting in release 9.2.0.2 and later, Oracle RAC environments required using a new I/O fencing model, named the hangcheck-timer module. This module was implemented to replace the Watchdog module, which provided similar fencing functionality. Hangcheck-timer was subsequently delivered as part of the standard kernel distribution for Linux kernel releases 2.4 and above.
Hangcheck-timer should be loaded at boot time, and monitors the Linux kernel for long operating system hangs that could affect the reliability of a RAC node. It runs in kernel mode and uses the Time Stamp Counter (TSC) to catch scheduling delays or node hangs. This is done by setting a timer, then checking when the timer fires