as to whether it was delayed by more than the allowed margin of error. If the duration exceeds the allowed time of (hangcheck_tick + hangcheck_margin seconds), the machine is restarted. Hangcheck-timer will not cause reboots to occur due to CPU starvation. Hangcheck-timer requires three configuration parameters: hangcheck_tick - defines how often, in seconds, the hangcheck-timer checks the node for hangs. The default value is 60 seconds. hangcheck_margin - defines how much margin is allowed, in seconds, between expected scheduling and real scheduling time. The default value is 180 seconds. hangcheck_reboot - determines if the hangcheck-timer restarts the node if the kernel fails to respond within the sum of the hangcheck_tick and hangcheck_margin parameter values. If the value of hangcheck_reboot is equal to or greater than 1, then the hangcheck-timer module restarts the system. If the hangcheck_reboot parameter is set to zero, then the hangcheck-timer module will not reboot the node, even if a hang is detected. The default value varies by kernel version. In the 2.4 kernel, the default is 1. In 2.6 kernels, the default is 0. Hangcheck-timer will provide message logging to the system messages log when a failure is detected, and a node restart is initiated by the module: When Hangcheck-timer reboots it may leave "Hangcheck: hangcheck is restarting the machine" message in /var/log/messages If you see the following message in /var/log/messages: "Hangcheck: hangcheck value past margin!" this means a reboot was required but was not performed, because hangcheck_reboot was not set to 1. If this message is seen, you must reload the hangcheck module as described earlier in this note, with the hangcheck_reboot value set to 1. Note : Hangheck timer is not required starting with Oracle Clusterware 11gR2
|