OptimizingOracleRAC(五)

2014-11-24 11:35:25 · 作者: · 浏览: 8
uld manually set LMS to run at a higher than default OS priority to alleviate this situation. From 10.2, LMS runs at a higher priority by default. Changing the number of LMS processes might mask the symptom; however preventing instances from overloading is a more effective overall solution.

gc cr/current block lost: Lost block waits occur when a block that has been transmitted is not received. If using of UDP – which is an “unreliable” protocol in the sense that a network operation does not require an acknowledgement – then some small number of lost blocks are to be expected. Moderate rates might suggest that the interconnect is overloaded. High rates probably indicate network hardware issues. We’ll look closer at lost blocks later in this article.

Reducing Global Cache latency

The RAC architecture requires and expects that instances will fetch data blocks across the interconnect as an alternative to reading those blocks from disk. The performance of RAC is therefore going to be very sensitive to the time it takes to retrieve a block from the Global Cache; which we will call Global Cache latency.

Some documents or presentations suggest that Global Cache latency is primarily or exclusively Interconnect latency: the time it takes to send the block across the interconnect network. Interconnect latency is certainly an important part of overall Global Cache latency: but it’s not the only part. Oracle processes such as the Global Cache Service (LMS) have to perform a significant amount of CPU intensive processing each time a block is transferred, and this CPU time is usually as least as significant as any other factor in overall Global Cache latency. In certain circumstances non-CPU operations – such flushing redo entries to disk – will also contribute to Global Cache latency.

Interconnect latency is an important factor in Global Cache latency – however Oracle CPU and IO are also important contributors.

To measure Global Cache latency, we use the wait interface as exposed by GV$SYSTEM_EVENT (The “V$” views report data for the current instance: “GV$” views report across the entire cluster.). The following query reports on average times for each of the Global Cache request types as well as single-block read times (for comparison):

SQL> SELECT event, SUM(total_waits) total_waits,
2 ROUND(SUM(time_waited_micro) / 1000000, 2)
3 time_waited_secs,
4 ROUND(SUM(time_waited_micro)/1000 /
5 SUM(total_waits), 2) avg_ms
6 FROM gv$system_event
7 WHERE wait_class <> 'Idle'
8 AND( event LIKE 'gc%block%way'
9 OR event LIKE 'gc%multi%'
10 or event like 'gc%grant%'
11 OR event = 'db file sequential read')
12 GROUP BY event
13 HAVING SUM(total_waits) > 0
14 ORDER BY event;

Total Time Avg Wait
Wait event Waits (secs) (ms)
------------------------------ ------------ ------------ ---------
db file sequential read 283,192 1,978 6.99
gc cr block 2-way 356,193 396 1.11
gc cr block 3-way 162,158 214 1.32
gc cr grant 2-way 141,016 25 .18
gc cr multi block request 503,265 242 .48
gc current block 2-way 325,065 227 .70
gc current block 3-way 117,913 93 .79
gc current grant 2-way 45,580 20 .44
gc current grant busy 168,459 296 1.76
gc current multi block request 91,690 42 .46

This example output provides reason for concern. The average wait for Global Cache consistent read requests (as shown by ‘gc cr block 2-way’ and ‘gc cr block 3-way’) is more than 1 millisecond and more than 1/10th of the time for a db file sequential read. While the Global Cache is still faster than disk, it’s taking longer than we’d expect if the interconnect and RAC were f