OptimizingOracleRAC(九)

2014-11-24 11:35:25 · 作者: · 浏览: 13
r is too small then various network errors may be reported by utilities such as netstat or ifconfig. These symptoms will include dropped packages, overflows, fragmentation or reassembly errors.

Oracle installation pre-requisites require that the value be increased, typically to about 4M.

In Linux, the kernel parameter net.core.rmem_max controls the receive buffer size. The sysctl command can be used to obtain the current value:

# sysctl -n net.core.rmem_max
4194304

Ensure that your UDP receive buffer size is set above the default value – probably to the OS maximum.

LMS waits

Interconnect performance is at the heart of Global Cache latency, but high Global Cache latencies are as often the result of delays in the Oracle software layers. The LMS service on the remote instances contributes most of the non-network latency to Global Cache requests; it is responsible for constructing and returning the requested blocks. The following query shows LMS latencies for each instance for current and consistent read requests:

SQL> WITH sysstats AS (
2 SELECT instance_name,
3 SUM(CASE WHEN name LIKE 'gc cr%time'
4 THEN VALUE END) cr_time,
5 SUM(CASE WHEN name LIKE 'gc current%time'
6 THEN VALUE END) current_time,
7 SUM(CASE WHEN name LIKE 'gc current blocks served'
8 THEN VALUE END) current_blocks_served,
9 SUM(CASE WHEN name LIKE 'gc cr blocks served'
10 THEN VALUE END) cr_blocks_served
11 FROM gv$sysstat JOIN gv$instance
12 USING (inst_id)
13 WHERE name IN
14 ('gc cr block build time',
15 'gc cr block flush time',
16 'gc cr block send time',
17 'gc current block pin time',
18 'gc current block flush time',
19 'gc current block send time',
20 'gc cr blocks served',
21 'gc current blocks served')
22 GROUP BY instance_name)
23 SELECT instance_name , current_blocks_served,
24 ROUND(current_time*10/current_blocks_served,2) avg_current_ms,
25 cr_blocks_served,
26 ROUND(cr_time*10/cr_blocks_served,2) avg_cr_ms
27 FROM sysstats;

Current Blks Avg CR Blks Avg
Instance Served CU ms Served Cr ms
------------ ------------ ------ ------------ ------
MELRAC1 7,342,829 .03 7,647,581 .05
MELRAC2 7,330,661 .03 7,418,901 .04
MELRAC3 7,310,866 .03 12,696,127 .08

If the network is responsive and fast, but LMS latency is high, then one of the following might be implicated:

An overloaded instance is unable to respond fast enough to Global Cache requests. In particular, the LMS processes might be overloaded with requests or starved for CPU.IO bottlenecks – particularly in redo log IO – are slowing down the response to Global Cache requests.

In the first case, the LMS process on the remote instance is simply too busy to process the Global Cache request. This can be due to an excessive volume of requests or because CPU load on the host is making it impossible for the LMS to obtain CPU. The later situation is less common from Oracle 10.2 onwards, because Oracle now runs the LMS processes at an elevated priority. Severe memory pressure may also lead to a lack of LMS responsiveness.

The “too busy” phenomenon is probably a result of an imbalanced cluster: if any instance in the cluster is significantly overloaded then Global Cache response times on the idle instances will suffer. The best solution is to try and achieve a better cluster balance – see the section on Cluster Balance below.

High Global Cache latencies can occur when one or more instances in the cluster become overloaded. Balancing the workload across the cluster is indicated.

The other typical cause of high latencies is when the LMS process must flush uncommitted change