SQL> SELECT name, SUM(VALUE)
2 FROM gv$sysstat
3 WHERE name LIKE 'gc%lost'
4 OR name LIKE 'gc%received'
5 OR name LIKE 'gc%served'
6 GROUP BY name
7 ORDER BY name;
NAME SUM(VALUE)
---------------------------------------------------------------- ----------
gc blocks lost 99
gc claim blocks lost 0
gc cr blocks received 14207701
gc cr blocks served 14207721
gc current blocks received 14471301
gc current blocks served 14471393
Time spent waiting for lost block retransmission is recorded in the wait events ‘gc cr request retry’, ‘gc cr block lost’ and ‘gc current block lost’. The times associated with these waits should be low: typically less than 1% of total when compared to the total number of blocks recorded in the ‘gc cr/current blocks received/served’ statistics.
If there are very high lost block counts (relative to blocks received) – or if the time associated with lost blocks becomes significant compared to total database time – then the most likely cause is a network hardware issue. This can be as simple as a poorly mounted network card, crimped networking cables or faulty network components.
Moderate lost block counts – especially if associated with very high levels of activity – might indicate an overloaded interconnect. The network optimizations below might alleviate the problem, or you may need to increase the throughput of the interconnect hardware (upgrading to Gigabit Ethernet, 10 Gigabit Ethernet or Infiniband for instance).
Global Cache “lost blocks” can be indicative of an overloaded or miss-configured interconnect or (at high levels) faulty network hardware.
Optimizing the interconnect
If the interconnect is identified as a problem, or even if we just want to optimize it to squeeze the Global Cache latencies down as far as possible, we have a few networking options we can try.
Network hardware and protocols
It’s possible to use dual Network Interconnect Cards (NICs) to reduce points of failure in the overall RAC architecture. If so, you should use NIC “bonding” (also known as link aggregation) to present the two NICs to Oracle as single logical interface. This will allow for the aggregate network bandwidth of both cards to be fully utilized.
The two most commonly used link and transport protocol combinations for the RAC interconnect are:
Gigabit Ethernet (GBe) or 10 Gigabit Ethernet (10GBe) in combination with UDP.InfiniBand in combination with either Reliable Datagram Sockets (RDS) or Internet Protocol (IP).The GBe/UDP option has the advantage of using standards-based commodity hardware, and being supported across a wide set of hardware and operating systems. InfiniBand offers superior throughput and latency, but at greater cost and administration effort. Note that Oracle uses InfiniBand/RDS inside Exadata both to connect the RAC instances and to attach the database nodes to the storage nodes: it’s clearly the highest performance solution.
However Gigabit Ethernet is able to sustain very high bandwidth – somewhere in the vicinity of 5,000-10,000 Global Cache transfers per second. Most RAC databases – especially those with an OLTP style workload - are unlikely to overload a GBe or 10GBe interconnect.
Many RAC databases – especially OLTP style - will be adequately served by a Gigabit Ethernet or 10 Gigabit Ethernet based interconnect. However, InfiniBand offers superior throughput and scalability.
Ethernet Jumbo frames
By default, the maximum sized packet that can be tran