hash join、nested loop,sort merge join(四)
ogether. See "PGA Memory Management" to learn how to size SQL work areas. See "Hints for Join Operations" to learn about the USE_HASH hint.
11.3.5 Sort Merge Joins
Sort merge joins can join rows from two independent sources. Hash joins generally perform better than sort merge joins. However, sort merge joins can perform better than hash joins if both of the following conditions exist:
The row sources are sorted already.
A sort operation does not have to be done.
However, if a sort merge join involves choosing a slower access method (an index scan as opposed to a full table scan), then the benefit of using a sort merge might be lost.
Sort merge joins are useful when the join condition between two tables is an inequality condition such as <, <=, >, or >=. Sort merge joins perform better than nested loop joins for large data sets. You cannot use hash joins unless there is an equality condition.
In a merge join, there is no concept of a driving table. The join consists of two steps:
1.Sort join operation: Both the inputs are sorted on the join key.
2.Merge join operation: The sorted lists are merged together.
If the input is sorted by the join column, then a sort join operation is not performed for that row source. However, a sort merge join always creates a positionable sort buffer for the right side of the join so that it can seek back to the last match in the case where duplicate join key values come out of the left side of the join.
11.3.5.1 When the Optimizer Uses Sort Merge Joins
The optimizer can choose a sort merge join over a hash join for joining large amounts of data if any of the following conditions are true:
The join condition between two tables is not an equijoin.
Because of sorts required by other operations, the optimizer finds it is cheaper to use a sort merge than a hash join.
11.3.5.2 Sort Merge Join Hints
To instruct the optimizer to use a sort merge join, apply the USE_MERGE hint. You might also need to give hints to force an access path.
There are situations where it makes sense to override the optimizer with the USE_MERGE hint. For example, the optimizer can choose a full scan on a table and avoid a sort operation in a query. However, there is an increased cost because a large table is accessed through an index and single block reads, as opposed to faster access through a full table scan.
参考:http://docs.oracle.com/cd/E11882_01/server.112/e41573/optimops.htm#PFGRF94645
总结一下,在哪种情况下用哪种连接方法比较好:(一下摘自摘自:SunnyXu)
排序 - - 合并连接(Sort Merge Join, SMJ):
a) 对于非等值连接,这种连接方式的效率是比较高的。
b) 如果在关联的列上都有索引,效果更好。
c) 对于将2个较大的row source做连接,该连接方法比NL连接要好一些。
d) 但是如果sort merge返回的row source过大,则又会导致使用过多的rowid在表中查询数据时,数据库性能下降,因为过多的I/O。
嵌套循环(Nested Loops, NL):
a) 如果driving row source(外部表)比较小,并且在inner row source(内部表)上
有唯一索引,或有高选择性非唯一索引时,使用这种方法可以得到较好的效率。
b) NESTED LOOPS有其它连接方法没有的的一个优点是:可以先返回已经
连接的行,而不必等待所有的连接操作处理完才返回数据,
这可以实现快速的响应时间。
哈希连接(Hash Join, HJ):
a) 这种方法是在oracle7后来引入的,使用了比较先进的连接理论,
一般来说,其效率应该好于其它2种连接,但是这种连接只能用在
CBO优化器中,而且需要设置合适的hash_area_size参数,
才能取得较好的性能。
b) 在2个较大的row source之间连接时会取得相对较好的效率,在一个
row source较小时则能取得更好的效率。
c) 只能用于等值连接中