hash join、nested loop,sort merge join
在oracle 执行计划中存在三种表的连接方式,hash join、nested loop,sort merge join
1、hash join
hash join
A join in which the database uses the smaller of two tables or data sources to build a hash table in memory. The database scans the larger table, probing the hash table for the addresses of the matching rows in the smaller table.
也就是说hash join是 在两个表中连接的时候存在的,散列连接是CBO 做大数据集连接时的常用方式,可以把一个小表或是数据源整合到内存中建立一个hash table,然后
数据库开始扫描这个大表,使用内存中的hash table的地址来匹配外表中的数据行。
一个小的表开始joined一个大表的时候,oracle的hash join与nested loop join相比显的非常快。oracle对hash join占用的内存是有限制的,这个值是5%pga_aggregate_target。
The
Oracle DBA controls the optimizers' propensity to invoke hash joins because the DBA must allocate the RAM resources to Oracle (using the hash_area_size and pga_aggregate_target parameters) for the optimizer to choose a hash join. The CBO will only choose a hash join if you have allocated Oracle enough RAM area in which to perform the hash join.
我们可以使用use_hash强制使用hash join
另外,oracle 的hash join需要很多的内存来建立hash表,为了提高效率,我们必须设置hash_area_size足够大,如果hash表占用的内存超出了hash_area_size的大小,那么就会分页到临时表空间,这会带来一定的消耗影响性能。
eg:
SQL> select count(*) from amy_emp;
COUNT(*)
----------
7340032
SQL> select count(*) from amy_dept;
COUNT(*)
----------
4
SQL> set autotrace on explain
SQL> select /*+use_hash(amy_emp,amy_dept*/ count(*) from amy_emp,amy_dept where amy_emp.deptno=amy_dept.deptno;
COUNT(*)
----------
7340032
Elapsed: 00:00:03.35
Execution Plan
----------------------------------------------------------
Plan hash value: 653649851
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 4 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | HASH JOIN | | 14 | 84 | 4 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | REVERSE_INDEX | 4 | 12 | 1 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| AMY_EMP | 14 | 42 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("AMY_EMP"."DEPTNO"="AMY_DEPT"."DEPTNO")
SQL> select count(*) from amy_emp,amy_dept where amy_emp.deptno=amy_dept.deptno;
COUNT(*)
----------
7340032
Elapsed: 00:00:25.10
Execution Plan
----------------------------------------------------------
Plan hash value: 1472773524
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
| 2 | NESTED LOOPS | | 14 | 84 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| AMY_EMP | 14 | 42 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | REVERSE_INDEX | 1 | 3 | 0 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("AMY_EMP"."