Big Table and Small Table Join strategy in Oracle(七)
01 |
|* 5 | TABLE ACCESS BY INDEX ROWID| T2 | 100 | 2400 | 101 (0)| 00:00:02 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("T1"."C1">=7300 AND "T1"."C1"<=1000000)
4 - access("T1"."C3"="T2"."C3")
5 - filter("T2"."C2">=900000 AND "T2"."C2"<=1000000)
As shown above, table T1 is still the driving table in the nestedloops join. Let’s test retrieving 993,001 rows from T1:
SELECT /*+ USE_NL(T1 T2) */
T1.C1,
T1.C2,
T1.C3,
T1.C4,
T2.C1,
T2.C2,
T2.C3,
T2.C4
FROM
T1,
T2
WHERE
T1.C3=T2.C3
AND T1.C1 BETWEEN 7000 AND 1000000
AND T2.C2 BETWEEN 900000 AND 1000000;
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 99M| 4545M| 100M (1)|334:26:13 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 99M| 4545M| 100M (1)|334:26:13 |
|* 3 | TABLE ACCESS FULL | T2 | 100K| 2343K| 889 (1)| 00:00:11 |
|* 4 | INDEX RANGE SCAN | IND_T1_C3 | 1000 | | 3 (0)| 00:00:01 |
|* 5 | TABLE ACCESS BY INDEX ROWID| T1 | 993 | 23832 | 1003 (0)| 00:00:13 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("T2"."C2">=900000 AND "T2"."C2"<=1000000)
4 - access("T1"."C3"="T2"."C3")
5 - filter("T1"."C1">=7000 AND "T1"."C1"<=1000000)
As shown above, table T2 is now the driving table for the nestedloops join. So, there must be other factors beyond table (or betterworded row source) size and the number of rows that will be retrieved from thetables. You might be wondering if the CLUSTERING_FACTOR of the indexesalso plays a role in determining which table is the driving table:
SET AUTOTRACE OFF
SELECT
TABLE_NAME,
INDEX_NAME,
CLUSTERING_FACTOR,
NUM_ROWS
FROM
USER_INDEXES
WHERE
TABLE_NAME IN ('T1','T2')
ORDER BY
TABLE_NAME,
INDEX_NAME;
TABLE_NAME INDEX_NAME CLUSTERING_FACTOR NUM_ROWS
---------- ---------- ----------------- ----------
T1 IND_T1_C1 32259 1000000
T1 IND_T1_C2 32259 1000000
T1 IND_T1_C3 1000000 1000000
T2 IND_T2_C1 3226 100000
T2 IND_T2_C2 3226 100000
T2 IND_T2_C3 100000 100000
I suggested (without checking) in the OTN thread that theCLUSTERING_FACTOR of the index on columns C2 would be higher than theCLUSTERING_FACTOR of the index on columns C1 because of the reverse (descending)order in which the C2 column values were inserted into the tables. Surprisingly (at least to me), the optimizer set the CLUSTERING_FACTOR of theC1 and C2 columns to be the same values, and set the CLUSTERING_FACTOR ofcolumn C3 to be the same as the number of rows in the table. Maybe one ofthe readers of this blog article can explain what happened to theCLUSTERING_FACTOR.
So, the answer to