(6 rows)
嵌套连接顺序扫描 源码:
for (i = 0; i < length(outer); i++)
for (j = 0; j < length(inner); j++)
if (outer[i] == inner[j])
output(outer[i], inner[j]);
这里和oracle 的原理是一样的,就不再多解释。
HASH JOIN:
EXPLAIN SELECT sample1.junk
FROM sample1 JOIN sample2 ON (sample1.id = sample2.id)
WHERE sample2.id > 33;
QUERY PLAN
----------------------------------------------------------------------
Hash Join (cost=33.55..1097.55 rows=24131 width=32)
Hash Cond: (sample1.id = sample2.id)
-> Seq Scan on sample1 (cost=0.00..194.01 rows=10701 width=36)
-> Hash (cost=27.91..27.91 rows=451 width=4)
-> Seq Scan on sample2 (cost=0.00..27.91 rows=451 width=4)
Filter: (id > 33::oid)
HASH JOIN源码:
for (j = 0; j < length(inner); j++)
hash_key = hash(inner[j]);
append(hash_store[hash_key], inner[j]);
for (i = 0; i < length(outer); i++)
hash_key = hash(outer[i]);
for (j = 0; j < length(hash_store[hash_key]); j++)
if (outer[i] == hash_store[hash_key][j])
output(outer[i], inner[j]);
merge join:
EXPLAIN SELECT sample1.junk
FROM sample1 JOIN sample2 ON (sample1.id = sample2.id);
QUERY PLAN
--------------------------------------------------------------------------
Merge Join (cost=1005.10..2097.74 rows=72392 width=32)
Merge Cond: (sample2.id = sample1.id)
-> Sort (cost=94.90..98.28 rows=1353 width=4)
Sort Key: sample2.id
-> Seq Scan on sample2 (cost=0.00..24.53 rows=1353 width=4)
-> Sort (cost=910.20..936.95 rows=10701 width=36)
Sort Key: sample1.id
-> Seq Scan on sample1 (cost=0.00..194.01 rows=10701 width=36)
这里两表位置变化无影响。
sort(outer);
sort(inner);
i = 0;
j = 0;
save_j = 0;
while (i < length(outer))
if (outer[i] == inner[j])
output(outer[i], inner[j]);
if (outer[i] <= inner[j] && j < length(inner))
j++;
if (outer[i] < inner[j])
save_j = j;
else
i++;
j = save_j;
ANALYZE sample1;
ANALYZE sample2;
有了统计信息后:
postgres=# EXPLAIN SELECT sample2.junk
postgres-# FROM sample1 JOIN sample2 ON (sample1.id = sample2.id);
QUERY PLAN
------------------------------------------------------------------------
Hash Join (cost=17.39..139.45 rows=284 width=254)
Hash Cond: (sample1.id = sample2.id)
-> Seq Scan on sample1 (cost=0.00..110.43 rows=2343 width=4)
-> Hash (cost=13.84..13.84 rows=284 width=258)
-> Seq Scan on sample2 (cost=0.00..13.84 rows=284 width=258)
CREATE INDEX i_sample1 on sample1 (id);
CREATE INDEX i_sample2 on sample2 (id);
建立相关索引后:
postgres=# EXPLAIN SELECT sample2.junk
postgres-# FROM sample1 JOIN sample2 ON (sample1.id = sample2.id)
postgres-# WHERE sample1.id = 33;
QUERY PLAN
---------------------------------------------------------------------------------
Nested Loop (cost=0.00..16.55 rows=1 width=254)
-> Index Scan using i_sample1 on sample1 (cost=0.00..8.27 rows=1 width=4)
Index Cond: (id = 33::oid)
-> Index Scan using i_sample2 on sample2 (cost=0.00..8.27 rows=1 width=258)
Index Cond: (id = 33::oid)
不再是Inner Sequential Scan,而是Inner Index Scan