解释直方图信息(四)

2015-07-24 06:48:55 · 作者: · 浏览: 13
0 Distinct Values:

?

exec DBMS_STATS.GATHER_TABLE_STATS (NULL,'HTAB1', method_opt => 'FOR COLUMNS B SIZE 8'); 

?

So now we have gathered a HEIGHT BALANCED HISTOGRAM for Column B:

?

COL   NUM_DISTINCT        LOW       HIGH DENSITY NUM_NULLS NUM_BUCKETS LAST_ANALYZED        SAMPLE_SIZE HISTOGRAM
----- ------------ ---------- ---------- ------- --------- ----------- -------------------- ----------- ---------------
A            10000          1      10000       0         0           1 31-jan-2013 09:58:01       10000 NONE
B               10          1      10000       0         0           8 31-jan-2013 09:59:09       10000 HEIGHT BALANCED

TAB        COL        ENDPOINT_NUMBER ENDPOINT_VALUE
---------- ---------- --------------- --------------
     HTAB1          A               0              1
     HTAB1          A               1          10000
     HTAB1          B               0              1
     HTAB1          B               7              5
     HTAB1          B               8          10000

?


Notice that there are 8 Buckets against B now.

Oracle puts the same number of values in each bucket and records the endpoint of each bucket.

With HEIGHT BALANCED Histograms, the ENDPOINT_NUMBER is the actual bucket number and ENDPOINT_VALUE is the endpoint value of the bucket determined by the column value.

From the above, bucket 0 holds the low value for the column.

Because buckets 1-7 have the same endpoint, Oracle does not store all these rows to save space. But we have: bucket 1 with an endpoint of 5, bucket 2 with an endpoint of 5, bucket 3 with an endpoint of 5, bucket 4 with an endpoint of 5, bucket 5 with an endpoint of 5, bucket 6 with an endpoint of 5, bucket 7 with an endpoint of 5 AND bucket 8 with an endpoint of 10000 So bucket 1 contains values between 1 and 5, bucket 8 contains values between 5 and 10000.

All buckets contain the same number of values (which is why they are called height-balanced histograms), except the last bucket may have fewer values then the other buckets.

?

Storing Character Values in Histograms

For character columns, Oracle only stores the first 32 bytes of any string (there are also limits on numeric columns, but these are less frequently an issue since the majority of numbers are insufficiently large to encounter any problems). See:

Document 212809.1 Limitations of the Oracle Cost Based Optimizer

Any predicates that contain strings greater than 32 characters will not use histogram information and the selectivity will be 1 / Number of DISTINCT Values. Data in histogram endpoints is normalized to double precision floating point arithmetic.

?

For Example

?

SQL> select * from example;

A
----------
a
b
c
d
e
e
e
e
  

?

The table contains 5 distinct values. There is one occurence of 'a', 'b', 'c' and 'd' There are 4 occurrences of 'e'. If we create a histogram: Looking in user_histograms:

?

TABLE      COL   ENDPOINT_NUMBER ENDPOINT_VALUE
---------- ----- --------------- --------------
   EXAMPLE     A               1     5.0365E+35
   EXAMPLE     A               2     5.0885E+35
   EXAMPLE     A               3     5.1404E+35
   EXAMPLE     A               4     5.1923E+35
   EXAMPLE     A               8     5.2442E+35

?

So:

?

 ENDPOINT_VALUE 	5.0365E+35 represents a
			5.0885E+35 represents b
			5.1404E+35 represents c
			5.1923E+35 represents d
			5.2442E+35 represents e

?

Then, if you look at the cumulative values for ENDPOINT_NUMBER, the corresponding ENDPOINT_VALUE's are correct.

?

?