作业。如果在内存中不存在足够的空间,就会占用磁盘的临时空间。对于estimation方式,oracle仅仅需要扫描和排序所采样的内容。如果我们统计的对象是索引,computation方式不会占用太多的时间和空间,因此对于索引我们最好采用compute方式。
某些统计信息总是精确计算的,例如表所占用的数据块数量和索引的深度。
对于表和聚簇,我们建议使用estimation的方式,除非真的需要精确的统计信息。因此estatimation方式通常不会发生排序,速度更快,尤其在分析大表时。
ESTIMATE STATISTICS
ESTIMATE STATISTICS instructs Oracle to estimate statistics about the analyzed object and stores them in the data dictionary.
When estimating statistics, Oracle gathers representative information from portions of an object. This subset of information provides reasonable, estimated statistics about the object. The accuracy of estimated statistics depends upon how representative the sampling used by Oracle is. Only parts of an object are scanned to gather information for estimated statistics, so an object can be analyzed quickly. You can optionally specify the number or percentage of rows that Oracle should use in making the estimate.
estimate statistics 使得oracle评估待分析对象的统计信息并将它们存储在数据字典中。当评估统计信息时,oracle在待分析对象的部分区间内收集信息。这部分信息为分析对象提供了足够的内容。estimate方式的准确程度主要依赖于oracle是如何采样的。由于只有部分内容被扫描,因此速度更快。我们可以指定oracle采样的百分比。
To estimate statistics, Oracle selects a random sample of data. You can specify the sampling percentage and whether sampling should be based on rows or blocks.
对于estimate方式,oracle会随机采样数据。我们可以指定采样的百分比,也可以指定是根据记录还是根据块来采样。
Row sampling reads rows without regard to their physical placement on disk. This provides the most random data for estimates, but it can result in reading more data than necessary. For example, in the worst case a row sample might select one row from each block, requiring a full scan of the table or index.
基于记录的采用不会考虑记录的物理存储位置。这种方式提供了更好的随机性,但是可能会造成读取更多的数据。在最坏的情况下,oracle可能会在每个数据块中读取一条记录,从而会全表扫描表或者索引
Block sampling reads a random sample of blocks and uses all of the rows in those blocks for estimates. This reduces the amount of I/O activity for a given sample size, but it can reduce the randomness of the sample if rows are not randomly distributed on disk. Block sampling is not available for index statistics.
基于块的采样会随机读取数据块,然后利用数据块中的所有记录来进行分析统计工作。这无疑减少了输入输出的数量,但是如果记录在块内的分布不是随机的,这种方式会影响采样的随机性。对于索引,基于块的采样方式是不可用的。
Notes on estimating statistics
The default estimate of the analyze command reads the first approx 1064 rows of the table so the results often leave a lot to be desired.
默认情况下,oracle会读取表中的前1064条记录来作为采样数据。
The general consensus is that the default value of 1064 is not sufficient for accurate statistics when dealing with tables of any size. Many claims have shown that estimating statistics on 30 percent produces very accurate results. I personally have been running estimate 35 percent. This seems to produce very accurate numbers. It also saves a lot of time over full scans.
通常情况下,默认采样1064条记录是不充分的。多数人认为30%的采样会产生比较准确的结果。我个人常常将采样比例设置为35%
Note that if an estimate does 50% or more of a table Oracle converts the estimate to a full compute statistics.
如果采样比超过50%,oracle会将其转换为full compute statiistics
DBMS_STATS functions and variable definitions
Most of the DBMS_STATS procedures include the three parameters statown, stattab, and statid. These parameters allow you to store statistics in your own tables (outside of the dictionary), which does not affect the optimizer. Therefore, you can maintain and experiment with sets of statistics.
大部分DBMS_STAT过程包含三个参数STATOWN,STATTAB和statid。这些参数允许我们将统计信息存放到自己的表中,这些统计信息不回影响优化器。因此,我们可以维护和测试统计信息。
The stattab parameter s