?
In fact it may be preferable to use a Full Table Scan for the select where b=5 and index lookups for the others.
?
Gathering Histogram Statistics
If we collect histogram statistics with the recommended settings:
?
exec DBMS_STATS.GATHER_TABLE_STATS (NULL,'HTAB1', method_opt => 'FOR ALL COLUMNS SIZE AUTO');
?
The b=5 query now does a Full Table Scan
?
select * from htab1 where b=5; --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9991 | 69937 | 7 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| HTAB1 | 9991 | 69937 | 7 (0)| 00:00:01 | ---------------------------------------------------------------------------
?
The query where B is 3 still uses an index:
?
select * from htab1 where b=3; --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 7 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| HTAB1 | 1 | 7 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | HTAB1_B | 1 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------
?
This is because a FREQUENCY Histogram has been created:
?
COL NUM_DISTINCT LOW HIGH DENSITY NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE HISTOGRAM
----- ------------ ---------- ---------- ------- --------- ----------- -------------------- ----------- ---------------
A 10000 1 10000 0 0 1 31-jan-2013 09:58:01 10000 NONE
B 10 1 10000 0 0 10 31-jan-2013 09:58:01 10000 FREQUENCY
TAB COL ENDPOINT_NUMBER ENDPOINT_VALUE
---------- ---------- --------------- --------------
HTAB1 A 0 1
HTAB1 A 1 10000
HTAB1 B 1 1
HTAB1 B 2 2
HTAB1 B 3 3
HTAB1 B 4 4
HTAB1 B 9995 5
HTAB1 B 9996 9996
HTAB1 B 9997 9997
HTAB1 B 9998 9998
HTAB1 B 9999 9999
HTAB1 B 10000 10000
12 rows selected.
?
On Column B there are 10 buckets matching up with the 10 distinct values.
The ENDPOINT_VALUE shows the column value and the ENDPOINT_NUMBER shows the cumulative number of rows. So the number of rows for ENDPOINT_VALUE 2, it has an ENDPOINT_NUMBER 2, the previous ENDPOINT_NUMBER is 1, hence the number of rows with value 2 is 1. Another example is ENDPOINT_VALUE 5. Its ENDPOINT_NUMBER is 9995. The previous bucket ENDPOINT_NUMBER is 4, so 9995 - 4 = 9991 rows containing the value 5.
Frequency histograms work fine with a low number of distinct values, but when the number exceeds the maximum number of buckets, you cannot create a bucket for each value. In this case the Optimizer creates Height balanced histograms.
?
Height Balanced Histograms
You can demonstrate this situation by forcing the optimizer to create fewer buckets than the Number of Distinct Values. i.e. using 8 buckets for 1