As we saw in the previous section, the query planner needs to estimate
   the number of rows retrieved by a query in order to make good choices
   of query plans.  This section provides a quick look at the statistics
   that the system uses for these estimates.
  
   One component of the statistics is the total number of entries in each
   table and index, as well as the number of disk blocks occupied by each
   table and index.  This information is kept in
   pg_class's reltuples
   and relpages columns.  We can look at it
   with queries similar to this one:
regression=# SELECT relname, relkind, reltuples, relpages FROM pg_class
regression-# WHERE relname LIKE 'tenk1%';
    relname    | relkind | reltuples | relpages
---------------+---------+-----------+----------
 tenk1         | r       |     10000 |      233
 tenk1_hundred | i       |     10000 |       30
 tenk1_unique1 | i       |     10000 |       30
 tenk1_unique2 | i       |     10000 |       30
(4 rows)
   Here we can see that tenk1 contains 10000
   rows, as do its indexes, but the indexes are (unsurprisingly) much
   smaller than the table.
  
   For efficiency reasons, reltuples 
   and relpages are not updated on-the-fly,
   and so they usually contain only approximate values (which is good
   enough for the planner's purposes).  They are initialized with dummy
   values (presently 1000 and 10 respectively) when a table is created.
   They are updated by certain commands, presently VACUUM,
   ANALYZE, and CREATE INDEX.  A stand-alone
   ANALYZE, that is one not part of VACUUM,
   generates an approximate reltuples value
   since it does not read every row of the table.
  
   Most queries retrieve only a fraction of the rows in a table, due
   to having WHERE clauses that restrict the rows to be examined.
   The planner thus needs to make an estimate of the
   selectivity of WHERE clauses, that is, the fraction of
   rows that match each clause of the WHERE condition.  The information
   used for this task is stored in the pg_statistic
   system catalog.  Entries in pg_statistic are
   updated by ANALYZE and VACUUM ANALYZE commands,
   and are always approximate even when freshly updated.
  
   Rather than look at pg_statistic directly,
   it's better to look at its view pg_stats
   when examining the statistics manually.  pg_stats
   is designed to be more easily readable.  Furthermore,
   pg_stats is readable by all, whereas
   pg_statistic is only readable by the superuser.
   (This prevents unprivileged users from learning something about
   the contents of other people's tables from the statistics.  The
   pg_stats view is restricted to show only
   rows about tables that the current user can read.)
   For example, we might do:
regression=# SELECT attname, n_distinct, most_common_vals FROM pg_stats WHERE tablename = 'road';
 attname | n_distinct |                                                                                                                                                                                  most_common_vals                                                                                                                                                                                   
---------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 name    |  -0.467008 | {"I- 580                        Ramp","I- 880                        Ramp","Sp Railroad                       ","I- 580                            ","I- 680                        Ramp","I- 80                         Ramp","14th                          St  ","5th                           St  ","Mission                       Blvd","I- 880                            "}
 thepath |         20 | {"[(-122.089,37.71),(-122.0886,37.711)]"}
(2 rows)
regression=#
  
   Table 10-1 shows the columns that
   exist in pg_stats.
  
Table 10-1. pg_stats Columns
| Name | Type | Description | 
|---|
| tablename | name | Name of the table containing the column | 
| attname | name | Column described by this row | 
| null_frac | real | Fraction of column's entries that are null | 
| avg_width | integer | Average width in bytes of the column's entries | 
| n_distinct | real | If greater than zero, the estimated number of distinct values
      in the column.  If less than zero, the negative of the number of
      distinct values divided by the number of rows.  (The negated form
      is used when ANALYZE believes that the number of distinct values
      is likely to increase as the table grows; the positive form is used
      when the column seems to have a fixed number of possible values.)
      For example, -1 indicates a unique column in which the number of
      distinct values is the same as the number of rows. | 
| most_common_vals | text[] | A list of the most common values in the column. (Omitted if
      no values seem to be more common than any others.) | 
| most_common_freqs | real[] | A list of the frequencies of the most common values,
      i.e., number of occurrences of each divided by total number of rows. | 
| histogram_bounds | text[] | A list of values that divide the column's values into
      groups of approximately equal population.  The 
      most_common_vals, if present, are omitted from the
      histogram calculation.  (Omitted if column data type does not have a
      < operator, or if the most_common_vals
      list accounts for the entire population.) | 
| correlation | real | Statistical correlation between physical row ordering and
      logical ordering of the column values.  This ranges from -1 to +1.
      When the value is near -1 or +1, an index scan on the column will
      be estimated to be cheaper than when it is near zero, due to reduction
      of random access to the disk.  (Omitted if column data type does
      not have a < operator.) | 
   The maximum number of entries in the most_common_vals
   and histogram_bounds arrays can be set on a
   column-by-column basis using the ALTER TABLE SET STATISTICS
   command.  The default limit is presently 10 entries.  Raising the limit
   may allow more accurate planner estimates to be made, particularly for
   columns with irregular data distributions, at the price of consuming
   more space in pg_statistic and slightly more
   time to compute the estimates.  Conversely, a lower limit may be
   appropriate for columns with simple data distributions.