Specifying Query Access Plans in the InterBase Optimizer
Deej Bredenberg - January 18, 1993
Various customers have requested the ability to view the access plan being
used by InterBase in query processing, as well as to modify that plan when
they believe they can improve on it. Here are the requirements for setting
and getting the access plan in the optimizer:
1. Provide a readable display of the access plan generated by the optimizer
for any given query (ISQL only).
2. Provide extended SQL syntax to specify the plan to be used in a query.
(DSQL/ISQL and embedded SQL).
3. Make the output of requirement 1 acceptable as input to requirement
2, so that the user may simply modify the plan rather than generating it
from scratch. Limitations
If the user specifies the plan to be used in a query, we will make no attempt
to analyze the submitted plan for errors of omission. That is, the optimizer
is either on or off; it will not attempt to complete a partial plan or
make recommendations for doing so. Of course, if the query cannot be accomplished
via the submitted plan an error will be reported.
Elements of Plan Selection
To determine the options needed in setting a plan, it is important to survey
the methods used by the optimizer to generate a plan. In this discussion,
the term stream refers generically to a set of records, such as a relation
or a relation with restrictions placed on it.
Assigning Indices
InterBase uses indices for four basic purposes:
1. Equalities and Inequalities: When a field is compared to a constant,
andthe field is indexed, the index can be used to retrieve values greater
than, less than, or equal to the constant. Utilizing the index rather than
a sequential scan avoids fetching records that don't meet the selection
criteria.
2. Matching joined streams: When two streams are joined by matching two
fields, an index can be used to perform the join. If an index is available
for one of the fields, the other stream is retrieved first and key values
matched against the index.
3. Sorting and grouping: When a query is sorted by a field or fields which
are the keys of an index, the index is used to retrieve the records in
order rather than performing a sort. This is called a "navigational"
walk of the index. It is not navigational in the sense that the index is
walked backwards, but in that no bitmap is generated for the stream.
4. Navigation: Indices are used to walk forwards and backwards through
a stream in key order. This is used to support IDAPI-style navigation.
(This functionality does not currently surface through SQL.)
Combining Indices
For each selection criterion which matches an index, InterBase generates
a bitmap of the selected records. It can then combine the bitmaps by ANDing
and ORing them together according to the logic of the query. This generates
a single bitmap which describes all the selected records from a single
stream. This is advantageous when iterating through a stream multiple times,
since it is much faster to scan the bitmap than a set of indices.
Determining Join Order
In joining two or more streams together, it is often crucial to determine
which stream should be retrieved first. In general, the stream which is
the most expensive to retrieve should be retrieved first. If you think
of a join as a set of nested loops, it makes sense that the innermost loop
will be executed the most times. Similarly, the last stream in the join
list will be fetched the most times, so it had better be the cheapest to
retrieve.
Generating Rivers
When two or more streams are combined together, they form a river. The
optimizer first attempts to find the longest river which can be retrieved
via indices alone. Then it attempts to form the remaining streams into
the longest river, and so on. Once it has a set of rivers which are retrievable
via indexed access, it joins them together in order from left to right
based on the estimated cost of retrieving each river.
Cost Estimation
The cost of retrieval of a stream depends on several variables: whether
there is an index, the selectivity of that index, whether there are selection
criteria, the cardinality of the underlying relation, and whether the stream
needs to be sorted. The optimizer uses all these variables to determine
the cost of a retrieval of a stream. For example, when retrieving an indexed
equality, the cost is estimated as: cardinality of base relation * selectivity
of index + overhead of index retrieval Since rough estimation is used to
determine the values of all these variables, and the relative importance
of each of these variables cannot be precisely determined for all queries,
the optimizer can often select an inappropriate plan. The user may simply
wish to determine a plan based on empirical data for a particular query.
But we should advise the customer that the best plan may change over time
as the data changes, so specifying a plan as part of the query implies
an added maintenance burden for the SQL programmer.
Sort Merges
When a join term is specified between two streams, and no index is available,
the optimizer specifies a sort merge between the two streams. This means
that both streams will be sorted and the results merged together. Once
the streams are sorted, the engine can scan through both streams just once
to form the join, rather than iterating through the second stream for every
record in the first stream.
Set Plan Syntax
Syntax has been added to the SELECT expression in GPRE and DSQL/ISQL to
allow the user to set the plan for a query. The syntax should work for
SELECT statements used in creating a view, a stored procedure, or a trigger.
It should also work for subselects in an UPDATE or DELETE statement. Syntax
to specify a plan for UPDATE or DELETE is also an option, though it is
not planned for now.
The Select Statement
A PLAN clause has been added to the SELECT expression. The syntax supports
all current optimizer options, and is extensible enough to allow for new
options. An abridged syntax for the SELECT statement is:
select_expression [ UNION select_expression ... ] [ ORDER BY field_list ] select_expression := SELECT [ DISTINCT | ALL ] [ field_list | '*' ] FROM relation_list [ WHERE search_condition ] [ GROUP BY field_list ] [ HAVING search condition ] [ PLAN plan_expression ]Plans can be specified independently for the query and any subqueries. Specifying a plan on the main select or any subselect does not require a plan to be specified on any of the other select expressions in the query. The Plan Expression The syntax for the plan expression is as follows:
plan_expression := [join_type] '(' plan_item_list ')' plan_item_list := plan_item | plan_item ',' plan_item_list plan_item := table_specifier access_type | plan_expression
This syntax allows specification of a single relation or a join of two
or more relations in one pass. Nested parentheses can be used to specify
any combination of joins. Reverse polish notation is used to specify the
join operator, mostly because an infix operator would confuse the parser.
join_type := JOIN | [SORT] MERGE
The default join type is a JOIN. The keyword MERGE can be used to specify
a sort merge of the substreams. The optional SORT keyword can be used for
clarification purposes.
table_specifier := { table_name | alias_name } [ table_specifier ]
The table name or the alias for the table can be specified here. If a table
is used more than once in the query, aliases must be used to distinguish
them in the plan. More than one table specifier can be used to specify
a base table of a view (see "Handling Views", below).
access_type := [ NATURAL | INDEX '(' index_list ')' | ORDER index_name
]
index_list := index_name | index_name ',' index_list
The default access is NATURAL order, which means sequentially accessing
the records in no defined order.
The INDEX keyword allows specification of one or more indices for satisfying
booleans and join terms in the query. The user must specify all the indices
to be used; if any booleans or join terms remain after all indices are
exhausted, they will be evaluated without benefit of an index. If any indices
are specified which cannot be utilized, an error will be returned.
The ORDER keyword specifies that a sort will be effected through the use
of the index. In order to cause the result of the query to be sorted, the
stream must be placed leftmost in the join order. If the resultant stream
is not sorted as specified by the ORDER clause, a physical sort will be
performed.
Display Plan Syntax
The following command is now available in ISQL (QLI for V3.2L):
ISQL> SET PLAN;
Once this option has been set, all select-type statements will display
the plan chosen by the optimizer for the provided query, as shown in the
examples below. The SET PLAN command toggles this feature on and off, just
as with other ISQL options.
Examples of Plan Syntax
These examples demonstrate the plans that ISQL would print for some sample
queries. The plan clause itself can then be placed at the end of the query,
guaranteeing that the plan will always be used by the query in retrieving
the result set. Optionally, the user may use the plan clause as a starting
point to make modifications to the plan.
1. Simple retrieval.
ISQL> SELECT * FROM CITIES; PLAN (CITIES NATURAL)
2. Ordered retrieval, utilizing an index for ordering.
ISQL> SELECT * FROM CITIES ORDER BY CITY; PLAN (CITIES ORDER CITIES_1)
3. Simple join, resulting in a cross product.
ISQL> SELECT * FROM CITIES C, STATES S; PLAN JOIN (S NATURAL, C NATURAL)
*Note that when aliases are specified in the query, they are also used
in the plan printed by the optimizer, to avoid confusion when the same
table is referenced twice. The user may use either aliases or table names
as long as the table is used only once in the select expression.
4. Join with indexed equality.
ISQL> SELECT * FROM CITIES C, STATES S WHERE C.STATE = S.STATE; PLAN
JOIN (S NATURAL, C INDEX (DUPE_CITY))
5. Equijoin with index used for ordering.
ISQL> SELECT * FROM CITIES C, STATES S WHERE C.STATE = S.STATE ORDER
BY S.STATE; PLAN JOIN (S ORDER STATE1, C INDEX (DUPE_CITY))
6. Equijoin with no available indices, resulting in a sort merge.
ISQL> DROP INDEX STATE1; ISQL> DROP INDEX DUPE_CITY; ISQL> SELECT
* FROM CITIES C, STATES S WHERE C.STATE = S.STATE; PLAN SORT MERGE (C NATURAL,
S NATURAL)
7. Three-way join with two indexed equalities.
ISQL> SELECT * FROM CITIES C, STATES S, MAYORS M WHERE C.CITY = M.CITY
AND C.STATE = S.STATE; PLAN JOIN (S NATURAL, C INDEX (DUPE_CITY), M INDEX
(MAYORS_1));
8. Three-way join with only one indexed equality, resulting in a sort merge
of the result
. ISQL> SELECT * FROM CITIES C, STATES S, MAYORS M WHERE C.STATE = M.STATE
AND C.STATE = S.STATE; PLAN SORT MERGE (M NATURAL, JOIN (S NATURAL, C INDEX
(DUPE_CITY)));
Handling Views
Views may present some difficulty for users of the PLAN feature. Ordinarily,
users may treat a view the same as a table. But in optimizing a query,
the user must be cognizant of the base table(s) participating in the view.
Expanding Views
The PLAN clause must treat a view reference as if the base tables used
in creating the view were inserted into the FROM list of the query. For
example, assume a view is created as:
ISQL> CREATE VIEW CITIES_VIEW AS SELECT * FROM CITIES C, STATES S WHERE
C.STATE = S.STATE;
If that view were referenced in a simple query, the resultant plan would
be printed as follows:
ISQL> SELECT * FROM CITIES_VIEW; PLAN JOIN (CITIES_VIEW S NATURAL, CITIES_VIEW
C INDEX (DUPE_CITY))
To retrieve a view, a join is formed which retrieves the base tables in
the view. The join is executed according to the logic of the query which
defines the view. Thus the PLAN writer must be familiar with the query
which defines the view, and tell the optimizer how to retrieve the base
tables in the view.
Specifying a Base Table
When printing a plan for a query which involves a view, ISQL will use multiple
aliases to completely specify the base tables of the view. This is to avoid
confusion when a table is referenced more than once in the query. The user
may abbreviate the syntax to specify only the table name, or only the alias
for that table name. Thus any of the following alternative syntaxes are
also legal for the view defined above:
PLAN JOIN (STATES NATURAL, CITIES INDEX (DUPE_CITY))
PLAN JOIN (S NATURAL, C INDEX (DUPE_CITY))
PLAN JOIN (CITIES_VIEW STATES NATURAL, CITIES_VIEW CITIES INDEX (DUPE_CITY))
Unavailable Indices
If an index is specified in the plan, and the index is deleted or deactivated,
an error is returned when the query is compiled. It is assumed that if
the user goes to the trouble to specify the index to be used by the query,
it is an error for the query to be executed without benefit of the index.
Passing the Plan to the Engine
DSQL and GPRE generate new BLR to pass an access plan to the engine. The
BLR takes the form of a new plan clause on the record selection expression:
blr_rse count stream_clause... [first_clause] [boolean] [sort_clause] [project_clause] [plan_clause] stream_clause := table_name | aggregate_expression | union_expression table_name := blr_relation relation_name context_variable | blr_rid relation_id context_variable where the plan clause is defined as: plan_clause := blr_plan plan_expression plan_expression := {blr_join | blr_merge} count plan_expression... | blr_retrieve context_variable access_type
The context variable in the blr_retrieve must match a context variable
from the stream_clause. There must be a blr_retrieve for each stream in
the stream_clause.
access_type := blr_sequential | blr_navigational index_name | blr_indices count index_name...
Last Modified: 17-JAN-03