Attribute Domain Partitions for Efficient Query Optimization
A major factor in the optimization of processing relational database queries is the determination of the expected size of relations resulting from various relational operations. The binary operation Join is frequently used in queries and requires a high processing cost. The ability to accurately estimate the size of joined relations without actually performing the Join operation can greatly improve query processing. For example, knowing the expected join sizes for each pair of three relations, we can determine ahead of time the best join sequence of these three relations. This is especially useful in a distributed database environment where the communication cost is a key factor of processing cost. An expected join size formula and an exact join size formula using statistical parameters will be described with examples. Since join sizes can vary in a wide range depending on the distribution of join attribute values, the estimation error can be large. To have practical applications, one must be able to bound the estimation error. This presentation will describe a process, called A-partitioning, which divides attribute domains into partitions. With this process we can bound the maximal estimation error to within reasonable limits.