## Integer Predicates

### September 19, 2007

Another thought provoking OTN exchange.

To recap, given a relation with column *QTY*, for example:

KEY | STRING | QTY |

1 | a | 2 |

2 | b | 3 |

produce a multiset such that each row of the original relation is duplicated the *QTY* times:

KEY | STRING | QTY |

1 | a | 2 |

1 | a | 2 |

2 | b | 3 |

2 | b | 3 |

2 | b | 3 |

We’ll focus on the two solutions, the one suggested by Joe:

`select t.*`

from T

CONNECT BY CONNECT_BY_ROOT KEY = KEY AND LEVEL <= QTY

and “the standard” one

`select t.* from T, (`

SELECT level L from DUAL

connect by level <= (select max(qty) from T)

) iv where L <= QTY

On the surface it appears as if Joe just merged the inline view *iv* from the standard solution!

Let’s contrast them in more detail. The inline view in the standard solution is a segment of the infinite integer numbers relation. The eyesore *level <= (select max(qty) from T)* expression is a safety predicate: it is logically not required! We can think of joining the original table t with the set of all integers, the filtering out only those rows that satisfy the *L <= QTY* condition. Alternatively, we can just join the original table with the infinite *L <= QTY* predicate

L | QTY |

1 | 1 |

1 | 2 |

1 | 3 |

1 | 4 |

2 | 2 |

2 | 3 |

2 | 4 |

3 | 3 |

3 | 4 |

4 | 4 |

This imaginary SQL solution is remarkably succinct:

`select t.* from T, LE`

where LE.QTY = T.QTY

It could be reduced even more

`select t.* from T`

where L <= QTY

The normal SQL syntax wouldn’t allow any of these, of course. The first query refers to some relation *LE* which is not defined, while the second one refers to a variable *L* which is free. The answer is that SQL should be extended to admit predicates as first class citizens. The query execution engine should become aware of the infinite relations and be extended with the methods allowing to join predicates and relations.

In this particular example what is the join method? Clearly any join with *LE* as a leading relation is going to fail. The only feasible execution is to scan the finite *T* relation first, and try to find the matching tuples from *LE* by some sort of index lookup, otherwise we, again, have to make the full scan of *LE*. It is pretty clear what kind of index do we need, it has to be a function that accepts the *QTY* as an integer argument, and computes the set of all the *(L,QTY)* pairs that match.

With this idea in mind, here is the **proposed** syntax:

`create predicate LE (`

L integer,

QTY integer

);

create index IDX_LE

on LE (QTY) implemented as a function...;

As you see it differs from the traditional table/index SQL definition in two aspects:

- a user is not allowed to to modify the LE relation
- he has to supply the index implementation

This is the first step, which “fixes” the first query. The second step is for SQL engine to automatically recognize “built-in” predicates, so that the query

`select t.* from T`

where L <= QTY

become legitimate as well.