One more Nested Intervals encoding
June 20, 2008
Dan Hazel developed one more version of continued fraction tree encoding. It is equivalent to matrix based approach with atomic matrices of the kind
[n + 1 n]
[ ]
[ 1 1]
but should be appealing to the readers who find continued fractions more intuitive compared to matrices.
Here is matrix perspective to his work. The atomic matrix in the foundation of Dan’s approach is
[n + 1 n]
[ ]
[ 1 1]
It has determinant equal to one, therefore general path determinants are not alternating, and nested interval orientation does not change when we ascend to the next tree level. Each Dan’s atomic matrix is a product of the familiar atomic matrix
[n 1]
[ ]
[1 1]
with an auxiliary matrix
[1 1]
[ ]
[1 0]
This is why Dan’s continued fractions contain interleaving sequences of 1s.
This issue is becoming a recurring theme of mine. This time it has been triggered by OTN exchange. To my surprise, people have difficulty accepting this seemingly obvious idea.
In SQL there is syntactic difference between a predicate and a table. However, let’s look beyond what you may have learned from syntax diagrams in Oracle SQL Reference (or other vendor for that matter). For our purposes let’s create a simple table Dept10:
create table Dept10 (
deptno integer;
);
insert into Dept10 values (10);
Here is the question: is there a difference between this table and the predicate deptno = 10?
Sure there is: we can examine the content of Dept10 with a simple query:
select * from Dept10
and we have no means to “examine the content” of a predicate! However, let’s include the predicate deptno = 10 and the Dept10 table into a bigger query:
select dname from dept where deptno = 10
select dname from Dept natural join Dept10;
Now, there is no difference between the two! In fact, you can construct many more queries: for every query involving Dept10 we may compose equivalent query without Dept10 and vice versa, with the only exception being the early mentioned single table query:
select * from Dept10
Therefore, I propose to extend SQL to allow queries like this
select * where deptno = 10
One consequence of this amendment is that the famous Dual table is no longer required. The other implication of equating tables with predicates is that physical query optimization can be achieved in a simpler optimizer framework.
Some people suggested to go even further and just write
select 10
how this proposal fits? Let’s analyze this expression step by step. The legitimate SQL query which achieves this is
select 10 from Dual
where the Dual could be any table or view containing one row. Next, what is this constant 10 in the select clause? I’m asking because in relational algebra you can’t just take a projection operator and substitute a column with an arbitrary expression. We may think of this expression being evaluated for each row, but a better idea is that we invoke some function. In our case it happen to be a function taking an empty argument list and returning a constant
function return10 return integer is
BEGIN
RETURN 10;
END return10;
Now the query transforms to
select return10() from dual
Next, every function is a relation (or predicate) that matches its input with output. Which predicate the function return10 corresponds to? Well, it must be deptno=10! Certainly function output is unnamed, so the predicate argument deptno can be anything. With the query output column being aliased
select return10() as deptno from dual
however, we can interpret row-wise function invocation as a join:
select deptno from dual natural join Dept10
Removing the redundant Dual we get
select deptno from Dept10
Once again, the Dept10 can be thought as a table which materializes the deptno = 10 predicate, and we established that there is unambiguous interpretation of the queries
select 10
and
select 10 as deptno
Package content ordering in SQLDeveloper
April 22, 2008
Explain Plan enhancements in SQL Developer 1.5
April 16, 2008
Prior to version 1.5 Explain Plan and Autotrace followed different code paths. This manifested itself in many ways. The most obvious distinction was that the tree nodes were sugarcoated with different set of icons. Then, the predicates were shown differently.
Explain plan implementation change in the Early Access release 3 (EA3) went not completely unnoticed. Many users complained that the explain plan functionality regressed. Most of the problems were fixed for production, although couple of final amendments may still be delayed to the patch release. This article describes the new explain plan.
There are many columns in the PLAN_TABLE competing for the window real estate. Which columns are displayed is controlled from the preferences page:
The check boxes in the autotrace and explain plan columns are self describing. You select one and the corresponding table column in the SQL worksheet autotrace or explain plan tab would appear. There is minimal post processing (if any at all). Few exceptions are
Here is explain plan for a pretty complex view ALL_OBJECTS:
You may wonder why parse the predicates and to show them as nodes in the parse tree, especially that they take quite a lot of space for complex queries. Well, the reason is the same why the views are expanded into the defining queries, so that view nodes become branches in the explain plan tree. From theoretical perspective, a view is a relational algebra expression over base relations, while complex predicate is a boolean algebra expression over primitive predicates. Please refer to that paper for more detailed exposition.
Although predicates are important, the complaint about them consuming a lot of space is valid as well. Tree structure, no matter how deep it is, can be trimmed to a manageable size. In our case all the predicate branches can be collapsed en masse with little context menu assistance:
For collapsed predicate nodes a tooltip displays the predicate information like this
For complex predicate a user can expand the tree node
Those users who are still unimpressed by the feature, can simply disable the predicate option as shown at the very first preferences screenshot for autotrace. The display looks pretty much like the legacy explain plan:
Code Completion for Mixed Case Identifiers
March 24, 2008
In general, programming environment obey its design principles, and one of them is whether the language is conceived to be case sensitive or not. This is why mixed case identifiers in oracle are something like bastard children. What is the perspective of SQLDeveloper code completion?
Let’s create a couple of test tables
CREATE TABLE "SCOTT"."MixedCase"
( “MixedId” VARCHAR2(20 BYTE)
);
CREATE TABLE "SCOTT"."Mixed Case"
( “Mixed Id” VARCHAR2(20 BYTE)
);
If you type
select 1 from
and assuming that these two tables are in your schema, then they should appear on the list. Likewise, if you type
select 1 from "Mi
fast enough not to give any chance to code completion popup to surface after the from keyword, then you are still prompted with both table names. Furthermore, you erase the column expression “1″ you are prompted with column name. So it sort of works.
However, as several posters on SQLDeveloper forum indicated, this functionality is naturally expected to be extended so that a user don’t know if the table is in mixed case, and specifically what case the letters should be! Unfortunately, the
select 1 from Mi
won’t prompt you any mixed case identifiers.
The reason is that code completion executes the following dictionary query:
select ... from all_objects where object_name like 'MI%'
which misses the mixed case identifiers, and the query which “obviously” should be executing is
select ... from all_objects where upper(object_name) like 'MI%'
To understand why the second query is a bad idea let see the execution statistics in a moderately sized database (such as oracle Apps). Here is the first case captured by SQLDeveloper autotrace
As you see, the execution is lean 336 buffer gets and the crux is index skip scan on I_OBJ2 (where the value the first column OBJ$.OWNER# is not known). Compare this to the second case
where full OBJ$ table scan results in 5K buffer gets, which translates into the execution time of about 250 ms. This number is somewhat on the boundary of acceptable performance for code completion, which is designed to be fast, otherwise it directly affects the product end user experience. I decided that compromising performance for 99.9% of those users who don’t use pitiful mixed case identifiers is not worth it.
There is a straightforward fix on server side: the function based index with the first column upper(object_name), but unfortunately it is not there (even in oracle version 11).
P.S. In another SQLDeveloper forum thread a poster suggested that “obviously” code completion should be designed to cache all the identifiers into memory which would guarantee a “good performance”. This sort of ideas usually come from object propellerheads who are ignorant in database fundamentals and learned nothing from OODBMS failure. It doesn’t warrant a more detailed reply.
Getting most of Code Completion in SQL Developer
January 31, 2008
Here are few tips to make yourself comfortable with the redesigned code completion in SQL Developer 1.5, which is currently available as a preview release.
The most ambitious code completion feature is that it is triggered by any character typed, not only the “.”. If you make a slight pause when typing, you’ll notice a list of suggestions. The delay is set on the preferences page (Code Editor -> Code Insight) and the recommended value is the default 300 ms.
Code completion is designed to respond in less than 0.5 sec. Everything longer than that is logged onto java console. If you experience performance problems, you can disable the “eager” mode of code completion on the preferences page so that it would be triggered by ctrl-space only. However, before you do that I suggest to spend 5 min and report the db version, how large it is (the number of objects), and possibly the output of java console. (The executable with java console sqldeveloper.exe is located in sqldev_home/sqldeveloper/bin).
In case if you are still reading, begin typing
select *
and the list of 3 keywords appears. However, if you begin the query with
select 1
then no suggestion would appear at all. This is because the list of suggested keywords is artificially restricted to 3 items. Having more than 3 keywords in the suggestion list is too overwhelming for the end user so that it was decided not to prompt anything in this case. Sure when we type “1″, the expression can be amended at least with any of the 4 arithmetic operators (and even more choices). Continued the second query we get the “from” keyword suggestion as soon as the “f” is typed. A list of reserved oracle SQL&PL/SQl keywords contains four items starting with “F”: FETCH, FORM, FOR, FROM, but the code completion is syntax driven, so that the other options are out of context. Therefore we arrived to
select 1 from
At this point you would be prompted with the list of the tables and views in your current schema, and the list of the users. So you either start typing the other schema name, or continue with the table in current schema. You can also may start typing the name in the other schema without schema prefix, say “all_objects”. The list of options would shrink to match the prefix you typed, and it may well be that your choice is not on the list. Say, you are connected as a familiar SCOTT user, but want to query all_objects. Therefore, as soon as you typed
select 1 from a
you would get the following prompt:
The list shows all the tables and views in all the schemas, and again, in order not to make it overwhelmingly large for the end user, it is truncated to 20 items for each object type. Hence the the “…” item at the end of the list of tables. You can either select from the list, or type more in order to narrow the selection. At some point you may get a list with fewer than 20 items, yet there would still be the “…” list entry at the end.
This is because the list of items may have been narrowed by pure selection of the items matching the prefix. You may select the “…” item, which would retrigger the code completion, and you’ll get the list repopulated from the database, as opposed to narrowing the previous incomplete list of 20 items:
Also this retriggering happens automatically whenever the narrowed list becomes empty.
So now that we have the basic skeleton of the query, you can continue on and make it more sophisticated. Join more tables or views, alias them, build the where clause, or complete the select clause. Note that if you just start typing
select emp.
then no code completion prompt would appear. There are several reasons why.
First, here is how ANSI SQL standard specifies the order of query evaluation:
- Start in the FROM clause specify the list of all tables/views used in the query block. Use aliases for disambiguation/references from the other clauses.
- Go to the WHERE clause and remove rows that do not pass criteria.
- Go to the optional GROUP BY clause…
- Go to the optional HAVING clause…
- Go to the SELECT clause and construct the expressions in the list.
Optimization can affect this order, of course, but the point is that you don’t complete the SELECT clause until the FROM clause is finished (or is partially constructed, at least). Something like select * from is a good template to start.
Beyond basic single table queries you most likely would want table aliases, not full table names in the SELECT clause, and there is no way telling what “e” in the
select e.
is without having the FROM clause build in advance.
Second, you type more this way. Compare
select emp.empno, emp.ename, emp.sal from emp
to
select e.empno, e.ename, e.sal from emp e
or even
select empno, ename, sal from emp e
(and note that “emp” is pretty short name for a table!). Code completion handles all the three cases seamlessly, but only after you complete the FROM clause (which you have to do anyway!)
The identifier case is controlled from the formatter preferences page. You can set the keywords be upper/lower, the whole statement to be upper/lower, or select “no change” so that you have explicit control.
Finally there are few quirks, that look odd at first. Suppose you are editing the following PL/SQL fragment:
PROCEDURE PROCEDURE2 (
in_arg IN NUMBER ,
out_arg1 OUT NUMBER)
AS
i NUMBER;
BEGIN
DECLARE
x NUMBER;
i NUMBER;
y VARCHAR2(1);
BEGIN
x:=1
END;
END PROCEDURE2;
You’ll be prompted local variable names correctly scoped when you type “x:=1″, but not the trailing semicolon which is “obviously” missing. This impression is wrong, however. There is lot more keywords and operation symbols that can be inserted after the constant 1 (including arithmetic operators again!), so that it overwhelms the list of suggestions.
The other example where code completion fails to make a suggestion is the case expression. Suppose you have
PROCEDURE PROCEDURE2 (
in_arg IN NUMBER ,
out_arg1 OUT NUMBER)
AS
i NUMBER;
BEGIN
DECLARE
x NUMBER;
i NUMBER;
y VARCHAR2(1);
BEGIN
i:=CASE WHEN
END;
END PROCEDURE2;
Then no local variable is suggested. This is because the completion algorithm tries only few insertions. More insertions would explode the search space and cause slow response time, this is why the completion tries to insert no more than 3 grammar symbols (terminals or nonterminals). The grammatical structure of the incomplete case expression can’t be repaired by adding 3 symbols, so the completion fails. You are advised to refine the case statement, say by adding the “END;” then returning to where you’ll get code completion prompted the local variables as it should.
Multiplication overflow in Matrix Encoding
January 14, 2008
I just noticed an exchange Help Implementing Nested Invervals using Maxtrix Encoding which never showed up in Google groups and, therefore, left unanswered. There Steven Johnston pointed out that multiplication of 32 bit integers (which is used when comparing if one tree node is a descendant of the other)would normally produce 64 bit integer. Then, what is the point using slightly more efficient Matrix encoding/Nested Intervals with continued fractions if integer multiplication overflow seems to render the whole method invalid? Well, the fix is straightforward, although not pretty. To reliably evaluate if
descendant.a11*node.a21-descendant.a11*node.a22 <=
node.a11*descendant.a21-node.a12*descendant.a21
one have to introduce a user defined function:
function isDescendant( da11 integer, na21 integer, da11 integer, ….)
Within the function implementation chop each 32 bit function argument into two 16 bit chunks. When multiplying the numbers, use the formula:
(A*216 + B)*(C*216 + D) = A*C*232 + (A*D+B*C)*216 + B*D
I understand the frustration of people who have to spend their time to workaround snags like this, but this is not really an interesting issue. More exciting question may be what if we change the ordinary addition&multiplication algebra (+,*) into tropical semiring (min,+)? There is no multiplication, so no matter what calculations we do the overflow is impossible. Unfortunately this idea don’t work ![]()
No Dual Required
September 20, 2007
Two corollaries to yesterday’s post.
1. The Dual table becomes redundant, because now
select X
where X = 1
is allowed.
Question: how does the query execution engine figure out the type of the variable X?
2. Integer generator query:
select X
where X < 100
No level, rownum, or connect by!
Integer Predicates
September 19, 2007
Another thought provoking OTN exchange.
To recap, given a relation with column QTY, for example:
| KEY | STRING | QTY |
| 1 | a | 2 |
| 2 | b | 3 |
produce a multiset such that each row of the original relation is duplicated the QTY times:
| KEY | STRING | QTY |
| 1 | a | 2 |
| 1 | a | 2 |
| 2 | b | 3 |
| 2 | b | 3 |
| 2 | b | 3 |
We’ll focus on the two solutions, the one suggested by Joe:
select t.*
from T
CONNECT BY CONNECT_BY_ROOT KEY = KEY AND LEVEL <= QTY
and “the standard” one
select t.* from T, (
SELECT level L from DUAL
connect by level <= (select max(qty) from T)
) iv where L <= QTY
On the surface it appears as if Joe just merged the inline view iv from the standard solution!
Let’s contrast them in more detail. The inline view in the standard solution is a segment of the infinite integer numbers relation. The eyesore level <= (select max(qty) from T) expression is a safety predicate: it is logically not required! We can think of joining the original table t with the set of all integers, the filtering out only those rows that satisfy the L <= QTY condition. Alternatively, we can just join the original table with the infinite L <= QTY predicate
| L | QTY |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
| 3 | 3 |
| 3 | 4 |
| 4 | 4 |
This imaginary SQL solution is remarkably succinct:
select t.* from T, LE
where LE.QTY = T.QTY
It could be reduced even more
select t.* from T
where L <= QTY
The normal SQL syntax wouldn’t allow any of these, of course. The first query refers to some relation LE which is not defined, while the second one refers to a variable L which is free. The answer is that SQL should be extended to admit predicates as first class citizens. The query execution engine should become aware of the infinite relations and be extended with the methods allowing to join predicates and relations.
In this particular example what is the join method? Clearly any join with LE as a leading relation is going to fail. The only feasible execution is to scan the finite T relation first, and try to find the matching tuples from LE by some sort of index lookup, otherwise we, again, have to make the full scan of LE. It is pretty clear what kind of index do we need, it has to be a function that accepts the QTY as an integer argument, and computes the set of all the (L,QTY) pairs that match.
With this idea in mind, here is the proposed syntax:
create predicate LE (
L integer,
QTY integer
);
create index IDX_LE
on LE (QTY) implemented as a function…;
As you see it differs from the traditional table/index SQL definition in two aspects:
- a user is not allowed to to modify the LE relation
- he has to supply the index implementation
This is the first step, which “fixes” the first query. The second step is for SQL engine to automatically recognize “built-in” predicates, so that the query
select t.* from T
where L <= QTY
become legitimate as well.
String Decomposition with Regular Expressions
July 12, 2007
Once in a while an interesting problem pops up on Oracle OTN forums
http://forums.oracle.com/forums/thread.jspa?threadID=530584&tstart=0
where Joe Fuda published an inspiring solution leveraging Regular Expressions.
The fact that regular expressions are so powerful that can generate a very elegant solution is not surprising at all, because they are based on deep and fruitful Programming Language Theory. Furthermore, there is a certain similarity between database and language theory operators. They both have a union operator which have identical properties, and they both have join which is slightly different. This partially explains why regular expressions feature fits snugly into SQL.
Let’s see how regular expressions can be leveraged upon a standard SQL pattern — String Decomposition. Given a table of strings
create table sources (
num integer,
src varchar2(1000)
);
insert into sources values(1, ‘123,45,6,7,8′);
insert into sources values(2, ‘8,9′);
commit;
select * from sources;
| NUM | SRC |
| 1 | 123,45,6,7,8 |
| 2 | 8,9 |
the query should output a relation with tuples containing individual substrings that were “separated by commas” in the original table
| NUM | SRC | Decomposed String |
| 1 | 123,45,6,7,8 | 8 |
| 1 | 123,45,6,7,8 | 123 |
| 1 | 123,45,6,7,8 | 7 |
| 1 | 123,45,6,7,8 | 6 |
| 1 | 123,45,6,7,8 | 45 |
| 2 | 8,9 | 8 |
| 2 | 8,9 | 9 |
When we decompose a string into a set of substrings, we change concatenation into union. In our example, concatenation is defined via comma character, so what we want to do is to replace the comma with regular expression disjunction operator. Here is the query:
select distinct num, src,
regexp_substr
( src,
replace( src, ‘,’, ‘|’ ),
1,level
)
from sources, dual
connect by
regexp_substr
( src,
replace( src, ‘,’, ‘|’ ),
1,level
) is not null
An alternative solution leveraging regular expressions negation operator
select distinct src, regexp_substr(src,’[^,]+’,1,level)
from sources
connect by regexp_substr(src,’[^,]+’,1,level) is not null
is slightly shorter, but I fail to notice much theory behind it.
What makes the method less practical, however, is the fact that each string fragment is derived multiple times — a virtue of CONNECT BY execution. An alternative solution with MODEL operator is much more verbose, but also gives a programmer much more control how the execution is driven.













