September 1, 2011
Here is curious identity:
TABLE_DUM divided by a relation evaluates to the relation’s complement.
Formally in QBQL:
TABLE_DUM /= x = <NOT> x.
What division am I talking about? Googling “relational division” brings up a list of usual suspects — articles by Date, Celko, etc — with some confusion leading to “Todd’s division”, ternary (!) division operation and so on. QBQL follows much cleaner definitions of set equality join, set containment join and alike readily found in academic literature. In the above identity the “/=” is binary set equality join operation, informally known as relational division.
Dual identity is also interesting. First, the dual of
R00 is the universal relation
R11. The dual of unary complement operation is inversion. Duality among multiple division-like operations is not evident, but the reader might verify that set intersection join analogous to composition in the algebra of Binary Relations fits the bill:
R11 /^ x = <INV> x.
July 29, 2011
How weird is the language you are programming on? User survey on stackoverflow is the second most voted thread. Somewhere on page 2 we encounter the following SQL code snippet:
create table wtf (key number primary key, animal varchar2(10));
insert into wtf values (1,'dog');
insert into wtf values (2,'');
insert into wtf values (3,'cat');
select * from wtf where animal 'cat';
I don’t agree with the sentiment of that post. How exactly returning an animal (or any other thing for that matter) named with empty string of characters would make user happy? For all practical purposes empty string of data is useless and equating it with NULL just reduces complexity.
June 22, 2011
Oracle 11g introduced “syntax aware” code search facility — PL/Scope. One can look up identifier definitions, usages, exceptions, which is much more sophisticated and targeted search compared to the old way of finding all of the occurrences of a string in the
ALL_SOURCE. Certainly few developers are expected to use PL/Scope À la carte. IDEs, such as SQL Developer, accommodated it within its [object] search. Here is a screen shot illustrating a search of formal parameter
So, with PL/Scope available is the old way, that is finding all of the occurrences of a string in the
ALL_SOURCE, obsolete? Yes, it has been effectively deprecated, which turned out to be ill-considered decision. Apparently, some people like the old way better, especially that there is no way to search the comments. The fix is expected in next SQL Developer release, but for now here is a workaround. Among the other things it demonstrates that many features are nearly duplicated, and certainly a search can be accommodated by SQL Developer reports. In fact, there is a off-the-shelf report “Search Source Code” already, although many would find the
owner = USER predicate impeding their goal. With little effort, however, one can customize it to approximate to what the Search Panel does (for RDBMS versions below 11.1). One can even prettify the output with html formatting; here are the steps:
1. Go to reports tab -> User Defined Reports and select “Add Report” from context menu
2. Name the report, e.g. “Fancy all_source”, and copy and paste the following prettified query
select owner "Owner", name "PL/SQL Object Name", type "Type", line "Line", '<html><font bgcolor=#dddddd>' ||substr(text,1,instr(upper(text),upper(:TEXT_STRING))-1)||'</font>' ||'<font bgcolor=#bbffbb>' ||substr(text,instr(upper(text),upper(:TEXT_STRING)),length(:TEXT_STRING)) ||'</font>' ||'<html><font bgcolor=#dddddd>' ||substr(text,instr(upper(text),upper(:TEXT_STRING))+length(:TEXT_STRING)) "Text", owner sdev_link_owner, name sdev_link_name, type sdev_link_type, line sdev_link_line from sys.all_source where (:TEXT_STRING is null or instr(upper(text),upper(:TEXT_STRING)) > 0) and length(text)<3000 --not like '%wrapped%' and name not like 'BIN$%' order by owner, name, type, line
3. Use Go To option from context menu on PL/SQL object name to navigate to the PL/SQL compilation unit.
Here is a screen shot of the report output:
March 2, 2011
Here is unconventional way to write a query in ANSI SQL join syntax
SELECT 1 FROM DEPARTMENTS C JOIN EMPLOYEES A JOIN JOBS B ON C.DEPARTMENT_ID = A.DEPARTMENT_ID ON A.JOB_ID = B.JOB_ID
It is valid syntax according to ANSI 92. Indeed, the fragment
EMPLOYEES A JOIN JOBS B ON C.DEPARTMENT_ID = A.DEPARTMENT_ID
qualified join which is a
joined table which is a
Therefore, one can just take conventional restrict-project-cartesian product query, and replace all the commas in the
FROM clause with the
JOIN keyword, then replace
AND keywords with the
An argument in favor of ANSI style is allegedly cleaner textbook example
SELECT 1 FROM DEPARTMENTS C JOIN EMPLOYEES A ON C.DEPARTMENT_ID = A.DEPARTMENT_ID JOIN JOBS B ON A.JOB_ID = B.JOB_ID
although, as we have witnessed, ANSI grammar doesn’t enforce it. A developer is free to mix the first and the second variant in unbelievably messy query!
My feelings is reflected in a quote by Anthony Molinaro (author of “SQL Cookbook”):
“Old style is short and sweet and perfect. ANSI dumbed it down, and for people who’ve been developing for sometime, it’s wholly unnecessary”.
February 14, 2011
Here is a solution to Iggy Fernandez The Second SQL challenge:
WITH T(W2,OUT) AS (
SELECT WORD2, WORD1 || ' ' || WORD2 || ' ' || WORD3 FROM RIDDLE
T1.out || ' ' || R.WORD2 || ' ' || T2.out
FROM T T1, RIDDLE R, T T2
WHERE T1.W2 = R.WORD1 AND T2.W2 = R.WORD3
) select out from t
(Be sure to have table columns of sufficient width). It outputs something like:
TRYING TO TYPE ONE HUNDRED DISTINCT WORDS IN A SINGLE PARAGRAPH IS REALLY TOUGH IF I CANNOT REPEAT ANY OF THEM THEN PROBABLY THOSE WITH MANY LETTERS SHOULD BE USED MAYBE SOME READERS WILL UTILIZE DICTIONARIES THESAURUSES THESAURI OR POSSIBLY EVEN ENCYCLOPEDIAS BUT MY PREFERENCE HAS ALWAYS BEEN THAT GRAY MATTER BETWEEN YOUR EARS SERIOUSLY MARILYN CHALLENGES SUCH AS THIS REQUIRE SKILLS BEYOND
P.S. It looks like I didn’t figure out tree structure correctly as the rhs of the tree is almost gibberish:
SCIENCE AND PHYSICS SO WHAT DO YOU ASK READING COMPREHENSION WRITING ABILITY GOOD OLD FASHIONED ELBOW GREASE SCIENTISTS DONT CARE ABOUT STRUCTURE THEY WANT RESULTS HEY LOOK ONLY ELEVEN MORE LEFT
Why the forementioned query didn’t connect it to the first part is another mystery…
November 20, 2010
October 14, 2010
SQL has many flaws, and the fact that its syntax poorly reflects duality of pivot and unpivot queries is probably somewhere at the bottom of the list. To be fair, relational algebra doesn’t do any better. What are pivot and unpivot queries in QBQL?
Consider the following matrix example:
This 2×2 matrix can be represented relationally in two ways:
• as ternary relation with two subscripts and one entry attribute:
Au=[i j s] 1 1 3 1 2 1 2 1 1 2 2 0 ;
• as rectangular array with dimension j pivoted into the columns j1 and j2:
Ap=[i j1 j2] 1 3 1 2 1 0 ;
Ap in terms of
Au is pivot operation, while representing
Au in terms of
Ap is unpivot.
Au = ((Ap /^ [j1=s]) /^ [j]1) v ((Ap /^ [j2=s]) /^ [j]2) . Ap = ((Au /^ [j1=s]) /^ [j]1) ^ ((Au /^ [j2=s]) /^ [j]2) .
Notice that both expressinos are almost identical, so the duality of pivot and unpivot is exposed as formal relational lattice duality of join and union.
October 8, 2010
QBQL is, to put it bluntly, “Relational Algebra done right”; and this posting is about the recent improvements.
include directive allows nesting QBQL programs inside each other. For example, the code snippet
[x] 1 2 <OR> [x] 2 3;
refers to user-defined operations introduced in the udf.def file (with generalized union operation
<OR> among them) and then executes the query.
3. Artificial distinction between database and program files is gone. To run the program you specify a single argument — the program name. For example,
C:\> java -jar C:\qbql_trunk\dist\qbql.jar C:\qbql_trunk\src\qbql\program\current.prg
runs the program
current.prg. It is up to the
current.prg to instantiate any database(s) with the include directive. (As we have seen in the previous bullet one might actually don’t need any explicit database initilalization).
Little more detailed instructions how to run QBQL are on updated Getting Started page.
September 30, 2010
The previous post about Boolean datatype sparkled a lively discussion at comp.databases.theory. Here, I’ll embark upon Tony Andrews comment that SQL DBMS world somehow abhors unary relations. This is very strange from theoretical perspective, as one can expect lower arity relations be prevalent according to Zipf law.
This is one more artifact of infamous impedance mismatch between front-end application and DBMS. The client behavior is imperative: “Tell me if my customer is creditworthy or not, so I can proceed with her application. (I don’t care about the list of all other creditworthy customers!)” . It clearly wants a boolean answer. What database reply might be? “Look, I supplied all the [relevant] information, isn’t decision and taking action your responsibility?”
P.S. We all know that ORM tools (Hibernate, Linq, etc) solved impedance problem already, so the issue is no longer there! Just kidding.
September 16, 2010
The question if
Boolean is a legitimate SQL datatype is the source of long debates; witness the 10 years long discussion at Ask Tom, for example. Here is my perspective.
To begin with, what is Boolean datatype, and why it is ubiquitous in each and every procedural programming language? Mathematically, depending how you look into it, Boolean objects form a ring or lattice. Then, any algebraic system prompts a datatype. We have integers (of finite and infinite precision), floating point numbers (as bastardized version of reals), alphanumeric strings, complex numbers, vectors, matrices, and so on. Boolean is just yet another type among many (and not the one that is terribly hard to implement, either).
However, relational databases are different. Relations are predicates and the simplest form of predicate is a proposition. For example, the assertion “Employee ___ has been hired at ___” is a binary predicate, while “Employee SCOTT has been hired at 01/01/2010″ is a proposition. What values does a proposition have? Here, you have it: True and False. The relations whose values are True and False are sometimes known in database world under colorful names of
DUM, but practitioner would have hard time finding anything of resemblance in his favorite RDBMS. The takeout lesson here is that if we allow propositional values of relational attributes then, perhaps, why don’t we allow predicates as well? This leads us to the old idea of nested relations.
Let’s get back to earth, and borrow an example from the above Ask Tom discussion. Consider the Boolean attribute
IS_CREDIT_WORTHY of the relation
CUSTOMERS. This is sloppy database design: it is obvious that for credit agency this Boolean attribute is calculated, while genuine customer data contains credit score. Given a relation
CUSTOMER with attribute
CREDIT_LIMIT one can always make a view
create view CREDIT_WORTHY_CUSTOMERS as
select * from CUSTOMERS where CREDIT_LIMIT > 0
Therefore, I interpret all these pleas for Boolean datatype as “I’m too lazy (or impatient) to do proper database design, please give us Boolean shortcut. (After all, you have stuffed RDBMS with other things that don’t fit there, like Blob, XML and Java!)”. Customer is always right isn’t e?