Introduction

SQL proved to be quite resilient language to dethrone. It occupies a narrow niche so that it is generally not even listed among general-purpose programming languages. This artifact originated after relational database theory inventor — E.Codd — who put emphasis on database access sublanguage in order to avoid unnecessary competition with mainstream programming languages.

One of the reasons why SQL remains nearly ubiquitous for so long is that relational theory didn’t progress much after its glory days. Relational Algebra and Relational Calculus remain the state of the art, so that any new language based on them wouldn’t suggests any new insights (Darwen&Date Tutorial D notwithstanding).

In this article we introduce new general purpose programming language — QBQL (cubicle, quasi boolean query language?) QBQL advances several new ideas. Its foundation is not Relational Algebra but Relational Lattice. This results in succinct language which is friendly with query optimizations. Constraints (which are known in programming language community as assertions) are naturally expressed as Relational Lattice identities and inequalities. QBQL also generalizes finite relations into predicates. Each predicate is defined by its indexes which are procedural functions; contrast it with OOP perspective where object behaviour is defined through methods.

Relations and Operations

Data in Relational Lattice are organized as relations, which are explicitly defined as finite sets of tuples, or predicates which could also be viewed as relations with potentially infinite content. In this section we focus exclusively on relations, which QBQL syntax

People = [name    sex]
          max     male
          peter   male
          joe     male
          claire  female
;

is reminiscent of the database table import/export CSV format. There are two more pragmatic reasons why to prefer it over standard set-theoretic notation

People = {
  <name=max,sex=male>,
  <name=peter,sex=male>,
  <name=joe,sex=male>,
  <name=claire,sex=female>;
};

First, there is only one empty set, while relational theory needs more than one empty relation. As a consequence, set-theoretic notation can’t express an empty relation. Second, QBQL outputs queries and relations in table form; it is a matter of convenience cutting-and-pasting them back into QBQL input worksheet. After switching to table syntax a year or so ago, I’ve never looked back to set-theoretic syntax and, eventually, deprecated it.

Technically, both expressions define named relations; they are assignments of a relational value to relational variable People. Any relational expression can use either (previously defined) relational variable, or inline their defintitions. For example, given the two relations

Pets=[nickname  specie]
      folly    dog
      scruffy  cat
;

and

Fed=[name  nickname  time]
     claire  folly   200
     max   folly    200
     max   folly    300
     max   scruffy  200
;

the expressions

Pets ^ Fed;

and

[nickname  specie]
      folly  dog
      scruffy  cat ^ Fed;

produce the same output

Pets^Fed=[name nickname specie time]
claire folly dog 200
max folly dog 200
max folly dog 300
max scruffy cat 200
;

The caret symbol ^ is the [natural] join operation. Join in Relational Lattice remains identical to that of Relational Algebra. Unlike Relational Algebra, however, Relational Lattice generalizes the set union operation to be applied to any pair of relations. In our example

Pets v Fed;

outputs

PetsvFed=[nickname]
folly
scruffy
;

Much of the Relational Lattice theoretic work exploits the duality between lattice meet ^ and join v. The fact that database join in lattice terminology is a meet, and not a join, is unfortunate and confusing, but has no effect on QBQL programmatic capabilities. Most of other Relational Algebra operations are expressed in terms of [natural] join and [generalized] union, e.g. projection of Fed onto the set attributes {time} is expressed as

Fed v [time];

which outputs

Fedv[time]=[time]
200
300
;

Lattice operations enjoy many algebraic properties, and one of them — symmetry — has immediate consequence for QBQL programming. The alternative way to express projection is

[time] v Fed;

The other property — associativity — allows us to omit parenthesis in a chain of joins or chain of unions, e.g.

Pets ^ Fed ^ People;  

QBQL program consists of a sequence of either assignments or constraints. For this tutorial we’ll limit our scope to identity constraints which are syntactically equations with trailing dot. For example,

[time] v Fed = Fed v [time].

asserts the algebraic symmetry property that we had mentioned above. QBQL checks every assertion and doesn’t output anything unless the constraint is violated.

Predicates

A system that works with finite relations is quite limited, this is why QBQL allows infinite relations that is predicates. For example, it is useful to have the predicate Integers that would generalize its finite incarnations, such as the relation

IntegersLessThanTen =  [num]
0 1 2 3 4 5 6 7 8 9;

Arguably, ternary relations corresponding to integer arithmetic operations would be even more useful

FiniteDivision =  [numerator denominator fraction]
2 1 1
4 2 2
6 3 2
6 2 3
8 4 2;

Naturally predicates are no longer defined as sets of tuples. The key idea is to define predicates via a set of functional dependencies. Each functional dependency (which abusing the language we’ll call just a function) has to have a procedural implementation. QBQL is implemented in java, therefore it is convenient to have functions coded as java methods. By convention when QBQL finds an expression containing undefined relational variable (that is a name which syntactically resolves to relation or predicate) it assumes it to be a predicate. Then, it looks up for function implementations. For example, the statement

Division ^ [denominator] 4 ^ [fraction] 2;

has the Division relvar undefined. Therefore, QBQL assumes that the Division is a predicate defined in java class Division. Then QBQL invokes java reflection and lists all the Division methods. Let’s assume we implemented it as

public class Division {
    
    static int numerator_denominator_fraction( int den, int fra ) {
        return den * fra;
    }
    
    static int denominator_numerator_fraction( int num, int fra ) {
        return num / fra;
    }

}

Since java arguments are not named but positional, and named perspective is paramount for relational databases, we had to encode argument names into function name. In our example, the two methods numerator_denominator_fraction and denominator_numerator_fraction encode functional dependencies

{denominator,fraction}->{numerator}
{numerator,fraction}->{denominator}

Why did we define predicates this way, and, more important, how do we possibly relationally join or union predicates? This is the theme of the next section.

Architecture: System R meets Relational Lattice

The central optimization idea pioneered by System R was combinatorial enumeration of all possible join orders and comparing them based on their cost. For example, the join of three relations x, y and z

x ^ y ^ z

can be evaluated in one of 12 possible ways

(x ⋈ y) ⋈ z        z ⋈ (x ⋈ y)
(y ⋈ x) ⋈ z        z ⋈ (y ⋈ x)
(y ⋈ z) ⋈ x        x ⋈ (y ⋈ z)
(z ⋈ y) ⋈ x        x ⋈ (z ⋈ y)
(x ⋈ z) ⋈ y        y ⋈ (x ⋈ z)
(z ⋈ x) ⋈ y        y ⋈ (z ⋈ x)

Each join can be evaluated with different join method, but we are primarily interested in only one — Nested Loops. The name of the method reflects the algorithm of joining relation x with y. For each tuple of x Nested Loop algorithm scans the second relation y, finds the matching tuples, and combines them into the result tuples. This is clearly quite inefficient, the amended version of the algorithm called Indexed Nested Loops leverages an index on the relation y so that it doesn’t have to fully scan it.

The Indexed Nested Loop algorithm provides us with critical idea how to calculate join of relation with predicate. Consider evaluating x ^ y where x is a predicate and y is a relation. If we evaluate it literally in the same order as written, then no matter what algorithm we use — indexed or vanilla nested loops — we wold face an insurmountable obstacle of scanning the whole x, which potentially contains infinite number of tuples! In other words, we fail on the very first step, let alone trying to somehow perform the second nested loop phase. Therefore, the only feasible alternative is to scan finite relation y first. The second phase depends upon which variety of the algorithm — indexed or vanilla nested loops — we chose. Again, the vanilla nested loops won’t work because we cant scan predicate. Therefore the indexed nested loops is our the only option. Yet, what may serve the role of predicate index? It is the predicate defining functions that we’ll leverage as indexes.

Let walk through the evaluation of our earlier predicate expression

Division ^ [denominator fraction] 4 2;

The Division is a predicate, therefore the relation [denominator fraction] 4 2 is evaluated first. It contains a single tuple which is supposed to be matched against the predicate. QBQL finds the Division class, reflects upon its methods, and derives that the output relation must contain the set of attributes {numerator,denominator,fraction}. Out of three values of the output tuple we know the two: denominator=4 and fraction=2. The missing third value is calculated via “index access path” by invocation of the numerator_denominator_fraction() method.

“Hello World” in QBQL

OOP groups methods into objects. QBQL associates them into predicates. It is instructive to take a ubiquitous Java class, such as java.lang.String and check if its methods aggregate into anything meaningful in QBQL. Unfortunately, we can’t do this without applying little thought, because objects contain state, and QBQL operate stateless (that is static java) methods. Further analysis evidences that the String functionality is captured into two predicates Cat and Substr, which we can illustrate with their finite reincarnations:

FiniteCat = 
Hello 1 H ello
Hello 2 He llo
Hello 3 Hel lo
World 1 W orld
World 2 Wo rld
;
FiniteSubstr = 
Hello 1 2 e
Hello 1 3 el
Hello 1 4 ell
World 1 2 o
World 1 3 or
;

Relational programming is about expressing one relation or predicate in terms of the others, and in this case we have

Substr = 
( Cat(source,from,prefix,postfix) 
^ Cat(postfix,to,fragment,dummy) ) 
v ; 

Here we used comma separated lists of arguments to denote argument renaming. In the first case the canonical set {source from prefix postfix} of the Cat arguments is renamed into {source,from,prefix,postfix}, while in the second case it is replaced with {postfix,to,fragment,dummy}. (Aside: Relational lattice allows to represent renaming in terms of fundamental operations join and union together with equality predicate, although this observation is too weak to capitalize upon).

The Cat java definition is straightforward:

public class Cat {

    public static String prefix_source_from( String source, int from ) {
        return source.substring(0,from);
    }

    public static String postfix_source_from( String source, int from ) {
        return source.substring(from);
    }

}

Now we can run some QBQL queries:

Cat ^  Hello 3;

which outputs

Cat^[sourcefrom]Hello3=[from postfix prefix source]
3 lo Hel Hello
;

while

Cat ^  Hello World ^ [from] 3;

outputs

Cat^HelloWorld^[from]3=[from  postfix  prefix  source]
                                3  ld  Wor  World
                                3  lo  Hel  Hello

The fact that it worked in the second case is remarkable because we didn’t specify the order of evaluation. Unlike System R, QBQL doesn’t evaluate all join orders (let alone cost them). It evaluates the first legitimate permutation of joins which doesn’t generate an intermediate infinite relation. In the second case, we were just fortunate that it evaluated the expression in this order

Cat ^ ( Hello World ^ [from] 3);

as opposed to

(Cat ^  Hello World) ^ [from] 3;

The latter fails because there is no appropriate index!

Lets check some constraints

Cat ^ [] = [from  postfix  prefix  source].

which asserts that the atrribute set of the Cat predicate is the set {from,postfix,prefix,source}. Here the [] is the empty relation with empty set of attributes (aka DUM in D&D terminology).

Conclusion and Further Directions

QBQL began as relational theory experimentation tool, and it looks promising to turn into a general purpose programming language for wider acceptance. However, there are several implementation holes to be patched. Join between two predicates is not implemented yet. For example, the following program (“Find all the occurencies of the substring o inside the set of strings {Hello,World}“)

Substr = (Cat(source,from,prefix,postfix) ^ Cat(postfix,to,fragment,dummy)) v ;

(substr ^  Hello World ^ [fragment] o) v [from];

would fail to execute. This is not a fundamental flaw: certainly a complex predicate which is a join (or even more complex expression) can be defined via composition of functions. Alternatively, complex predicates can be named and supplied with their own indexes, so that the above expression can be rewritten as constraint

Substr = (Cat(source,from,prefix,postfix) ^ Cat(postfix,to,fragment,dummy)) v .

More important is interpreation of recursion. Relational theory contains somewhat ad-hock definition of recursion — transitive closure — and the challenge is to equip QBQL with transitive closure in a natural fashion. An obvious QBQL candidate for recursive closure is an equation with a relvar on both sides, so that transitive closure is a relation value that satisfies it. Yet, we after not any solution but the “least fixed point” one, and it is unclear how to express it formally.

2 Responses to “Relational Programming with QBQL”

  1. Eric Says:

    Fascinating! I look forward to seeing further QBQL developments.
    (Btw, I noticed a small typo: “let along” should be “let alone”.)

  2. Luke Bradley Says:

    Congratulations on your work with this, its a brilliant idea. I work as a web developer, which basically always consists of mapping databases to web forms, using no less than 5 languages. PHP, SQL, HTML, CSS and Javascript. Having a language which can function as database and programming language would really be useful, especially if its expressive enough to deal with the tree structures of well formed XHTML pages, thus combing three languages to one.
    I’m also impressed by how mathematically expressive this language could be. It always shocks me how hard it is to implement any but the most basic mathematical concepts in programming languages, without something like MATLAB or Mathematica. Keep up the good work.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: