## User Defined Operations, Nested Relations, and Sorting

In this chapter QBQL programming is advanced in several directions. First, we introduce user defined operations, which, among other things, demote inner join and outer union from being explicitly implemented within QBQL kernel. As a side effect, we deprecated the “*” and “+” notation — using arithmetic symbols operating over anything but numbers was just bad programming taste. Now, Algebra A inspired syntax for user-defined operations should appeal to Tutorial D enthusiasts. Our second theme is nested relations, which allows alternative expression for set joins. The final venue is getting relational insight into ubiquitous programming exercises such as finding maximum element in a set and sorting.

User Defined Operations

User defined operation is just an expression introducing a new operation in terms of built-in primitives and, perhaps, some other user defined operations. For example, the two assertions

```x <AND> y = x ^ y.
x <OR> y = (x ^ (y v R11)) v (y ^ (x v R11)).
```

have two binary infix operations `x <AND> y` and `x <OR> y` on the left hand side. Syntactically, all user defined operations are angle bracket delimited alphanumeric strings; please refer to QBQL grammar for exact syntax. Assuming that QBQL didn’t come across definitions of these operations before, it would interpret the above two code fragments as definitions in terms of operations at the right hand side: the natural join `^` and inner union `v` built-in operations.

In earlier versions of QBQL all operations, including `x <OR> y`, were built-in. Now that we have demoted some of them to the rank of user-defined ones, QBQL kernel becomes more succinct. The other benefit is that we can try alternative definitions and choose the best performer. For example, definition leveraging De Morgan law

```x <OR> y = (x' ^ y')'.
```

is more intuitive, but 70% slower.

What about unary operations, can these be user defined as well? One may be tempted to introduce an operation of emptying relation content, that is getting relation header:

```(<Header> x ) = x ^ R00.
```

Parenthesis at the left hand side of definition are just implementation quirk: QBQL requires to wrap unary user-defined operations with them just to make the grammar less ambiguous. The utility of this is operation debatable; perhaps more useful might be binary projection operator

```(R <Project> A) = R v (<Header> A).
```

which is clearly inferior to inner union.

Let’s study more useful example. Newer versions of QBQL generalized implementation of the built-in complement operation `'`, which now can be applied to any predicate. As we’ll see later, defining the “greater than” predicate as complement of “less or equal”

```GT = LE';
```

is much more elegant than coding index functions in Java. However, the built-in complement is allowed to be applied only within safe (that is domain-independent) context. For example, the expression

```[lft rgt] 1 1 ^ LE';
```

where we join a single tuple binary relation `[lft rgt] 1 1` with predicate `LE'` is legal. In general, the complement of relation `x` is allowed to be joined with any relation which attributes are superset of that of `x`. This appears to be quite restrictive, but we will see that QBQL query transformation engine gets around this limitation, so that we’ll be able to define quite sophisticated predicates like this one

```GE = LE' v [lft=rgt];
```

Here we have unioned two infinite objects: a complement to “less or equal” and equality predicates!

A complement operation in unsafe context, for example, QBQL query “what all the names which are not Smith”

```([name] Smith)';
```

throws an exception. Compare this to SQL world, where the safety problem is resolved by replacing unsafe unary complement operation with binary safe operations of set difference and, more general, antijoin. Unlike SQL, QBQL is focused both on programming and research, and unary complement operation is superior from theoretical perspective. Even if it were possible to express De Morgan law in terms of antijoin, it would hardly be reminiscent of the familiar succinct Boolean algebra equality. Now that we have limited the scope of complement operation, is it still possible to express DeMorgan law?

We introduce unsafe complement as user defined operation:

```(<NOT> x) = (R11 v x) ^ x'.
```

Here, the left hand side introduces unary prefix operation `<NOT> x` in terms of built in complement within the safe context. It is the relation `R11` that indicates that the whole expression is potentially domain-dependent. This the idea: the domain dependence is conveyed via explicit reference to domain dependent relations such as `R11` (and `R10`).

Here is DeMorgan law in new notation:

```(<NOT> x) <OR> (<NOT> y) = (<NOT> (x <AND> y)).
```

Nested Relations

Relational model is agnostic about the nature of attribute values; those can be anything, including relations themselves. QBQL syntax requires nested relations be demarcated with parenthesis, for example

```x = [name set]
"{0}" ([m] 0)
"{0,1}" ([m] 0 1)
"{2}" ([m] 2) ;
```

defines relation `x` with “scalar” attribute `name` and “composite” attribute `set`. Parenthesis are mandatory even in the case where nested relations are refered by name, rather than inlined:

```m0 = [m] 0;
m01 = [m] 0 1;
m2 = [m] 2;
x = [name set]
"{0}" (m0)
"{0,1}" (m01)
"{2}" (m2) ;
```

without them

```x = [name set]
"{0}" m0
"{0,1}" m01
"{2}" m2);
```

the `m0`, `m01`, and `m2` are interpreted as ordinary string literals (remember, the double quote symbols around literals are optional; and designed to wrap “complex” literals like `"{0,1}"`).

QBQL has built-in set join operations, but nested relations provide an alternative. For example, set equality join (aka relational division) can be expressed as

```x = [lset lft]
"{0}" ([m] 0)
"{0,1}" ([m] 0 1)
"{2}" ([m] 2) ;
y = [rgt rset]
([m] 0 1) "{0,1}"
([m] 0 2) "{0,2}";
(x ^ y) /^ [lft=rgt];
```

which outputs

```(x^y)/^[lft=rgt]=[lset rset] {0,1} {0,1} ; ```

Formally speaking, we have used set intersection join, but it is really a clever shortcut for natural join projected to distinct attributes:

```x /^ y = (x^y) v (R00 ^ (y v x))`.
```

Next, set containment join query

```x = [lset lft]
"{0}" ([m] 0)
"{0,1}" ([m] 0 1)
"{2}" ([m] 2) ;
y = [rgt rset]
([m] 0 1) "{0,1}"
([m] 0 2) "{0,2}";
(x ^ y) /^ LE;
```

produces

```(x^y)/^LE=[lset rset] {0,1} {0,1} {0} {0,1} {0} {0,2} {2} {0,2} ; ```

The `less-or-equal` predicate implementation is found in QBQL library:

```public class LE {
public static Relation lft_rgt( Object lft, Object rgt ) {
if( !(lft instanceof Comparable) || !(rgt instanceof Comparable) )
throw new AssertionError("LE !lft/rgt instanceof Comparable");
Comparable l = (Comparable) lft;
Comparable r = (Comparable) rgt;
if( l.compareTo(r)<=0 )
return Database.R01;
else
return Database.R00;
}
}
```

It required the values `lft` and `rgt` to be comparable, and relations are declared to be comparable by the virtue of relational lattice order (which generalizes set containment aka subset relation).

Set intersection join can be expressed as well

```x = [lset lft]
"{0}" ([m] 0)
"{0,1}" ([m] 0 1)
"{2}" ([m] 2) ;
y = [rgt rset]
([m] 0 1) "{0,1}"
([m] 0 2) "{0,2}";
(x ^ y) /^ IX;
```

with the output

```(x^y)/^IX=[lset rset] {0,1} {0,1} {0,1} {0,2} {0} {0,1} {0} {0,2} {2} {0,2} ```

where the set intersection predicate `IX` implementation is

```public class IX {
public static Relation lft_rgt( Object lft, Object rgt ) {
if( !(lft instanceof Relation) || !(rgt instanceof Relation) )
throw new AssertionError("LE !lft/rgt instanceof Relation");
Relation l = (Relation) lft;
Relation r = (Relation) rgt;
return Relation.union(Relation.join(l,r),Database.R00);
}
}
```

`Finding Maximum`

Sorting algorithms are at the hart of procedural programming. Does Relational theory suggest any insight beyond pedestrian concept of `order by` clause in SQL? Before engaging into sorting, let’s examine much simpler problem of finding maximal element. We have explored set joins and aggregates already, however, querying maximum could be expressed without these tools. So, given an unary relation of totally ordered elements (numbers)

```x = [t] 1 3 5 17;
```

together with binary comparison relation `LE`, which we have introduced in the previous section, what relational algebra expression returns the maximum? Writing a query in any language is akin to solving a miniature mathematical problem, and mathematical problems are often solved by divide-and-concur method. For our purposes the `LE` predicate seems to be too overwhelming; for one thing we can’t even print its content! If we restrict the scope of the `LE `predicate to the domain of numbers in the relation `x`, then we don’t seem to loose anything, and yet we get a handy little relation to ponder:

```lex = ((x /^ [t=lft]) ^ (x /^ [t=rgt])) ^ LE;
lex;
```

This code produces the following QBQL output:

```lex=[lft rgt] 1 17 1 1 1 3 1 5 17 17 3 17 3 3 3 5 5 17 5 5 ;```

Under close examination it looks like the set maximum (that is `17`) on the right side is linked to all the numbers in `x`. There we have it — set equality join (aka relational division):

```(x /^ [t=lft]) /=  lex;
```

which outputs

```(x/^[t=lft])/=lex=[rgt] 17 ;```

as expected.

At this point a reader might be protesting: “Wait, all that cryptic code to find a maximum? A trivial loop with an if statement? Really?” Hold on, we’ll get into sophisticated stuff.

Next, aggregation with grouping “find maximum in each group” for

```x = [grp elem]
a    1
a    3
a    5
b  17
b  20;
```

Let’s try the previous query (with obvious attribute renaming to match changed attribute names in the new input):

```lex = ((x /^ [elem=lft]) ^ (x /^ [elem=rgt])) ^ LE;
(x /^ [elem=lft]) /=  lex;
```

Its empty output caught me by surprise, because I was expecting the query to scale up without any code modification. Moreover the output relation have one less attribute, which hints that, perhaps, renaming the group name attribute before performing set equality join might help. Indeed

```(x /^ [elem=lft] /^ [grp=g]) /=  lex;
```

produces

```(x/^[elem=lft]/^[grp=g])/=lex=[g grp rgt] a a 5 b b 20 ; ```

We solved the new challenge with smaller modifications of the original maximum query compared to procedural programming. Still not convinced?

Let’s query maximum in a set which is not totally ordered. From relational lattice theory we know that relations themselves are such objects (with the ordering being generalization of set containment); so why don’t we try nested relations from the previous section? Here is the unmodified original query applied to a new input:

```A=[p]
1
;
B=[p]
1
2
;
C=[q]
a
;
D=[q]
a
b
;
E=[p]
2
;
x = [t] (A) (B) (C) (D) (E);
lex = ((x /^ [t=lft]) ^ (x /^ [t=rgt])) ^ LE;
(x /^ [t=lft]) /=  lex;
```

with the empty output. In this case the query is wrong because looking for elements which are greater or equal than all the others would not work for partially ordered sets. We should have queried all the elements for which no greater elements exists. Let’s introduce two new relations

```GE = LE' v [lft=rgt];
gex = ((x /^ [t=lft]) ^ (x /^ [t=rgt])) ^ GE;
```

Here we abused the relation naming convention a little: the “GE” name indicates that this relation has something to do with “greater or equal”, but this is true only for total orders. The second point, that we teased a reader in previous section, is worth repeating. Let’s reflect on the fearless definition of the `GE` predicate. It is complement of some predicate unioned with some other predicate. QBQL has no facilities to perform this sequence of operations verbose. When `GE` is joined with some relation, however, QBQL would try to apply Spight distributivity criteria and rewrite a query with join over union into union over join.

With amended comparison relation our query

```(x /^ [t=rgt]) /=  gex;
```

returns

```(x/^[t=rgt])/=gex=[lft] ([p] 1 2 ) ([q] a b ) ;```

as expected.