Renaming is an intuitive operation, although somewhat obscure from mathematical perspective. Indeed, mathematics don’t make much ado about renaming variables: it is generally assumed that variable names are not important, and that variables themselves can be renamed at will (subject to common-sense restrictions). It is Relational Algebra that elevated renaming into first-rank operation. There renaming is both explicit and honors several query transformation laws. This article casts Relational Lattice perspective into renaming.

Basic Renaming properties

Relational algebra wikipedia web page lists the following renaming properties

\rho_{a / b}(\rho_{b / c}(R)) = \rho_{a / c}(R)\,\!
\rho_{a / b}(\rho_{c / d}(R)) = \rho_{c / d}(\rho_{a / b}(R))\,\!

Let’s rewrite these equalities in Relational Lattice / QBQL terms:

(R /^ "a=b") /^ "b=c" = R /^ "a=c".
(R /^ "a=b") /^ "c=d" = (R /^ "c=d") /^ "a=b".

Here the /^ is set intersection join operation, and "a=b" is an identity relation with attributes a and b. Set intersection join is a natural join projected into the set of distinct relation attributes. QBQL queries in the other articles at this site should convince the reader that set intersection join is at least as common and important as natural join. Set intersection join resembles composition operation from the Algebra of Dyadic Relations, hence we’ll carry it over to our field (i.e. Algebra of Relations with Named Attributes).

Let’s leverage commutativity of composition, and rewrite the second equality into the form that would make the issue more transparent:

("a=b" /^ R) /^ "c=d" = "a=b" /^ (R /^ "c=d").

A similar trick with the first equality and leveraging the identity

"a=b" /^ "b=c" = "a=c".

demonstrates that both properties of renaming operation are essentially associativity laws of composition in disguise. Unfortunately, composition doesn’t honor associativity in general.

Conditional Associativity

If the three relations x, y and z have no attributes in common, then composition is associative. Formally:

R00 ^ ((x v y) v z) = R00 -> x /^ (y /^ z) = (x /^ y) /^ z.

where R00 is the empty relation with empty set of attributes. This is exactly what is needed to prove renaming properties. In the first case, the identity relations "a=b" and "b=c" have common attribute b, but relation R is not supposed to have it. In the second case, the identity relations "a=b" and "c=d" don’t have any common attributes to begin with.

Let’s try establishing conditional associativity formally: could it be proved from relational lattice axioms? We have general lattice axioms

x ^ y = y ^ x.
(x ^ y) ^ z = x ^ (y ^ z).
x ^ (x v y) = x.
x v y = y v x.
(x v y) v z = x v (y v z).
x v (x ^ y) = x.

amended with three axioms borrowed from the Litak et.al. paper

x ^ (y v z) = (x ^ (z v (R00 ^ y))) v (x ^ (y v (R00 ^ z))).
(x ^ y) v (x ^ z) = x ^ ( ((x v z) ^ y) v ((x v y) ^ z) ).
(R00 ^ (x ^ (y v z))) v (y ^ z) = ((R00 ^ (x ^ y)) v z) ^ ((R00 ^ (x ^ z)) v y).

Next, we need to introduce universal relation R11 captured with laws of Fundamental Decomposition Identity and its “dual”

x = (x ^ R00) v (x ^ R11).
R00 ^ (x v R11) = x ^ R00.

and inversion defined via

x` ^ x = R11 ^ x.
x` v x = R00 v x.

Now we have everything for defining composition:

x /^ y = (R00 ^ (x` ^ y`)) v (y ^ x).

Unfortunately, proving conditional associativity doesn’t turn to be a simple matter of copying and pasting these axioms and the goal into Prover9. The problem, at least in this form, seems to be too challenging for automated prover abilities. Mace4 established that the conditional associativity is valid for the domains up to size 20, at least.

The reader might wonder what was the purpose of this exercise; aren’t renaming rules obvious to begin with? Yes, they are. However, we learned several lessons here. From relational algebra perspective renaming rules (and others) are essentially ad-hoc. In one of the earlier papers we have already established that rewrite rules such as push-select-through-project can be firmly rooted into relational lattice foundation. We see that there is nothing special about renaming rules as well: the crux of the matter is associativity of some other operation. Knowing that renaming is essentially join with an identity relation and projecting away unwanted attribute, one may have suggest that properties of identity relation might be important too. This may have an implication for relational lattice theory, as perhaps we should add identity relation axioms as well. However, we have witnessed what really matters is the condition for the three relation not to have common attribute(s).


Distributivity properties

The rest of renaming properties

\rho_{a / b}(R \setminus P) = \rho_{a / b}(R) \setminus \rho_{a / b}(P)
\rho_{a / b}(R \cup P) = \rho_{a / b}(R) \cup \rho_{a / b}(P)
\rho_{a / b}(R \cap P) = \rho_{a / b}(R) \cap \rho_{a / b}(P)

handles just as easily. Rewritten into relational lattice notation the above assertions become

"a=b" /^ (R ^ P') = ("a=b" /^ R) ^ ("a=b" /^ P').
"a=b" /^ (R v P) = ("a=b" /^ R) v ("a=b" /^ P).
"a=b" /^ (R ^ P) = ("a=b" /^ R) ^ ("a=b" /^ P).

In relational lattice we have the following analog of Spight Distributivity Criteria (SDC):

R00 ^ (x v y) = R00 ^ (x v z) -> x /^ (y v z) = (x /^ y) v (x /^ z).

R and P have the same set of attributes, hence they satisfy the premise of SDC, and distributivity of composition/renaming over union follows. Again, we see that the nature of the identity relation "a=b" is irrelevant.

The other two identities reduce to distributivity of composition/renaming over join. Again, this is conditional property. It is easy to see that general distributivity of composition over join is invalid with relations on both sides of equality failing to have the same set of attributes! Run QBQL with formal assertion that distributivity has relations with the same set of attributes on each side of the equality, that is

(x /^ (y ^ z)) ^ R00 = ((x /^ y) ^ (x /^ z)) ^ R00.

and witness QBQL producing the following counterexample

z = [q]
     a
;
y = [p]
     1
;
x = [p]
     1
;

Previously cited wikipedia article claims modest distributivity of renaming over intersection, rather than more general distributivity of renaming over join. Both arguments of intersection operation having the same set of attributes in relational lattice terms is formal equality

y ^ R00 = z ^ R00.

which is stronger than the previous assertion that we refuted with QBQL. Neither of those premises fixes the distributivity law that we are after; we have to add the familiar condition that the three relations in questions have no attributes in common. Formally:

R00 ^ (x v y v z) = R00 &
(x /^ (y ^ z)) ^ R00 = ((x /^ y) ^ (x /^ z)) ^ R00 ->
x /^ (y ^ z) = (x /^ y) ^ (x /^ z).

Advertisements

7 Responses to “The Laws of Renaming”

  1. AntC Says:

    Hi Vadim: something you don’t cover, but I wonder how QBQL deals with it?:
    natural join of two relations with same attribute name but at different types.
    (This makes natural join not a total function, so breaks the lattice structure?)

    AntC


  2. QBQL is type agnostic. Otherwise, IMO type should always be a part of attribute name. Therefore, attributes with the same name and different types never match in natural join.

    • AntC Says:

      Your piece on ‘Independence of Relational Operations’ got me thinking:
      * the relational lattice is a full meet-and-join lattice.
      * so Natural join is a (partial) ordering all the way from R01 to R10.
      * You define relational complement equationally in terms of lattice meet and join.
      So is it possible to define Inner union equationally as the inverse of Natural Join?
      (That is, as an ordering that is upside-down relative to Natural Join.)

      Then we only need a single primitive operation(?)

      (BTW, apologies for posting on this thread: my browser isn’t liking wordpress’s comment boxes. This seems to be the only way I can post something.)


      • Let check your idea against simpler structure — Boolean algebra. Although, there is axiomatization with single operation, there is no axiomatization with single lattice join (or union). Likewise, for Tarski relation algebras there exist axiomatization with single operation, but this operation is not as intuitive as basic ones. For relational lattice it is unknown if there is such an operation (and it certainly has to be something more general than natural join).

  3. AntC Says:

    Thank you Vadim. So what ‘goes wrong’?
    Is NatJoin not well-defined for some relation values?
    Is Inner Union not well-defined?
    Is Inner Union not the ‘upside down’ of NatJoin?
    Does it help to make controlling assumptions — for example that the number of attributes in the schema’s Universe is finite, or that the domains are all finite?
    Does the lack of axiomisation make a difference to the practical inferencing you need for a DBMS?
    (Perhaps we can ‘make do’ with some weaker characterisation of the lattice space?)


    • Yes, inner union is “upside down” to natural join, and this connection is both well defined and formal (lattice duality). Regarding database practice and other applications, it is not clear yet what full implication of this fact is. Even when you have nice theory (think of Tarski relation algebras) its application success is not guaranteed. In terms of theoretical development relational lattice is decades (if not full century) behind relation/fork algebras. Speaking of relation/fork algebras, one of their latest development is algebraic formalization of database dependency theory (J.N.Oliveira) — pretty impressive.

      • AntC Says:

        Thanks Vadim. I’m finding that lattice duality and equational reasoning with partial ordering is challenging enough!
        Re Oliveira, I’m familiar with ‘Pointfree’ from functional programming (especially Haskell), but Relation Algebra (Tarski) is more mind-twisting.
        BTW I see that Oliveira makes frequent reference to Haskell’s so-called Functional Dependencies for type-level inference. This was perhaps legitimate in 2005, but not these days (IMO). Haskell FunDeps are not really comparable to relational database FDs, because they work across patterns not individual values. So they have always had their problems. (The most common symptom is undecidability and/or incoherence of type inference.) Starting around 2006, Haskell has introduced type-level functions, which are much more tractable.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: