Introduction to Nested Sets

There are many ways to perform hierarchical queries in SQL database, and the Nested Sets model is one of them. For our purpose, we’ll generalize it.

Consider the following tree example:

figure_5_2
Figure 1: A tree.

If we collect the leaf nodes at each tree branch into sets, we’ll get something like this:

figure_5_2b

Figure 2: Nested sets structure for the tree at fig. 1. Set elements are boxes, and sets are the ovals including them. Every parent set contains its children sets, for example the set {C,D} corresponding to the child node B is a subset of the set {C,D,E} corresponding to the parent node A.

This is the essence of the Nested Sets model: a tree node is a parent of another node whenever the corresponding sets are in the subset relation. One can invent many variations of the theme, for example, Nested Sets in Joe Celko’s interpretation are intervals of integers. Interval containment is much easier to check than subset relationship, which partially explains why you never heard of any other Nested Sets model than the one popularized by Joe. Yet, for the terminology sake, let’s brand our model as Generalized Nested Sets.

Adjacency Relation

Even though Nested Sets are pretty intuitive, it is the the Adjacency Relation that is crowned as the king of all the hierarchy models. Any hierarchy is a graph, and graphs are defined by its nodes and edges. The Adjacency Relation is just another term for the graph Edges. Recalling our example from figure 1, we have the following Adjacency Relation:

 Tail   Head 
 A   B 
 B   C 
 B   D 
 A   E 

This table looks deceptively simple, but querying it is not an easy matter! For example, a the grandparent of a node could be found with the following self-join query:

select a.head AS grandparent, b.tail AS grandchild
from Edges a, Edges b
where a.tail=b.head

Finding node’s great grand father is not a problem either. However, querying node’s ancestors to unlimited depth is not possible with basic SQL (which is standard Relational Algebra amended with aggregation). One way to get around the problem is transforming the Adjacency Relation to make it transitively and reflectively closed. Informally, we add the edges which are loops, for example A->A, and edges made of all adjacent edges joined together, for example, A->C which is a composition of A->B and B->C. Here is the result:

 Tail   Head 
 A   B 
 B   C 
 B   D 
 A   E 
 A   A 
 B   B 
 C   C 
 D   D 
 E   E 
 A   C 
 A   D 

A reader interested how to automatically maintain the transitive closure relation is referred to the work of Dong et.al. and/or chapter 6 of my book.

A typical query “find all the ancestors of node C” is no longer a problem, because we listed all the ancestor nodes in the transitive closure relation T. Expressed in SQL it is

select tail from T
where head = 'C'

Transitive Adjacency Relation and Nested Sets

There is another perspective into Transitive Adjacency Relation. Let’s rename the Tail column into the Set, and the Head column into the Element. Let’s also rearrange the rows, and transform the Element column values into the lower case, in order to emphasize the set structure:

 Set   Elem 
 A   a 
 A   b 
 A   c 
 A   d 
 A   e 
 B   b 
 B   c 
 B   d 
 C   c 
 D   d 
 E   e 

If you haven’t already noticed a Nested Set structure hidden in the table above, here are those sets written in traditional notation:

A = {a,b,c,d,e}
B = {b,c,d}
C = {c}
D = {d}
E = {e}

We have mentioned that in the Generalized Nested Sets model we perform hierarchical query via set containment, which is relational division. “find all the ancestors of node B (including itself)” is done in 2 steps:

1. Find all the elements of the set at node B. This would be

select elem from T
where set = 'B'

which evaluates into the following unary relation

 Elem 
 b 
 c 
 d 

It is natural to wonder if we are selecting and projecting onto the wrong columns. It appears as if we look for all the descendants of the node B! Hang on to step 2.

2. Divide the relation T(Set,Elem) by the relation obtained at step1, that is find all the sets that contain the set {b,c,d}. That would be the set {A,B}, which is the right answer.

Let’s summarize our finding:

Transitive Adjacency Relation divided by the node’s descendants produces node’s ancestors.

Symmetrically:

Transitive Adjacency Relation divided by the node’s ancestors produces node’s descendants.

5 Responses to “Nested Sets and Relational Division”

  1. Eric DeCosta Says:

    A typical query “find all the ancestors of node C” is no longer a problem, because we listed all the ancestor nodes in the transitive closure relation T. Expressed in SQL it is

    select head from T
    where tail = ‘C’

    This will only select the one self-referential node. The believe that in order to find all the ancestors of C, the correct query would be:

    select head from T
    where head = ‘C’

    Which would return {A,B,C}


  2. You are right as written the query looks for all the descendants (including the node itself) which should return one node only. To find ancestors we need to exchange the tail and head.

    In your version of the query you select the same column as you placed constraint, sure the output would be
    [C,C,C], and not [A,B,C]

  3. jeffrey04 Says:

    if I understand this correctly, to maintain the transitive closure relation, if I am given a tree as follows

    a->b
    b->c
    b->d
    a->e
    e->f
    f->g

    I know the additional transitive transitions as follows needs to be stored into the table

    a->c
    a->d
    a->f
    a->g

    so is it correct that I need to store this transition as well?

    e->g

    I like the simplicity of the queries (although it took me quite sometime to figure out how to do relation division) but both nested set and this transitive adjacency relation approaches seems to do a lot of background work when inserting and updating the tree. (but node/tree deletion seems simpler in transitive adjacency)

    thanks for sharing :D

    • jeffrey04 Says:

      just realize that insertion is easier than deletion


    • In your example a join of e->f, f->g is e->g, therefore it also belongs to transitive closure. It appears that you are confused by the fact that graph is a tree in which the root node “a” is distinguished. There is nothing special about node “a” when the tree is viewed as a graph!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: