**Introduction to Nested Sets**

There are many ways to perform hierarchical queries in SQL database, and the Nested Sets model is one of them. For our purpose, we’ll generalize it.

Consider the following tree example:

If we collect the leaf nodes at each tree branch into sets, we’ll get something like this:

*Figure 2: Nested sets structure for the tree at fig. 1. Set elements are boxes, and sets are the ovals including them. Every parent set contains its children sets, for example the set {C,D} corresponding to the child node B is a subset of the set {C,D,E} corresponding to the parent node A.*

This is the essence of the Nested Sets model: *a tree node is a parent of another node whenever the corresponding sets are in the subset relation*. One can invent many variations of the theme, for example, Nested Sets in Joe Celko’s interpretation are intervals of integers. Interval containment is much easier to check than subset relationship, which partially explains why you never heard of any other Nested Sets model than the one popularized by Joe. Yet, for the terminology sake, let’s brand our model as *Generalized Nested Sets*.

**Adjacency Relation**

Even though Nested Sets are pretty intuitive, it is the the *Adjacency Relation* that is crowned as the king of all the hierarchy models. Any hierarchy is a *graph*, and graphs are defined by its *nodes* and *edges*. The Adjacency Relation is just another term for the graph Edges. Recalling our example from figure 1, we have the following Adjacency Relation:

Tail | Head |

A | B |

B | C |

B | D |

A | E |

This table looks deceptively simple, but querying it is not an easy matter! For example, a the grandparent of a node could be found with the following self-join query:

`select a.head AS grandparent, b.tail AS grandchild`

from Edges a, Edges b

where a.tail=b.head

Finding node’s great grand father is not a problem either. However, querying node’s ancestors to unlimited depth is not possible with basic SQL (which is standard Relational Algebra amended with aggregation). One way to get around the problem is transforming the Adjacency Relation to make it *transitively* and *reflectively* closed. Informally, we add the edges which are loops, for example A->A, and edges made of all adjacent edges joined together, for example, A->C which is a composition of A->B and B->C. Here is the result:

Tail | Head |

A | B |

B | C |

B | D |

A | E |

A | A |

B | B |

C | C |

D | D |

E | E |

A | C |

A | D |

A reader interested how to automatically maintain the transitive closure relation is referred to the work of Dong et.al. and/or chapter 6 of my book.

A typical query “*find all the ancestors of node C*” is no longer a problem, because we listed all the ancestor nodes in the transitive closure relation T. Expressed in SQL it is

`select tail from T`

where head = 'C'

**Transitive Adjacency Relation and Nested Sets**

There is another perspective into Transitive Adjacency Relation. Let’s rename the *Tail* column into the *Set*, and the *Head* column into the *Element*. Let’s also rearrange the rows, and transform the *Element* column values into the lower case, in order to emphasize the set structure:

Set | Elem |

A | a |

A | b |

A | c |

A | d |

A | e |

B | b |

B | c |

B | d |

C | c |

D | d |

E | e |

If you haven’t already noticed a Nested Set structure hidden in the table above, here are those sets written in traditional notation:

A = {a,b,c,d,e}

B = {b,c,d}

C = {c}

D = {d}

E = {e}

We have mentioned that in the Generalized Nested Sets model we perform hierarchical query via set containment, which is relational division. “*find all the ancestors of node B (including itself)*” is done in 2 steps:

1. *Find all the elements of the set at node B*. This would be

`select elem from T`

where set = 'B'

which evaluates into the following unary relation

Elem |

b |

c |

d |

It is natural to wonder if we are selecting and projecting onto the wrong columns. It appears as if we look for all the **descendants** of the node B! Hang on to step 2.

2. *Divide the relation T(Set,Elem) by the relation obtained at step1*, that is find all the sets that contain the set {b,c,d}. That would be the set {A,B}, which is the right answer.

Let’s summarize our finding:

Transitive Adjacency Relation divided by the node’s descendants produces node’s ancestors.

Symmetrically:

Transitive Adjacency Relation divided by the node’s ancestors produces node’s descendants.

June 23, 2008 at 9:05 pm

A typical query “find all the ancestors of node C” is no longer a problem, because we listed all the ancestor nodes in the transitive closure relation T. Expressed in SQL it is

select head from T

where tail = ‘C’

This will only select the one self-referential node. The believe that in order to find all the ancestors of C, the correct query would be:

select head from T

where head = ‘C’

Which would return {A,B,C}

June 24, 2008 at 7:48 pm

You are right as written the query looks for all the descendants (including the node itself) which should return one node only. To find ancestors we need to exchange the tail and head.

In your version of the query you select the same column as you placed constraint, sure the output would be

[C,C,C], and not [A,B,C]

June 9, 2009 at 5:20 am

if I understand this correctly, to maintain the transitive closure relation, if I am given a tree as follows

a->b

b->c

b->d

a->e

e->f

f->g

I know the additional transitive transitions as follows needs to be stored into the table

a->c

a->d

a->f

a->g

so is it correct that I need to store this transition as well?

e->g

I like the simplicity of the queries (although it took me quite sometime to figure out how to do relation division) but both nested set and this transitive adjacency relation approaches seems to do a lot of background work when inserting and updating the tree. (but node/tree deletion seems simpler in transitive adjacency)

thanks for sharing :D

June 9, 2009 at 9:09 am

just realize that insertion is easier than deletion

June 10, 2009 at 11:12 pm

In your example a join of e->f, f->g is e->g, therefore it also belongs to transitive closure. It appears that you are confused by the fact that graph is a tree in which the root node “a” is distinguished. There is nothing special about node “a” when the tree is viewed as a graph!