One of the main problems with using a relational database such as MySQL is that it can be tricky to store and retrieve hierarchical information. Hierarchies are expressed with tree structures whereas relational databases use flat tables. So how can one fit this square peg into a round hole? Fortunately, there are several methods out there. Unfortunately most of these methods rely on recursion for either populating, updating, or deleting parts of tree. Check out this evolt article to learn more about the pitfalls of recursion (and even more ways to store hierarchical data). Here’s a quick overview of two methods which require recursion and one which never requires it.
Method 1: Parent/child
This is the most straightforward method.
| id | element | parent |
|---|---|---|
| 1 | fruit | null |
| 2 | orange | 1 |
| 3 | apple | 1 |
| 4 | granny-smith | 3 |
| 5 | red delicious | 3 |
Each row contains the name of the element, its unique id and the unique id of its parent element. Every child has one and only one parent but a parent may have any number of children.

A php function for converting such a table into a hierarchical tree could look like this:
< ?php require_once "dbconnect.php"; function show_hierarchy_1($rootid){ $query = "SELECT * FROM table WHERE parent '$rootid'"; $result = mysql_query($query); while ($row = mysql_fetch_array($result)){ echo $row[element]; echo ""; show_hierarchy_1($row[id]); } } ?>
Notice that this function is recursive. While dead-simple to understand, this function will use more and more processing resources as the database grows in size. There must be a better way. Enter method two.
Method 2: the modified preorder tree traversal algorithm
When I first read Sitepoint’s article about this, I was excited to find a method that did not require recursion to retrieve the hierarchy. Have a look-see:
| element | lt | rt |
|---|---|---|
| fruit | 1 | 10 |
| orange | 2 | 3 |
| apple | 4 | 9 |
| granny-smith | 5 | 6 |
| red-delicious | 7 | 8 |
Each element has a left value and a right value. The values are determined by traversing the tree starting from the left of the root element. This diagram can explain the process far easier than words can:

Now this is where the magic happens. A php function to retrieve the whole tree would look like this:
< ?php require_once "dbconnect.php"; function show_hierarchy_2($rootid){ $query = "SELECT * FROM table WHERE id = '$rootid'"; $result = mysql_query($query); while ($row = mysql_fetch_array($result)){ $query2 = "SELECT * FROM table WHERE lt > $row[lt] AND rt < $row[rt]"; $result2 = mysql_query($query2); while ($row2 = mysql_fetch_array($result2)){ echo $row2[element]; echo ""; } } } ?>
The simplicity of retrieving a hierarchy given a parent element’s left and right values is astounding. With a single query of the database, an entire tree is displayed, no recursion necessary. Unfortunately the modified preorder tree traversal algorithm has some potentially deal-breaking downsides. The first downside is the issue of populating a table with left and right values. Gijs Van Tulder (author of the Sitepoint article) provides such a method, given a table that already has parent/child values for each row.
The following is Tulder’s work:
< ?php function rebuild_tree($parent, $left) { // the right value of this node is the left value + 1 $right = $left+1; // get all children of this node $result = mysql_query('SELECT title FROM tree '. 'WHERE parent="'.$parent.'";'); while ($row = mysql_fetch_array($result)) { // recursive execution of this function for each // child of this node // $right is the current right value, which is // incremented by the rebuild_tree function $right = rebuild_tree($row['title'], $right); } // we've got the left value, and now that we've processed // the children of this node we also know the right value mysql_query('UPDATE tree SET lft='.$left.', rgt='. $right.' WHERE title="'.$parent.'";'); // return the right value of this node + 1 return $right+1; } ?>
Tulder writes:
The recursion makes this a fairly complex function to understand. However, this function achieves the same result we did by hand at the beginning of this section. It walks around the tree, adding one for each node it sees. After you’ve run this function, you’ll see that the left and right values are still the same (a quick check: the right value of the root node should be twice the number of nodes).
Remember, we want to avoid recursion. Recursion is slow. Furthermore, modifying the tree requires recursion. Tulder explains two methods:
The first option is simple. You use the adjacency list method for updating, and the modified preorder tree traversal algorithm for retrieval. If you want to add a new node, you just add it to the table and set the parent column. Then, you simply rerun the rebuild_tree() function. This is easy, but not very efficient with large trees.
The second way to add, and delete nodes is to update the left and right values of all nodes to the right of the new node.
Tulder’s second method may be very light on system resources in one circumstance, and very heavy in another. For example, adding an element whose parent is the root element will not spare much processing power.
And this brings us to the wonderful third way.
Method 3: David Chandler’s patent
David Chandler, apparently is a programmer. Google him around if you want. The important thing is that he devised a method of storing hierarchical data in a relational database where recursion is never needed.
You can read the patent here. (Requires free registration)
Here’s the table:
| element | L1 | L2 | L3 |
|---|---|---|---|
| fruit | 1 | 0 | 0 |
| orange | 1 | 1 | 0 |
| apple | 1 | 2 | 0 |
| granny-smith | 1 | 2 | 1 |
| red-delicious | 1 | 2 | 2 |
With this method, there will be a database column for each level of depth. The root element will only have the L1 column set. Its children will have the L1 and L2 columns set and their children will have the L1, L2 and L3 columns set and so on and so forth. If you have no idea how “deep” your hierarchy might be, this method may not be for you. Otherwise, keep reading.

And here’s a php script to display the whole tree:
< ?php require_once "dbconnect.php"; function show_hierarchy_3(){ $query = "SELECT * FROM table WHERE l1 != '0' ORDER BY l1, l2, l3, l4"; $result = mysql_query($query); while ($row = mysql_fetch_array($result)){ $level = 1; if ($row[l1] != '0' ){ $level = $level + 1; } if ($row[l2] != '0' ){ $level = $level + 1; } if ($row[l3] != '0' ){ $level = $level + 1; } echo str_repeat('___',$level); echo $row[element]; echo ""; } } ?>
Let’s say I want to add a child to the “orange” node:
< ?php require_once "dbconnect.php"; mysql_query("INSERT INTO `table` ( `element` , `l1` , `l2` , `l3` ) VALUES ( 'valencia', '1', '1', '1' );"); ?>

And that’s it. The tree was updated without recursion. Deleting nodes is very similar to adding, just use DELETE FROM. There’s no update_tree() function to run, you can just run the show_hierarchy_3() script.
Chandler’s method could be considered an adaptation of the path enumeration model. The main difference between the two is that the path enumeration model uses a single database column with a delimiter to show an element’s hierarchy, whereas Chandler’s method uses multiple columns. In the patent, Candler writes:
Storing the digits of the Outline Number in separate columns allows for constructing very efficient queries of a type not possible with the prior art methods. In particular reference to the method shown in FIG. 3 , consider the steps required to move a data element within a tree (i.e. changing the Outline Number). First the Outline Number string needs to be parsed into the respective digits, then adding and subtracting the appropriate digits as necessary would need to be performed, and then the digits need to be converted back to string values and the new Outline Number reconstructed. By storing the digits of the Outline Number in separate columns the present invention substantially simplifies these operations by avoiding performing string operations that are typically slow and cumbersome to write, and by quickly isolating and operating on the appropriate digit in the Outline Number.
So there you have it. Enjoy.
Comments 9
David Chandler’s patent seems to only be good if you know the depth limit of the tree - e.g. in your example, the limit is 3, so if you want a 4th child, you will have to create a new column. Therefore, this method is only good in certain cases, and not others, such as a discussion forum, because this method could create hundreds of columns in your table.
Posted 16 Jul 2008 at 2:12 am ¶I agree. For a forum, I would probably end up using Chandler’s method to store topics down to the level of an individual thread. Within each thread, discussions would be stored using the flat table model mentioned in the evolt article.
Posted 17 Jul 2008 at 8:51 pm ¶I can’t believe he was granted a patent for this! It’s basic art; like building a picket fence. Good grief.
This method is effectively the “Lineage” method mentioned elsewhere, but is more limited as the max depth must be known ahead of time. In the Lineage approach, an extremely large VARCHAR can be used which will allow MANY layers, but will only use the amount of storage actually required. This patented approach ALWAYS stores the maximum depth, even when it’s not needed. I considered this method for a BOM (Bill Of Materials) program (before I knew the stupid approach was patented!), but ultimately decided it was way too wasteful, and went with the Lineage approach instead. Presently I’m considering rewriting the app to use the Nested Set (or MPTTA) approach.
Posted 18 Jul 2008 at 4:25 pm ¶Once again, I agree that the need to know the depth is the main disadvantage of this method. However, while there are many cases in which the depth is unknown/impossible to predict, there are just as many cases where the depth will be constant. Take for example the storage of a course catalog at a college. I would have one column for semester, one for campus, one for department, one for course name, and one for sections. I have tried both the mptta and Chandler’s method for storing course data. The mptta becomes VERY slow.
The two comments I’ve received for this post are essentially examples in which Chandler’s method will not work. That still does not negate the usefulness of it. It is one of many options, including the mptta and the path enumeration model.
Posted 18 Jul 2008 at 5:21 pm ¶The admin said:
“That still does not negate the usefulness of it. It is one of many options”
How can it be an option when it is patented?
I’ve not used patented software before (unless perhaps by accident because the patent office was so obtuse at to grant a patent to something that any reasonable programmer like myself just “came up with”), but it occurs to me that attempting to manage the rights to use it would become an absolute nightmare unless you KNEW it was only going to be used in one app, and you yourself entirely controlled the distribution. Even then, without some automated way of keeping your distribution mechanism synchronized with your patented code use, you still run the risk of being in violation. A violation COULD spell financial ruin for your employer, and ultimately for yourself.
I appreciate your point about it being a valid technical option in certain limited circumstances. But for me, the complexity of introducing patented software into my projects far outweighs any performance benefit I might get. In my app with over a quarter million records, the Lineage method (or “path enumeration method”) yielded queries so fast I didn’t even bother to time them. Virtually instantaneous.
Thanks for the article though. Good explanations and diagrams. I appreciate the work you put in on it.
Posted 21 Jul 2008 at 1:44 pm ¶The modified preorder tree traversal way of storing the hierarchy (also called nested set model) does not require recursion to update the tree. The rebuild_tree() function shown above converts a parent/child hierarchy table to a modified preorder tree traversal table.
To update the nested set table directly, one only has to update the left/right indexes appropriately. See this article at mysql.org for more information on how: http://dev.mysql.com/tech-resources/articles/hierarchical-data.html
Posted 28 Jul 2008 at 8:46 am ¶This is so sad… I have just wrote a piece of code to maintain a very large organizational chart (more than 100,000 users) with a fixed depth (12 levels). After used an adjacency list (the first method) without success (too slow) I decided to use a flat list, with a field for each level… and then I discovered the damn thing has already been patented. Oh, my…
I agree with DQNOK, using patented algorithms is not that simple if you want to avoid any violation.
I searched to check if the patent was not really just an application (meaning no patent was granted) but it seems the patent was granted. The algorithm is so simple, why did they do this?
Urgh.
Anyway, back to the rewrite my code. I think the mptt will be my answer.
Regards,
MV
Posted 29 Jul 2008 at 4:46 am ¶I’m in search of the “golden” algorithm to store hierarchical data in a relational database, probably a futile quest, I know. Any one got any ideas?!
p.s. David Chandler’s patent is a joke, let me guess it’s an USA patent.
Posted 30 Jul 2008 at 1:35 am ¶Cool. As I said before, after trying to use the adjacency list method (one of the recursive methods) and failing due to performance issues, and after discovering my “new” simpler method has been patented by somebody, I started to search for new ways to do build my organization-tree manager.
After searching (thanks to the ACM for the ACM Digital Library) I discovered several variants of the Lineage method (or model) also called Path Enumeration or Materialized Path (already mentioned by DQNOK), which indeed are part of a family of models called Tree Encodings.
Anyway, now my software fly: not tens of times, not hundreds of times, but thousands of times faster. From several minutes to few milliseconds with minimal memory usage. (Hey, this is a web application, memory usage is critical.)
I also learned a bit about some variants of the nested set model, but honestly, maybe I am too stupid but got a headache trying to understand them. Ok, maybe I’m not stupid, but I was in a hurry and the Path Enumeration thing is so simple and intuitive.
Regards,
MV
Posted 20 Aug 2008 at 7:43 pm ¶Post a Comment