Sunday 18 November 2007

SQL Server Recursion

Have you ever come across a problem where you need to traverse a tree and MS SQL Server does not allow more than 32 level of recursion? I often work with trees and this is a common problem I need to solve. I can actually remember at least four occasions in which I needed to solve it so I thought if I blog the solution at least I will know where to look for it instead of reinventing it again ;)
The idea is not new I guess but probably not the one you would want to implement in SQL. First step is to create a “list” – a table which has a column for status so you can filter out the nodes you have already visited. This table will also need to have any primary keys or significant data you can’t do without depending on your specific task. My pick usually is an in memory table because I almost every time have two integers and a bit field. So with 17 bytes per row even with 100k rows the memory taken is still 1.62MB which should be alright for everyone these days. Of course if you’re dealing with a lot bigger data volumes you may want to go for a temp table.
My example below is based on traversing a tree or a branch starting from certain tree node and visiting all child nodes to construct a list of values which are later one updated with a single update statement in the big data table.
Step 1 – The list
So I start by declaring my “list”:

-- create the in memory table here
DECLARE @list table (nodeId int, metaId int, done bit)
DECLARE @curNID int -- current nodeID
DECLARE @cnt int

Node Id is my key field here and you can add PRIMARY KEY if performance isn’t good enough, meta ID is the filed I want to update back in the big table and the done field is how I filter the nodes I already visited.
As you notice there are a few more declarations here. The curNID (short from current node id for the curious ones) is keeping my key field in the loop, and cnt is just a temporary variable to store the global @@rowcount.
Next step is to insert the initial node details so that’s a simple insert statement:

-- insert the initial values
INSERT INTO @list VALUES(@nodeID, @metaID, 0)
SET @curNID = @nodeID

The @nodeID and @metaID presumably come from the stored proc argument list. And here I also set mu current node id to be the initial value.
And here comes the loop.

SET @cnt=1
WHILE @cnt>0
BEGIN
-- get next unprocessed
SELECT top 1 @curNID = nodeId
FROM @list
WHERE done<>1

SET @cnt = @@ROWCOUNT

IF @cnt>0
BEGIN
-- this inserts the set of child nodes
INSERT INTO @list
SELECT theNodeID,
CASE WHEN metaID = 0 THEN defaultMetaID
ELSE metaID
END,
0
FROM bigDataTable
WHERE theParentNodeID = @curNID
END

-- update in the list that it is done
UPDATE @list
SET done=1
WHERE nodeId=@curNID
END

Briefly we run the loop until every row in the table gets its done field updated to 1. In the loop we pick top 1 of the not done rows and insert all its children nodes in the list with done set to 0. There is a bit of logic also in the case if you need to do that case gives you a lot of options. Finally before going back to check the while condition we update the row we worked with so it is no longer selected for processing.
Finally we need to update the values in the big table:

UPDATE bigDataTable
SET big.MetaID = list.MetaID
FROM bigDataTable big
INNER JOIN @list list ON big.theNodeID = list.nodeID

As you can see this is also achieved through a simple update statement.

So overall I think this doesn’t look too scary, does it?

No comments: