Tuesday, June 12, 2012

SQL Server 2005: Recursive Hierarchies to XML - CTEs vs. UDFs


Suppose we have a recursive hierarchy stored in a relational database and we want to write it to XML.  This might be for a variety of reasons – e.g. as a pre-cached input to a UI control, to export to another system using a pre-defined format, etc.

In SQL Server 2000, in order to get it straight to XML using FOR XML EXPLICIT, we would have to know the depth of the lowest node beforehand (without doing some very ugly dynamic SQL), so this does not help us.

It would be useful to access the data in the same order that it will appear in the XML.  I.e.
  • Node1
    • Node2
      • Node3
      • Node4
    • Node5

Getting at the data in this order will allow us to iterate through the nodes in sequential order.  This avoids using the DOM and is significantly quicker and more efficient as it avoids loading the whole structure into memory.

We could achieve this in SQL Server 2000 using a recursive table-valued UDF.  In SQL Server 2005, we also have the option of using a recursive Common Table Expression (CTE) to achieve the same functional result.  Let’s compare the two ways of doing it.

A CTE is a temporary named resultset referenced by a subsequent “outer query”.  They can provide similar functionality to views and derived tables, but their real value is in recursive queries.  Recursive CTE’s contain an “anchor member” and a “recursive member”, which are connected by a UNION ALL operator.  They can be encapsulated by UDFs for reusability.



Data Preparation

Let’s create a table and insert hierarchical values.

CREATE TABLE Employees
(
  empid   int         NOT NULL,
  mgrid   int         NULL,
  empname varchar(25) NOT NULL,
  CONSTRAINT PK_Employees PRIMARY KEY(empid),
  CONSTRAINT FK_Employees_mgrid_empid
    FOREIGN KEY(mgrid)
    REFERENCES Employees(empid)
)
CREATE INDEX idx_nci_mgrid ON Employees(mgrid)

SET NOCOUNT ON
INSERT INTO Employees VALUES(, NULL, 'Nancy')
INSERT INTO Employees VALUES(, 1   , 'Andrew')
INSERT INTO Employees VALUES(, 1   , 'Janet')
INSERT INTO Employees VALUES(, 1   , 'Margaret')
INSERT INTO Employees VALUES(, 2   , 'Steven')
INSERT INTO Employees VALUES(, 2   , 'Michael')
INSERT INTO Employees VALUES(, 3   , 'Robert')
INSERT INTO Employees VALUES(, 3   , 'Laura')
INSERT INTO Employees VALUES(, 3   , 'Ann')
INSERT INTO Employees VALUES(10, 4   , 'Ina')
INSERT INTO Employees VALUES(11, 7   , 'David')
INSERT INTO Employees VALUES(12, 7   , 'Ron')
INSERT INTO Employees VALUES(13, 7   , 'Dan')
INSERT INTO Employees VALUES(14, 11  , 'James')


Recursive Table-Valued UDF Example

  
CREATE FUNCTION dbo.fn_EmpRec
(
      @empid      int,
      @depth      int
)
RETURNS      @Emps TABLE
(
      empid       int,
      empname      varchar(25),
      mgrid       int,
      depth       int
)
AS
BEGIN
      -- insert current employee into working table
      INSERT INTO @Emps
            SELECT empid,
                     empname,
                     mgrid,
                     @depth
            FROM Employees
            WHERE empid = @empid
     
      -- holding variable to keep track of current child employee
      DECLARE @curempid int
     
      -- get the first child
      SELECT @curempid = MIN(empid)
      FROM Employees
      WHERE mgrid = @empid
     
      -- iterate each child and make the recursive call
      WHILE @curempid IS NOT NULL
      BEGIN
            INSERT INTO @Emps
                  SELECT *
                  FROM dbo.fn_EmpRec(@curempid, @depth + 1)
                 
            SELECT @curempid = MIN(empid)
                  FROM Employees
                  WHERE empid > @curempid AND
                          mgrid = @empid
      END
      RETURN
END
GO

And if we run the following query,

SELECT
  REPLICATE('|         ', depth)
    + '(' + (CAST(empid AS VARCHAR(10))) + ') '
    + empname AS empname
FROM dbo.fn_EmpRec(1, 0)

The resultset is:

Recursive UDF Result

Great!  - (until we try it on 25,000 rows or above that is)



Common Table Expression Example


Now let’s do it the CTE way.

WITH EmpCTE(empid, empname, mgrid, depth, sortcol)
AS
(
  -- anchor member
  SELECT empid, empname, mgrid, 0,
    CAST(empid AS VARBINARY(900))
  FROM employees
  WHERE empid = 1

  UNION ALL

  -- recursive member
  SELECT E.empid, E.empname, E.mgrid, M.depth+1,
    CAST(sortcol + CAST(E.empid AS BINARY(4)) AS VARBINARY(900))
  FROM Employees AS E
    JOIN EmpCTE AS M
      ON E.mgrid = M.empid
)
-- outer query
SELECT
  REPLICATE('|         ', depth)
    + '(' + (CAST(empid AS VARCHAR(10))) + ') '
    + empname AS empname
FROM EmpCTE
ORDER BY sortcol


This returns the same resultset as the recursive UDF.

No comments: