Tuesday, May 31, 2011

SQL Server 2008: GroupBy Enhancements


Microsoft SQL Server 2008 introduces several enhancements & changes to how we group and aggregate relational result sets in TSQL. What follows is a simplistic listing and overview of the grouping/aggregate changes we are seeing in Microsoft SQL Server 2008 CTP5:
1.       New GROUPING SETS Operator
2.       ROLLUP & CUBE Operator Syntax Changes
3.       New GROUPING_ID() Function
 
GROUP BY Operator Addition: GROUPING SETS
The first and most noted change in how we can perform relational grouping is the GROUPING SETS operator addition to the GROUP BY clause. This new operator allows you to perform several grouping set in one query. The older equivalent of this functionality is to perform several GROUP BY queries and then perform a UNION ALL between them. The GROUPING SETS operator supports concatenating column groupings and an optional grand total row. GROUPING SETS can be used in conjunction with the ROLLUP & CUBE operators (which we will use the new syntax described below for ROLLUP|CUBE). Below is some sample TSQL code to demonstrate this functionality:
 
*All sample code requires the AdventureWorks & AdventureWorks DW sample databases.
 
USE AdventureWorksDW;
GO
--------OLD METHOD, UNION ALL
SELECT NULL AS [ProductKey], OrderDateKey, DueDateKey, ShipDateKey
    ,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY OrderDateKey, DueDateKey, ShipDateKey
UNION ALL
SELECT ProductKey, NULL AS [OrderDateKey], DueDateKey, ShipDateKey
    ,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY ProductKey, DueDateKey, ShipDateKey
UNION ALL
SELECT ProductKey, OrderDateKey, NULL AS [DueDateKey], ShipDateKey
    ,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY ProductKey, OrderDateKey, ShipDateKey
UNION ALL
SELECT ProductKey, OrderDateKey, DueDateKey, NULL AS [ShipDateKey]
    ,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY ProductKey, OrderDateKey, DueDateKey
GO
--------NEW METHOD, GROUPING SETS
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
    ,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY GROUPING SETS(ProductKey, OrderDateKey, DueDateKey, ShipDateKey);
GO
--------NEW METHOD, GROUPING SETS w/Grand Total Row (record #1125 is the grand total row)
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
    ,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY GROUPING SETS(ProductKey, OrderDateKey, DueDateKey, ShipDateKey, ());
GO
--------NEW METHOD, GROUPING SETS w/Composite Columns
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
    ,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY GROUPING SETS((ProductKey, OrderDateKey), DueDateKey, ShipDateKey);
GO
--------NEW METHOD, GROUPING SETS w/ROLLUP Composite Columns
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
    ,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY GROUPING SETS(ROLLUP(ProductKey, OrderDateKey), DueDateKey, ShipDateKey);
GO
--------NEW METHOD, GROUPING SETS w/CUBE Composite Columns
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
    ,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY GROUPING SETS(CUBE(ProductKey, OrderDateKey), DueDateKey, ShipDateKey);
GO
 
GROUP BY Operator Syntax Changes: ROLLUP & CUBE
Next up are the changes in the ROLLUP|CUBE operators' syntax. These operators produce aggregate groupings and are appended to the GROUP BY clause. Prior to SQL Server 2008 you would specify WITH ROLLUP|WITH CUBE. In SQL Server 2008, you will first designate ROLLUP|CUBE and then pass the grouped columns into these operators. Below is some sample TSQL code to demonstrate this functionality:
 
USE AdventureWorksDW;
GO
--------OLD METHOD, WITH ROLLUP
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
    ,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY ProductKey, OrderDateKey, DueDateKey, ShipDateKey WITH ROLLUP;
GO
--------NEW METHOD, ROLLUP()
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
    ,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY ROLLUP(ProductKey, OrderDateKey, DueDateKey, ShipDateKey);
GO
--------OLD METHOD, WITH CUBE
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
    ,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY ProductKey, OrderDateKey, DueDateKey, ShipDateKey WITH CUBE;
GO
--------NEW METHOD, CUBE()
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
    ,SUM (SalesAmount) AS [Sales]
FROM dbo.FactInternetSales
GROUP BY CUBE(ProductKey, OrderDateKey, DueDateKey, ShipDateKey);
GO
 
New Function: GROUPING_ID()
Last but not least, there is a new function, GROUPING_ID which returns the level of grouping for a particular column. This function is an enhancement from the older GROUPING() function. GROUPING() returns a 0 or a 1 to indicate if the column is aggregated or not. Below is some sample TSQL code to demonstrate this functionality:
 
USE AdventureWorks;
GO
SELECT
     D.Name
    ,GROUPING_ID(D.Name, E.Title) AS [Name_Title_GroupID]
    ,GROUPING(D.Name) AS [Name_GroupAggregated?]
    ,GROUPING(E.Title) AS [Title_GroupAggregated?]
    ,COUNT(E.EmployeeID) AS N'Employee Count'
FROM HumanResources.Employee E
   INNER JOIN HumanResources.EmployeeDepartmentHistory DH
        ON E.EmployeeID = DH.EmployeeID
    INNER JOIN HumanResources.Department D
        ON D.DepartmentID = DH.DepartmentID    
WHERE DH.EndDate IS NULL
    AND D.DepartmentID IN (12,14)
GROUP BY ROLLUP(D.Name, E.Title);

GROUPING SETS SQL 2008


have been reading and working a lot with Grouping Sets in SQL lately. So, I decided to write about it this month.
Grouping Sets is a feature which was introduced in SQL 2005 however, MSSQL 2008 has a different syntax which I will mention on this blog. The old syntax which was introduced with MSSQL 2005 will no longer be available for future versions of MSSQL so, that is an important thing to take into consideration.
As a reminder, a general GROUP BY clause includes GROUPING SETS, CUBE, ROLLUP, WITH CUBE, or WITH ROLLUP.
Let’s start creating an example table first.
1
2
3
4
5
6
7
CREATE TABLE [dbo].[Song](
    [SongID] [int] NOT NULL,
    [Name] [varchar](100) NOT NULL,
    [Artist] [varchar](100) NULL,
    [Length] [decimal](3, 2) NULL,
    [Status] [char](1) NULL
) ON [PRIMARY]
Now lets insert some data into the table.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
INSERT dbo.Song VALUES(1,'E.T.','Katy Perry', 2.65, 'A' )
INSERT dbo.Song VALUES(2,'Judas','Lady GaGa', 1.35, 'A' )
INSERT dbo.Song VALUES(4,'Rolling In The Deep','Adele', 2.24, 'A' )
INSERT dbo.Song VALUES(3,'S&M','    Rihanna', 2.63, 'A' )
 
INSERT dbo.Song VALUES(2,'California Gurls','Katy Perry', 3.04, 'A' )
INSERT dbo.Song VALUES(3,'Money Honey','Lady GaGa', 2.58, 'A' )
INSERT dbo.Song VALUES(4,'Wonderful','Lady GaGa', 4.61, 'A' )
INSERT dbo.Song VALUES(1,'Disturbia','Rihanna', 1.35, 'A' )
 
INSERT dbo.Song VALUES(4,'Unfaithful','Rihanna', 2.14, 'A' )
INSERT dbo.Song VALUES(3,'Poker Face','Lady GaGa', 2.55, 'A' )
INSERT dbo.Song VALUES(1,'Daydreamer','Adele', 2.55, 'A' )
INSERT dbo.Song VALUES(2,'We Ride','Rihanna', 2.62, 'A' )
Ok, now that we have some information there, let’s run a couple of select statements using the old With Cube and With ROLLUP syntax.
1
2
3
SELECT SongID, Artist, SUM(Length)
FROM Song
GROUP BY SongID, Artist WITH CUBE
1
2
3
SELECT SongID,  Artist, SUM(Length)
FROM Song
GROUP BY SongID, Artist WITH ROLLUP
In the previous example the ROLLUP returned a list of artists grouped by SongId and Artist (Name). It also returned the length’s total for every group and at the end it returned and extra line which is the sum of all the totals.
In contrast, the CUBE outputted a grouping for all permutations of expressions in the composite element list.
They are definitely great tools, especially if you are working with mathematical operations or writing reports. The scope can be endless.
We have seen how the Rollup and the Cube work, now, let’s see an example of what the new syntax looks like.
1
2
3
SELECT SongID, Artist, SUM(Length) As Length
FROM Song
GROUP BY ROLLUP (SongID, Artist)
1
2
3
SELECT SongID,  Artist, SUM(Lenght) AS Lenght
FROM Song
GROUP BY CUBE (SongID, Artist)
Noticed that GROUP BY is followed by the either ROLLUP or Cube keywords, and the selected columns are placed between parentheses.
If you have been able to run the code so far, you may have noticed that the results between different versions are the same. However, like I previously mentioned, the old syntax will no longer be supported for new MS SQL versions so, it is better to start getting used to it now.
Regarding Grouping sets, it specifies multiple groupings of data in one statement. In the following example you can see how it basically creates 2 groups, one by SongIds and another one by Artists.
1
2
3
SELECT SongID, Artist, SUM(Length) as Length
FROM Song
GROUP BY GROUPING SETS (SongID, Artist)
It was just a small demonstration of what can be done on the Grouping Sets, Cube and Rollup queries