# Saturday, November 08, 2008

Here is a cool TSQL tip I will be showing off next week at TechEd Europe at one of my sessions (Rock Star Common Table Expressions).

Let’s say you have a product table that looks like this:

image

As you can see in the image, we have some duplicate data. You can use a Common Table Expression in SQL Server 2005 or 2008 to look up the “parents” using an aggregate HAVING clause. Then you join to the CTE to the Product table on the Product_Name field as well as the Product_ID field, but using the > indicator, so we only return the “children” or products with a higher product ID. Here is the CTE and code:

--Find the Dupes with a CTE

WITH CTEMinProductRecords

AS

(

 SELECT MIN(Product_ID) AS Product_ID, Product_Name

 FROM Products

 GROUP BY Product_Name

 HAVING COUNT(*) > 1

)

SELECT cte.Product_ID as DupeProductID,

cte.Product_Name as DupeProduct,

p.Product_ID as ParentID, p.Product_Name as ParentProduct,

p.Price as ParentPrice

FROM Products p

JOIN CTEMinProductRecords cte ON

 p.Product_Name = cte.Product_Name

 AND p.Product_ID > cte.Product_ID

Here are the results:

image

Let’s say you want to automatically delete the children. While the business case for this may exist (my old business did have a rule, the higher product id was the dupe), you will want to update all of the “many” tables to the lower product ID first. To then do the delete using the CTE all you have to do is convert the above select statement to a delete statement:

WITH CTEMinProductRecords

AS

(

 SELECT MIN(Product_ID) AS Product_ID, Product_Name

 FROM Products

 GROUP BY Product_Name

 HAVING COUNT(*) > 1

)

DELETE Products

FROM Products p

JOIN CTEMinProductRecords cte ON

 p.Product_Name = cte.Product_Name

 AND p.Product_ID > cte.Product_ID

Enjoy!