Delete Duplicate Rows in SQL Server 2005

A new addition to the DELETE command in SQL Server 2005 is the TOP statement. The DELETE TOP does the same thing as a SELECT TOP WHERE only the TOP number of rows are deleted. This can be very helpful when there are duplicate rows of data present.

DELETE TOP (1)
FROM Sales.Customer
WHERE CustomerID = 1

This would delete one of the duplicate rows for Customer number 1 Suppose somehow the whole customer table got duplicated. I duplicated the Sales.Customer table into a tmpCustomer table.

SELECT Top 1 CustomerID, COUNT(CustomerID) AS Cnt
FROM tmpCustomer 
GROUP BY CustomerID
HAVING COUNT(CustomerID) > 1

WHILE @@RowCount > 0
BEGIN
    DELETE Top (1)
    FROM tmpCustomer
    WHERE CustomerID = (SELECT Top (1) CustomerID
                        FROM tmpCustomer 
                        GROUP BY CustomerID
                        HAVING COUNT(CustomerID) > 1)

END

While this worked just fine, it ran about 4 minutes for 38K rows. Let's try the dreaded CURSOR. Notice I can stick a variable in where the TOP () statement is. I subtracted -1 because we don't want to delete every row.

DECLARE @cnt int, @custID as int

DECLARE dupCursor CURSOR FAST_FORWARD
FOR SELECT CustomerID, COUNT(CustomerID) AS Cnt
    FROM tmpCustomer 
    GROUP BY CustomerID
    HAVING COUNT(CustomerID) > 1

OPEN dupCursor 

FETCH NEXT FROM dupCursor 
INTO @custID, @cnt

WHILE @@FETCH_STATUS = 0
BEGIN
    DELETE Top (@cnt-1)
    FROM tmpCustomer
    WHERE CustomerID = @custID
    
    FETCH NEXT FROM dupCursor 
    INTO @custID, @cnt
END

CLOSE dupCursor
DEALLOCATE dupCursor

This ran much better at 18 seconds. Enjoy.

Developers Mania

Search This Blog

Delete Duplicate Rows in SQL Server 2005

Comments

Post a Comment

Popular posts from this blog

Call User-defined Function on Linked Server :SQL Server

Microsoft SQL Server 2005 Service Pack 3 for Windows 7 (64 bit)

RDLC Report - Add column to a dataset in existing report and save report in older format ( 2008)