Skip to main content

Delete Duplicate Rows in SQL Server 2005

A new addition to the DELETE command in SQL Server 2005 is the TOP statement. The DELETE TOP does the same thing as a SELECT TOP WHERE only the TOP number of rows are deleted. This can be very helpful when there are duplicate rows of data present.
1
2
3
DELETE TOP (1)
FROM Sales.Customer
WHERE CustomerID = 1



This would delete one of the duplicate rows for Customer number 1 Suppose somehow the whole customer table got duplicated. I duplicated the Sales.Customer table into a tmpCustomer table.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SELECT Top 1 CustomerID, COUNT(CustomerID) AS Cnt
FROM tmpCustomer
GROUP BY CustomerID
HAVING COUNT(CustomerID) > 1
 
WHILE @@RowCount > 0
BEGIN
    DELETE Top (1)
    FROM tmpCustomer
    WHERE CustomerID = (SELECT Top (1) CustomerID
                        FROM tmpCustomer
                        GROUP BY CustomerID
                        HAVING COUNT(CustomerID) > 1)
 
END



While this worked just fine, it ran about 4 minutes for 38K rows. Let's try the dreaded CURSOR. Notice I can stick a variable in where the TOP () statement is. I subtracted -1 because we don't want to delete every row.



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
DECLARE @cnt int, @custID as int
 
DECLARE dupCursor CURSOR FAST_FORWARD
FOR SELECT CustomerID, COUNT(CustomerID) AS Cnt
    FROM tmpCustomer
    GROUP BY CustomerID
    HAVING COUNT(CustomerID) > 1
 
OPEN dupCursor
 
FETCH NEXT FROM dupCursor
INTO @custID, @cnt
 
WHILE @@FETCH_STATUS = 0
BEGIN
    DELETE Top (@cnt-1)
    FROM tmpCustomer
    WHERE CustomerID = @custID
     
    FETCH NEXT FROM dupCursor
    INTO @custID, @cnt
END
 
CLOSE dupCursor
DEALLOCATE dupCursor



This ran much better at 18 seconds. Enjoy.


Comments

Popular posts from this blog

Call User-defined Function on Linked Server :SQL Server

If you try to invoke a user-defined function (UDF) through a linked server in SQL Server by using a "four-part naming" convention (server.database.dbo.Function), you may receive error message.  The reason is User-defined function calls inside a four-part linked server query are not supported in SQL Server. Thats why error message indicates that the syntax of a Transact-SQL statement is incorrect.  To work around this problem, use the Openquery function instead of the four-part naming convention. For example, instead of the following query Select * from Linked_Server.database.dbo.Function(10) run a query with the Openquery function: Select * from Openquery(Linked_Server,'select database.dbo.Function(10)') If the user-defined function takes variable or scalar parameters, you can use the sp_executesql stored procedure to avoid this behavior.  For example: exec Linked_Server.database.dbo.sp_executesql N'SELECT database.dbo.Function(@input)',N'@input...

Microsoft SQL Server 2005 Service Pack 3 for Windows 7 (64 bit)

You can download from here Microsoft SQL Server 2005 Service Pack 3 Overview Service Pack 3 for Microsoft SQL Server 2005 is now available. SQL Server 2005 service packs are cumulative, and this service pack upgrades all service levels of SQL Server 2005 to SP3. You can use these packages to upgrade any of the following SQL Server 2005 editions: Enterprise Enterprise Evaluation Developer Standard Workgroup Download Size: 326.0 MB http://www.microsoft.com/downloads/details.aspx?FamilyID=ae7387c3-348c-4faa-8ae5-949fdfbe59c4&displaylang=en Microsoft SQL Server Management Studio Express Service Pack 3 Overview Microsoft SQL Server Management Studio Express (SSMSE) is a free, easy-to-use graphical management tool for managing SQL Server 2005 Express Edition and SQL Server 2005 Express Edition with Advanced Services. SSMSE can also manage instances of the SQL Server Database Engine created by any edition of SQL Server 2005. Note: SSMSE cannot manage SQL Server Analysis ...

RDLC Report - Add column to a dataset in existing report and save report in older format ( 2008)

I have an rdlc that has a separately-defined dataset. The time has come that I have the need to add a column to one of the tables, which I can do without issue. However, when I open the rdlc to use the new column, it does not appear in the Report Data pane. This issue was reported to Microsoft here, but it was closed as by design. The workaround offered with the issue does not seem to work for VS2010 (refresh the dataset or the table; neither does anything). Solution The only way to add a column to a dataset that is already attached to an rdlc is to hand-edit the xml (i.e. open the rdlc with your favorite text editor and add a Field to the appropriate table). After doing this, the field appears in the Report Data pane in the Design Pane, and you can use it as if it were there from the beginning. Save the report. Replace Report tag with the following line Then remove the following head, //keep the data here ...