<< SQL Server Hosting - HostForLIFE :: DATE_BUCKET Function New T-SQL Enhancements in SQL Server | SQL Server Hosting - HostForLIFE :: STRING_SPLIT() Ordinal New T-SQL Enhancements in SQL Server >>

SQL Server Hosting - HostForLIFE :: How Can I Remove Duplicate Rows From A SQL Server Table?

October 14, 2024 07:54 by author

Peter

I'll go over several techniques in this post for removing duplicate rows from a SQL Server table. The issues listed below will be covered in detail in this article.

Introduction
Various methods to remove duplicate rows from a table in SQL Server
Points to Remember
Conclusion

First of all, we are going to create a new database using the SQL Server. You can still do this step if you already have an existing database.

Create a new Database
The following SQL query creates a new database and a table. Copy this query and execute it in Query Explorer or the command line.
-- Execute the following query to create the database...
IF (DB_ID('Hostforlife_DeleteDuplicateRows') IS NOT NULL)
BEGIN
    USE master
    PRINT 'Database exists'
    DROP DATABASE Hostforlife_DeleteDuplicateRows
    PRINT 'Database Dropped...'
END
GO
CREATE DATABASE Hostforlife_DeleteDuplicateRows
PRINT 'New Database ''Hostforlife_DeleteDuplicateRows'' Created'
GO
USE [Hostforlife_DeleteDuplicateRows]
GO
-- Employee Table
CREATE TABLE [dbo].[Employee] (
    EmployeeID INT IDENTITY(31100, 1),
    EmployerID BIGINT NOT NULL DEFAULT 228866,
    FirstName VARCHAR(50) NOT NULL,
    LastName VARCHAR(50) NOT NULL,
    Email VARCHAR(255) NOT NULL,
    DepartmentID VARCHAR(100) NOT NULL,
    Age INT NOT NULL,
    GrossSalary BIGINT NOT NULL,
    PerformanceBonus BIGINT,
    ContactNo VARCHAR(25),
    PRIMARY KEY (EmployeeID)
);

Next, you can insert data to the tables using the SQL INSERT statement or by adding data directly to the tables in SSMS.

Let's check our table using the following query.

To get the data from the "Employee" table, use the following query.
SELECT * FROM Hostforlife_DeleteDuplicateRows..Employee

3 Ways to Delete Duplicate Rows From A Table In SQL Server
Here are 3 common methods that you can use to delete duplicate records from a table In SQL Server.

Method 1. Using GROUP BY and Having Clause
In this method, the SQL GROUP BY clause is used to identify and remove duplicate rows from a table.

Syntax
DELETE FROM <Table_Name>
WHERE ID NOT IN
(
    SELECT MAX(ID) AS MaxRecordID
    FROM <Table_Name>
    GROUP BY column1, columns2, ...
);

Example
DELETE FROM [Employee]
    WHERE EmployeeID NOT IN
    (
        SELECT MAX(EmployeeID) AS MaxRecordID
        FROM [Employee]
        GROUP BY [EmployerID], [FirstName], [LastName], [Email], [DepartmentID], [Age], [GrossSalary], [PerformanceBonus], [ContactNo]
    );

To verify the deletion, use the following query.

SELECT * FROM Hostforlife_DeleteDuplicateRows..Employee

Method 2. Using CTE (Common Table Expression)
CTE (Common Table Expression) can also be used to remove duplicate rows from a table in SQL Server.

Syntax
WITH CTE AS (
    SELECT
        column1,
        column2,
        ...
        ROW_NUMBER() OVER (
            PARTITION BY column1, column2, ...
            ORDER BY column1, column2, ...
        ) AS RowNumber
    FROM
        <Table_Name>
)
DELETE FROM CTE
WHERE RowNumber > 1;

Example
WITH CTE AS
(
    SELECT
        [EmployeeID],
        [EmployerID],
        [FirstName],
        [LastName],
        [Email],
        [DepartmentID],
        [Age],
        [GrossSalary],
        [PerformanceBonus],
        [ContactNo],
        ROW_NUMBER() OVER (
            PARTITION BY [EmployerID], [FirstName], [LastName], [Email], [DepartmentID], [Age], [GrossSalary], [PerformanceBonus], [ContactNo]
            ORDER BY [EmployerID], [FirstName], [LastName], [Email], [DepartmentID], [Age], [GrossSalary], [PerformanceBonus], [ContactNo]
        ) AS RowNumber
    FROM
        Employee
)
DELETE FROM CTE
WHERE RowNumber > 1;

To verify the deletion, use the following query.
SELECT * FROM Hostforlife_DeleteDuplicateRows..Employee

Method 3. Using Rank Function
The RANK function with PARTITION BY can also be used to remove duplicate rows from a table in SQL Server.

Syntax
DELETE E
    FROM <Table_Name> E
      INNER JOIN
    (
     SELECT *,
            RANK() OVER(PARTITION BY column1, column2, ...
            ORDER BY ID) rank
     FROM <Table_Name>
    ) T ON E.ID = t.ID
    WHERE rank > 1;

Example
DELETE E
FROM [Employee] E
INNER JOIN
(
    SELECT *,
           RANK() OVER (
               PARTITION BY [EmployerID], [FirstName], [LastName], [Email], [DepartmentID], [Age], [GrossSalary], [PerformanceBonus], [ContactNo]
               ORDER BY [EmployeeID]
           ) AS rank
    FROM [Employee]
) T ON E.[EmployeeID] = T.[EmployeeID]
WHERE rank > 1;

To verify the deletion, use the following query.
SELECT *
FROM Hostforlife_DeleteDuplicateRows..Employee;

Points to Remember
I would recommend you follow the points given below before deleting any type of record.

Back up your data.
Be sure to test your DELETE query with a SELECT statement.
Choose an effective method as per the requirement to remove duplicate rows.

See you in the next article, till then, take care and be happy learning.

HostForLIFE.eu SQL Server 2022 Hosting
HostForLIFE.eu is European Windows Hosting Provider which focuses on Windows Platform only. We deliver on-demand hosting solutions including Shared hosting, Reseller Hosting, Cloud Hosting, Dedicated Servers, and IT as a Service for companies of all sizes.