European Windows 2012 Hosting BLOG

BLOG about Windows 2012 Hosting and SQL 2012 Hosting - Dedicated to European Windows Hosting Customer

SQL Server Hosting - HostForLIFE :: How Can I Remove Duplicate Rows From A SQL Server Table?

clock October 14, 2024 07:54 by author Peter

I'll go over several techniques in this post for removing duplicate rows from a SQL Server table. The issues listed below will be covered in detail in this article.

  • Introduction
  • Various methods to remove duplicate rows from a table in SQL Server
  • Points to Remember
  • Conclusion

First of all, we are going to create a new database using the SQL Server. You can still do this step if you already have an existing database.

Create a new Database

The following SQL query creates a new database and a table. Copy this query and execute it in Query Explorer or the command line.
-- Execute the following query to create the database...
IF (DB_ID('Hostforlife_DeleteDuplicateRows') IS NOT NULL)
BEGIN
    USE master
    PRINT 'Database exists'
    DROP DATABASE Hostforlife_DeleteDuplicateRows
    PRINT 'Database Dropped...'
END
GO
CREATE DATABASE Hostforlife_DeleteDuplicateRows
PRINT 'New Database ''Hostforlife_DeleteDuplicateRows'' Created'
GO
USE [Hostforlife_DeleteDuplicateRows]
GO
-- Employee Table
CREATE TABLE [dbo].[Employee] (
    EmployeeID INT IDENTITY(31100, 1),
    EmployerID BIGINT NOT NULL DEFAULT 228866,
    FirstName VARCHAR(50) NOT NULL,
    LastName VARCHAR(50) NOT NULL,
    Email VARCHAR(255) NOT NULL,
    DepartmentID VARCHAR(100) NOT NULL,
    Age INT NOT NULL,
    GrossSalary BIGINT NOT NULL,
    PerformanceBonus BIGINT,
    ContactNo VARCHAR(25),
    PRIMARY KEY (EmployeeID)
);

Next, you can insert data to the tables using the SQL INSERT statement or by adding data directly to the tables in SSMS.

Let's check our table using the following query.

To get the data from the "Employee" table, use the following query.
SELECT * FROM Hostforlife_DeleteDuplicateRows..Employee

3 Ways to Delete Duplicate Rows From A Table In SQL Server
Here are 3 common methods that you can use to delete duplicate records from a table In SQL Server.

Method 1. Using GROUP BY and Having Clause
In this method, the SQL GROUP BY clause is used to identify and remove duplicate rows from a table.

Syntax

DELETE FROM <Table_Name>
WHERE ID NOT IN
(
    SELECT MAX(ID) AS MaxRecordID
    FROM <Table_Name>
    GROUP BY column1, columns2, ...
);

Example

DELETE FROM [Employee]
    WHERE EmployeeID NOT IN
    (
        SELECT MAX(EmployeeID) AS MaxRecordID
        FROM [Employee]
        GROUP BY [EmployerID], [FirstName], [LastName], [Email], [DepartmentID], [Age], [GrossSalary], [PerformanceBonus], [ContactNo]
    );

To verify the deletion, use the following query.

SELECT * FROM Hostforlife_DeleteDuplicateRows..Employee

Method 2. Using CTE (Common Table Expression)
CTE (Common Table Expression) can also be used to remove duplicate rows from a table in SQL Server.

Syntax
WITH CTE AS (
    SELECT
        column1,
        column2,
        ...
        ROW_NUMBER() OVER (
            PARTITION BY column1, column2, ...
            ORDER BY column1, column2, ...
        ) AS RowNumber
    FROM
        <Table_Name>
)
DELETE FROM CTE
WHERE RowNumber > 1;


Example
WITH CTE AS
(
    SELECT
        [EmployeeID],
        [EmployerID],
        [FirstName],
        [LastName],
        [Email],
        [DepartmentID],
        [Age],
        [GrossSalary],
        [PerformanceBonus],
        [ContactNo],
        ROW_NUMBER() OVER (
            PARTITION BY [EmployerID], [FirstName], [LastName], [Email], [DepartmentID], [Age], [GrossSalary], [PerformanceBonus], [ContactNo]
            ORDER BY [EmployerID], [FirstName], [LastName], [Email], [DepartmentID], [Age], [GrossSalary], [PerformanceBonus], [ContactNo]
        ) AS RowNumber
    FROM
        Employee
)
DELETE FROM CTE
WHERE RowNumber > 1;


To verify the deletion, use the following query.
SELECT * FROM Hostforlife_DeleteDuplicateRows..Employee

Method 3. Using Rank Function
The RANK function with PARTITION BY can also be used to remove duplicate rows from a table in SQL Server.

Syntax
DELETE E
    FROM <Table_Name> E
      INNER JOIN
    (
     SELECT *,
            RANK() OVER(PARTITION BY column1, column2, ...
            ORDER BY ID) rank
     FROM <Table_Name>
    ) T ON E.ID = t.ID
    WHERE rank > 1;


Example

DELETE E
FROM [Employee] E
INNER JOIN
(
    SELECT *,
           RANK() OVER (
               PARTITION BY [EmployerID], [FirstName], [LastName], [Email], [DepartmentID], [Age], [GrossSalary], [PerformanceBonus], [ContactNo]
               ORDER BY [EmployeeID]
           ) AS rank
    FROM [Employee]
) T ON E.[EmployeeID] = T.[EmployeeID]
WHERE rank > 1;


To verify the deletion, use the following query.
SELECT *
FROM Hostforlife_DeleteDuplicateRows..Employee;


Points to Remember
I would recommend you follow the points given below before deleting any type of record.

  • Back up your data.
  • Be sure to test your DELETE query with a SELECT statement.
  • Choose an effective method as per the requirement to remove duplicate rows.

See you in the next article, till then, take care and be happy learning.

HostForLIFE.eu SQL Server 2022 Hosting
HostForLIFE.eu is European Windows Hosting Provider which focuses on Windows Platform only. We deliver on-demand hosting solutions including Shared hosting, Reseller Hosting, Cloud Hosting, Dedicated Servers, and IT as a Service for companies of all sizes.




SQL Server Hosting - HostForLIFE :: DATE_BUCKET Function New T-SQL Enhancements in SQL Server

clock October 7, 2024 10:14 by author Peter

Microsoft has included new features and improvements to SQL Server 2022 in order to boost functionality, usability, and speed. The DATE_BUCKET function is one of these new capabilities; it's a useful tool for developers and data professionals working with time-based data.  The DATE_BUCKET function streamlines interval management and date grouping, facilitating the aggregation and analysis of time-based data over predetermined timeframes. For SQL developers, data engineers, and DBAs who regularly handle temporal data analysis, this is especially helpful.

Understanding the DATE_BUCKET Function
What is DATE_BUCKET?
The DATE_BUCKET function in SQL Server 2022 is designed to help group and truncate dates into fixed intervals (or "buckets"). This makes it easier to group data for analysis over consistent time periods such as days, weeks, months, quarters, or years. This is especially useful for reporting and data visualization when you want to group time-based data into periods like hourly or weekly aggregates.

DATE_BUCKET (datepart, number, date [, origin ] )

--datepart: The part of the date you want to group by, such as a day, week, month, etc. This can be
--number: The size of the time bucket, which must be an integer. year, month, week, day, hour, minute, second, millisecond, etc.
--date: The date to be truncated and grouped by the interval and datepart.


Return Type
The function returns a date/time value that is rounded down to the start of the specified bucket. This allows for grouping time-based data into logical intervals.

How is DATE_BUCKET Different from Other T-SQL Functions?

Other T-SQL functions, like DATEADD, DATEDIFF, and DATEPART, are typically used to manipulate dates, extract parts of dates, or compute the difference between dates. However, these functions don't natively support the concept of fixed time intervals (buckets). DATE_BUCKET, on the other hand, allows for grouping dates into regular intervals, which can be critical for generating time-based reports.

Example 1. Month Interval Example

This example groups dates into 2-month intervals, starting from January 1, 2024.
DECLARE @DateOrigin date = '2024-01-01'
SELECT
    '1/2m' = DATE_BUCKET(MONTH, 2, CONVERT(date, '2024-01-01'), @DateOrigin),
    '1/2m' = DATE_BUCKET(MONTH, 2, CONVERT(date, '2024-02-01'), @DateOrigin),
    '2/2m' = DATE_BUCKET(MONTH, 2, CONVERT(date, '2024-03-01'), @DateOrigin),
    '2/2m' = DATE_BUCKET(MONTH, 2, CONVERT(date, '2024-04-01'), @DateOrigin),
    '1/2m' = DATE_BUCKET(MONTH, 2, CONVERT(date, '2024-05-01'), @DateOrigin),
    '1/2m' = DATE_BUCKET(MONTH, 2, CONVERT(date, '2024-06-01'), @DateOrigin),
    '2/2m' = DATE_BUCKET(MONTH, 2, CONVERT(date, '2024-07-01'), @DateOrigin),
    '2/2m' = DATE_BUCKET(MONTH, 2, CONVERT(date, '2024-08-01'), @DateOrigin)
GO


Output

Example 2. Week Interval Example
This example groups dates into 2-week intervals, starting from January 1, 2024.
DECLARE @DateOrigin date = '2024-01-01'
SELECT
    '1/2w' = DATE_BUCKET(WEEK, 2, CONVERT(date, '2024-01-01'), @DateOrigin),
    '1/2w' = DATE_BUCKET(WEEK, 2, CONVERT(date, '2024-01-08'), @DateOrigin),
    '2/2w' = DATE_BUCKET(WEEK, 2, CONVERT(date, '2024-01-15'), @DateOrigin),
    '2/2w' = DATE_BUCKET(WEEK, 2, CONVERT(date, '2024-01-22'), @DateOrigin),
    '1/2w' = DATE_BUCKET(WEEK, 2, CONVERT(date, '2024-01-29'), @DateOrigin),
    '1/2w' = DATE_BUCKET(WEEK, 2, CONVERT(date, '2024-02-05'), @DateOrigin),
    '2/2w' = DATE_BUCKET(WEEK, 2, CONVERT(date, '2024-02-12'), @DateOrigin),
    '2/2w' = DATE_BUCKET(WEEK, 2, CONVERT(date, '2024-02-19'), @DateOrigin)
GO


Output


Example 3. Day Interval Example
This example groups dates into 2-day intervals, starting from January 1, 2022.
DECLARE @DateOrigin date = '2024-01-01'
SELECT
    '1/2d' = DATE_BUCKET(DAY, 2, CONVERT(date, '2024-01-01'), @DateOrigin),
    '2/2d' = DATE_BUCKET(DAY, 2, CONVERT(date, '2024-01-02'), @DateOrigin),
    '1/2d' = DATE_BUCKET(DAY, 2, CONVERT(date, '2024-01-03'), @DateOrigin),
    '2/2d' = DATE_BUCKET(DAY, 2, CONVERT(date, '2024-01-04'), @DateOrigin),
    '1/2d' = DATE_BUCKET(DAY, 2, CONVERT(date, '2024-01-05'), @DateOrigin),
    '2/2d' = DATE_BUCKET(DAY, 2, CONVERT(date, '2024-01-06'), @DateOrigin),
    '1/2d' = DATE_BUCKET(DAY, 2, CONVERT(date, '2024-01-07'), @DateOrigin),
    '2/2d' = DATE_BUCKET(DAY, 2, CONVERT(date, '2024-01-08'), @DateOrigin)
GO


Output


Use Cases
1. Grouping Sales Data by Weekly Buckets
Suppose we want to analyze sales orders in AdventureWorks2022 and group the orders into weekly intervals. Using DATE_BUCKET, we can easily create these weekly buckets based on the OrderDate from the Sales.SalesOrderHeader table.

In this query
DATE_BUCKET(1, WEEK, OrderDate) groups the sales orders into weekly buckets, starting from the earliest OrderDate.
The query aggregates the total number of orders and the total sales (TotalDue) within each week.

USE AdventureWorks2022
GO

SELECT
    DATE_BUCKET(WEEK, 1, OrderDate) AS OrderWeek,
    COUNT(SalesOrderID) AS TotalOrders,
    SUM(TotalDue) AS TotalSales
FROM
    Sales.SalesOrderHeader
GROUP BY
    DATE_BUCKET(WEEK, 1, OrderDate)
ORDER BY
    OrderWeek

2. Monthly Sales Data Analysis
For longer-term trends, we may want to aggregate sales on a monthly basis. DATE_BUCKET makes it simple to group the data into months, just as easily as weeks.

Example: Monthly Sales Performance

This query aggregates the sales orders into monthly intervals using DATE_BUCKET(1, MONTH, OrderDate). You can easily visualize this data in a line graph or bar chart to track monthly sales performance over time.

USE AdventureWorks2022
GO

SELECT
    DATE_BUCKET(MONTH, 1, OrderDate) AS OrderMonth,
    COUNT(SalesOrderID) AS TotalOrders,
    SUM(TotalDue) AS TotalSales
FROM
    Sales.SalesOrderHeader
GROUP BY
    DATE_BUCKET(MONTH, 1, OrderDate)
ORDER BY
    OrderMonth


Output

3. Grouping Data in Custom Intervals (e.g., 10-Day Buckets)
While DATE_BUCKET allows for standard intervals like weeks or months, you can also group dates into custom intervals. For instance, if you want to create a report based on 10-day periods instead of full months or weeks, DATE_BUCKET can handle that too.

Here, we specify an interval of 10 days, and the sales orders are grouped into periods based on that interval. This can be useful in scenarios where typical calendar boundaries like weeks or months are too coarse or too fine.

USE AdventureWorks2022
GO

SELECT
    DATE_BUCKET(DAY, 10, OrderDate) AS OrderPeriod,
    COUNT(SalesOrderID) AS TotalOrders,
    SUM(TotalDue) AS TotalSales
FROM
    Sales.SalesOrderHeader
GROUP BY
    DATE_BUCKET(DAY, 10, OrderDate)
ORDER BY
    OrderPeriod


Output

Comparing DATE_BUCKET to Other Functions
1. DATEADD and DATEDIFF
In the past, SQL developers would use combinations of DATEADD and DATEDIFF to group dates into intervals. For example, you could group sales data by year with these functions:

While this method works, it is less intuitive and more cumbersome than using DATE_BUCKET, which allows for direct and flexible interval grouping.
USE AdventureWorks2022
GO

SELECT
    DATEADD(YEAR, DATEDIFF(YEAR, 0, OrderDate), 0) AS OrderYear,
    COUNT(SalesOrderID) AS TotalOrders,
    SUM(TotalDue) AS TotalSales
FROM
    Sales.SalesOrderHeader
GROUP BY
    DATEADD(YEAR, DATEDIFF(YEAR, 0, OrderDate), 0)


Output

2. FLOOR or CEILING on Date Calculations
Another workaround for bucketing dates was using FLOOR or CEILING in conjunction with date calculations. While effective, this approach was error-prone and harder to maintain.

For example, to group dates into weekly intervals using FLOOR, you might write something like this:

This code is not as readable as using DATE_BUCKET. The DATE_BUCKET function simplifies and abstracts away the complexity, making it easier to reason about your queries.
USE AdventureWorks2022
GO


SELECT
    FLOOR(DATEDIFF(DAY, '1900-01-01', OrderDate) / 7) AS WeekNumber,
    COUNT(SalesOrderID) AS TotalOrders
FROM
    Sales.SalesOrderHeader
GROUP BY
    FLOOR(DATEDIFF(DAY, '1900-01-01', OrderDate) / 7)


Output

Benefits of Using DATE_BUCKET

1. Simplified Code

One of the most apparent benefits of DATE_BUCKET is the simplification of code when compared to older methods of date bucketing. Instead of using complex expressions with DATEADD and DATEDIFF, you can now achieve the same result with a single, readable function.

2. Flexibility and Power

DATE_BUCKET provides a powerful tool for aggregating time-based data in flexible ways. Whether you need to group data by week, month, or even custom intervals like ten days or 15 minutes, DATE_BUCKET makes it easy to express and execute these groupings.

3. Improved Performance
By natively supporting time-based intervals in a straightforward function, DATE_BUCKET improves performance over workarounds that rely on complex date manipulation functions such as combinations of DATEADD, DATEDIFF, and FLOOR. These traditional approaches often require multiple calculations and transformations to achieve similar results, which can increase both complexity and computational overhead.

Conclusion
The introduction of the DATE_BUCKET function in SQL Server 2022 marks a significant enhancement for SQL developers, data engineers, and DBAs who frequently work with time-based data. By simplifying the process of grouping dates into consistent intervals, DATE_BUCKET not only makes queries more readable and easier to maintain but also improves performance by reducing reliance on complex, manual date manipulation functions. With its ability to streamline queries, improve code maintainability, and optimize performance, DATE_BUCKET represents a valuable addition to the SQL Server toolkit, empowering professionals to better manage and analyze their time-series data.

HostForLIFE.eu SQL Server 2022 Hosting
HostForLIFE.eu is European Windows Hosting Provider which focuses on Windows Platform only. We deliver on-demand hosting solutions including Shared hosting, Reseller Hosting, Cloud Hosting, Dedicated Servers, and IT as a Service for companies of all sizes.




About HostForLIFE.eu

HostForLIFE.eu is European Windows Hosting Provider which focuses on Windows Platform only. We deliver on-demand hosting solutions including Shared hosting, Reseller Hosting, Cloud Hosting, Dedicated Servers, and IT as a Service for companies of all sizes.

We have offered the latest Windows 2016 Hosting, ASP.NET Core 2.2.1 Hosting, ASP.NET MVC 6 Hosting and SQL 2017 Hosting.


Tag cloud

Sign in