How Can You Find Duplicate Records in a SQL Table?

πŸ’‘ Concept Name

Duplicate Records in SQL – Rows in a table where one or more columns have the same values repeated, potentially causing data integrity issues.

πŸ“˜ Quick Intro

Finding duplicates involves identifying rows where specific column values appear more than once, using grouping and filtering techniques.

🧠 Analogy / Short Story

Imagine a guest list for a party β€” if the same person is accidentally invited twice, the list needs cleaning to avoid confusion. Similarly, in databases, duplicate records can cause redundancy and errors, so identifying them is important.

πŸ”§ Technical Explanation

  • Use the GROUP BY clause on columns you want to check for duplicates.
  • Apply HAVING COUNT(*) > 1 to filter groups that appear multiple times.
  • You can find duplicates for a single column or a combination of columns (composite duplicates).
  • Using window functions like ROW_NUMBER() can help identify duplicates with more control.

πŸ’» Real Code Example

-- Find duplicates based on one column (e.g., Email)
SELECT Email, COUNT(*)
FROM Users
GROUP BY Email
HAVING COUNT(*) > 1;

-- Find duplicates based on multiple columns (e.g., FirstName and LastName)
SELECT FirstName, LastName, COUNT(*)
FROM Employees
GROUP BY FirstName, LastName
HAVING COUNT(*) > 1;

-- Using ROW_NUMBER to find duplicates and keep one record
WITH CTE AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY Email ORDER BY (SELECT 0)) AS rn
    FROM Users
)
SELECT * FROM CTE WHERE rn > 1;

❓ Interview Q&A

Q1: What SQL clause is used to group rows when finding duplicates?
A: The GROUP BY clause.

Q2: How do you filter groups having duplicates?
A: Using HAVING COUNT(*) > 1.

Q3: Can you find duplicates based on multiple columns?
A: Yes, by grouping on multiple columns.

Q4: What does the ROW_NUMBER() function do in duplicate detection?
A: It assigns unique sequential numbers to rows within a partition to help identify duplicates.

Q5: Why is it important to find duplicates?
A: Duplicates can cause data inconsistency and errors in analysis.

Q6: Can duplicates occur in primary key columns?
A: No, primary key columns must be unique.

Q7: How can you delete duplicates after finding them?
A: Use CTEs with ROW_NUMBER() to delete rows where row number > 1.

Q8: Can indexes help prevent duplicates?
A: Yes, unique indexes enforce uniqueness.

Q9: Are NULL values considered duplicates?
A: It depends on the database system’s handling of NULLs.

Q10: What is a composite duplicate?
A: Duplicate records identified based on multiple column values combined.

πŸ“ MCQs

Q1. Which SQL clause groups rows to find duplicates?

  • WHERE
  • ORDER BY
  • GROUP BY
  • HAVING

Q2. How do you filter groups with duplicates?

  • WHERE COUNT(*) > 1
  • HAVING COUNT(*) > 1
  • GROUP BY COUNT(*) > 1
  • ORDER BY COUNT(*) > 1

Q3. Can you find duplicates based on multiple columns?

  • No
  • Yes
  • Only single columns
  • Depends on DBMS

Q4. What does ROW_NUMBER() do in duplicate detection?

  • Counts rows
  • Assigns unique numbers within partitions
  • Deletes duplicates
  • Filters data

Q5. Why find duplicates?

  • To increase duplicates
  • To avoid data inconsistency
  • To slow queries
  • To increase storage

Q6. Can primary keys have duplicates?

  • Yes
  • No
  • Sometimes
  • Only with composite keys

Q7. How to delete duplicates after finding?

  • Use GROUP BY
  • Use ROW_NUMBER() with CTE
  • Use WHERE clause
  • Use DISTINCT

Q8. Do indexes prevent duplicates?

  • No
  • Yes, unique indexes do
  • Sometimes
  • Depends on DBMS

Q9. Are NULLs considered duplicates?

  • Always
  • Never
  • Depends on DBMS
  • Only for primary keys

Q10. What is a composite duplicate?

  • Single column duplicates
  • Duplicates based on multiple columns
  • No duplicates
  • Depends on index

πŸ’‘ Bonus Insight

Using window functions like ROW_NUMBER() allows you to identify and remove duplicates efficiently without losing the original record.

πŸ“„ PDF Download

Need a handy summary for your notes? Download this topic as a PDF!

πŸ’¬ Feedback
πŸš€ Start Learning
Share:

Tags: