Detecting and Removing Duplicates in SQL
π‘ Concept Name
Duplicate detection involves identifying rows with the same values in specified columns. Duplicate removal ensures only unique entries remain in a table.
π Quick Intro
Duplicates often creep into data through imports or bad design. SQL provides tools like GROUP BY
, ROW_NUMBER()
, and DISTINCT
to detect and remove such redundancy cleanly.
π§ Analogy / Short Story
Imagine sorting a stack of resumes. If you find multiple copies of the same resume, you only keep one and discard the rest. SQL does the same with duplicatesβby identifying repeated entries and keeping the most relevant or earliest one.
π§ Technical Explanation
GROUP BY
withHAVING COUNT(*) > 1
finds duplicates.ROW_NUMBER()
ranks rows within a partition, allowing precise duplicate removal.DISTINCT
is a simple way to remove duplicates during data retrieval, not permanent removal.- Common Table Expressions (CTEs) help isolate duplicates before deletion.
- Indexes and constraints like
UNIQUE
prevent future duplicates.
π― Purpose & Use Case
- β Clean up repeated records in transactional tables.
- β Ensure integrity of reports and analytics by removing redundant data.
- β Prepare data for migration or export.
- β Avoid violations of unique constraints in updates or inserts.
π» Real Code Example
-- Detect duplicates
SELECT name, COUNT(*)
FROM Employees
GROUP BY name
HAVING COUNT(*) > 1;
-- Remove duplicates using CTE + ROW_NUMBER
WITH RankedDuplicates AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id) AS rn
FROM Employees
)
DELETE FROM RankedDuplicates WHERE rn > 1;

β Interview Q&A
Q1: How do you find duplicates in a table?
A: Use GROUP BY
with HAVING COUNT(*) > 1
to detect repeated entries.
Q2: Can DISTINCT permanently remove duplicates?
A: No, it only removes duplicates in the result setβnot from the table itself.
Q3: What does ROW_NUMBER() do in duplicate removal?
A: It assigns a unique rank to each row, allowing us to delete all but the first occurrence.
Q4: Why use CTE with ROW_NUMBER() for deletion?
A: CTE allows structured identification of duplicates before removing them safely.
Q5: How can we avoid duplicates in future?
A: Apply UNIQUE
constraints or keys on important columns.
Q6: What happens if duplicate rows have different IDs?
A: You can still detect them by checking for duplicate values in key fields like name or email.
Q7: Whatβs better for large tablesβDISTINCT or ROW_NUMBER()?
A: ROW_NUMBER()
gives more control and scalability for deletion tasks.
Q8: Can you find duplicates across multiple columns?
A: Yes, by grouping on multiple columns (e.g., name, email).
Q9: Is DELETE faster or TRUNCATE for duplicates?
A: DELETE
is used for selective removal; TRUNCATE
wipes the entire table.
Q10: Can you use joins to detect duplicates?
A: Yes, self-joins can help find duplicates based on matching criteria.
π MCQs
Q1. Which SQL clause helps detect duplicates?
- WHERE
- GROUP BY with HAVING
- JOIN
- ORDER BY
Q2. Which function assigns a rank to each row?
- COUNT()
- RANK()
- ROW_NUMBER()
- PARTITION()
Q3. What keyword is used to remove duplicates in a SELECT?
- UNIQUE
- DISTINCT
- FILTER
- ONLY
Q4. Can CTEs be used to delete duplicates?
- No
- Yes
- Only in MySQL
- Only with JOINs
Q5. How can future duplicates be prevented?
- Adding indexes
- Using DISTINCT
- Using WHERE clause
- Using UNIQUE constraints
Q6. What does HAVING COUNT(*) > 1 detect?
- Nulls
- Primary keys
- Duplicate records
- Foreign keys
Q7. Which is not used for removing duplicates?
- ROW_NUMBER()
- DELETE
- CTE
- INSERT
Q8. How do you rank rows within a group?
- Use GROUP BY
- Use ORDER BY only
- Use JOIN
- Use PARTITION BY with ROW_NUMBER()
Q9. What does ROW_NUMBER() OVER (PARTITION BY...) generate?
- Sum of values
- Count of rows
- Row rank in each group
- Filtered data
Q10. Which clause removes specific rows from a table?
- SELECT
- TRUNCATE
- DELETE
- DISTINCT
π‘ Bonus Insight
For better performance, ensure relevant indexes exist before running duplicate detection queries on large datasets. Also, always back up your data before bulk deletion operations.
π PDF Download
Need a handy summary for your notes? Download this topic as a PDF!