Thursday, 1 November 2018

Query to Remove the Duplicates from a Data Set

Assume I have a table TMP with structure as given below.


  1. CREATE TABLE TMP (
  2. NAME VARCHARW(30)
  3. )

I have populated the duplicate values as well.


  1. SELECT * FROM TEMP;

  2. Output
  3. ------
  4. Rahim
  5. Vicky
  6. Sai
  7. Tom
  8. Rahim
  9. Rahim
  10. Vicky
  11. Sai
  12. Jaz
  13. Apple
  14. Apple
  15. Rahim

Query to remove the Duplicate.

Method 1.

  1. WITH ds AS ( SELECT rownum AS ID, name FROM tmp),  
  2. dup_set AS (SELECT id, RANK() OVER(PARTITION    
  3. BY name ORDER BY ROWNUM) AS rk, name from ds)  
  4. SELECT name FROM dup_set WHERE rk=1;  

Output:

  1. Rahim
  2. Vicky
  3. Sai
  4. Tom
  5. Jaz
  6. Apple

Method 2.

  1. WITH set_a AS  
  2.   ( SELECT ROWNUM AS id, name FROM tmp a  
  3.   ),  
  4.   set_b AS  
  5.   (SELECT ROWNUM AS id, name FROM tmp b  
  6.   )  
  7. SELECT a.name  
  8. FROM set_a a  
  9. WHERE a.id >= ALL  
  10.   (SELECT id FROM set_b b   
  11.   WHERE a.name=b.name  
  12.   );  


Method 3.


  1. WITH rem_dup_set AS (
  2. SELECT ROW_NUMBER() OVER(PARTITION BY name
  3.    ORDER BY name) AS rk, name
  4. FROM temp )
  5. SELECT name FROM rem_dup_set WHERE rk=1;

Output:

  1. Rahim
  2. Vicky
  3. Sai
  4. Tom
  5. Jaz
  6. Apple