¶àÖÖ·½·¨´ÓSQL ServerÖÐÈ¥³ýÖØ¸´¼Ç¼
Different strategies for removing duplicate records in SQL Server
Written By: Arshad Ali -- 1/8/2010
from£ºhttp://www.mssqltips.com/tip.asp?tip=1918
The level of compression is amazing.
Free SQL Server Books ----- Learn fast with our tutorials!
Problem
In data warehousing applications during ETL (Extraction, Transformation and Loading) or even in OLTP (On Line Transaction Processing) applications we are often encountered with duplicate records in our table. To make the table data consistent and accurate we need to get rid of these duplicate records keeping only one of them in the table. In this tip I discuss different strategies which you can take for this, along with the pros and cons.
Solution
There are different methods for deleting duplicate (de-duplication) records from a table, each of them has its own pros and cons. I am going to discuss these methods, prerequisite of each of these methods along with its pros and cons.
Using correlated subquery
Using temporary table
Creating new table with distinct records and renaming it..
Using Common Table Expression (CTE)
Using Fuzzy Group Transformation in SSIS
Using MERGE Statement
1. Using correlated subquery
If you already have a identity column on your table, your work is half done. You can use a correlated subquery to get rid of the duplicates.
First let me briefly tell you how a correlated subquery works. In a correlated subquery, first outer query is evaluated, the result from the outer query is used by an inner sub query for its evaluation, whatever the outcome of the inner sub-query is again used by the outer query to get the final resultset. To learn more about correlated subqueries, you can click here.
In the example below, for the data deletion I am joining the inner query columns with the outer query to find the record with
Ïà¹ØÎĵµ£º
QL·ÖÀࣺ
DDL—Êý¾Ý¶¨ÒåÓïÑÔ(CREATE£¬ALTER£¬DROP£¬DECLARE)
DML—Êý¾Ý²Ù×ÝÓïÑÔ(SELECT£¬DELETE£¬UPDATE£¬INSERT)
DCL—Êý¾Ý¿ØÖÆÓïÑÔ(GRANT£¬REVOKE£¬COMMIT£¬ROLLBACK)
Ê×ÏÈ,¼òÒª½éÉÜ»ù´¡Óï¾ä£º
1¡¢ËµÃ÷£º´´½¨Êý¾Ý¿â
CREATE DATABASE database-name
2¡¢ËµÃ÷£ºÉ¾³ýÊý¾Ý¿â
drop database dbname
3¡¢ ......
CREATE PROCEDURE dbo.OutputData
@tablename varchar(200)
AS
declare @column varchar(1000)
declare @ ......
ORACLE SQLÐÔÄÜÓÅ»¯´ó×ܽá
1£© Ñ¡Ôñ×îÓÐЧÂʵıíÃû˳Ðò(Ö»ÔÚ»ùÓÚ¹æÔòµÄÓÅ»¯Æ÷ÖÐÓÐЧ)£º
" ?; J) n+ v8 J- v" q OracleµÄ½âÎöÆ÷°´ÕÕ´ÓÓÒµ½×óµÄ˳Ðò´¦Àífrom×Ó¾äÖеıíÃû£¬from×Ó¾äÖÐдÔÚ×îºóµÄ±í(»ù´¡±í driving table)½«±»×îÏÈ´¦Àí£¬ÔÚfrom×Ó¾äÖаüº¬¶à¸ö±íµÄÇé¿öÏÂ,Äã±ØÐëÑ¡Ôñ¼Ç¼ÌõÊý×îÉٵıí×÷Ϊ»ù´¡±í¡ ......
ÔÚSQLÓï¾äÓÅ»¯¹ý³ÌÖУ¬ÎÒÃǾ³£»áÓõ½hint,ÏÖ×ܽáÒ»ÏÂÔÚSQLÓÅ»¯¹ý³ÌÖг£¼ûOracle HINTµÄÓ÷¨£º
1. /*+ALL_ROWS*/
±íÃ÷¶ÔÓï¾ä¿éÑ¡Ôñ»ùÓÚ¿ªÏúµÄÓÅ»¯·½·¨,²¢»ñµÃ×î¼ÑÍÌÍÂÁ¿,ʹ×ÊÔ´ÏûºÄ×îС»¯.
ÀýÈç:
SELECT /*+ALL+_ROWS*/ EMP_NO,EMP_NAM,DAT_IN from BSEMPMS WHERE EMP_NO='SCOTT';
2. /*+FIRST_ROWS*/
±íÃ÷¶ÔÓï¾ ......