Notes of a Junior on the Path to Oracle Mastery
On this blog I am writing about thoughts, solutions, failures and pitfalls on the path to Oracle Mastery. It’s still a long way to go. On this blog, the main focus is learning Oracle.
Removing Duplicate Rows
Author: HP Fuchs | Category: Oracle Database, SQL Server
Oracle
For our example we need to insert a duplicate row into HR.EMPLOYEES first. The new row has to be identical on first_name, last_name, email and department_id columns. To insert such a row, we have to disable the unique constraint on the HR.EMPLOYEES table.
ALTER TABLE employees DISABLE CONSTRAINT emp_email_uk;
INSERT INTO employees (
EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER,
HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT, MANAGER_ID,
DEPARTMENT_ID
) VALUES (
‘501′, ‘Donald’, ‘OConnell’, ‘DOCONNEL’, ‘650.507.9833′, TO_DATE(’21.06.99′,’DD.MM.RR’), ‘IT_PROG’, ‘2600′,null,’124′,’50′;
Duplicate rows can be removed like this:
DELETE FROM employees
WHERE rowid IN (
SELECT rowid
FROM employees
MINUS
SELECT MAX(rowid)
FROM employees
GROUP BY first_name, last_name, email, department_id
);
Rows will be considered duplicate, if they match on first_name, last_name, email and department_id. In production it is not wise to delete duplicate rows without copying the table first.
SELECT * FROM employees INTO employees_copy;
To enable the emp_email_uk unique constraint again:
ALTER TABLE employees ENABLE CONSTRAINT emp_email_uk;
SQL Server
In SQL Server it is a bit trickier to delete duplicate rows. It is not possible without a second table, for example a temporary table. For our examples we need a table with a few rows, one of them is duplicate.
CREATE TABLE t (
c1 INT NULL,
c2 INT NULL,
c3 INT NULL);
INSERT INTO t (c1, c2, c3) VALUES (1, 2, 3);
INSERT INTO t (c1, c2, c3) VALUES (4, 5, 6);
INSERT INTO t (c1, c2, c3) VALUES (7, 8, 9);
INSERT INTO t (c1, c2, c3) VALUES (7, 8, 9);
It’s easy if a row is considered duplicate when all column values are identical:
SELECT DISTINCT * INTO #t_tmp FROM t;
TRUNCATE TABLE t;
INSERT t SELECT * FROM #t_tmp;
DROP TABLE #t_tmp;
SELECT * FROM t;
If somebody by accident has loaded the same file twice into our table, and we have a column to log the insert date, rows are duplicate but not identical. SELECT DISTINCT won’t help anymore here.
ALTER TABLE t ADD d DATETIME NOT NULL DEFAULT getdate();
INSERT INTO t (c1, c2, c3) VALUES (1, 2, 3);
INSERT INTO t (c1, c2, c3) VALUES (4, 5, 6);
INSERT INTO t (c1, c2, c3) VALUES (7, 8, 9);
SELECT * FROM t;
Here a UNIQUE INDEX with the IGNORE_DUP_KEY option may help.
CREATE TABLE #t_tmp (
c1 INT NULL,
c2 INT NULL,
c3 INT NULL,
d DATETIME NULL);
CREATE UNIQUE INDEX ui ON #t_tmp (c1, c2, c3) WITH IGNORE_DUP_KEY;
INSERT #t_tmp SELECT * FROM t;
TRUNCATE TABLE t;
INSERT t SELECT * FROM #t_tmp;
DROP TABLE #t_tmp;
SELECT * FROM t;
RSS Full


May 2nd, 2008 at 12:01 pm
I get useful information. Thank you.
Have a nice day ^^