January 16, 2018 by Robert Gravelle

The majority of duplicate records fall into one of two categories: Duplicate Meaning and Non-unique Keys. The How to Spot and Delete Values with Duplicate Meaning in MySQL blog dealt with Duplicate Meaning; the follow-up addressed how to identify Non-unique Keys. That’s where two records in the same table have the same key, but may or may not have different values and meanings. Today’s blog will cover how to delete rows with duplicated data, but with different keys.

Identifying Duplicates by Type

The last query presented in the How to Identify Duplicates with Non-unique Keys in MySQL blog listed all the duplicates in a format that was easy to visually scan through:

Repetitions		row_data
----------------------------------------------------------------
2		22 (DAVIS, JENNIFER) \| 22 (DAVIS, JENNIFER)
2		23 (LOLLOBRIGIDA, JOHNNY) \| 23 (GABLE, CHRISTIAN)
2		41 (WAHLBERG, NICK) \| 12 (WAHLBERG, NICK)

Having identified all of the duplicated keys and values, we can decide how best to deal with the redundant data.

JENNIFER DAVIS appears in two records with the same key of 22, making those rows exact duplicates. Nick Walberg’s name fields are duplicated, but the IDs are not. There is also a duplicated key that is associated with two unrelated actors: #23 for “JOHNNY LOLLOBRIGIDA” and “CHRISTIAN GABLE”. With regards to the duplicated keys of 22 and 23, the first is a true duplicate, whereas the second only needs a new key to be generated for one of the records.

Deleting Rows using DELETE JOIN

In the How to Spot and Delete Values with Duplicate Meaning in MySQL blog, we removed duplicates from SELECT result sets by performing a Search & Replace on values. Here we will permanently delete one of the duplicated rows using the DELETE JOIN statement.

Since we are comparing fields from the same table, we have to join the table to itself. We can choose to keep either the lower or higher id number by comparing the ids in the WHERE clause. The following statement keeps the highest id:

DELETE a FROM wp.amalgamated_actors a
  INNER JOIN wp.amalgamated_actors a2
WHERE a.id < a2.id
AND   a.first_name = a2.first_name
AND   a.last_name  = a2.last_name;

1 row(s) affected	0.093 sec

In this case, the affected (deleted) row is NICK WAHLBERG with an id of 12. A quick SELECT confirms the result:

id	first_name	last_name
-------------------------------------
10	PENELOPE	GUINESS
12	NICK	WAHLBERG
14	ED	CHASE
22	JENNIFER	DAVIS
23	JOHNNY	LOLLOBRIGIDA
27	BETTE	NICHOLSON
34	GRACE	MOSTEL
39	JOE	SWANK
23	CHRISTIAN	GABLE
22	JENNIFER	DAVIS

If you wanted to keep the lowest id, you would just change the a.id < a2.id expression to a.id > a2.id.

id	first_name	last_name
-------------------------------------
10	PENELOPE	GUINESS
14	ED	CHASE
22	JENNIFER	DAVIS
23	JOHNNY	LOLLOBRIGIDA
27	BETTE	NICHOLSON
34	GRACE	MOSTEL
41	NICK	WAHLBERG
39	JOE	SWANK
23	CHRISTIAN	GABLE
22	JENNIFER	DAVIS

Deleting Rows with Non-unique Keys

In the case of JENNIFER DAVIS, who appears twice with the same id of 22, we would need to employ a different approach because running the above statement with a.id = a2.id will target every row in the table! The reason is that we are essentially matching every row against itself! In the next blog, we’ll learn how to delete rows with non-unique keys such as these.

Navicat Blog

How to Delete Duplicate Rows with Different IDs in MySQL (Part 3)

Identifying Duplicates by Type

Deleting Rows using DELETE JOIN

Deleting Rows with Non-unique Keys