Как найти дубликаты в таблице sql

Создание игр на Unreal Engine 5
Данный курс научит Вас созданию игр на Unreal Engine 5. Курс состоит из 12 модулей, в которых Вы с нуля освоите этот движок и сможете создавать самые разные игры.
В курсе Вы получите всю необходимую теоретическую часть, а также увидите массу практических примеров. Дополнительно, почти к каждому уроку идут упражнения для закрепления материала.
Помимо самого курса Вас ждёт ещё 8 бесплатных ценных Бонусов: «Chaos Destruction», «Разработка 2D-игры», «Динамическая смена дня и ночи», «Создание динамической погоды», «Создание искусственного интеллекта для NPC», «Создание игры под мобильные устройства», «Создание прототипа RPG с открытым миром» и и весь курс «Создание игр на Unreal Engine 4» (актуальный и в 5-й версии), включающий в себя ещё десятки часов видеоуроков.
Подпишитесь на мой канал на YouTube, где я регулярно публикую новые видео.
Подписаться

Подписавшись по E-mail, Вы будете получать уведомления о новых статьях.
Подписаться

Добавляйтесь ко мне в друзья ВКонтакте! Отзывы о сайте и обо мне оставляйте в моей группе.
SQL Find Duplicates Like a Pro: 3 Guaranteed Techniques

Aah…. duplicates! They are everywhere! Look around you – multiple charger cables, headphones, pictures in your smartphone! But we are not here to talk about those duplicates. No, Sir! We are here to address the duplicates in sql, how to find them and possibly resolve them in your SQL code.
In this SQL find duplicates post, let us look at 3 ways to identify duplicate rows/columns and then conclude by looking at 2 ways to mitigate them.
Let us start by looking at a very simple database table, USER_DIET. The below listed table shows the Fruit consumption of Sam and John over two days.
Just by looking at the data can you tell if there are duplicates in the table, say for the column “NAME”?
| NAME | FRUIT | DAY |
| John | Apple | Monday |
| Sam | Orange | Monday |
| John | Orange | Tuesday |
| Sam | Banana | Tuesday |
| John | Peach | Wednesday |
| Sam | Banana | Wednesday |
The most obvious answer is YES! John occurs 3 times and so does Sam.
How about if we were to look at columns NAME and FRUIT? Once again, the answer would be YES, because “Sam” and “Banana” occurs twice. Apparently, Sam loves bananas, while John prefers a different fruit every day.
Finally, let’s look at columns NAME, FRUIT and DAY. Do you see any duplicates now?
The answer is NO. There are no duplicates because both Sam and John had a different fruit on each day.
The point I would like to drive home is this! To truly understand if data is duplicate, you need to understand the context and the functionality behind it.
Note: All SQL examples below use Oracle SQL syntax. However, they should work across most relational databases with minimal changes.
1. SQL Find Duplicates using Count
The most common method to find duplicates in sql is using the count function in a select statement. There are two other clauses that are key to finding duplicates: GROUP BY and HAVING.
Let us continue using the database table (USER_DIET) from the previous example and see if we can find duplicates for the NAME column.
a. Duplicates in a single column
In this second example, let us look at finding duplicates in multiple columns: NAME and FRUIT.
Lets think this thru and put things in context before diving into our select statement. As yourself, what am I trying to find here ?
We are trying to find if any of the users, in this case, Sam/John had the same fruit twice. That it ! This context is based on the two fields NAME and FRUIT.
b. Duplicates in multiple columns
Key to remember, the columns in the select statement, excluding the count(*) should be the exact same in the group by clause as well.
Also note that using the count(*) function gives you a count of the number of occurrences of a value. In this case, “Sam” + “Banana” occurs twice in the table, but in actuality we only have one duplicate row.
c. SQL to find duplicate rows
The SQL to find duplicate rows in a table is not the same as checking for duplicates in a column.
Ideally, if the database table has the right combination of key columns, you should not have duplicate rows. Regardless, if you are suspicious that your table has duplicate rows, perform the below steps.
- Determine they Key columns on your table.
- If the table does not have keys defined, determine which column(s) makes a row unique. Often times this depends on the functional use case of the data.
- Add the fields from Step 1 or Step 2 to your SQL COUNT(*) clause.
Using the USER_DIET table above, lets assume no keys were defined on the table. Our next option would be determining which column(s) makes a row unique.
Note that the table has 3 rows. If Sam or Jon had the same fruit more than once on the same day, this would create a duplicate row.
Could Sam or Jon eating different fruits on the same day be considered a duplicate row?
The answer – Maybe! It depends on the functional use case of the data.
Find duplicate records in MySQL
I want to pull out duplicate records in a MySQL Database. This can be done with:
Which results in:
I would like to pull it so that it shows each row that is a duplicate. Something like:
Any thoughts on how this can be done? I’m trying to avoid doing the first one then looking up the duplicates with a second query in the code.
28 Answers 28
The key is to rewrite this query so that it can be used as a subquery.
Why not just INNER JOIN the table with itself?
A DISTINCT is needed if the address could exist more than two times.
![]()
I tried the best answer chosen for this question, but it confused me somewhat. I actually needed that just on a single field from my table. The following example from this link worked out very well for me:
Isn’t this easier :
10 000 duplicate rows in order to make them unique, much faster than load all 600 000 rows.
This is the similar query you have asked for and its 200% working and easy too. Enjoy.
Find duplicate users by email address with this query.
![]()
we can found the duplicates depends on more then one fields also.For those cases you can use below format.
![]()
Finding duplicate addresses is much more complex than it seems, especially if you require accuracy. A MySQL query is not enough in this case.
I work at SmartyStreets, where we do address validation and de-duplication and other stuff, and I’ve seen a lot of diverse challenges with similar problems.
There are several third-party services which will flag duplicates in a list for you. Doing this solely with a MySQL subquery will not account for differences in address formats and standards. The USPS (for US address) has certain guidelines to make these standard, but only a handful of vendors are certified to perform such operations.
So, I would recommend the best answer for you is to export the table into a CSV file, for instance, and submit it to a capable list processor. One such is LiveAddress which will have it done for you in a few seconds to a few minutes automatically. It will flag duplicate rows with a new field called "Duplicate" and a value of Y in it.
Finding duplicate values in a SQL table
This query will give us John, Sam, Tom, Tom because they all have the same email .
However, what I want is to get duplicates with the same email and name .
That is, I want to get "Tom", "Tom".
The reason I need this: I made a mistake, and allowed inserting duplicate name and email values. Now I need to remove/change the duplicates, so I need to find them first.
39 Answers 39
Simply group on both of the columns.
Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of «functional dependency»:
In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.
Support is not consistent:
- Recent PostgreSQL supports it.
- SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
- MySQL is unpredictable and you need sql_mode=only_full_group_by :
-
; (see comments in accepted answer).
![]()
if you want the IDs of the dups use this:
to delete the duplicates try:
![]()
![]()
If you want to delete the duplicates, here’s a much simpler way to do it than having to find even/odd rows into a triple sub-select:
And so to delete:
Much more easier to read and understand IMHO
Note: The only issue is that you have to execute the request until there is no rows deleted, since you delete only 1 of each duplicate each time
In contrast to other answers you can view the whole records containing all columns if there are any. In the PARTITION BY part of row_number function choose the desired unique/duplicit columns.
When you want to select ALL duplicated records with ALL fields you can write it like
![]()
![]()
A little late to the party but I found a really cool workaround to finding all duplicate IDs:
![]()
This selects/deletes all duplicate records except one record from each group of duplicates. So, the delete leaves all unique records + one record from each group of the duplicates.
Be aware of larger amounts of records, it can cause performance problems.
![]()
In case you work with Oracle, this way would be preferable:
![]()
Well this question has been answered very neatly in all the above answers. But I would like to list all the possible manners, we can do this in various ways which may impart the understanding how we can do it and seeker can pick one of the solution which best fits to his/her need as this is one of the most common query SQL developer come across different business usecases or sometime in interviews as well.
Creating Sample Data
I will start with setting up some sample data from this question only.

1. USING GROUP BY CLAUSE

- the GROUP BY clause groups the rows into groups by values in both name and email columns.
- Then, the COUNT() function returns the number of occurrences of each group (name,email).
- Then, the HAVING clause keeps only duplicate groups, which are groups that have more than one occurrence.
2. Using CTE:
To return the entire row for each duplicate row, you join the result of the above query with the NewTable table using a common table expression (CTE):

3. Using ROW_NUMBER() function

- ROW_NUMBER() distributes rows of the NewTable table into partitions by values in the name and email columns. The duplicate rows will have repeated values in the name and email columns, but different row numbers
- Outer query removes the first row in each group.
Well Now I believe, you can have sound Idea of how to find duplicates and apply the logic to find duplicate in all possible scenarios. Thanks.