SQL - Find Duplicates, then add to another table with count - sql

There are ton's of listings on how to find duplicate rows, and remove them, or list out the duplicates. In the masses of responses i've tried searching through on here those are the only responses i've found. I figured I would just put up my question since its been an hour and still no luck.
This is the example data I have
Table Name: Customers
_____________________________
ID | CompanyName
--------------
1 | Joes
2 | Wendys
3 | Kellys
4 | Ricks
5 | Wendys
6 | Kellys
7 | Kellys
I need to be able to find all the duplicates in this table, then put the results into another table that lists what the company name is, and how many duplicates it found.
For example the above table I should have a new table that says something like
Table Name: CustomerTotals
_______________________________
ID | CompanyName | Totals
-------------------------------
1 | Joes | 1
2 | Wendys | 2
3 | Kellys | 3
4 | Ricks | 1
-----EDIT Added after 2 responses, ran into another question------
Thanks for the responses! What about the opposite? say i only want to add items to a new table "UniqueCustomers" from the Customers table that doesn't exist in CustomerTotals table?

Try this :
INSERT INTO CustomerTotals
(CompanyName, Totals)
SELECT CompanyName, COUNT(*)
FROM Customer
GROUP BY CompanyName
Use an identity column for the ID field.

for get the duplicates, you can do
Insert into CustomerTotals (Id, CompanyName, Totals)
Select min(Id), CompanyName, count(*) From Customers
group by CompanyName
there you got the results, conserving the minimun id for each company name(if you really need the first id from your original table, as I see in the results).
If not, you can use an Identity Column

If you only want the duplicates to be inserted into the second table, use this:
INSERT INTO CustomerTotals
(CompanyName, Totals)
SELECT CompanyName, COUNT(*)
FROM Customer
GROUP BY CompanyName
HAVING Count(*) > 1

The above examples are fine. But you should use count(1) instead of count(*) to improve performance.
Read this question.

I think below script is simplest...
SELECT CompanyName, COUNT(*) AS Total
INTO #tempTable
FROM Customer
GROUP BY CompanyName
HAVING Count(*) > 1

Related

SQL Server: Insert row into table, for all id's not existing yet

I have three tables in MS SQL Server, one with addresses, one with addresstypes and one with assignments of addresstypes:
Address:
IdAddress | Name | ...
1 | xyz
2 | abc |
...
AddressTypes
IdAddresstype | Caption
1 | Customer
2 | Supplier
...
Address2AddressType
IdAddress2AddressType | IdAddress | IdAddressType
1 | 1 | 2
3 | 3 | 2
Now I want to insert a row into Address2AddressType for each address, which is not assigned yet / not emerging in this table with the Addresstype Customer.
So to select those addresses, I use this query:
SELECT adresses.IdAddress
FROM [dbo].[Address] AS adresses
WHERE adresses.IdAddress NOT IN (SELECT adresstypeassignment.IdAddress
FROM [dbo].[Address2AddressType] AS adresstypeassignment)
Now I need to find a way to loop through all those results to insert like this:
INSERT INTO (Address2AddressType (IdAddress, IdAddresstype)
VALUES (<IdAddress from result>, 1)
Can anybody help, please?
Thanks in advance.
Regards
Lars
Use insert . . . select:
INSERT INTO Address2AddressType (IdAddress, IdAddresstype)
SELECT a.IdAddress, 1
FROM [dbo].[Address] a
WHERE a.IdAddress NOT IN (SELECT ata.IdAddress FROM [dbo].Address2AddressType ata);
I also simplified the table aliases.
Note: I don't recommend NOT IN for this purpose, because it does not handle NULLs the way you expect (if any values returned by the subquery are NULL no rows at all will be inserted). I recommend NOT EXISTS instead:
INSERT INTO Address2AddressType (IdAddress, IdAddresstype)
SELECT a.IdAddress, 1
FROM [dbo].[Address] a
WHERE NOT EXISTS (SELECT 1
FROM [dbo].Address2AddressType ata
WHERE ata.IdAddress = a.IdAddress
);

Why is INNER JOIN producing more records than original file?

I have two tables. Table A & Table B. Table A has 40516 rows, and records sales by seller_id. The first column in Table A is the seller_id that repeats every time a sale is made.
Example: Table A (40516 rows)
seller_id | item | cost
------------------------
1 | dog | 5000
1 | cat | 50
4 |lizard| 80
5 |bird | 20
5 |fish | 90
The seller_id is also present in Table B, and also contains the corresponding name of the seller.
Example: Table B (5851 rows)
seller_id | seller_name
-------------------------
1 | Dog and Cat World INC
4 | Reptile Love.com
5 | Ocean Dogs Inc
I want to join these two tables, but only display the seller name from Table B and all other columns from Table A. When I do this with an INNER JOIN I get 40864 rows (348 extra rows). Shouldn't the query produce only the original 40516 rows?
Also not sure if this matters, but the seller_id can contain several zeros before the number (e.g., 0000845, 0000549).
I've looked around on here and haven't really found an answer. I've tried LEFT and RIGHT joins and get the same results for one and way more results for the other.
SQL Code Example:
SELECT public.table_B.seller_name, *
FROM public.table_A
INNER JOIN public.table_B ON public.table_A.seller_id =
public.table_B.seller_id;
Expected Results:
seller_name | seller_id | item | cost
------------------------------------------------
Dog and Cat World INC | 1 | dog | 5000
Dog and Cat World INC | 1 | cat | 50
Reptile Love.com | 4 |lizard| 80
Ocean Dogs Inc | 5 |bird | 20
Ocean Dogs Inc | 5 |fish | 90
I expected the results to contain the same number of rows in Table A. Instead I gut names matching up and an additional 348 rows...
Update:
I changed "unique_id" to "seller_id" in the question.
I guess I should have chosen a better name for unique_id in the original example. I didn't mean it to be unique in the sense of a key. It is just the seller's id that repeats every time there is a sale (in Table A). The seller's ID does repeat in Table A because it is supposed to. I simply want to pair up the seller IDs with the seller names.
Thanks again everyone for their help!
unique_id is already not correctly named in the first table, so there is no reason to assume it is unique in the second table either.
Run this query to find the duplicates:
select unique_id
from table_b
group by unique_id
having count(*) > 1;
You can fix the query using distinct on:
SELECT b.seller_name, a.*
FROM public.table_A a JOIN
(SELECT DISTINCT ON (b.unique_id) b.*
FROM public.table_B b
ORDER BY b.unique_id
) b
ON a.unique_id = b.unique_id;
In this case, you may get fewer records, if there are no matches. To fix that, use a LEFT JOIN.
Because unique id column is not unique.
Gordon Linoff was correct. The seller_id (formerly listed as unique_id) was indeed duplicated throughout the data set. I foolishly assumed otherwise. Also the seller_name had many duplicates too! In the end I had to use the CONCAT() function to join the seller_id with second identifier to create a type of foreign key. After I did this the join worked as expected. Thanks everyone!

How to return unique rows having count() of multiple columns = 1 using group by?

So here is my situation:
____________________________________________
| idnumber | name | sectiongroup |
--------------------------------------------
| 123 | Joe | one |
| 123 | Barry | two |
| 1234 | Laura | one |
| 1234 | LauraCopyCat | one |
--------------------------------------------
I am trying to build a query which will return any unique (i.e. - COUNT(idnumber) = 1) id numbers in a given sectiongroup. So if you are in sectiongroup number one and no one else in your sectiongroup has the same ID number as you, then I want your idnumber. If someone in group two happens to have the same idnumer, that is okay, I still want your idnumber.
For example, Barry and Joe have the same id number but they are in separate sectiongroups, so I want to return their idnubers. However, Laura and LauraCopyCat have the SAME sectiongroup, so I do NOT want their idnumbers to be returned. So far I have the following:
SELECT idnumber
FROM namestable
GROUP BY idnumber, sectiongroup
HAVING(COUNT(idnumber) = 1)
Is there a way to add sectiongroup into the COUNT()=1 condition?
Just use COUNT(*) to avoid confusion. This will count the number of records in the particular group. Remember, a group consists of the unique combinations of values in the fields specified in your GROUP BY statement.
SELECT idnumber
FROM namestable
GROUP BY idnumber, sectiongroup
HAVING COUNT(*) = 1
Note that this will result in duplicate idnumbers, if you have records that share an id but have different subgroups. To remove duplicate, just change SELECT to SELECT DISTINCT.
Tested here: http://sqlfiddle.com/#!9/b0a50c/3

SQL - find rows having n duplicate values

Let's say I have a table like this:
OrderId | CustomerId | ProductName
======================================
73 | 301 | Sponge
74 | 508 | Garbage Bag
75 | 301 | Spoon
76 | 301 | Bacon
77 | 508 | Dog treats
78 | 301 | Paper
79 | 905 | Text book
and I want to find a customer who has made two orders in the past. How would I set up the query?
For the table above, the query would return the two rows for customer 508.
How would I modify it to return customers who have one previous order, so that it would return the row for customer 905?
select customerId, count(*)
from mytable
group by customerId
having count(*) >= 2
If you need only CustomerId of those who have exactly one order in table (they exist once) then the following query groups customers and counts how many times they appear in a table (here showing only those who appear once, modify as you wish).
SELECT
CustomerId
FROM
table
GROUP BY 1
HAVING COUNT(*) = 1
Let's say you want to list every customer and number of orders they've placed but no less than 2 then modify above query to add COUNT(*) in column list to be selected and the HAVING condition like that:
SELECT
CustomerId
COUNT(*) AS no_of_orders
FROM
table
GROUP BY 1
HAVING COUNT(*) > 1
I used a SQL query of something like that just yesterday.
I adapted it to your table, but I'm not 100% sure it will work.
CREATE TEMPORARY TABLE COMMANDS
SELECT OrderId, ProductName, COUNT(CustomerId) AS NbCmds
FROM your_table
GROUP BY OrderId;
SELECT CustomerId
FROM COMMANDS
WHERE NbCmds > 1;
Edit : You can also do it with a HAVING clause. Take a look here

Select ID given the list of members

I have a table for the link/relationship between two other tables, a table of customers and a table of groups. a group is made up of one or more customers. The link table is like
APP_ID | GROUP_ID | CUSTOMER_ID
1 | 1 | 123
1 | 1 | 124
1 | 1 | 125
1 | 2 | 123
1 | 2 | 125
2 | 3 | 123
3 | 1 | 123
3 | 1 | 124
3 | 1 | 125
I now have a need, given a list of customer IDs to be able to get the group ID for that list of customer IDs. Group ID may not be unique, the same group ID will contain the same list of customer IDs but this group may exist in more than one app_id.
I'm thinking that
SELECT APP_ID, GROUP_ID, COUNT(CUSTOMER_ID) AS COUNT
FROM GROUP_CUST_REL
WHERE CUSTOMER_ID IN ( <list of ids> )
GROUP BY APP_ID, GROUP_ID
HAVING COUNT(CUSTOMER_ID) = <number of ids in list>
will return me all of the group IDs that contain all of the customer ids in the given list and only those group ids. So for a list of (123,125) only group id 2 would be returned from the above example
I will then have to link with the app table to use its created timestamp to identify the most recent application that the group existed in so that I can then pull the correct/most up to date info from the group table.
Does anyone have any thoughts on whether this is the most efficient way to do this? If there is another quicker/cleaner way I'd appreciate your thoughts.
This smells like a division:
Division sample
Other related stack overflow question
Taking a look at the provided links you'll see the solution to similar issues from relational alegebra's point of view, doesn't seem to be quicker and arguably cleaner.
I didn't look at your solution at first, and when I solved this I turned out to have solved this the same way you did.
Actually, I thought this:
<number of ids in list>
Could be turned into something like this (so that you don't need the extra parameter):
select count(*) from (<list of ids>) as t
But clearly, I was wrong. I'd stay with your current solution if I were you.