SQL Server: Retrieve the duplicate value in a column - sql

How could I filter out the duplicate value in a column with SQL syntax?
Thanks.

A common question, though filtering out suggests you want to ignore the duplicate ones? If so, listing unique values:
SELECT col
FROM table
GROUP BY col
HAVING (COUNT(col) =1 )
If you just want to filter so you are left with the duplicates
SELECT col, COUNT(col) AS dup_count
FROM table
GROUP BY col
HAVING (COUNT(col) > 1)
Used www.mximize.com/how-to-find-duplicate-values-in-a-table- as a base in 2009; website content has now gone (2014).

Use DISTINCT or GROUP BY** clause.
Select DISTINCT City from TableName
OR
Select City from TableName GROUP BY City

Depending on how you mean "filter" you could either use DISTINCT or maybe GROUP BY both are used to Remove or Group duplicate entries.
Check the links for more information.
A snippet from the DISTINCT-link above:
SELECT DISTINCT od.productid
FROM [order details] OD
SELECT od.productid
FROM [order details] OD
GROUP BY od.productid
Both of these generally result in the same output.

You asked for a list of the duplicates. Here is a simple way using the except operator (SQL 2008 +).
select [column] from Table1
except
select distinct [column] from Table1;
Alternatively, you could use standard SQL
select [column] from Table1
where [column} not in
(select distinct [column} from Table1);

distinct would be the keyword to filter douplicates.
May be you can explain a little more what you're trying to achieve ?

Related

Get Max from a joined table

I write this script in SQL server And I want get the food name with the Max of order count From this Joined Table . I can get Max value correct but when I add FoodName is select It give me an error.
SELECT S.FoodName, MAX(S.OrderCount) FROM
(SELECT FoodName,
SUM(Number) AS OrderCount
FROM tblFactor
INNER JOIN tblDetail
ON tblFactor.Factor_ID = tblDetail.Factor_ID
WHERE FactorDate = '2020-10-30'
GROUP BY FoodName)S
Here is The Error Message
Column 'S.FoodName' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
also I know I can use order by and top to achieve the food Name and Max order number but I want use the way I use in this script . Thank you for your answers
If I follow you correctly, you can use ORDER BY and TOP (1) directly on the result of the join query:
SELECT TOP (1) f.FoodName, SUM(d.Number) AS OrderCount
FROM tblFactor f
INNER JOIN tblDetail d ON f.Factor_ID = d.Factor_ID
WHERE f.FactorDate = '2020-10-30'
GROUP BY f.FoodName
ORDER BY OrderCount DESC
Notes:
I added table aliases to the query, and prefixed each column with the table it (presumably !) comes from; you might need to review that, as I had to make assumptions
If you want to allow top ties, use TOP (1) WITH TIES instead
You have an aggregation function in the outer query MAX() and an unaggregated column. Hence, the database expects a GROUP BY.
Instead, use ORDER BY and LIMIT:
SELECT FoodName, SUM(Number) AS OrderCount
FROM tblFactor f INNER JOIN
tblDetail d
ON fd.Factor_ID = d.Factor_ID
WHERE FactorDate = '2020-10-30'
GROUP BY FoodName
ORDER BY OrderCount DESC
LIMIT 1;
Note: In a query that references multiple tables, you should qualify all column references. It is not clear where the columns come from, so I cannot do that for this query.

What can I use other than Group By?

I have a question that uses this DML statement
SELECT SupplierID, COUNT(*) AS TotalProducts
FROM Products
GROUP BY SupplierID;
I'm trying to get the same results without using "Group By". I can use a table variable or a temp table, with Insert and Update if needed. Also, using While and IF-Else is allowed.
I'm really lost any help would be awesome. Thanks SO Community.
This is used in SQL Server. Thanks again.
You can always use SELECT DISTINCT with window functions:
SELECT DISTINCT SupplierID,
COUNT(*) OVER (PARTITION BY SupplierId) AS TotalProducts
FROM Products;
But GROUP BY is the right way to write an aggregation query.
You may also use the following query :
select distinct P.SupplierID, (select count(*) from Products
where SupplierID=P.SupplierID) TotalProducts from Products P
You will get the same result using the above query, but i don't think avoiding GROUP BY is a good idea!
Using a subquery:
SELECT DISTINCT SupplierID
,(SELECT COUNT(*)
FROM Products P2
WHERE P2.SupplierID = P.SupplierID
) AS TotalProducts
FROM Products P
The distinct is to remove duplicates... the count executes for every row so without distinct you would get repeat answers for supplierID.
Another way
select distinct supplierId, p2.ttl
from products p1
cross apply
(
select count(*)
from products p2
where p1.supplierId = p2.supplierId
) p2(ttl);

SQL Server 2016 Sub Query Guidance

I am currently working on an assignment for my SQL class and I am stuck. I'm not looking for full code to answer the question, just a little nudge in the right direction. If you do provide full code would you mind a small explanation as to why you did it that way (so I can actually learn something.)
Here is the question:
Write a SELECT statement that returns three columns: EmailAddress, ShipmentId, and the order total for each Client. To do this, you can group the result set by the EmailAddress and ShipmentId columns. In addition, you must calculate the order total from the columns in the ShipItems table.
Write a second SELECT statement that uses the first SELECT statement in its FROM clause. The main query should return two columns: the Client’s email address and the largest order for that Client. To do this, you can group the result set by the EmailAddress column.
I am confused on how to pull in the EmailAddress column from the Clients table, as in order to join it I have to bring in other tables that aren't being used. I am assuming there is an easier way to do this using sub Queries as that is what we are working on at the time.
Think of SQL as working with sets of data as opposed to just tables. Tables are merely a set of data. So when you view data this way you immediately see that the query below returns a set of data consisting of the entirety of another set, being a table:
SELECT * FROM MyTable1
Now, if you were to only get the first two columns from MyTable1 you would return a different set that consisted only of columns 1 and 2:
SELECT col1, col2 FROM MyTable1
Now you can treat this second set, a subset of data as a "table" as well and query it like this:
SELECT
*
FROM (
SELECT
col1,
col2
FROM
MyTable1
)
This will return all the columns from the two columns provided in the inner set.
So, your inner query, which I won't write for you since you appear to be a student, and that wouldn't be right for me to give you the entire answer, would be a query consisting of a GROUP BY clause and a SUM of the order value field. But the key thing you need to understand is this set thinking: you can just wrap the ENTIRE query inside brackets and treat it as a table the way I have done above. Hopefully this helps.
You need a subquery, like this:
select emailaddress, max(OrderTotal) as MaxOrder
from
( -- Open the subquery
select Cl.emailaddress,
Sh.ShipmentID,
sum(SI.Value) as OrderTotal -- Use the line item value column in here
from Client Cl -- First table
inner join Shipments Sh -- Join the shipments
on Sh.ClientID = Cl.ClientID
inner join ShipItem SI -- Now the items
on SI.ShipmentID = Sh.ShipmentID
group by C1.emailaddress, Sh.ShipmentID -- here's your grouping for the sum() aggregation
) -- Close subquery
group by emailaddress -- group for the max()
For the first query you can join the Clients to Shipments (on ClientId).
And Shipments to the ShipItems table (on ShipmentId).
Then group the results, and count or sum the total you need.
Using aliases for the tables is usefull, certainly when you select fields from the joined tables that have the same column name.
select
c.EmailAddress,
i.ShipmentId,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
order by i.ShipmentId, c.EmailAddress;
Using that grouped query in a subquery, you can get the Maximum total per EmailAddress.
select EmailAddress,
-- max(TotalShipItems) as MaxTotalShipItems,
max(TotalPriceDiscounted) as MaxTotalPriceDiscounted
from (
select
c.EmailAddress,
-- i.ShipmentId,
-- count(*) as TotalShipItems,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
) q
group by EmailAddress
order by EmailAddress
Note that an ORDER BY is mostly meaningless inside a subquery if you don't use TOP.

Select the countries with fewest number of tuples

At http://www.dofactory.com/sql/sandbox I'm experimenting with submitting my own SQL queries against their sample database to become better at SQL. What I want to do is to select all countries from Customer that have exactly the fewest number of tuples. Here is my query attempt:
SELECT a.Country
FROM [Customer] a, (SELECT COUNT(*) AS Tot
FROM [Customer]
GROUP BY Country) b
GROUP BY a.Country
HAVING COUNT(*) = MIN(b.Tot)
However, the website returns an empty table instead of the correct result which is (Ireland, Norway, Poland). The correct result is easily realized by grouping the table by country and using COUNT(*), and then looking at the countries that have the smallest COUNT(*) value out of all COUNT(*) values. I would like some advice on how to generate the correct result without any assumptions about the table's data.
I would do this using SELECT TOP 1 WITH TIES:
SELECT TOP 1 WITH TIES c.Country
FROM Customer c
GROUP BY c.Country
ORDER BY COUNT(*) ASC;
Two notes:
When using table aliases, make them abbreviations for the tables. This makes the query much easier to follow.
Never use commas in the FROM clause. Always use proper, explicit JOIN syntax.
Learned somtihing new(WITH TIES) from Gordon Linoff, again...
Here my solution without it...
Select a.Country from [Customer] a
group by a.Country
having count(*) = (select min(b.Tot) from (SELECT COUNT(*) AS Tot FROM [Customer] GROUP BY Country) b)
If you are not using sql 2012 then,
declare #Fewer int=2
;With CTE as
(
select c.*
,ROW_NUMBER()over(partition by countryid order by customerid)rn
from dbo.Customers C
)
select * from cte
where rn<=#Fewer

Column 'ITEMS_MASTER.QUANTITY' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

Select
ID.ITEM_MODEL,
SUM(ID.AMOUNT) as Total_Amount,
AVG(ID.RATE) as Avg_Rate,
IM.QUANTITY
From
ITEM_DETAILS ID
inner join
ITEMS_MASTER IM on ID.ITEM_MODEL = IM.ITEM_MODEL
where IM.ITEM_MODEL='keyboard'
Group by
ID.ITEM_MODEL
I wrote above query, I want to extract data from two tables ITEM_DETAILS and ITEMS_MASTERS but when I run this it is showing me this error:
Column 'ITEMS_MASTER.QUANTITY' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Can anyone suggest me the correct way of doing this.
Try this:
SELECT IM.ITEM_MODEL, IM.QUANTITY,
SUM(ID.AMOUNT) as Total_Amount,
AVG(ID.RATE) as Avg_Rate
FROM ITEMS_MASTER IM
INNER JOIN ITEM_DETAILS ID ON ID.ITEM_MODEL = IM.ITEM_MODEL
WHERE IM.ITEM_MODEL = 'keyboard'
GROUP BY IM.ITEM_MODEL, IM.QUANTITY
Generally speaking, when you have a parent table (ITEMS_MASTER) that has multiple rows from a child table (ITEM_DETAILS), you want to GROUP BY the columns in your parent table. This is slightly more logical and on some databases this performs better.
You could try adding IM.QUANTITY to the group by clause,
Group by
ID.ITEM_MODEL,
IM.QUANTITY