Using query result as subquery syntax - sql

I have a table that I need to identify duplicate entries to delete them. I can find the duplicates using the following query
select s.*, t.*
from [tableXYZ] s
join (
select [date], [product], count(*) as qty
from [tableXYZ]
group by [date], [product]
having count(*) > 1
) t on s.[date] = t.[date] and s.[product] = t.[product]
ORDER BY s.[date], s.[product], s.[id]
and then need to use the result from this table to show where [fieldZ] IS NULL
I've tried the following but get error The column 'date' was specified multiple times for 'subquery'.
select * from
(
select s.*, t.*
from [tableXYZ] s
join (
select [date], [product], count(*) as qty
from [tableXYZ]
group by [date], [product]
having count(*) > 1
) t on s.[date] = t.[date] and s.[product] = t.[product]
) as subquery
where [fieldZ] is null

You have column date in your subquery twice because you are selecting s.* and t.*, this will return s.Date and t.date. If you need both columns, alias one of the columns.
You will also run into this problem with the product column. Your subquery cannot return multiple columns with the same name. Only select the columns you need in your subquery instead of selecting all columns. This is a good practice in general and will solve this issue.

Related

Second minimum value for every customer

I am using MySQL database. So, there are two columns I am working on, CustomerId, and OrderDate. I want to find a second-order date (2nd minimum order date) for each customer.
If you are using MySQL 8+, then ROW_NUMBER can be used here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY OrderDate) rn
FROM yourTable
)
SELECT CustomerId, OrderDate
FROM cte
WHERE rn = 2;
I would recommend using dense_rank as it can give you correct result even if there is duplicate order_date as follows:
SELECT * FROM
(SELECT t.*, DENSE_RANK() OVER (PARTITION BY CustomerId ORDER BY OrderDate) dr
FROM yourTable t
) t where dr = 2;
You can use corelated sub-query as follows if your MySQL version do not support analytical functions as follows:
SELECT T.*
FROM YOURTABLE T
WHERE 1 = (
SELECT COUNT(DISTINCT ORDER_DATE)
FROM YOURTABLE TT
WHERE TT.ORDER_DATE > T.ORDER_DATE
)
I would use a subquery like this:
select o.*
from orders o
where o.order_date = (select o2.order_date
from orders o2
where o2.customer_id = o.customer_id
order by o2.order_date
limit 1 offset 1
);
The subquery is a correlated subquery that returns the second date. If you want the second date with other columns, it can be moved to the select.
With an index on (customer_id, order_date), this is likely to be the fastest solution.
This assumes that there is one row per date (or that if there are multiple rows, "second" can be the earliest date). If you want the second distinct date then use select distinct int he subquery -- however select distinct and group by would incur additional overhead.

SQL show orders where 2 values are distinct and match the first

I'm looking for a way to let me select all orders that have multiple distinct names within the same order-number, it looks like this:
order - name
111-Paul
112-Paula
113-John
113-John
113-Jessica
114-Eric
114-Eric
114-John
115-Zack
115-Zack
115-Zack
etc.
so that i would get all the orders that have 2 or more distinct names in it:
113-John
113-Jessica
114-Eric
114-John
with which I could do further queries but I'm stuck. Can anyone give me some hints on how to tackle this problem please? I've tried it with count(*) which looked like this:
select order, name, count(name) from dbo.orders
group by order, name
having count(name) > 1
which gave me all the orders which had more than 1 name in it but I don't know how to let it only show orders with the distinct names.
Here's one approach using exists:
select distinct [order], name
from orders o
where exists (
select 1
from orders o2
where o.[order] = o2.[order] and o.name != o2.name)
Fiddle Demo
I would use windows functions for this
For example:
select distinct order
from
(select
order,
row_number() over(partition by order, name order by order asc) as rn
) as t1
where rn > 1
you can do the same with count
count(*) over(partition by order,name order by order asc) as cnt
Here's a straight forward implementation in Sql Server:
select distinct *
from table1
where [order] in (
select [order]
from (select distinct * from table1) iq
group by [order]
having count(*) > 1)
It's essentially breaking down the problem into:
Finding the orders that have more than one distinct value.
Finding the pairs of distinct order - name that belong to the list previously calculated.
When you use HAVING COUNT(name) > 1, it is counting all of the rows in those groups, including duplicate rows (rows 113-John and 113-John are 2 rows for order 113). I would query distinct rows from your table, and then select from that:
SELECT [order], [name] FROM (
SELECT DISTINCT [order], [name] FROM dbo.orders
) A
GROUP BY [order], [name]
HAVING COUNT([name]) > 1
As a note, if a [name] is null, then it will not be counted with COUNT(name). If you want nulls to be counted, use COUNT(*) instead.
You can use count(distinct name) to get the number of unique names for each order:
select [order], count(distinct name)
from orders
group by [order]
To just get the order for those you can use having:
select [order]
from orders
group by [order]
having count(distinct name) > 1
To get the details for those orders you can put that in the where clause to just return the rows with order in that list:
select *
from orders
where [order] in (
select [order]
from orders
group by [order]
having count(distinct name) > 1
)
sqlfiddle
I would use RANK (or DENSE_RANK) for this as shown below.
SELECT [Order]
FROM (SELECT
[Order],
RANK() OVER(PARTITION BY [Order] ORDER BY Name) AS NameRank
FROM [StackOverflow].[dbo].[OrderAndName]) ranked
WHERE ranked.NameRank > 1
GROUP BY [Order]
The sub-query ranks (gives a seeding) to the names in an order according to their value. Names with the same value would have the same rank i.e. when an order has one name multiple times (like 115) the rank of all names would be 1.
The partition is important here as otherwise you would get the rank for all names for all orders which wouldn't give you the result you'd like.
It is then just a case of pulling out the orders that have a RANK greater than 1 and grouping (could use distinct if that's a preference).
You can then join to this table to get get the orders and names as follows;
SELECT oan.[Order], [Name]
FROM [StackOverflow].[dbo].[OrderAndName] oan
INNER JOIN (SELECT [Order]
FROM (SELECT [Order],
RANK() OVER(PARTITION BY [Order] ORDER BY Name) AS NameRank
FROM [StackOverflow].[dbo].[OrderAndName]) ranked
WHERE ranked.NameRank > 1
GROUP BY [Order]) twoOrMore ON oan.[Order] = twoOrMore.[Order]

How to get the records from inner query results with the MAX value

The results are below. I need to get the records (seller and purchaser) with the max count- grouped by purchaser (marked with yellow)
You can use window functions:
with q as (
<your query here>
)
select q.*
from (select q.*,
row_number() over (order by seller desc) as seqnum_s,
row_number() over (order by purchaser desc) as seqnum_p
from q
) q
where seqnum_s = 1 or seqnum_p = 1;
Try this:
SELECT COUNT,seller,purchaser FROM YourTable ORDER BY seller,purchaser DESC
SELECT T2.MaxCount,T2.purchaser,T1.Seller FROM <Yourtable> T1
Inner JOIN
(
Select Max(Count) as MaxCount, purchaser
FROM <Yourtable>
GROUP BY Purchaser
)T2
On T2.Purchaser=T1.Purchaser AND T2.MaxCount=T1.Count
First you select the Seller from which will give you a list of all 5 sellers. Then you write another query where you select only the Purchaser and the Max(count) grouped by Purchaser which will give you the two yellow-marked lines. Join the two queries on fields Purchaser and Max(Count) and add the columns from the joined table to your first query.
I can't think of a faster way but this works pretty fast even with rather large queries. You can further-by order the fields as needed.

Trying to find duplicate values in TWO rows and TWO columns - SQL Server

Using SQL Server, I'm not a DBA but I can write some general SQL. Been pulling my hair out for about an hour now. Searching I've found several solutions but they all fail due to how GROUP BY works.
I have a table with two columns that I'm trying to check for duplicates:
userid
orderdate
I'm looking for rows that have BOTH userid and orderdate as duplicates. I want to display these rows.
If I use group by, I can't pull any other data, such as the order ID, because it's not in the group by clause.
You could use the grouped query in a subquery:
SELECT *
FROM mytable a
WHERE EXISTS (SELECT userid, orderdate
FROM mytable b
WHERE a.userid = b.userid AND a.orderdate = b.orderdate
GROUP BY userid, orderdate
HAVING COUNT(*) > 1)
You can also use a windowed function:
; With CTE as
(Select *
, count(*) over (partition by UserID, OrderDate) as DupRows
from MyTable)
Select *
from CTE
where DupRows > 1
order by UserID, OrderDate
You can get the duplicates by using the groupby and having. Like so:
SELECT
userid,orderdate, COUNT(*)
FROM
yourTable
GROUP BY
userid,orderdate
HAVING
COUNT(*) > 1
EDIT:
SELECT * FROM yourTable
WHERE CONCAT(userid,orderdate) IN
(
SELECT
CONCAT(userid,orderdate)
FROM
yourTable
GROUP BY
userid,orderdate
HAVING
COUNT(*) > 1
)
SELECT *
FROM myTable
WHERE CAST(userid as Varchar) + '/' + CONVERT(varchar(10),orderdate,103) In
(
SELECT
CAST(userid as Varchar) + '/' + CONVERT(varchar(10),orderdate,103)
FROM myTable
GROUP BY userid , orderdate
HAVING COUNT(*) > 1
);

Is there SQL Query using SQL Server 2016 to display only record with no duplicates?

I have
How to query so that the results will only display record having no duplicate without hard coding? See results
If you want empcodes with no duplicates, then one simple way uses aggregation:
select empcode, min(leavecode) as leavecode
from t
group by empcode
having count(*) = 1;
This works, because if there is only one row for an empcode, then min(leavecode) is the leavecode.
An alternative method uses window functions:
select empcode, leavecode
from (select t.*, count(*) over (partition by empcode) as cnt
from t
) t
where cnt = 1;
Or, if leavecodes are unique when there are duplicates, perhaps the most efficient way:
select t.*
from t
where not exists (select 1
from t t2
where t2.emp_code = t.empcode and t2.leavecode <> t.leavecode
);
You just select those records that have the empcode in a table of empcodes that only have one occurrence.
SELECT
empcode,
leavecode
FROM mytable
WHERE empcode in (
SELECT empcode FROM mytable GROUP BY empcode HAVING count(1)=1
)