Finding all duplicate rows with the given conditions

Finding all duplicate rows with the given conditions - sql

I have a table called Fruit which has two columns (id,cost) and I want to select id and cost, and find all duplicate rows where the cost is the same but the id is different. How can I write this query?
I write this query
SELECT id,cost
From Fruit a
WHERE (SELECT COUNT(*)
FROM Fruit b
WHERE a.cost = b.cost
) > 1
This works but only give me the rows where the cost is the same but the id might be same as well, I only want results where the cost is the same but id is different

This is what you need:
SELECT DISTINCT F1.*
FROM Fruit F1
INNER JOIN Fruit F2 ON F1.id <> F2.id AND F1.cost = F2.cost
If you want repeated id-cost pairs to be listed too, just remove the DISTINCT.

You can add a simple condition where id are not equal.
SELECT id,cost From Fruit a WHERE (SELECT COUNT(*) FROM Fruit b WHERE a.cost = b.cost and a.id <> b.id ) > 1
Here <> is operator for not equal.
I hope it will help you :)

you could select all the rows with cost duplicated using a group by and having ccount(*) >1
and for this get all the row that match
select a.id, a.cost
from Fruit a
where cost in ( select b.cost
from fruit b
group by b.cost
having count(*) > 1
)
for avoid duplicated result you can add distinct
select distinct a.id, a.cost
from Fruit a
where cost in ( select b.cost
from fruit b
group by b.cost
having count(*) > 1
)

You can add a simple condition where id are not equal.
SELECT id,cost From Fruit a WHERE (SELECT COUNT(*) FROM Fruit b WHERE a.cost = b.cost and a.id <> b.id ) > 1
Here <> is the operator Of not equal in sql.

This works. Ran it in SQL Server because Oracle is broken on Fiddle, but should work on either system.
MS SQL Server 2014 Schema Setup:
CREATE TABLE ab
([id] int, [cost] int)
;
INSERT INTO ab
([id], [cost])
VALUES
(1, 5),
(2, 5),
(3, 15),
(3, 15),
(4, 24),
(5, 68),
(6, 13),
(7, 3)
;
Query 1:
with a1 as (
SELECT id
,cost
,rank () over (partition by cost order by id) dup
From ab
)
select * from a1 where dup > 1
Results:
| id | cost | dup |
|----|------|-----|
| 2 | 5 | 2 |
And then to return all values where there was a duplicate cost:
with a1 as (
SELECT id
,cost
,rank () over (partition by cost order by id) dup
From ab
)
,a2 as ( select * from a1 where dup > 1)
select * from ab
join a2 on ab.cost = a2.cost

Related

ratio from different condition in Group By clause SQL

I have a table t with three columns, a, b, c. I want to calculate the number of a where b =1 over the number of a where b = 2 for every category in c. some Pseudo code is like: (mysql)
select count(distinct a) where b = 1 / count(distinct a) where b = 2
from t
group by c
but this won't work in SQL, since the condition 'where' cannot add for every category in c in the clause group by c.

You don't mention which database you are using, so I'll assume it implements FULL OUTER JOIN.
Also, you don't say what to do in case a division by zero could happen. Anyway, this query will get you the separate sums, so you can compute the division as needed:
select
coalesce(x.c, y.c) as c
coalesce(x.e, 0) as b1
coalesce(y.f, 0) as b2
case when y.f is null or y.f = 0 then -1 else x.e / y.f end
from (
select c, count(distinct a) as e
from t
where b = 1
group by c
) x
full join (
select c, count(distinct a) as f
from t
where b = 2
group by c
) y on x.c = y.c

You can do this in SQL Server, PostgreSQL, MySQL:
create table test (a int, b int, c varchar(10));
insert into test values
(1, 1, 'food'), (2, 1, 'food'), (3, 1, 'food'),
(1, 2, 'food'), (2, 2, 'food'),
(1, 1, 'drinks'), (2, 1, 'drinks'), (2, 1, 'drinks'),
(1, 2, 'drinks')
;
select cat.c, cast(sum(b1_count) as decimal)/sum(b2_count), sum(b1_count), sum(b2_count) from
(select distinct c from test) as cat
left join
(select c, count(distinct a) b1_count from test where b = 1 group by c) b1 on cat.c = b1.c
left join
(select c, count(distinct a) b2_count from test where b = 2 group by c) b2
on cat.c = b2.c
group by cat.c;
Result
c | (No column name) | (No column name) | (No column name)
:----- | ---------------: | ---------------: | ---------------:
drinks | 2.00000000000 | 2 | 1
food | 1.50000000000 | 3 | 2
Examples:
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=2003e0baa46bfbb197152b829ea57d2d
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=2003e0baa46bfbb197152b829ea57d2d
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=2003e0baa46bfbb197152b829ea57d2d

You can use conditional aggregation:
select count(distinct case when b = 1 then a end) / count(distinct case when b = 2 then a end)
from t
group by c;
You don't mention your database, but some do integer division -- which can result in unexpected truncation. You might want a * 1.0 / instead of / to for non-integer division.

Perform ranking depend on category

I Have a table looks like this:
RowNum category Rank4A Rank4B
-------------------------------------------
1 A
2 A
3 B
5 A
6 B
9 B
My requirement is based on the RowNum order, Make two new ranking columns depend on category. Rank4A works like the DENSERANK() by category = A, but if the row is for category B, it derives the latest appeared rank for category A order by RowNum. Rank4B have similar logic, but it orders by RowNum in DESC order. So the result would like this (W means this cell I don't care its value):
RowNum category Rank4A Rank4B
-------------------------------------------
1 A 1 W
2 A 2 W
3 B 2 3
5 A 3 2
6 B W 2
9 B W 1
One more additional requirement is that CROSS APPLY or CURSOR is not allowed due to dataset being large. Any neat solutions?
Edit: Also no CTE (due to MAX 32767 limit)

You can use the following query:
SELECT RowNum, category,
SUM(CASE
WHEN category = 'A' THEN 1
ELSE 0
END) OVER (ORDER BY RowNum) AS Rank4A,
SUM(CASE
WHEN category = 'B' THEN 1
ELSE 0
END) OVER (ORDER BY RowNum DESC) AS Rank4B
FROM mytable
ORDER BY RowNum

Giorgos Betsos' answer is better, please read it first.
Try this out. I believe each CTE is clear enough to show the steps.
IF OBJECT_ID('tempdb..#Data') IS NOT NULL
DROP TABLE #Data
CREATE TABLE #Data (
RowNum INT,
Category CHAR(1))
INSERT INTO #Data (
RowNum,
Category)
VALUES
(1, 'A'),
(2, 'A'),
(3, 'B'),
(5, 'A'),
(6, 'B'),
(9, 'B')
;WITH AscendentDenseRanking AS
(
SELECT
D.RowNum,
D.Category,
AscendentDenseRanking = DENSE_RANK() OVER (ORDER BY D.Rownum ASC)
FROM
#Data AS D
WHERE
D.Category = 'A'
),
LaggedRankingA AS
(
SELECT
D.RowNum,
AscendentDenseRankingA = MAX(A.AscendentDenseRanking)
FROM
#Data AS D
INNER JOIN AscendentDenseRanking AS A ON D.RowNum > A.RowNum
WHERE
D.Category = 'B'
GROUP BY
D.RowNum
),
DescendantDenseRanking AS
(
SELECT
D.RowNum,
D.Category,
DescendantDenseRanking = DENSE_RANK() OVER (ORDER BY D.Rownum DESC)
FROM
#Data AS D
WHERE
D.Category = 'B'
),
LaggedRankingB AS
(
SELECT
D.RowNum,
AscendentDenseRankingB = MAX(A.DescendantDenseRanking)
FROM
#Data AS D
INNER JOIN DescendantDenseRanking AS A ON D.RowNum < A.RowNum
WHERE
D.Category = 'A'
GROUP BY
D.RowNum
)
SELECT
D.RowNum,
D.Category,
Rank4A = ISNULL(RA.AscendentDenseRanking, LA.AscendentDenseRankingA),
Rank4B = ISNULL(RB.DescendantDenseRanking, LB.AscendentDenseRankingB)
FROM
#Data AS D
LEFT JOIN AscendentDenseRanking AS RA ON D.RowNum = RA.RowNum
LEFT JOIN LaggedRankingA AS LA ON D.RowNum = LA.RowNum
LEFT JOIN DescendantDenseRanking AS RB ON D.RowNum = RB.RowNum
LEFT JOIN LaggedRankingB AS LB ON D.RowNum = LB.RowNum
/*
Results:
RowNum Category Rank4A Rank4B
----------- -------- -------------------- --------------------
1 A 1 3
2 A 2 3
3 B 2 3
5 A 3 2
6 B 3 2
9 B 3 1
*/
This isn't a recursive CTE, so the limit 32k doesn't apply.

To Add multiple values from different columns from two tables

Table A1 Table B1
A --100 id =1 A --100 id=1
B -- 100 id =2 A -100 id=1
C -- 200 id=3 A - 100 id =1
Need to sum all values from two tables where id =1.
select (SUM(A1.A) + SUM(nvl(B1.A,0))) SUM from A1 a, B1 b where a.id='1' AND b.id='1';
I am getting sum as 600 but it should be 400

Something like this should work.
SELECT id, SUM(A) as Totals FROM
(
SELECT id, A FROM A1
UNION ALL
SELECT id, A FROM B1
) AS tblData
GROUP BY id

You can use:
SELECT COALESCE((SELECT SUM(A) FROM A1 WHERE ID = 1), 0) +
COALESCE((SELECT SUM(A) FROM B1 WHERE ID = 1), 0)

you can do by this way
select sum(A) as total
from
(
select A
from A1 where id = 1
union all
select A
from B1 where id = 1
) t

Here is a method that doesn't use subqueries (although I am not recommending it):
select coalesce(sum(a.a), 0) + coalesce(sum(b.b), 0)
from a full outer join
b
on 1 = 0
where a.id = 1 or b.id = 1;

You used CROSS JOIN ( Cartesian product ) so you get 600..
You can try this to use UNION combine two table.
SELECT SUM(T.Totle)
FROM
(
select SUM(A1.A) Totle from A1 a where a.id='1'
UNION ALL
select SUM(B1.A) Totle from B1 b where b.id='1'
) T
SQLFiddle

How to find max value from each group and display their information when using "group by"

For example, i create a table about people contribue to 2 campaigns
+-------------------------------------+
| ID Name Campaign Amount (USD) |
+-------------------------------------+
| 1 A 1 10 |
| 2 B 1 5 |
| 3 C 2 7 |
| 4 D 2 9 |
+-------------------------------------+
Task: For each campaign, find the person (Name, ID) who contribute the most to
Expected result is
+-----------------------------------------+
| Campaign Name ID |
+-----------------------------------------+
| 1 A 1 |
| 2 D 4 |
+-----------------------------------------+
I used "group by Campaign" but the result have 2 columns "Campagin" and "max value" when I need "Name" and "ID"
Thanks for your help.
Edited: I fix some values, really sorry

You can use analytic functions for this:
select name, id, amount
from (select t.*, max(amount) over (partition by campaign) as max_amount
from t
) t
where amount = max_amount;

You can also do it by giving a rank/row_number partiton by campaign and order by descending order of amount.
Query
;with cte as(
select [num] = dense_rank() over(
partition by [Campaign]
order by [Amount] desc
), *
from [your_table_name]
)
select [Campaign], [Name], [ID]
from cte
where [num] = 1;

Try the next query:-
SELECT Campaign , Name , ID
FROM (
SELECT Campaign , Name , ID , MAX (Amount)
FROM MyTable
GROUP BY Campaign , Name , ID
) temp;

Simply use Where Clause with the max of amount group by Campaign:-
As following generic code:-
select a, b , c
from tablename
where d in
(
select max(d)
from tablename
group by a
)
Demo:-
Create table #MyTable (ID int , Name char(1), Campaign int , Amount int)
go
insert into #MyTable values (1,'A',1,10)
insert into #MyTable values (2,'B',1,5)
insert into #MyTable values (3,'C',2,7)
insert into #MyTable values (4,'D',2,9)
go
select Campaign, Name , ID
from #MyTable
where Amount in
(
select max(Amount)
from #MyTable
group by Campaign
)
drop table #MyTable
Result:-

Please find the below code for the same
SELECT *
FROM #MyTable T
OUTER APPLY (
SELECT COUNT(1) record
FROM #MyTable T1
where t.Campaign = t1.Campaign
and t.amount < t1.amount
)E
where E.record = 0

T-SQL using SUM for a running total

I have a simple table with some dummy data setup like:
|id|user|value|
---------------
1 John 2
2 Ted 1
3 John 4
4 Ted 2
I can select a running total by executing the following sql(MSSQL 2008) statement:
SELECT a.id, a.user, a.value, SUM(b.value) AS total
FROM table a INNER JOIN table b
ON a.id >= b.id
AND a.user = b.user
GROUP BY a.id, a.user, a.value
ORDER BY a.id
This will give me results like:
|id|user|value|total|
---------------------
1 John 2 2
3 John 4 6
2 Ted 1 1
4 Ted 2 3
Now is it possible to only retrieve the most recent rows for each user? So the result would be:
|id|user|value|total|
---------------------
3 John 4 6
4 Ted 2 3
Am I going about this the right way? any suggestions or a new path to follow would be great!

No join is needed, you can speed up the query this way:
select id, [user], value, total
from
(
select id, [user], value,
row_number() over (partition by [user] order by id desc) rn,
sum(value) over (partition by [user]) total
from users
) a
where rn = 1

try this:
;with cte as
(SELECT a.id, a.[user], a.value, SUM(b.value) AS total
FROM users a INNER JOIN users b
ON a.id >= b.id
AND a.[user] = b.[user]
GROUP BY a.id, a.[user], a.value
),
cte1 as (select *,ROW_NUMBER() over (partition by [user]
order by total desc) as row_num
from cte)
select id,[user],value,total from cte1 where row_num=1
SQL Fiddle Demo

add where statement:
select * from
(
your select statement
) t
where t.id in (select max(id) from table group by user)
also you can use this query:
SELECT a.id, a.user, a.value,
(select max(b.value) from table b where b.user=a.user) AS total
FROM table a
where a.id in (select max(id) from table group by user)
ORDER BY a.id

Adding a right join would perform better than nested select.
Or even simpler:
SELECT MAX(id), [user], MAX(value), SUM(value)
FROM table
GROUP BY [user]

Compatible with SQL Server 2008 or later
DECLARE #AnotherTbl TABLE
(
id INT
, somedate DATE
, somevalue DECIMAL(18, 4)
, runningtotal DECIMAL(18, 4)
)
INSERT INTO #AnotherTbl
(
id
, somedate
, somevalue
, runningtotal
)
SELECT LEDGER_ID
, LL.LEDGER_DocDate
, LL.LEDGER_Amount
, NULL
FROM ACC_Ledger LL
ORDER BY LL.LEDGER_DocDate
DECLARE #RunningTotal DECIMAL(18, 4)
SET #RunningTotal = 0
UPDATE #AnotherTbl
SET #RunningTotal=runningtotal = #RunningTotal + somevalue
FROM #AnotherTbl
SELECT *
FROM #AnotherTbl

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Finding all duplicate rows with the given conditions - sql

This is what you need: SELECT DISTINCT F1.* FROM Fruit F1 INNER JOIN Fruit F2 ON F1.id <> F2.id AND F1.cost = F2.cost If you want repeated id-cost pairs to be listed too, just remove the DISTINCT.

You can add a simple condition where id are not equal. SELECT id,cost From Fruit a WHERE (SELECT COUNT(*) FROM Fruit b WHERE a.cost = b.cost and a.id <> b.id ) > 1 Here <> is operator for not equal. I hope it will help you :)

You can add a simple condition where id are not equal. SELECT id,cost From Fruit a WHERE (SELECT COUNT(*) FROM Fruit b WHERE a.cost = b.cost and a.id <> b.id ) > 1 Here <> is the operator Of not equal in sql.

Related

ratio from different condition in Group By clause SQL

Perform ranking depend on category

To Add multiple values from different columns from two tables

How to find max value from each group and display their information when using "group by"

T-SQL using SUM for a running total

Categories

Resources