Find the count of IDs that have the same value - sql

I'd like to get a count of all of the Ids that have have the same value (Drops) as other Ids. For instance, the illustration below shows you that ID 1 and 3 have A drops so the query would count them. Similarly, ID 7 & 18 have B drops so that's another two IDs that the query would count totalling in 4 Ids that share the same values so that's what my query would return.
+------+-------+
| ID | Drops |
+------+-------+
| 1 | A |
| 2 | C |
| 3 | A |
| 7 | B |
| 18 | B |
+------+-------+
I've tried the several approaches but the following query was my last attempt.
With cte1 (Id1, D1) as
(
select Id, Drops
from Posts
),
cte2 (Id2, D2) as
(
select Id, Drops
from Posts
)
Select count(distinct c1.Id1) newcnt, c1.D1
from cte1 c1
left outer join cte2 c2 on c1.D1 = c2.D2
group by c1.D1
The result if written out in full would be a single value output but the records that the query should be choosing should look as follows:
+------+-------+
| ID | Drops |
+------+-------+
| 1 | A |
| 3 | A |
| 7 | B |
| 18 | B |
+------+-------+
Any advice would be great. Thanks

You can use a CTE to generate a list of Drops values that have more than one corresponding ID value, and then JOIN that to Posts to find all rows which have a Drops value that has more than one Post:
WITH CTE AS (
SELECT Drops
FROM Posts
GROUP BY Drops
HAVING COUNT(*) > 1
)
SELECT P.*
FROM Posts P
JOIN CTE ON P.Drops = CTE.Drops
Output:
ID Drops
1 A
3 A
7 B
18 B
If desired you can then count those posts in total (or grouped by Drops value):
WITH CTE AS (
SELECT Drops
FROM Posts
GROUP BY Drops
HAVING COUNT(*) > 1
)
SELECT COUNT(*) AS newcnt
FROM Posts P
JOIN CTE ON P.Drops = CTE.Drops
Output
newcnt
4
Demo on SQLFiddle

You may use dense_rank() to resolve your problem. if drops has the same ID then dense_rank() will provide the same rank.
Here is the demo.
with cte as
(
select
drops,
count(distinct rnk) as newCnt
from
( select
*,
dense_rank() over (partition by drops order by id) as rnk
from myTable
) t
group by
drops
having count(distinct rnk) > 1
)
select
sum(newCnt) as newCnt
from cte
Output:
|newcnt |
|------ |
| 4 |

First group the count of the ids for your drops and then sum the values greater than 1.
select sum(countdrops) as total from
(select drops , count(id) as countdrops from yourtable group by drops) as temp
where countdrops > 1;

Related

Select first rows where condition [duplicate]

Here's what I'm trying to do. Let's say I have this table t:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
2 | 18 | 2012-05-19 | y
3 | 18 | 2012-08-09 | z
4 | 19 | 2009-06-01 | a
5 | 19 | 2011-04-03 | b
6 | 19 | 2011-10-25 | c
7 | 19 | 2012-08-09 | d
For each id, I want to select the row containing the minimum record_date. So I'd get:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
4 | 19 | 2009-06-01 | a
The only solutions I've seen to this problem assume that all record_date entries are distinct, but that is not this case in my data. Using a subquery and an inner join with two conditions would give me duplicate rows for some ids, which I don't want:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
5 | 19 | 2011-04-03 | b
4 | 19 | 2009-06-01 | a
How about something like:
SELECT mt.*
FROM MyTable mt INNER JOIN
(
SELECT id, MIN(record_date) AS MinDate
FROM MyTable
GROUP BY id
) t ON mt.id = t.id AND mt.record_date = t.MinDate
This gets the minimum date per ID, and then gets the values based on those values. The only time you would have duplicates is if there are duplicate minimum record_dates for the same ID.
I could get to your expected result just by doing this in mysql:
SELECT id, min(record_date), other_cols
FROM mytable
GROUP BY id
Does this work for you?
To get the cheapest product in each category, you use the MIN() function in a correlated subquery as follows:
SELECT categoryid,
productid,
productName,
unitprice
FROM products a WHERE unitprice = (
SELECT MIN(unitprice)
FROM products b
WHERE b.categoryid = a.categoryid)
The outer query scans all rows in the products table and returns the products that have unit prices match with the lowest price in each category returned by the correlated subquery.
I would like to add to some of the other answers here, if you don't need the first item but say the second number for example you can use rownumber in a subquery and base your result set off of that.
SELECT * FROM
(
SELECT
ROW_NUM() OVER (PARTITION BY Id ORDER BY record_date, other_cols) as rownum,
*
FROM products P
) INNER
WHERE rownum = 2
This also allows you to order off multiple columns in the subquery which may help if two record_dates have identical values. You can also partition off of multiple columns if needed by delimiting them with a comma
This does it simply:
select t2.id,t2.record_date,t2.other_cols
from (select ROW_NUMBER() over(partition by id order by record_date)as rownum,id,record_date,other_cols from MyTable)t2
where t2.rownum = 1
If record_date has no duplicates within a group:
think of it as of filtering. Simpliy get (WHERE) one (MIN(record_date)) row from the current group:
SELECT * FROM t t1 WHERE record_date = (
select MIN(record_date)
from t t2 where t2.group_id = t1.group_id)
If there could be 2+ min record_date within a group:
filter out non-min rows (see above)
then (AND) pick only one from the 2+ min record_date rows, within the given group_id. E.g. pick the one with the min unique key:
AND key_id = (select MIN(key_id)
from t t3 where t3.record_date = t1.record_date
and t3.group_id = t1.group_id)
so
key_id | group_id | record_date | other_cols
1 | 18 | 2011-04-03 | x
4 | 19 | 2009-06-01 | a
8 | 19 | 2009-06-01 | e
will select key_ids: #1 and #4
SELECT p.* FROM tbl p
INNER JOIN(
SELECT t.id, MIN(record_date) AS MinDate
FROM tbl t
GROUP BY t.id
) t ON p.id = t.id AND p.record_date = t.MinDate
GROUP BY p.id
This code eliminates duplicate record_date in case there are same ids with same record_date.
If you want duplicates, remove the last line GROUP BY p.id.
This a old question, but this can useful for someone
In my case i can't using a sub query because i have a big query and i need using min() on my result, if i use sub query the db need reexecute my big query. i'm using Mysql
select t.*
from (select m.*, #g := 0
from MyTable m --here i have a big query
order by id, record_date) t
where (1 = case when #g = 0 or #g <> id then 1 else 0 end )
and (#g := id) IS NOT NULL
Basically I ordered the result and then put a variable in order to get only the first record in each group.
The below query takes the first date for each work order (in a table of showing all status changes):
SELECT
WORKORDERNUM,
MIN(DATE)
FROM
WORKORDERS
WHERE
DATE >= to_date('2015-01-01','YYYY-MM-DD')
GROUP BY
WORKORDERNUM
select
department,
min_salary,
(select s1.last_name from staff s1 where s1.salary=s3.min_salary ) lastname
from
(select department, min (salary) min_salary from staff s2 group by s2.department) s3

How can i find all linked rows by array values in postgres?

i have a table like this:
id | arr_val | grp
-----------------
1 | {10,20} | -
2 | {20,30} | -
3 | {50,5} | -
4 | {30,60} | -
5 | {1,5} | -
6 | {7,6} | -
I want to find out which rows are in a group together.
In this example 1,2,4 are one group because 1 and 2 have a common element and 2 and 4. 3 and 5 form a group because they have a common element. 6 Has no common elments with anybody else. So it forms a group for itself.
The result should look like this:
id | arr_val | grp
-----------------
1 | {10,20} | 1
2 | {20,30} | 1
3 | {50,5} | 2
4 | {30,60} | 1
5 | {1,5} | 2
6 | {7,6} | 3
I think i need recursive cte because my problem is graphlike but i am not sure how to that.
Additional info and background:
The Table has ~2500000 rows.
In reality the problem i try to solve has more fields and conditions for finding a group:
id | arr_val | date | val | grp
---------------------------------
1 | {10,20} | -
2 | {20,30} | -
Not only do the element of a group need to be linked by common elements in arr_val. They all need to have the same value in val and need to be linked by a timespan in date (gaps and islands). I solved the other two but now the condition of my question was added. If there is an easy way to do all three together in one query that would be awesome but it is not necessary.
----Edit-----
While both answer work for the example of five rows they do not work for a table with a lot more rows. Both answers have the problem that the number of rows in the recursive part explodes and only reduce them at the end.
A solutiuon should work for data like this too:
id | arr_val | grp
-----------------
1 | {1} | -
2 | {1} | -
3 | {1} | -
4 | {1} | -
5 | {1} | -
6 | {1} | -
7 | {1} | -
8 | {1} | -
9 | {1} | -
10 | {1} | -
11 | {1} | -
more rows........
Is there a solution to that problem?
You can handle this as a recursive CTE. Define the edges between the ids based on common values. Then traverse the edges and aggregate:
with recursive nodes as (
select id, val
from t cross join
unnest(arr_val) as val
),
edges as (
select distinct n1.id as id1, n2.id as id2
from nodes n1 join
nodes n2
on n1.val = n2.val
),
cte as (
select id1, id2, array[id1] as visited, 1 as lev
from edges
where id1 = id2
union all
select cte.id1, e.id2, visited || e.id2,
lev + 1
from cte join
edges e
on cte.id2 = e.id1
where e.id2 <> all(cte.visited)
),
vals as (
select id1, array_agg(distinct id2 order by id2) as id2s
from cte
group by id1
)
select *, dense_rank() over (order by id2s) as grp
from vals;
Here is a db<>fiddle.
Here is an approach at this graph-walking problem:
with recursive cte as (
select id, arr_val, array[id] path from mytable
union all
select t.id, t.arr_val, c.path || t.id
from cte c
inner join mytable t on t.arr_val && c.arr_val and not t.id = any(c.path)
)
select c.id, c.arr_val, dense_rank() over(order by min(x.id)) grp
from cte c
cross join lateral unnest(c.path) as x(id)
group by c.id, c.arr_val
order by c.id
The common-table-expression walks the graph, recursively looking for "adjacent" nodes to the current node, while keeping track of already-visited nodes. Then the outer query aggregates, identifies groups using the least node per path, and finally ranks the groups.
Demo on DB Fiddle:
id | arr_val | grp
-: | :------ | --:
1 | {10,20} | 1
2 | {20,30} | 1
3 | {50,5} | 2
4 | {30,60} | 1
5 | {1,5} | 2
6 | {7,6} | 3
While Gordon Linoffs solution is the fastest i found for a small amount of data where the groups are not so big it will not work for bigger datasets and bigger groups.
I changed his solution to make it work.
I moved the edges to an indexed table:
create table edges
(
id1 integer not null,
id2 integer not null,
constraint staffel_group_nodes_pk
primary key (id1, id2)
);
insert into edges(id1, id2) with
nodes(id, arr_val) as (
select id, arr_val
from my_table
)
select n1.id as id1, n2.id as id2
from nodes n1
join
nodes n2
on n1.arr_val && n2.arr_val ;
Tha alone didnt help. I changed his recursive part too:
with recursive
cte as (
select id1, array [id1] as visited
from edges
where id1 = id2
union all
select unnested.id1, array_agg(distinct unnested.vis) as visited
from (
select cte.id1,
unnest(cte.visited || e.id2) as vis
from cte
join
staffel_group_edges e
on e.id1 = any (cte.visited)
and e.id2 <> all (cte.visited)) as unnested
group by unnested.id1
),
vals as (
select id1, array_agg(distinct vis) as id2s
from (
select cte.id1,
unnest(cte.visited) as vis
from cte) as unnested
group by unnested.id1
)
select id1,id2s, dense_rank() over (order by id2s) as grp
from vals;
Every step i group all searches by their starting point. This reduces the amount of parallel walked paths a lot and works surprisingly fast.

Get row which matched in each group

I am trying to make a sql query. I got some results from 2 tables below. Below results are good for me. Now I want those values which is present in each group. for example, A and B is present in each group(in each ID). so i want only A and B in result. and also i want make my query dynamic. Could anyone help?
| ID | Value |
|----|-------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 2 | A |
| 2 | B |
| 2 | C |
| 3 | A |
| 3 | B |
In the following query, I have placed your current query into a CTE for further use. We can try selecting those values for which every ID in your current result appears. This would imply that such values are associated with every ID.
WITH cte AS (
-- your current query
)
SELECT Value
FROM cte
GROUP BY Value
HAVING COUNT(DISTINCT ID) = (SELECT COUNT(DISTINCT ID) FROM cte);
Demo
The solution is simple - you can do this in two ways at least. Group by letters (Value), aggregate IDs with SUM or COUNT (distinct values in ID). Having that, choose those letters that have the value for SUM(ID) or COUNT(ID).
select Value from MyTable group by Value
having SUM(ID) = (SELECT SUM(DISTINCT ID) from MyTable)
select Value from MyTable group by Value
having COUNT(ID) = (SELECT COUNT(DISTINCT ID) from MyTable)
Use This
WITH CTE
AS
(
SELECT
Value,
Cnt = COUNT(DISTINCT ID)
FROM T1
GROUP BY Value
)
SELECT
Value
FROM CTE
WHERE Cnt = (SELECT COUNT(DISTINCT ID) FROM T1)

Group by minimum value in one field while selecting distinct rows

Here's what I'm trying to do. Let's say I have this table t:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
2 | 18 | 2012-05-19 | y
3 | 18 | 2012-08-09 | z
4 | 19 | 2009-06-01 | a
5 | 19 | 2011-04-03 | b
6 | 19 | 2011-10-25 | c
7 | 19 | 2012-08-09 | d
For each id, I want to select the row containing the minimum record_date. So I'd get:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
4 | 19 | 2009-06-01 | a
The only solutions I've seen to this problem assume that all record_date entries are distinct, but that is not this case in my data. Using a subquery and an inner join with two conditions would give me duplicate rows for some ids, which I don't want:
key_id | id | record_date | other_cols
1 | 18 | 2011-04-03 | x
5 | 19 | 2011-04-03 | b
4 | 19 | 2009-06-01 | a
How about something like:
SELECT mt.*
FROM MyTable mt INNER JOIN
(
SELECT id, MIN(record_date) AS MinDate
FROM MyTable
GROUP BY id
) t ON mt.id = t.id AND mt.record_date = t.MinDate
This gets the minimum date per ID, and then gets the values based on those values. The only time you would have duplicates is if there are duplicate minimum record_dates for the same ID.
I could get to your expected result just by doing this in mysql:
SELECT id, min(record_date), other_cols
FROM mytable
GROUP BY id
Does this work for you?
To get the cheapest product in each category, you use the MIN() function in a correlated subquery as follows:
SELECT categoryid,
productid,
productName,
unitprice
FROM products a WHERE unitprice = (
SELECT MIN(unitprice)
FROM products b
WHERE b.categoryid = a.categoryid)
The outer query scans all rows in the products table and returns the products that have unit prices match with the lowest price in each category returned by the correlated subquery.
I would like to add to some of the other answers here, if you don't need the first item but say the second number for example you can use rownumber in a subquery and base your result set off of that.
SELECT * FROM
(
SELECT
ROW_NUM() OVER (PARTITION BY Id ORDER BY record_date, other_cols) as rownum,
*
FROM products P
) INNER
WHERE rownum = 2
This also allows you to order off multiple columns in the subquery which may help if two record_dates have identical values. You can also partition off of multiple columns if needed by delimiting them with a comma
This does it simply:
select t2.id,t2.record_date,t2.other_cols
from (select ROW_NUMBER() over(partition by id order by record_date)as rownum,id,record_date,other_cols from MyTable)t2
where t2.rownum = 1
If record_date has no duplicates within a group:
think of it as of filtering. Simpliy get (WHERE) one (MIN(record_date)) row from the current group:
SELECT * FROM t t1 WHERE record_date = (
select MIN(record_date)
from t t2 where t2.group_id = t1.group_id)
If there could be 2+ min record_date within a group:
filter out non-min rows (see above)
then (AND) pick only one from the 2+ min record_date rows, within the given group_id. E.g. pick the one with the min unique key:
AND key_id = (select MIN(key_id)
from t t3 where t3.record_date = t1.record_date
and t3.group_id = t1.group_id)
so
key_id | group_id | record_date | other_cols
1 | 18 | 2011-04-03 | x
4 | 19 | 2009-06-01 | a
8 | 19 | 2009-06-01 | e
will select key_ids: #1 and #4
SELECT p.* FROM tbl p
INNER JOIN(
SELECT t.id, MIN(record_date) AS MinDate
FROM tbl t
GROUP BY t.id
) t ON p.id = t.id AND p.record_date = t.MinDate
GROUP BY p.id
This code eliminates duplicate record_date in case there are same ids with same record_date.
If you want duplicates, remove the last line GROUP BY p.id.
This a old question, but this can useful for someone
In my case i can't using a sub query because i have a big query and i need using min() on my result, if i use sub query the db need reexecute my big query. i'm using Mysql
select t.*
from (select m.*, #g := 0
from MyTable m --here i have a big query
order by id, record_date) t
where (1 = case when #g = 0 or #g <> id then 1 else 0 end )
and (#g := id) IS NOT NULL
Basically I ordered the result and then put a variable in order to get only the first record in each group.
The below query takes the first date for each work order (in a table of showing all status changes):
SELECT
WORKORDERNUM,
MIN(DATE)
FROM
WORKORDERS
WHERE
DATE >= to_date('2015-01-01','YYYY-MM-DD')
GROUP BY
WORKORDERNUM
select
department,
min_salary,
(select s1.last_name from staff s1 where s1.salary=s3.min_salary ) lastname
from
(select department, min (salary) min_salary from staff s2 group by s2.department) s3

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.