Selecting the first row of group with additional group by columns

Selecting the first row of group with additional group by columns - sql

Say I have a table with the following results:
How is it possible for me to select such that I only want distinct parent_ids with the min result of object0_behaviour?
Expected output:
parent_id | id | object0_behaviour | type
------------------------------------------
1 | 1 | 5 | IP
2 | 3 | 5 | IP
3 | 5 | 7 | ID
4 | 6 | 7 | ID
5 | 8 | 5 | IP
6 | 18 | 7 | ID
7 | 10 | 7 | ID
8 | 9 | 5 | IP
I have tried:
SELECT parent_id, min(object0_behaviour) FROM table GROUP BY parent_id
It works, however if I wanted the other 2 additional columns, I am required to add into GROUP BY clause and things go back to square one.
I saw examples with R : Select the first row by group
Similar output from what I need, but I can't seem to convert it into SQL

You can try using row_number() window function
select * from
(
select *, row_number() over(partition by parent_id order by object0_behaviour) as rn
from tablename
)A where rn=1

select * from table
join (
SELECT parent_id, min(object0_behaviour) object0_behaviour
FROM table GROUP BY parent_id
) grouped
on grouped.parent_id = table.parent_id
and grouped.object0_behaviour = table.object0_behaviour

Related

Get some values from the table by selecting

I have a table:
| id | Number |Address
| -----| ------------|-----------
| 1 | 0 | NULL
| 1 | 1 | NULL
| 1 | 2 | 50
| 1 | 3 | NULL
| 2 | 0 | 10
| 3 | 1 | 30
| 3 | 2 | 20
| 3 | 3 | 20
| 4 | 0 | 75
| 4 | 1 | 22
| 4 | 2 | 30
| 5 | 0 | NULL
I need to get: the NUMBER of the last ADDRESS change for each ID.
I wrote this select:
select dh.id, dh.number from table dh where dh =
(select max(min(t.history)) from table t where t.id = dh.id group by t.address)
But this select not correctly handling the case when the address first changed, and then changed to the previous value. For example id=1: group by return:
| Number |
| -------- |
| NULL |
| 50 |
I have been thinking about this select for several days, and I will be happy to receive any help.

You can do this using row_number() -- twice:
select t.id, min(number)
from (select t.*,
row_number() over (partition by id order by number desc) as seqnum1,
row_number() over (partition by id, address order by number desc) as seqnum2
from t
) t
where seqnum1 = seqnum2
group by id;
What this does is enumerate the rows by number in descending order:
Once per id.
Once per id and address.
These values are the same only when the value is 1, which is the most recent address in the data. Then aggregation pulls back the earliest row in this group.

I answered my question myself, if anyone needs it, my solution:
select * from table dh1 where dh1.number = (
select max(x.number)
from (
select
dh2.id, dh2.number, dh2.address, lag(dh2.address) over(order by dh2.number asc) as prev
from table dh2 where dh1.id=dh2.id
) x
where NVL(x.address, 0) <> NVL(x.prev, 0)
);

Get N results for each group without using join

can I solve this without using join? there are so many data in this table, I want to do it more efficiently.
one of my idea is get ID list by using group_concat subquery, but it doesn't work well with IN clause.
SELECT * FROM table WHERE id IN (group_concat subquery)
May I get your advice?
data
ID SERVER_ID ...
--------------------
1 1 ...
2 1
3 1
4 2
5 2
6 2
7 3
8 3
9 3
10 3
...
expected result with limit 2 per each group:
ID SERVER_ID ...
--------------------
1 1 ...
2 1
4 2
5 2
7 3
8 3

You can try the following using row_number, this solution will work for postgreSQL, MySQL 8.0, Oracle and SQL Server.
select
id,
server_id
from
(
select
id,
server_id,
row_number() over (partition by server_id order by id) as rnk
from yourTable
) val
where rnk <= 2
Here is the demo.
| id | server_id |
| --- | --------- |
| 1 | 1 |
| 2 | 1 |
| 4 | 2 |
| 5 | 2 |
| 7 | 3 |
| 8 | 3 |

H2 SQL Sequence count with duplicate values

I have a table of IDs, with some duplicates and I need to create a sequence based on the IDs. I'm trying to achieve the following.
[ROW] [ID] [SEQID]
1 11 1
2 11 2
3 12 1
4 13 1
5 13 2
I'm using an old version of the H2 DB which doesn't have use of windows functions so I have todo this using straight SQL. I have tried joining the table on itself but I'm not getting the result I want as the duplicate values cause issues, any ideas? I have unique identifier in row number, but not sure how to use this to achieve what I want?
SELECT A.ID, COUNT(*) FROM TABLE A
JOIN TABLE B
ON A.ID = B.ID
WHERE A.ID >= B.ID
GROUP BY A.ID;

Use a subquery that counts the seqid:
select
t.row, t.id,
(select count(*) from tablename where id = t.id and row <= t.row) seqid
from tablename t
It's not as efficient as window functions but it does what you expect.
See the demo (for MySql but it's standard SQL).
Results:
| row | id | seqid |
| --- | --- | ----- |
| 1 | 11 | 1 |
| 2 | 11 | 2 |
| 3 | 12 | 1 |
| 4 | 13 | 1 |
| 5 | 13 | 2 |

Struggling to find the right WHERE clause

I'm struggling with a SQL query and I need your help. To be honest, I'm starting to wonder if what I want to achieve can be done the way I did it so far but maybe your collective brains can come up with a better solution than mine and prove me I took the good way at the beginning (Or that I was totally wrong and I should start from scratch).
The Dataset
A row has 4 important fields: ItemID, Item, Priority and Group. Those fields contain the only valuable piece of information, the one that will be displayed in the end.
As I'm using SQL Server 2008, I don't have access to the LAG and LEAD function so I needed to simulate them (Or at least, I did it because I thought it would be useful to me but I'm not so sure anymore). To obtain this result, I used the code from this article from SQLscope which provide you with a LAG and LEAD equivalent that I restrict to a set of row that have the same ItemID. This adds 7 new functional columns to my dataset: Rn, RnDiv2, RnPlus1Div2, PreviousPriority, NextPriority, PreviousGroup and NextGroup.
ItemID | Item | Priority | Group | Rn | RnDiv2 | RnPlus1Div2 | PreviousPriority | NextPriority | PreviousGroup | NextGroup
-------- | ------- | -------- | ------- | ----- | ------ | ----------- | ---------------- | ------------ | ------------- | ---------
16777397 | Item 1 | 5 | Group 1 | 1 | 0 | 1 | NULL | NULL | NULL | NULL
16777403 | Item 2 | 5 | Group 2 | 1 | 0 | 1 | NULL | 5 | NULL | Group 2
16777403 | Item 2 | 10 | Group 2 | 2 | 1 | 1 | 5 | NULL | Group 2 | NULL
16777429 | Item 3 | 1000 | Group 3 | 1 | 0 | 1 | NULL | NULL | NULL | NULL
16777430 | Item 4 | 5 | Group 1 | 1 | 0 | 1 | NULL | NULL | NULL | NULL
16777454 | Item 5 | 5 | Group 4 | 1 | 0 | 1 | NULL | NULL | NULL | NULL
16777455 | Item 6 | 5 | Group 5 | 1 | 0 | 1 | NULL | NULL | NULL | NULL
16777459 | Item 6 | 5 | Group 6 | 1 | 0 | 1 | NULL | NULL | NULL | NULL
16777468 | Item 8 | 5 | Group 7 | 1 | 0 | 1 | NULL | NULL | NULL | NULL
16777479 | Item 9 | 5 | Group 4 | 1 | 0 | 1 | NULL | NULL | NULL | NULL
16777481 | Item 10 | 5 | Group 4 | 1 | 0 | 1 | NULL | NULL | NULL | NULL
16777496 | Item 11 | 5 | Group 6 | 1 | 0 | 1 | NULL | NULL | NULL | NULL
16777514 | Item 12 | 5 | Group 4 | 1 | 0 | 1 | NULL | NULL | NULL | NULL
16777518 | Item 13 | 5 | Group 8 | 1 | 0 | 1 | NULL | 10 | NULL | Group 8
16777518 | Item 13 | 10 | Group 8 | 2 | 1 | 1 | 5 | 100 | Group 8 | Group 1
16777518 | Item 13 | 100 | Group 1 | 3 | 1 | 2 | 10 | NULL | Group 8 | NULL
16777520 | Item 14 | 5 | Group 9 | 1 | 0 | 1 | NULL | NULL | NULL | NULL
The problem
The problem in my SQL query is the WHERE clause. I will always filter the rows based on their Group column. But there are some subtlety. Whatever the number of Group an Item is member of, I want it to appear in one and only one Group based on these criteria :
If the Item appears in the same Group more than one time, only the line with the lowest priority should be returned. If an Item appears more than one time in the same Group but with the same Priority, then only the first occurrence should be kept Example: for Item 2, only the line with a Priority value of 5 should be returned;
If the Item appears in the Group but is also present in another Group with a lowest Priority, it shouldn't be displayed. Example: Group 1 is selected as a filter. Item 1 should be displayed but Item 13 shouldn't because it is also present in Group 8 with a lower Priority (Item 13 would appear only in Group 8).
Note that this is just a sample. My real dataset has more than 3000 rows and some other cases are probably possible that I haven't listed in my sample.
Unsuccessful Attempts
Like I said, there is one constant in the WHERE clause and that is the Group filtering.
Because of the criterion #2, I can't simply start my clause like that : WHERE Group = 'Group 1' and I need to have something a bit more complex.
I have tried the following clause without success : WHERE Group = 'Group 1' AND (Group = NextGroup AND Priority < NextPriority). That works well in the case of an Item that is in no more that 2 groups. But for Item 13, it would return the first two rows. And if I add something like AND NOT (CorrectedPriority >= PreviousPriority) to the WHERE clause, I get no results at all.
Last attempt so far : (SiteName <> PreviousSiteName AND CorrectedPriority >= PreviousPriority). The problem is that I will never return a line where Rn = 1 because PreviousSiteName will be equal to NULL. Adding a check on NULL doesn't work either. I must have bee tired when trying this particular clause because it's complete garbage.
I will continue to try and find the good WHERE clause but I have the feeling that my whole approach is wrong. I don't see how I could solve the problem when there are more than two entries for the same Item. It is worth noting that this query is used in a SSRS report so I could maybe use custom code to parse the dataset and filter the rows (Working with tables might help solving the issue of Items with more than two entries). But if there's a SQL genius around here with a working solution, that would be great.
PS : if someone knows how to fix this table and can explain it to me, extra cookies for him. :D
Edit :
This is the modified query that I'm using at the moment. I will consider using #Yellowbedwetter's latest query has it seems more robust.
SELECT *
FROM (SELECT ItemID,
Item,
Priority,
Group_,
MIN(Priority) OVER
( PARTITION BY item
) AS interItem_MinPriority
FROM (SELECT ItemID,
Item,
Priority,
Group_,
ROW_NUMBER() OVER
( PARTITION BY Item
ORDER BY Priority ASC
) AS interGrp_Rank
FROM Test_Table
) AS TMP
WHERE interGrp_Rank = 1 -- Exclude all records with the same item/group, but higher priority.
) AS TMP2
WHERE Priority = interItem_MinPriority; -- Exclude which aren't the lowest priority across groups.

If I understand the question correctly this should work
SELECT *
FROM (SELECT ItemID,
Item,
Priority,
Group_,
MIN(Priority) OVER
( PARTITION BY item
) AS interItem_MinPriority
FROM (SELECT ItemID,
Item,
Priority,
Group_,
ROW_NUMBER() OVER
( PARTITION BY Item,
Group_
ORDER BY Priority ASC
) AS interGrp_Rank
FROM Test_Table
) AS TMP
WHERE interGrp_Rank = 1 -- Exclude all records with the same item/group, but higher priority.
) AS TMP2
WHERE Priority = interItem_MinPriority; -- Exclude which aren't the lowest priority across groups.
I don't know if your version of SQL Server supports MIN() OVER()..., but if not you should be able to work around that easily enough.
Edit: To handle tie breaks.
WITH TEST_TABLE (ItemID, Item, Priority, Group_) AS
(
SELECT '16777397','Item 1','5','Group 1' UNION
SELECT '16777403','Item 2','5','Group 2' UNION
SELECT '16777403','Item 2','10','Group 2' UNION
SELECT '16777429','Item 3','1000','Group 3' UNION
SELECT '16777430','Item 4','5','Group 1' UNION
SELECT '16777454','Item 5','5','Group 4' UNION
SELECT '16777455','Item 6','5','Group 5' UNION
SELECT '16777459','Item 6','5','Group 6' UNION
SELECT '16777468','Item 8','5','Group 7' UNION
SELECT '16777479','Item 9','5','Group 4' UNION
SELECT '16777481','Item 10','5','Group 4' UNION
SELECT '16777496','Item 11','5','Group 6' UNION
SELECT '16777514','Item 12','5','Group 4' UNION
SELECT '16777518','Item 13','5','Group 8' UNION
SELECT '16777518','Item 13','10','Group 8' UNION
SELECT '16777518','Item 13','100','Group 1' UNION
SELECT '16777520','Item 14','5','Group 9'
)
SELECT ItemID,
Item,
Priority,
Group_
FROM (SELECT ItemID,
Item,
Priority,
Group_,
ROW_NUMBER() OVER
( PARTITION BY item
ORDER BY Group_ ASC -- or however you want to break the tie
) AS grp_minPriority_TieBreak
FROM (SELECT ItemID,
Item,
Priority,
Group_,
MIN(Priority) OVER
( PARTITION BY item
) AS interItem_MinPriority
FROM (SELECT ItemID,
Item,
Priority,
Group_,
ROW_NUMBER() OVER
( PARTITION BY Item,
Group_
ORDER BY Priority ASC
) AS interGrp_Rank
FROM TEST_TABLE
) AS TMP
WHERE interGrp_Rank = 1 -- Exclude all records with the same item/group, but higher priority.
) AS TMP2
WHERE Priority = interItem_MinPriority -- Exclude which aren't the lowest priority across groups.
) AS TMP2
WHERE grp_minPriority_TieBreak = 1;

If I understand your problem well
about these criteria
If the Item appears in the same Group more than one time, only the
line with the lowest priority should be returned. Example: for Item
2, only the line with a Priority value of 5 should be returned;
If the Item appears in the Group but is also present in another
Group with a lowest Priority, it shouldn't be displayed. Example:
Group 1 is selected as a filter. Item 1 should be displayed but Item
13 shouldn't because it is also present in Group 8 with a lower
Priority (Item 13 would appear only in Group 8).
I think we can get the right result by using the minimum priority per item without considering the group of item , because in the two cases above we took the minimum priority of the item.
so the following query might be helpful.(I tested it with your sample data)
with minPriority as
(
select ItemID, Item, Priority , Group_,ROW_NUMBER() over(partition by ItemId order by priority )rn from Test_table
)
select * from minPriority where rn=1

Haven't tried it but something like..`select max(priority) as mp ..... From ... Where group = 'group1' and mp not in (select max(priority).... from ... Where group <> 'group1'
Apologies for the typing, on my phone no glasses :)

SQL distinct/groupby on combination of columns

I am trying to do a SQL select on a table based on two columns, but not in the usual way where the combination of values in both columns must be unique; I want to select where the value can only appear once in either column.
Given the dataset:
|pkid | fkself | otherData |
|-----+--------+-----------|
| 1 | 4 | there |
| 4 | 1 | will |
| 3 | 6 | be |
| 2 | 5 | other |
| 5 | 2 | data |
| 6 | 3 | columns |
I need to return either
|pkid | fkself | otherData |
|-----+--------+-----------|
| 1 | 4 | there |
| 3 | 6 | be |
| 2 | 5 | other |
or
|pkid | fkself | otherData |
|-----+--------+-----------|
| 4 | 1 | will |
| 5 | 2 | data |
| 6 | 3 | columns |
The only way I can think of to do this is to concatenate `pkid and fkid in order so that both row 1 and row 2 would concatenate to 1,4, but I'm not sure how to do that, or if it is even possible.
The rows will have other data columns, but it does not matter which row I get, only that I get each ID only once, whether the value is in pkid or fkself.

You can use least and greatest to get the smallest or biggest value of the two. That allows you to put them in the right order to generate those keys for you. You could concatenate the values as you suggested, but it's not needed in this solution. With dense_rank you can generate a sequence for each of those fictional keys. Then, you can get the first OtherData from that sequence.
select
pkid,
fkself,
otherData
from
(select
pkid,
fkself,
otherData,
dense_rank() over (partition by least(pkid, fkself), greatest(pkid, fkself) order by pkid) as rank
from
YourTable t)
where
rank = 1

Your idea is possible, and it should produce the results you want.
SELECT DISTINCT joinedID
FROM (
SELECT min(id) & "," & max(id) as joinedID
FROM (
SELECT pkid as id, someUniqueValue
FROM table
UNION ALL
SELECT fkself as id, someUniqueValue
FROM table)
GROUP BY someUniqueValue )
This will give you a unique list of IDs, concatenated as you like. You can easily include other fields by adding them to each SELECT statement. Also, someUniqueValue can be either an existing unique field, a new unique field, or the concatenated pkid and fkself, if that combination is unique.

The only way I can think of to do this is to concatenate `pkid and
fkid in order so that both row 1 and row 2 would concatenate to 1,4,
but I'm not sure how to do that, or if it is even possible.
You could do it using a CASE statement in Oracle:
SQL> SELECT * FROM sample
2 /
PKID FKSELF
---------- ----------
1 4
4 1
3 6
2 5
5 2
7 7
6 rows selected.
SQL> l
1 SELECT DISTINCT *
2 FROM (
3 SELECT CASE WHEN pkid <= fkself THEN pkid||','||fkself
4 ELSE fkself||','||pkid
5 END "JOINED"
6 FROM sample
7* )
SQL> /
JOINED
-------------------------------------------------------------------------------
1,4
2,5
3,6
7,7

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Selecting the first row of group with additional group by columns - sql

You can try using row_number() window function select * from ( select *, row_number() over(partition by parent_id order by object0_behaviour) as rn from tablename )A where rn=1

select * from table join ( SELECT parent_id, min(object0_behaviour) object0_behaviour FROM table GROUP BY parent_id ) grouped on grouped.parent_id = table.parent_id and grouped.object0_behaviour = table.object0_behaviour

Related

Get some values from the table by selecting

Get N results for each group without using join

H2 SQL Sequence count with duplicate values

Struggling to find the right WHERE clause

SQL distinct/groupby on combination of columns

Categories

Resources