How to get the complete row from a maximum calculation? - sql

I do struggle with a GROUP BY -- again. The basics I can handle, but there it is: How do I get to different columns I named in the group by, without destroying my grouping? Note that group by is only my own idea, there may be others that work better. It must work in Oracle, though.
Here is my example:
create table xxgroups (
groupid int not null primary key,
groupname varchar2(10)
);
insert into xxgroups values(100, 'Group 100');
insert into xxgroups values(200, 'Group 200');
drop table xxdata;
create table xxdata (
num1 int,
num2 int,
state_a int,
state_b int,
groupid int,
foreign key (groupid) references xxgroups(groupid)
);
-- "ranks" are 90, 40, null, 70:
insert into xxdata values(10, 10, 1, 4, 100);
insert into xxdata values(10, 10, 0, 4, 200);
insert into xxdata values(11, 11, 0, 3, 100);
insert into xxdata values(20, 22, 5, 7, 200);
The task is to create a result-row for each distinct (num1, num2) and print that groupname with the highest calculated "rank" from state_a and state_b.
Note that the first two rows have the same nums and thus only the higher ranking should be selected -- with the groupname being "Group 200".
I got quite far with the basic group by, I think.
SELECT xd.num1||xd.num2 nummer, max(ranking.goodness)
FROM xxdata xd
, xxgroups xg
,( select state_a, state_b, r as goodness
from dual
model return updated rows
dimension by (0 state_a, 0 state_b) measures (0 r)
rules (r[1,4]=90, r[3,7]=80,r[5,7]=70, r[4,7]=60, r[0,7]=50, r[0,4]=40)
order by goodness desc
) ranking
WHERE xd.groupid=xg.groupid
and ranking.state_a (+) = xd.state_a
and ranking.state_b (+) = xd.state_b
GROUP BY xd.num1||xd.num2
ORDER BY nummer
;
The result is 90% of what I need:
nummer ranking
----------------
1010 90
1111
2022 70
100% perfect would be
nummer groupname
-------------------
1010 Group 100
1111 Group 100
2022 Group 200
The tricky part is, that I want the groupname in the result. And I can not include it in the select, because then I would have to put it into the group by as well -- which I do not want (then I would not select the best ranking entry from over all groups)
In my solution a use a model table to calculate the "rank". There are other solution I am sure. The point is, that it is a non-trivial calculation that I do not want to do twice.
I know from other examples that one could use a second query to get back to the original row to get to the groupname, but I can not see how I could to this here,
without duplicating my ranking calculation.
A nice suggestion was to replace the group by with a LIMIT 1/ORDER BY goodness and use this calculating select as a filtering subselect. But a) there is no LIMIT in Oracle, and I doubt a rownum<=1 would do in a subselect and b) I can not wrap my brain around it anyway. Maybe there is a way?

You can use the FIRST aggregation modifier to selectively apply your function over a subset of rows of a group -- here a single row (SQLFiddle demo):
SELECT xd.num1||xd.num2 nummer,
MAX(xg.groupname) KEEP (DENSE_RANK FIRST
ORDER BY ranking.goodness DESC) grp,
max(ranking.goodness)
FROM xxdata xd
, xxgroups xg
,( select state_a, state_b, r as goodness
from dual
model return updated rows
dimension by (0 state_a, 0 state_b) measures (0 r)
rules (r[1,4]=90, r[3,7]=80,r[5,7]=70, r[4,7]=60, r[0,7]=50, r[0,4]=40)
order by goodness desc
) ranking
WHERE xd.groupid=xg.groupid
and ranking.state_a (+) = xd.state_a
and ranking.state_b (+) = xd.state_b
GROUP BY xd.num1||xd.num2
ORDER BY nummer;
Your method with analytics works as well but since we already use aggregations here, we may as well use the FIRST modifier to get all columns in one go.

Whow, I did search before, but now I found this answer, which I could adopt to my question. The Oracle-solution here is over, partition by with order by and row_number():
select *
from ( select data.*, row_number()
over (partition by nummer order by goodness desc) as seqnum
from (
SELECT xd.num1, xd.num2 nummer, xg.groupname, ranking.goodness
FROM xxdata xd
, xxgroups xg
,( select state_a, state_b, r as goodness
from dual
model return updated rows
dimension by (0 state_a, 0 state_b) measures (0 r)
rules (r[1,4]=90, r[3,7]=80,r[5,7]=70, r[4,7]=60, r[0,7]=50, r[0,4]=40)
) ranking
WHERE xd.groupid=xg.groupid
and ranking.state_a (+) = xd.state_a
and ranking.state_b (+) = xd.state_b
ORDER BY nummer
) data )
where seqnum = 1
;
The result is
10 10 Group 100 90 1
11 11 Group 100 1
20 22 Group 200 70 1
which is beautiful.
Now I have to try to understand what over in the select excactly does....

Related

Count length of consecutive duplicate values for each id

I have a table as shown in the screenshot (first two columns) and I need to create a column like the last one. I'm trying to calculate the length of each sequence of consecutive values for each id.
For this, the last column is required. I played around with
row_number() over (partition by id, value)
but did not have much success, since the circled number was (quite predictably) computed as 2 instead of 1.
Please help!
First of all, we need to have a way to defined how the rows are ordered. For example, in your sample data there is not way to be sure that 'first' row (1, 1) will be always displayed before the 'second' row (1,0).
That's why in my sample data I have added an identity column. In your real case, the details can be order by row ID, date column or something else, but you need to ensure the rows can be sorted via unique criteria.
So, the task is pretty simple:
calculate trigger switch - when value is changed
calculate groups
calculate rows
That's it. I have used common table expression and leave all columns in order to be easy for you to understand the logic. You are free to break this in separate statements and remove some of the columns.
DECLARE #DataSource TABLE
(
[RowID] INT IDENTITY(1, 1)
,[ID]INT
,[value] INT
);
INSERT INTO #DataSource ([ID], [value])
VALUES (1, 1)
,(1, 0)
,(1, 0)
,(1, 1)
,(1, 1)
,(1, 1)
--
,(2, 0)
,(2, 1)
,(2, 0)
,(2, 0);
WITH DataSourceWithSwitch AS
(
SELECT *
,IIF(LAG([value]) OVER (PARTITION BY [ID] ORDER BY [RowID]) = [value], 0, 1) AS [Switch]
FROM #DataSource
), DataSourceWithGroup AS
(
SELECT *
,SUM([Switch]) OVER (PARTITION BY [ID] ORDER BY [RowID]) AS [Group]
FROM DataSourceWithSwitch
)
SELECT *
,ROW_NUMBER() OVER (PARTITION BY [ID], [Group] ORDER BY [RowID]) AS [GroupRowID]
FROM DataSourceWithGroup
ORDER BY [RowID];
You want results that are dependent on actual data ordering in the data source. In SQL you operate on relations, sometimes on ordered set of relations rows. Your desired end result is not well-defined in terms of SQL, unless you introduce an additional column in your source table, over which your data is ordered (e.g. auto-increment or some timestamp column).
Note: this answers the original question and doesn't take into account additional timestamp column mentioned in the comment. I'm not updating my answer since there is already an accepted answer.
One way to solve it could be through a recursive CTE:
create table #tmp (i int identity,id int, value int, rn int);
insert into #tmp (id,value) VALUES
(1,1),(1,0),(1,0),(1,1),(1,1),(1,1),
(2,0),(2,1),(2,0),(2,0);
WITH numbered AS (
SELECT i,id,value, 1 seq FROM #tmp WHERE i=1 UNION ALL
SELECT a.i,a.id,a.value, CASE WHEN a.id=b.id AND a.value=b.value THEN b.seq+1 ELSE 1 END
FROM #tmp a INNER JOIN numbered b ON a.i=b.i+1
)
SELECT * FROM numbered -- OPTION (MAXRECURSION 1000)
This will return the following:
i id value seq
1 1 1 1
2 1 0 1
3 1 0 2
4 1 1 1
5 1 1 2
6 1 1 3
7 2 0 1
8 2 1 1
9 2 0 1
10 2 0 2
See my little demo here: https://rextester.com/ZZEIU93657
A prerequisite for the CTE to work is a sequenced table (e. g. a table with an identitycolumn in it) as a source. In my example I introduced the column i for this. As a starting point I need to find the first entry of the source table. In my case this was the entry with i=1.
For a longer source table you might run into a recursion-limit error as the default for MAXRECURSION is 100. In this case you should uncomment the OPTION setting behind my SELECT clause above. You can either set it to a higher value (like shown) or switch it off completely by setting it to 0.
IMHO, this is easier to do with cursor and loop.
may be there is a way to do the job with selfjoin
declare #t table (id int, val int)
insert into #t (id, val)
select 1 as id, 1 as val
union all select 1, 0
union all select 1, 0
union all select 1, 1
union all select 1, 1
union all select 1, 1
;with cte1 (id , val , num ) as
(
select id, val, row_number() over (ORDER BY (SELECT 1)) as num from #t
)
, cte2 (id, val, num, N) as
(
select id, val, num, 1 from cte1 where num = 1
union all
select t1.id, t1.val, t1.num,
case when t1.id=t2.id and t1.val=t2.val then t2.N + 1 else 1 end
from cte1 t1 inner join cte2 t2 on t1.num = t2.num + 1 where t1.num > 1
)
select * from cte2

Identify Consecutive Chunks in SQL Server Table

I have this table:
ValueId bigint // (identity) item ID
ListId bigint // group ID
ValueDelta int // item value
ValueCreated datetime2 // item created
What I need is to find consecutive Values within the same Group ordered by Created, not ID. Created and ID are not guaranteed to be in the same order.
So the output should be:
ListID bigint
FirstId bigint // from this ID (first in LID with Value ordered by Date)
LastId bigint // to this ID (last in LID with Value ordered by Date)
ValueDelta int // all share this value
ValueCount // and this many occurrences (number of items between FirstId and LastId)
I can do this with Cursors but I'm sure that's not the best idea so I'm wondering if this can be done in a query.
Please, for the answer (if any), explain it a bit.
UPDATE: SQLfiddle basic data set
It does look like a gaps-and-island problem.
Here is one way to do it. It would likely work faster than your variant.
The standard idea for gaps-and-islands is to generate two sets of row numbers partitioning them in two ways. The difference between such row numbers (rn1-rn2) would remain the same within each consecutive chunk. Run the query below CTE-by-CTE and examine intermediate results to see what is going on.
WITH
CTE_RN
AS
(
SELECT
[ValueId]
,[ListId]
,[ValueDelta]
,[ValueCreated]
,ROW_NUMBER() OVER (PARTITION BY ListID ORDER BY ValueCreated) AS rn1
,ROW_NUMBER() OVER (PARTITION BY ListID, [ValueDelta] ORDER BY ValueCreated) AS rn2
FROM [Value]
)
SELECT
ListID
,MIN(ValueID) AS FirstID
,MAX(ValueID) AS LastID
,MIN(ValueCreated) AS FirstCreated
,MAX(ValueCreated) AS LastCreated
,ValueDelta
,COUNT(*) AS ValueCount
FROM CTE_RN
GROUP BY
ListID
,ValueDelta
,rn1-rn2
ORDER BY
FirstCreated
;
This query produces the same result as yours on your sample data set.
It is not quite clear whether FirstID and LastID can be MIN and MAX, or they indeed must be from the first and last rows (when ordered by ValueCreated). If you need really first and last, the query would become a bit more complicated.
In your original sample data set the "first" and "min" for the FirstID are the same. Let's change the sample data set a little to highlight this difference:
insert into [Value]
([ListId], [ValueDelta], [ValueCreated])
values
(1, 1, '2019-01-01 01:01:02'), -- 1.1
(1, 0, '2019-01-01 01:02:01'), -- 2.1
(1, 0, '2019-01-01 01:03:01'), -- 2.2
(1, 0, '2019-01-01 01:04:01'), -- 2.3
(1, -1, '2019-01-01 01:05:01'), -- 3.1
(1, -1, '2019-01-01 01:06:01'), -- 3.2
(1, 1, '2019-01-01 01:01:01'), -- 1.2
(1, 1, '2019-01-01 01:08:01'), -- 4.2
(2, 1, '2019-01-01 01:08:01') -- 5.1
;
All I did is swapped the ValueCreated between the first and seventh rows, so now the FirstID of the first group is 7 and LastID is 1. Your query returns correct result. My simple query above doesn't.
Here is the variant that produces correct result. I decided to use FIRST_VALUE and LAST_VALUE functions to get the appropriate IDs. Again, run the query CTE-by-CTE and examine intermediate results to see what is going on.
This variant produces the same result as your query even with the adjusted sample data set.
WITH
CTE_RN
AS
(
SELECT
[ValueId]
,[ListId]
,[ValueDelta]
,[ValueCreated]
,ROW_NUMBER() OVER (PARTITION BY ListID ORDER BY ValueCreated) AS rn1
,ROW_NUMBER() OVER (PARTITION BY ListID, ValueDelta ORDER BY ValueCreated) AS rn2
FROM [Value]
)
,CTE2
AS
(
SELECT
ValueId
,ListId
,ValueDelta
,ValueCreated
,rn1
,rn2
,rn1-rn2 AS Diff
,FIRST_VALUE(ValueID) OVER(
PARTITION BY ListID, ValueDelta, rn1-rn2 ORDER BY ValueCreated
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS FirstID
,LAST_VALUE(ValueID) OVER(
PARTITION BY ListID, ValueDelta, rn1-rn2 ORDER BY ValueCreated
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LastID
FROM CTE_RN
)
SELECT
ListID
,FirstID
,LastID
,MIN(ValueCreated) AS FirstCreated
,MAX(ValueCreated) AS LastCreated
,ValueDelta
,COUNT(*) AS ValueCount
FROM CTE2
GROUP BY
ListID
,ValueDelta
,rn1-rn2
,FirstID
,LastID
ORDER BY FirstCreated;
Use a CTE that adds a Row_Number column, partitioned by GroupId and Value and ordered by Created.
Then select from the CTE, GROUP BY GroupId and Value; use COUNT(*) to get the Count, and use correlated subqueries to select the ValueId with the MIN(RowNumber) (which will always be 1, so you can just use that instead of MIN) and the MAX(RowNumber) to get FirstId and LastId.
Although, now that I've noticed you're using SQL Server 2017, you should be able to use First_Value() and Last_Value() instead of correlated subqueries.
After many iterations I think I have a working solution. I'm absolutely sure it's far from optimal but it works.
Link is here: http://sqlfiddle.com/#!18/4ee9f/3
Sample data:
create table [Value]
(
[ValueId] bigint not null identity(1,1),
[ListId] bigint not null,
[ValueDelta] int not null,
[ValueCreated] datetime2 not null,
constraint [PK_Value] primary key clustered ([ValueId])
);
insert into [Value]
([ListId], [ValueDelta], [ValueCreated])
values
(1, 1, '2019-01-01 01:01:01'), -- 1.1
(1, 0, '2019-01-01 01:02:01'), -- 2.1
(1, 0, '2019-01-01 01:03:01'), -- 2.2
(1, 0, '2019-01-01 01:04:01'), -- 2.3
(1, -1, '2019-01-01 01:05:01'), -- 3.1
(1, -1, '2019-01-01 01:06:01'), -- 3.2
(1, 1, '2019-01-01 01:01:02'), -- 1.2
(1, 1, '2019-01-01 01:08:01'), -- 4.2
(2, 1, '2019-01-01 01:08:01') -- 5.1
The Query that seems to work:
-- this is the actual order of data
select *
from [Value]
order by [ListId] asc, [ValueCreated] asc;
-- there are 4 sets here
-- set 1 GroupId=1, Id=1&7, Value=1
-- set 2 GroupId=1, Id=2-4, Value=0
-- set 3 GroupId=1, Id=5-6, Value=-1
-- set 4 GroupId=1, Id=8-8, Value=1
-- set 5 GroupId=2, Id=9-9, Value=1
with [cte1] as
(
select [v1].[ListId]
,[v2].[ValueId] as [FirstId], [v2].[ValueCreated] as [FirstCreated]
,[v1].[ValueId] as [LastId], [v1].[ValueCreated] as [LastCreated]
,isnull([v1].[ValueDelta], 0) as [ValueDelta]
from [dbo].[Value] [v1]
join [dbo].[Value] [v2] on [v2].[ListId] = [v1].[ListId]
and isnull([v2].[ValueDeltaPrev], 0) = isnull([v1].[ValueDeltaPrev], 0)
and [v2].[ValueCreated] <= [v1].[ValueCreated] and not exists (
select 1
from [dbo].[Value] [v3]
where 1=1
and ([v3].[ListId] = [v1].[ListId])
and ([v3].[ValueCreated] between [v2].[ValueCreated] and [v1].[ValueCreated])
and [v3].[ValueDelta] != [v1].[ValueDelta]
)
), [cte2] as
(
select [t1].*
from [cte1] [t1]
where not exists (select 1 from [cte1] [t2] where [t2].[ListId] = [t1].[ListId]
and ([t1].[FirstId] != [t2].[FirstId] or [t1].[LastId] != [t2].[LastId])
and [t1].[FirstCreated] between [t2].[FirstCreated] and [t2].[LastCreated]
and [t1].[LastCreated] between [t2].[FirstCreated] and [t2].[LastCreated]
)
)
select [ListId], [FirstId], [LastId], [FirstCreated], [LastCreated], [ValueDelta] as [ValueDelta]
,(select count(*) from [dbo].[Value] where [ListId] = [t].[ListId] and [ValueCreated] between [t].[FirstCreated] and [t].[LastCreated]) as [ValueCount]
from [cte2] [t];
How it works:
join table to self on same list but only on older (or equal date to handle single sets) values
join again on self and exclude any overlaps keeping only largest date set
once we identify largest sets, we then count entries in set dates
If anyone can find a better / friendlier solution, you get the answer.
PS: The dumb straightforward Cursor approach seems a lot faster than this. Still testing.

Returning the row with the most recent timestamp from each group

I have a table (Postgres 9.3) defined as follows:
CREATE TABLE tsrs (
id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL REFERENCES customers,
timestamp TIMESTAMP WITHOUT TIME ZONE,
licensekeys_checksum VARCHAR(32));
The pertinent details here are the customer_id, the timestamp, and the licensekeys_checksum. There can be multiple entries with the same customer_id, some of those may have matching licensekey_checksum entries, and some may be different. There will never be rows with equal checksum and equal timestamps.
I want to return a table containing 1 row for each group of rows with matching licensekeys_checksum entries. The row returned for each group should be the one with the newest / most recent timestamp.
Sample Input:
1, 2, 2014-08-21 16:03:35, 3FF2561A
2, 2, 2014-08-22 10:00:41, 3FF2561A
2, 2, 2014-06-10 10:00:41, 081AB3CA
3, 5, 2014-02-01 12:03:23, 299AFF90
4, 5, 2013-12-13 08:14:26, 299AFF90
5, 6, 2013-09-09 18:21:53, 49FFA891
Desired Output:
2, 2, 2014-08-22 10:00:41, 3FF2561A
2, 2, 2014-06-10 10:00:41, 081AB3CA
3, 5, 2014-02-01 12:03:23, 299AFF90
5, 6, 2013-09-09 18:21:53, 49FFA891
I have managed to piece together a query based on the comments below, and hours of searching on the internet. :)
select * from tsrs
inner join (
select licensekeys_checksum, max(timestamp) as mts
from tsrs
group by licensekeys_checksum
) x on x.licensekeys_checksum = tsrs.licensekeys_checksum
and x.mts = tsrs.timestamp;
It seems to work, but I am unsure. Am I on the right track?
Your query in the question should perform better than the queries in the (previously) accepted answer. Test with EXPLAIN ANALYZE.
DISTINCT ON is typically simpler and faster:
SELECT DISTINCT ON (licensekeys_checksum) *
FROM tsrs
ORDER BY licensekeys_checksum, timestamp DESC NULLS LAST;
db<>fiddle here
Old sqlfiddle
Detailed explanation:
Select first row in each GROUP BY group?
Alternative deduplication, using NOT EXISTS(...)
SELECT *
FROM tsrs t
WHERE NOT EXISTS (
SELECT *
FROM tsrs x
WHERE x.customer_id = t.customer_id -- same customer
AND x.licensekeys_checksum = t.licensekeys_checksum -- same checksum
AND x.ztimestamp > t.ztimestamp -- but more recent
);
Try this
select *
from tsrs
where (timestamp,licensekeys_checksum) in (
select max(timestamp)
,licensekeys_checksum
from tsrs
group by licensekeys_checksum)
>SqlFiddle Demo
or
with cte as (
select id
,customer_id
,timestamp
,licensekeys_checksum
,row_number () over (partition by licensekeys_checksum ORDER BY timestamp DESC) as rk
from tsrs)
select id
,customer_id
,timestamp
,licensekeys_checksum
from cte where rk=1 order by id
>SqlFiddle Demo
Reference : Window Functions, row_number(), and CTE

DB2 SQL - median with GROUP BY

First of all, I am running on DB2 for i5/OS V5R4. I have ROW_NUMBER(), RANK() and common table expressions. I do not have TOP n PERCENT or LIMIT OFFSET.
The actual data set I'm working with is hard to explain, so let's just say I have a weather history table where the columns are (city, temperature, timestamp). I want to compare medians to averages for each group (city).
This was the cleanest way I found to get a median for a whole table aggregation. I adapted it from the IBM Redbook here:
WITH base_t AS
( SELECT temp, row_number() over (order by temperature) AS rownum FROM t ),
count_t AS
( SELECT COUNT(temperature) + 1 AS base_count FROM base_t ),
median_t AS
( SELECT temperature FROM base_t, count_t
WHERE rownum in (FLOOR(base_count/2e0), CEILING(base_count/2e0)) )
SELECT DECIMAL(AVG(temperature),10,2) AS median FROM median_t
That works well for getting a single row back, but it seems to fall apart for grouping. Conceptually, this is what I want:
SELECT city, AVG(temperature), MEDIAN(temperature) FROM ...
city | mean_temp | median_temp
===================================================
'Minneapolis' | 60 | 64
'Milwaukee' | 65 | 66
'Muskegon' | 70 | 61
There could be an answer that makes me look stupid, but I'm having a mental block and this isn't my #1 thing to work on right now. Seems like it could be possible, but I can't use something that's extremely complex since it's a large table and I want the ability to customize which columns are being aggregated.
In SQL Server, agreagate functions like count(*) can be partitioned and calculated without a group by. I looked quickly through the referenced redbook, and it looks like DB2 has the same feature. But if not, then this won't work:
create table TemperatureHistory
(City varchar(20)
, Temperature decimal(5, 2)
, DateTaken datetime)
insert into TemperatureHistory values ('Minneapolis', 61, '20090101')
insert into TemperatureHistory values ('Minneapolis', 59, '20090102')
insert into TemperatureHistory values ('Milwaukee', 65, '20090101')
insert into TemperatureHistory values ('Milwaukee', 65, '20090102')
insert into TemperatureHistory values ('Milwaukee', 100, '20090103')
insert into TemperatureHistory values ('Muskegon', 80, '20090101')
insert into TemperatureHistory values ('Muskegon', 70, '20090102')
insert into TemperatureHistory values ('Muskegon', 70, '20090103')
insert into TemperatureHistory values ('Muskegon', 20, '20090104')
; with base_t as
(select city
, Temperature
, row_number() over (partition by city order by temperature) as RowNum
, (count(*) over (partition by city)) + 1 as CountPlusOne
from TemperatureHistory)
select City
, avg(Temperature) as MeanTemp
, avg(case
when RowNum in (FLOOR(CountPlusOne/2.0), CEILING(CountPlusOne/2.0))
then Temperature
else null end) as MedianTemp
from base_t
group by City

Simple Query to Grab Max Value for each ID

OK I have a table like this:
ID Signal Station OwnerID
111 -120 Home 1
111 -130 Car 1
111 -135 Work 2
222 -98 Home 2
222 -95 Work 1
222 -103 Work 2
This is all for the same day. I just need the Query to return the max signal for each ID:
ID Signal Station OwnerID
111 -120 Home 1
222 -95 Work 1
I tried using MAX() and the aggregation messes up with the Station and OwnerID being different for each record. Do I need to do a JOIN?
Something like this? Join your table with itself, and exclude the rows for which a higher signal was found.
select cur.id, cur.signal, cur.station, cur.ownerid
from yourtable cur
where not exists (
select *
from yourtable high
where high.id = cur.id
and high.signal > cur.signal
)
This would list one row for each highest signal, so there might be multiple rows per id.
You are doing a group-wise maximum/minimum operation. This is a common trap: it feels like something that should be easy to do, but in SQL it aggravatingly isn't.
There are a number of approaches (both standard ANSI and vendor-specific) to this problem, most of which are sub-optimal in many situations. Some will give you multiple rows when more than one row shares the same maximum/minimum value; some won't. Some work well on tables with a small number of groups; others are more efficient for a larger number of groups with smaller rows per group.
Here's a discussion of some of the common ones (MySQL-biased but generally applicable). Personally, if I know there are no multiple maxima (or don't care about getting them) I often tend towards the null-left-self-join method, which I'll post as no-one else has yet:
SELECT reading.ID, reading.Signal, reading.Station, reading.OwnerID
FROM readings AS reading
LEFT JOIN readings AS highersignal
ON highersignal.ID=reading.ID AND highersignal.Signal>reading.Signal
WHERE highersignal.ID IS NULL;
In classic SQL-92 (not using the OLAP operations used by Quassnoi), then you can use:
SELECT g.ID, g.MaxSignal, t.Station, t.OwnerID
FROM (SELECT id, MAX(Signal) AS MaxSignal
FROM t
GROUP BY id) AS g
JOIN t ON g.id = t.id AND g.MaxSignal = t.Signal;
(Unchecked syntax; assumes your table is 't'.)
The sub-query in the FROM clause identifies the maximum signal value for each id; the join combines that with the corresponding data row from the main table.
NB: if there are several entries for a specific ID that all have the same signal strength and that strength is the MAX(), then you will get several output rows for that ID.
Tested against IBM Informix Dynamic Server 11.50.FC3 running on Solaris 10:
+ CREATE TEMP TABLE signal_info
(
id INTEGER NOT NULL,
signal INTEGER NOT NULL,
station CHAR(5) NOT NULL,
ownerid INTEGER NOT NULL
);
+ INSERT INTO signal_info VALUES(111, -120, 'Home', 1);
+ INSERT INTO signal_info VALUES(111, -130, 'Car' , 1);
+ INSERT INTO signal_info VALUES(111, -135, 'Work', 2);
+ INSERT INTO signal_info VALUES(222, -98 , 'Home', 2);
+ INSERT INTO signal_info VALUES(222, -95 , 'Work', 1);
+ INSERT INTO signal_info VALUES(222, -103, 'Work', 2);
+ SELECT g.ID, g.MaxSignal, t.Station, t.OwnerID
FROM (SELECT id, MAX(Signal) AS MaxSignal
FROM signal_info
GROUP BY id) AS g
JOIN signal_info AS t ON g.id = t.id AND g.MaxSignal = t.Signal;
111 -120 Home 1
222 -95 Work 1
I named the table Signal_Info for this test - but it seems to produce the right answer.
This only shows that there is at least one DBMS that supports the notation. However, I am a little surprised that MS SQL Server does not - which version are you using?
It never ceases to surprise me how often SQL questions are submitted without table names.
WITH q AS
(
SELECT c.*, ROW_NUMBER() OVER (PARTITION BY id ORDER BY signal DESC) rn
FROM mytable
)
SELECT *
FROM q
WHERE rn = 1
This will return one row even if there are duplicates of MAX(signal) for a given ID.
Having an index on (id, signal) will greatly improve this query.
with tab(id, sig, sta, oid) as
(
select 111 as id, -120 as signal, 'Home' as station, 1 as ownerId union all
select 111, -130, 'Car', 1 union all
select 111, -135, 'Work', 2 union all
select 222, -98, 'Home', 2 union all
select 222, -95, 'Work', 1 union all
select 222, -103, 'Work', 2
) ,
tabG(id, maxS) as
(
select id, max(sig) as sig from tab group by id
)
select g.*, p.* from tabG g
cross apply ( select top(1) * from tab t where t.id=g.id order by t.sig desc ) p
We can do using self join
SELECT T1.ID,T1.Signal,T2.Station,T2.OwnerID
FROM (select ID,max(Signal) as Signal from mytable group by ID) T1
LEFT JOIN mytable T2
ON T1.ID=T2.ID and T1.Signal=T2.Signal;
Or you can also use the following query
SELECT t0.ID,t0.Signal,t0.Station,t0.OwnerID
FROM mytable t0
LEFT JOIN mytable t1 ON t0.ID=t1.ID AND t1.Signal>t0.Signal
WHERE t1.ID IS NULL;
select a.id, b.signal, a.station, a.owner from
mytable a
join
(SELECT ID, MAX(Signal) as Signal FROM mytable GROUP BY ID) b
on a.id = b.id AND a.Signal = b.Signal
SELECT * FROM StatusTable
WHERE Signal IN (
SELECT A.maxSignal FROM
(
SELECT ID, MAX(Signal) AS maxSignal
FROM StatusTable
GROUP BY ID
) AS A
);
select
id,
max_signal,
owner,
ownerId
FROM (
select * , rank() over(partition by id order by signal desc) as max_signal from table
)
where max_signal = 1;