Simple Query to Grab Max Value for each ID - sql

OK I have a table like this:
ID Signal Station OwnerID
111 -120 Home 1
111 -130 Car 1
111 -135 Work 2
222 -98 Home 2
222 -95 Work 1
222 -103 Work 2
This is all for the same day. I just need the Query to return the max signal for each ID:
ID Signal Station OwnerID
111 -120 Home 1
222 -95 Work 1
I tried using MAX() and the aggregation messes up with the Station and OwnerID being different for each record. Do I need to do a JOIN?

Something like this? Join your table with itself, and exclude the rows for which a higher signal was found.
select cur.id, cur.signal, cur.station, cur.ownerid
from yourtable cur
where not exists (
select *
from yourtable high
where high.id = cur.id
and high.signal > cur.signal
)
This would list one row for each highest signal, so there might be multiple rows per id.

You are doing a group-wise maximum/minimum operation. This is a common trap: it feels like something that should be easy to do, but in SQL it aggravatingly isn't.
There are a number of approaches (both standard ANSI and vendor-specific) to this problem, most of which are sub-optimal in many situations. Some will give you multiple rows when more than one row shares the same maximum/minimum value; some won't. Some work well on tables with a small number of groups; others are more efficient for a larger number of groups with smaller rows per group.
Here's a discussion of some of the common ones (MySQL-biased but generally applicable). Personally, if I know there are no multiple maxima (or don't care about getting them) I often tend towards the null-left-self-join method, which I'll post as no-one else has yet:
SELECT reading.ID, reading.Signal, reading.Station, reading.OwnerID
FROM readings AS reading
LEFT JOIN readings AS highersignal
ON highersignal.ID=reading.ID AND highersignal.Signal>reading.Signal
WHERE highersignal.ID IS NULL;

In classic SQL-92 (not using the OLAP operations used by Quassnoi), then you can use:
SELECT g.ID, g.MaxSignal, t.Station, t.OwnerID
FROM (SELECT id, MAX(Signal) AS MaxSignal
FROM t
GROUP BY id) AS g
JOIN t ON g.id = t.id AND g.MaxSignal = t.Signal;
(Unchecked syntax; assumes your table is 't'.)
The sub-query in the FROM clause identifies the maximum signal value for each id; the join combines that with the corresponding data row from the main table.
NB: if there are several entries for a specific ID that all have the same signal strength and that strength is the MAX(), then you will get several output rows for that ID.
Tested against IBM Informix Dynamic Server 11.50.FC3 running on Solaris 10:
+ CREATE TEMP TABLE signal_info
(
id INTEGER NOT NULL,
signal INTEGER NOT NULL,
station CHAR(5) NOT NULL,
ownerid INTEGER NOT NULL
);
+ INSERT INTO signal_info VALUES(111, -120, 'Home', 1);
+ INSERT INTO signal_info VALUES(111, -130, 'Car' , 1);
+ INSERT INTO signal_info VALUES(111, -135, 'Work', 2);
+ INSERT INTO signal_info VALUES(222, -98 , 'Home', 2);
+ INSERT INTO signal_info VALUES(222, -95 , 'Work', 1);
+ INSERT INTO signal_info VALUES(222, -103, 'Work', 2);
+ SELECT g.ID, g.MaxSignal, t.Station, t.OwnerID
FROM (SELECT id, MAX(Signal) AS MaxSignal
FROM signal_info
GROUP BY id) AS g
JOIN signal_info AS t ON g.id = t.id AND g.MaxSignal = t.Signal;
111 -120 Home 1
222 -95 Work 1
I named the table Signal_Info for this test - but it seems to produce the right answer.
This only shows that there is at least one DBMS that supports the notation. However, I am a little surprised that MS SQL Server does not - which version are you using?
It never ceases to surprise me how often SQL questions are submitted without table names.

WITH q AS
(
SELECT c.*, ROW_NUMBER() OVER (PARTITION BY id ORDER BY signal DESC) rn
FROM mytable
)
SELECT *
FROM q
WHERE rn = 1
This will return one row even if there are duplicates of MAX(signal) for a given ID.
Having an index on (id, signal) will greatly improve this query.

with tab(id, sig, sta, oid) as
(
select 111 as id, -120 as signal, 'Home' as station, 1 as ownerId union all
select 111, -130, 'Car', 1 union all
select 111, -135, 'Work', 2 union all
select 222, -98, 'Home', 2 union all
select 222, -95, 'Work', 1 union all
select 222, -103, 'Work', 2
) ,
tabG(id, maxS) as
(
select id, max(sig) as sig from tab group by id
)
select g.*, p.* from tabG g
cross apply ( select top(1) * from tab t where t.id=g.id order by t.sig desc ) p

We can do using self join
SELECT T1.ID,T1.Signal,T2.Station,T2.OwnerID
FROM (select ID,max(Signal) as Signal from mytable group by ID) T1
LEFT JOIN mytable T2
ON T1.ID=T2.ID and T1.Signal=T2.Signal;
Or you can also use the following query
SELECT t0.ID,t0.Signal,t0.Station,t0.OwnerID
FROM mytable t0
LEFT JOIN mytable t1 ON t0.ID=t1.ID AND t1.Signal>t0.Signal
WHERE t1.ID IS NULL;

select a.id, b.signal, a.station, a.owner from
mytable a
join
(SELECT ID, MAX(Signal) as Signal FROM mytable GROUP BY ID) b
on a.id = b.id AND a.Signal = b.Signal

SELECT * FROM StatusTable
WHERE Signal IN (
SELECT A.maxSignal FROM
(
SELECT ID, MAX(Signal) AS maxSignal
FROM StatusTable
GROUP BY ID
) AS A
);

select
id,
max_signal,
owner,
ownerId
FROM (
select * , rank() over(partition by id order by signal desc) as max_signal from table
)
where max_signal = 1;

Related

LIMIT by distinct values in PostgreSQL

I have a table of contacts with phone numbers similar to this:
Name Phone
Alice 11
Alice 33
Bob 22
Bob 44
Charlie 12
Charlie 55
I can't figure out how to query such a table with LIMITing the rows not just by plain count but by distinct names. For example, if I had a magic LIMIT_BY clause, it would work like this:
SELECT * FROM "Contacts" ORDER BY "Phone" LIMIT_BY("Name") 1
Alice 11
Alice 33
-- ^ only the first contact
SELECT * FROM "Contacts" ORDER BY "Phone" LIMIT_BY("Name") 2
Alice 11
Charlie 12
Alice 33
Charlie 55
-- ^ now with Charlie because his phone 12 goes right after 11. Bob isn't here because he's third, beyond the limit
How could I achieve this result?
In other words, select all rows containing top N distinct Names ordered by Phone
I don't think that PostgreSQL provides any particularly efficient way to do this, but for 6 rows it doesn't need to be very efficient. You could do a subquery to compute which people you want to see, then join that subquery back against the full table.
select * from
"Contacts" join
(select name from "Contacts" group by name order by min(phone) limit 2) as limited
using (name)
You could put the subquery in an IN-list rather than a JOIN, but that often performs worse.
If you want all names that are in the first n rows, you can use in:
select t.*
from t
where t.name in (select t2.name
from t t2
order by t2.phone
limit 2
);
If you want the first n names by phone:
select t.*
from t
where t.name in (select t2.name
from t t2
group by t2.name
order by min(t2.phone)
limit 2
);
try this:
SELECT distinct X.name
,X.phone
FROM (
SELECT *
FROM (
SELECT name
,rn
FROM (
SELECT name
,phone
,row_number() OVER (
ORDER BY phone
) rn
FROM "Contacts"
) AA
) DD
WHERE rn <= 2 --rn is the "limit" variable
) EE
,"Contacts" X
WHERE EE.name = X.name
above seems to be working correctly on following dataset:
create table "Contacts" (name text, phone text);
insert into "Contacts" (name, phone) VALUES
('Alice', '11'),
('Alice', '33'),
('Bob', '22'),
('Bob', '44'),
('Charlie', '13'),
('Charlie', '55'),
('Dennis', '12'),
('Dennis', '66');

Count length of consecutive duplicate values for each id

I have a table as shown in the screenshot (first two columns) and I need to create a column like the last one. I'm trying to calculate the length of each sequence of consecutive values for each id.
For this, the last column is required. I played around with
row_number() over (partition by id, value)
but did not have much success, since the circled number was (quite predictably) computed as 2 instead of 1.
Please help!
First of all, we need to have a way to defined how the rows are ordered. For example, in your sample data there is not way to be sure that 'first' row (1, 1) will be always displayed before the 'second' row (1,0).
That's why in my sample data I have added an identity column. In your real case, the details can be order by row ID, date column or something else, but you need to ensure the rows can be sorted via unique criteria.
So, the task is pretty simple:
calculate trigger switch - when value is changed
calculate groups
calculate rows
That's it. I have used common table expression and leave all columns in order to be easy for you to understand the logic. You are free to break this in separate statements and remove some of the columns.
DECLARE #DataSource TABLE
(
[RowID] INT IDENTITY(1, 1)
,[ID]INT
,[value] INT
);
INSERT INTO #DataSource ([ID], [value])
VALUES (1, 1)
,(1, 0)
,(1, 0)
,(1, 1)
,(1, 1)
,(1, 1)
--
,(2, 0)
,(2, 1)
,(2, 0)
,(2, 0);
WITH DataSourceWithSwitch AS
(
SELECT *
,IIF(LAG([value]) OVER (PARTITION BY [ID] ORDER BY [RowID]) = [value], 0, 1) AS [Switch]
FROM #DataSource
), DataSourceWithGroup AS
(
SELECT *
,SUM([Switch]) OVER (PARTITION BY [ID] ORDER BY [RowID]) AS [Group]
FROM DataSourceWithSwitch
)
SELECT *
,ROW_NUMBER() OVER (PARTITION BY [ID], [Group] ORDER BY [RowID]) AS [GroupRowID]
FROM DataSourceWithGroup
ORDER BY [RowID];
You want results that are dependent on actual data ordering in the data source. In SQL you operate on relations, sometimes on ordered set of relations rows. Your desired end result is not well-defined in terms of SQL, unless you introduce an additional column in your source table, over which your data is ordered (e.g. auto-increment or some timestamp column).
Note: this answers the original question and doesn't take into account additional timestamp column mentioned in the comment. I'm not updating my answer since there is already an accepted answer.
One way to solve it could be through a recursive CTE:
create table #tmp (i int identity,id int, value int, rn int);
insert into #tmp (id,value) VALUES
(1,1),(1,0),(1,0),(1,1),(1,1),(1,1),
(2,0),(2,1),(2,0),(2,0);
WITH numbered AS (
SELECT i,id,value, 1 seq FROM #tmp WHERE i=1 UNION ALL
SELECT a.i,a.id,a.value, CASE WHEN a.id=b.id AND a.value=b.value THEN b.seq+1 ELSE 1 END
FROM #tmp a INNER JOIN numbered b ON a.i=b.i+1
)
SELECT * FROM numbered -- OPTION (MAXRECURSION 1000)
This will return the following:
i id value seq
1 1 1 1
2 1 0 1
3 1 0 2
4 1 1 1
5 1 1 2
6 1 1 3
7 2 0 1
8 2 1 1
9 2 0 1
10 2 0 2
See my little demo here: https://rextester.com/ZZEIU93657
A prerequisite for the CTE to work is a sequenced table (e. g. a table with an identitycolumn in it) as a source. In my example I introduced the column i for this. As a starting point I need to find the first entry of the source table. In my case this was the entry with i=1.
For a longer source table you might run into a recursion-limit error as the default for MAXRECURSION is 100. In this case you should uncomment the OPTION setting behind my SELECT clause above. You can either set it to a higher value (like shown) or switch it off completely by setting it to 0.
IMHO, this is easier to do with cursor and loop.
may be there is a way to do the job with selfjoin
declare #t table (id int, val int)
insert into #t (id, val)
select 1 as id, 1 as val
union all select 1, 0
union all select 1, 0
union all select 1, 1
union all select 1, 1
union all select 1, 1
;with cte1 (id , val , num ) as
(
select id, val, row_number() over (ORDER BY (SELECT 1)) as num from #t
)
, cte2 (id, val, num, N) as
(
select id, val, num, 1 from cte1 where num = 1
union all
select t1.id, t1.val, t1.num,
case when t1.id=t2.id and t1.val=t2.val then t2.N + 1 else 1 end
from cte1 t1 inner join cte2 t2 on t1.num = t2.num + 1 where t1.num > 1
)
select * from cte2

Firebird select from table distinct one field

The question I asked yesterday was simplified but I realize that I have to report the whole story.
I have to extract the data of 4 from 4 different tables into a Firebird 2.5 database and the following query works:
SELECT
PRODUZIONE_T t.CODPRODUZIONE,
PRODUZIONE_T.NUMEROCOMMESSA as numeroco,
ANGCLIENTIFORNITORI.RAGIONESOCIALE1,
PRODUZIONE_T.DATACONSEGNA,
PRODUZIONE_T.REVISIONE,
ANGUTENTI.NOMINATIVO,
ORDINI.T_DATA,
FROM PRODUZIONE_T
LEFT OUTER JOIN ORDINI_T ON PRODUZIONE_T.CODORDINE=ORDINI_T.CODORDINE
INNER JOIN ANGCLIENTIFORNITORI ON ANGCLIENTIFORNITORI.CODCLIFOR=ORDINI_T.CODCLIFOR
LEFT OUTER JOIN ANGUTENTI ON ANGUTENTI.IDUTENTE = PRODUZIONE_T.RESPONSABILEUC
ORDER BY right(numeroco,2) DESC, left(numeroco,3) desc
rows 1 to 500;
However the query returns me double (or more) due to the REVISIONE column.
How do I select only the rows of a single NUMEROCOMMESSA with the maximum REVISIONE value?
This should work:
select COD, ORDER, S.DATE, REVISION
FROM TAB1
JOIN
(
select ORDER, MAX(REVISION) as REVISION
FROM TAB1
Group By ORDER
) m on m.ORDER = TAB1.ORDER and m.REVISION = TAB1.REVISION
Here you go - http://sqlfiddle.com/#!6/ce7cf/4
Sample Data (as u set it in your original question):
create table TAB1 (
cod integer primary key,
n_order varchar(10) not null,
s_date date not null,
revision integer not null );
alter table tab1 add constraint UQ1 unique (n_order,revision);
insert into TAB1 values ( 1, '001/18', '2018-02-01', 0 );
insert into TAB1 values ( 2, '002/18', '2018-01-31', 0 );
insert into TAB1 values ( 3, '002/18', '2018-01-30', 1 );
The query:
select *
from tab1 d
join ( select n_ORDER, MAX(REVISION) as REVISION
FROM TAB1
Group By n_ORDER ) m
on m.n_ORDER = d.n_ORDER and m.REVISION = d.REVISION
Suggestions:
Google and read the classic book: "Understanding SQL" by Martin Gruber
Read Firebird SQL reference: https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25.html
Here is yet one more solution using Windowed Functions introduced in Firebird 3 - http://sqlfiddle.com/#!6/ce7cf/13
I do not have Firebird 3 at hand, so can not actually check if there would not be some sudden incompatibility, do it at home :-D
SELECT * FROM
(
SELECT
TAB1.*,
ROW_NUMBER() OVER (
PARTITION BY n_order
ORDER BY revision DESC
) AS rank
FROM TAB1
) d
WHERE rank = 1
Read documentation
https://community.modeanalytics.com/sql/tutorial/sql-window-functions/
https://www.firebirdsql.org/file/documentation/release_notes/html/en/3_0/rnfb30-dml-windowfuncs.html
Which of the three (including Gordon's one) solution would be faster depends upon specific database - the real data, the existing indexes, the selectivity of indexes.
While window functions can make the join-less query, I am not sure it would be faster on real data, as it maybe can just ignore indexes on order+revision cortege and do the full-scan instead, before rank=1 condition applied. While the first solution would most probably use indexes to get maximums without actually reading every row in the table.
The Firebird-support mailing list suggested a way to break out of the loop, to only use a single query: The trick is using both windows functions and CTE (common table expression): http://sqlfiddle.com/#!18/ce7cf/2
WITH TMP AS (
SELECT
*,
MAX(revision) OVER (
PARTITION BY n_order
) as max_REV
FROM TAB1
)
SELECT * FROM TMP
WHERE revision = max_REV
If you want the max revision number in Firebird:
select t.*
from tab1 t
where t.revision = (select max(t2.revision) from tab1 t2 where t2.order = t.order);
For performance, you want an index on tab1(order, revision). With such an index, performance should be competitive with any other approach.

Linking together different columns in a SQL table?

I'm not very experienced with advance SQL queries, I'm familiar with basic statements and basic joins, currently trying to figure out how to write a query that seems to be out of my depth and I haven't been able to find a solution from google so far and I'm hoping somebody might be able to point me in the right direction.
The table I'm working with has an ID column, and a 'parent id' column.
I'm looking for all descendants of ID '1' - rows with a parent ID of '1', rows with a parent ID equal to any row's ID with a parent ID of '1' etc. Currently I've been doing this manually but there are hundreds of descendants so far and I feel like there's a way to put this into one query.
Any help would be appreciated, if this is unclear I can also try to clarify.
EDIT - I got it working with the following query:
with cteMappings as (
select map_id, parent_map_id, map_name
from admin_map
where map_id = '1'
union all
select a.map_id, a.parent_map_id, a.map_name
from admin_map a
inner join cteMappings m
on a.parent_map_id = m.map_id
)
select map_id, parent_map_id, map_name
from cteMappings
Sounds like it can be achieved by Common Table Expression:
DECLARE #temp TABLE (id INT IDENTITY(1, 1), parent_id INT);
INSERT #temp
SELECT NULL
UNION ALL
SELECT 1
UNION ALL
SELECT 2
UNION ALL
SELECT NULL
SELECT * FROM #temp
; WITH HierarchyTemp (id, parent_id, depth) AS (
SELECT id, parent_id, 0
FROM #temp
WHERE id = 1
UNION ALL
SELECT t.id, t.parent_id, ht.depth + 1
FROM #temp t
INNER JOIN HierarchyTemp ht ON ht.id = t.parent_id
)
SELECT *
FROM HierarchyTemp
So the above example is creating a table variable with 4 rows:
id parent_id
1 NULL
2 1
3 2
4 NULL
Result of descendent of id '1' (also including itself, but can be excluded with an additional WHERE clause):
id parent_id depth
1 NULL 0
2 1 1
3 2 2

How to get the complete row from a maximum calculation?

I do struggle with a GROUP BY -- again. The basics I can handle, but there it is: How do I get to different columns I named in the group by, without destroying my grouping? Note that group by is only my own idea, there may be others that work better. It must work in Oracle, though.
Here is my example:
create table xxgroups (
groupid int not null primary key,
groupname varchar2(10)
);
insert into xxgroups values(100, 'Group 100');
insert into xxgroups values(200, 'Group 200');
drop table xxdata;
create table xxdata (
num1 int,
num2 int,
state_a int,
state_b int,
groupid int,
foreign key (groupid) references xxgroups(groupid)
);
-- "ranks" are 90, 40, null, 70:
insert into xxdata values(10, 10, 1, 4, 100);
insert into xxdata values(10, 10, 0, 4, 200);
insert into xxdata values(11, 11, 0, 3, 100);
insert into xxdata values(20, 22, 5, 7, 200);
The task is to create a result-row for each distinct (num1, num2) and print that groupname with the highest calculated "rank" from state_a and state_b.
Note that the first two rows have the same nums and thus only the higher ranking should be selected -- with the groupname being "Group 200".
I got quite far with the basic group by, I think.
SELECT xd.num1||xd.num2 nummer, max(ranking.goodness)
FROM xxdata xd
, xxgroups xg
,( select state_a, state_b, r as goodness
from dual
model return updated rows
dimension by (0 state_a, 0 state_b) measures (0 r)
rules (r[1,4]=90, r[3,7]=80,r[5,7]=70, r[4,7]=60, r[0,7]=50, r[0,4]=40)
order by goodness desc
) ranking
WHERE xd.groupid=xg.groupid
and ranking.state_a (+) = xd.state_a
and ranking.state_b (+) = xd.state_b
GROUP BY xd.num1||xd.num2
ORDER BY nummer
;
The result is 90% of what I need:
nummer ranking
----------------
1010 90
1111
2022 70
100% perfect would be
nummer groupname
-------------------
1010 Group 100
1111 Group 100
2022 Group 200
The tricky part is, that I want the groupname in the result. And I can not include it in the select, because then I would have to put it into the group by as well -- which I do not want (then I would not select the best ranking entry from over all groups)
In my solution a use a model table to calculate the "rank". There are other solution I am sure. The point is, that it is a non-trivial calculation that I do not want to do twice.
I know from other examples that one could use a second query to get back to the original row to get to the groupname, but I can not see how I could to this here,
without duplicating my ranking calculation.
A nice suggestion was to replace the group by with a LIMIT 1/ORDER BY goodness and use this calculating select as a filtering subselect. But a) there is no LIMIT in Oracle, and I doubt a rownum<=1 would do in a subselect and b) I can not wrap my brain around it anyway. Maybe there is a way?
You can use the FIRST aggregation modifier to selectively apply your function over a subset of rows of a group -- here a single row (SQLFiddle demo):
SELECT xd.num1||xd.num2 nummer,
MAX(xg.groupname) KEEP (DENSE_RANK FIRST
ORDER BY ranking.goodness DESC) grp,
max(ranking.goodness)
FROM xxdata xd
, xxgroups xg
,( select state_a, state_b, r as goodness
from dual
model return updated rows
dimension by (0 state_a, 0 state_b) measures (0 r)
rules (r[1,4]=90, r[3,7]=80,r[5,7]=70, r[4,7]=60, r[0,7]=50, r[0,4]=40)
order by goodness desc
) ranking
WHERE xd.groupid=xg.groupid
and ranking.state_a (+) = xd.state_a
and ranking.state_b (+) = xd.state_b
GROUP BY xd.num1||xd.num2
ORDER BY nummer;
Your method with analytics works as well but since we already use aggregations here, we may as well use the FIRST modifier to get all columns in one go.
Whow, I did search before, but now I found this answer, which I could adopt to my question. The Oracle-solution here is over, partition by with order by and row_number():
select *
from ( select data.*, row_number()
over (partition by nummer order by goodness desc) as seqnum
from (
SELECT xd.num1, xd.num2 nummer, xg.groupname, ranking.goodness
FROM xxdata xd
, xxgroups xg
,( select state_a, state_b, r as goodness
from dual
model return updated rows
dimension by (0 state_a, 0 state_b) measures (0 r)
rules (r[1,4]=90, r[3,7]=80,r[5,7]=70, r[4,7]=60, r[0,7]=50, r[0,4]=40)
) ranking
WHERE xd.groupid=xg.groupid
and ranking.state_a (+) = xd.state_a
and ranking.state_b (+) = xd.state_b
ORDER BY nummer
) data )
where seqnum = 1
;
The result is
10 10 Group 100 90 1
11 11 Group 100 1
20 22 Group 200 70 1
which is beautiful.
Now I have to try to understand what over in the select excactly does....