how to use SQL group to filter rows with maximum date value - sql

I have the following table
CREATE TABLE Test
(`Id` int, `value` varchar(20), `adate` varchar(20))
;
INSERT INTO Test
(`Id`, `value`, `adate`)
VALUES
(1, 100, '2014-01-01'),
(1, 200, '2014-01-02'),
(1, 300, '2014-01-03'),
(2, 200, '2014-01-01'),
(2, 400, '2014-01-02'),
(2, 30 , '2014-01-04'),
(3, 800, '2014-01-01'),
(3, 300, '2014-01-02'),
(3, 60 , '2014-01-04')
;
I want to achieve the result which selects only Id having max value of date. ie
Id ,value ,adate
1, 300,'2014-01-03'
2, 30 ,'2014-01-04'
3, 60 ,'2014-01-04'
how can I achieve this using group by? I have done as follows but it is not working.
Select Id,value,adate
from Test
group by Id,value,adate
having adate = MAX(adate)
Can someone help with the query?

Select the maximum dates for each id.
select id, max(adate) max_date
from test
group by id
Join on that to get the rest of the columns.
select t1.*
from test t1
inner join (select id, max(adate) max_date
from test
group by id) t2
on t1.id = t2.id and t1.adate = t2.max_date;

Please try:
select
*
from
tbl a
where
a.adate=(select MAX(adate) from tbl b where b.Id=a.Id)

If you are using a DBMS that has analytical functions you can use ROW_NUMBER:
SELECT Id, Value, ADate
FROM ( SELECT ID,
Value,
ADate,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Adate DESC) AS RowNum
FROM Test
) AS T
WHERE RowNum = 1;
Otherwise you will need to use a join to the aggregated max date by Id to filter the results from Test to only those where the date matches the maximum date for that Id
SELECT Test.Id, Test.Value, Test.ADate
FROM Test
INNER JOIN
( SELECT ID, MAX(ADate) AS ADate
FROM Test
GROUP BY ID
) AS MaxT
ON MaxT.ID = Test.ID
AND MaxT.ADate = Test.ADate;

I would try something like this
Select t1.Id, t1.value, t1.adate
from Test as t1
where t1.adate = (select max(t2.adate)
from Test as t2
where t2.id = t1.id)

Related

Get 'most recent' grouped record (with order by)

I have a query like the below
SELECT
t1.Supplier,
t2.Product,
FROM
t1
INNER JOIN
t2 ON t1.ProductCode = t2.ProductCode
GROUP BY
t1.Supplier, t2.Product
On table t1, there are also columns called 'Timestamp' and 'Price' - I want to get the most recent price, i.e. SELECT Price ORDER BY Timestamp DESC. Can I do this with any aggregate functions, or would it have to be a subquery?
One standard way of doing this is to use ROW_NUMBER() to create an additional column in the source data, allowing you to identify which row is "first" within each "partition".
WITH
supplier_sorted AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY supplier, ProductCode
ORDER BY timestamp DESC
)
AS recency_id
FROM
supplier
)
SELECT
s.Supplier,
p.Product,
COUNT(*)
FROM
supplier_sorted AS s
INNER JOIN
product AS p
ON s.ProductCode = p.ProductCode
WHERE
s.recency_id = 1
GROUP BY
s.Supplier,
p.Product
You can use cross apply:
SELECT t2.*, t1.*
FROM t2 CROSS APPLY
(SELECT TOP (1) t1.*
FROM t1
WHERE t1.ProductCode = t2.ProductCode
ORDER BY t1.TimeStamp DESC
) t1;
So, GROUP BY is not necessary.
Can use the row_number() over the partiton of ProductCode and Supplier to by using Timestamp Order by desc to get the latest record by based on the partition. Then you can use in the same query without aggregation to get the desired result.
It is good to use Windows functions rather than Group by for these questions.
SELECT
A.Supplier
,A.Product
,A.Price
FROM
(
SELECT
t1.Supplier,
t2.Product,
T1.Price,
ROW_NUMBER () OVER ( PARTITION BY t1.Supplier,t2.Product ORDER BY T1.[Timestamp] DESC ) AS row_num
FROM t1
INNER JOIN t2
ON t1.ProductCode = t2.ProductCode
) AS A WHERE A.row_num = 1
Tested using below added data.
CREATE TABLE t1
( Supplier varchar(100)
,ProductCode int
, Price Decimal (10,2)
, [TimeStamp] datetime
)
CREATE TABLE t2
(
ProductCode int
,Product varchar(100)
)
insert into t1 values ('A', 1, 100.00, GetDate())
insert into t1 values ('A', 1, 80.00, GetDate())
insert into t1 values ('b', 2, 190.00, GetDate())
insert into t1 values ('b', 2, 500.00, GetDate())
insert into t2 values (1, 'Pro1')
insert into t2 values (2, 'Pro2')
insert into t2 values (3, 'Pro3')

Count rows from another table matching 2 ids and after a date in Postgresql

I have 2 tables in Postgresql 13 with the following sample structure:
table1
-------
client_id
member_id
email_count
last_date
table2
-------
client_id
member_id
created_at
Im trying to update the email_count column for each record in table1 with a count of rows from table2 where the client_id, member_id match and the created_at date is > than the last_date column.
Ive tried multiple approaches but cant seem to get the right combination. My latest approach using a CTE looks like:
with counted as (
select t.client_id,
t.member_id,
t.last_date,
(select count(*)
from table2 t2
where t2.client_id = t.client_id
and t2.member_id = t.member_id
and t2.created_at > t.last_engagement_date
) as count
from (
select t1.client_id,
t1.member_id,
t1.last_date
from table1 t1
) t
)
update table1
set email_count = counted.count
where table1.client_id = counted.client_id
and table1.member_id = counted.member_id;
But all of the counts are coming up as zero. Ive verified the data and should be getting counts as high as 200 in some cases.
Thanks in advance for any assistance!
EDIT
Example structure with additional data from first answer:
create table table1 (
client_id int,
member_id int,
email_count int,
last_date date
);
create table table2 (
client_id int,
member_id int,
created_at date
);
insert into table1
values (1, 1, null, '2021-06-01')
,(2, 3, null, '2021-05-01')
,(2, 4, null, '2021-04-01');
insert into table2
values (1, 1, '2021-05-01')
,(1, 1, '2021-07-01')
,(2, 3, '2021-06-01')
,(2, 3, '2021-07-01')
,(2, 4, '2021-04-01')
,(2, 4, '2021-05-01')
,(2, 4, '2021-06-01')
,(2, 4, '2021-07-01');
From this data Im expecting to get the following results in the email_count field:
client_id|member_id|email_count
1 | 1 | 1
2 | 3 | 2
2 | 4 | 3
Your code throws errors on dbfiddle.
This would be easier with a working fiddle to start with, including sample data.
create table table1 (
client_id int,
member_id int,
email_count int,
last_date date
);
create table table2 (
client_id int,
member_id int,
created_at date
);
insert into table1
values (1, 1, null, '2021-06-01');
insert into table2
values (1, 1, '2021-05-01')
, (1, 1, '2021-07-01');
update table1
set email_count = t2.email_count
from (
select count(table1.*) email_count
from table1
inner join table2 on table2.client_id = table1.client_id
and table2.member_id = table1.member_id
and table2.created_at > table1.last_date
) t2
;
select *
from table1
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=71d887b17cd91793a719786ed829b58d
You can use the following code to update in postgresql.
UPDATE
table1
SET
email_count = result.count
FROM
(select t.client_id,
t.member_id,
t.email_count,
t.last_date,
(select count(*)
from table2 t2
where t2.client_id = t.client_id
and t2.member_id = t.member_id
and t2.created_at > t.last_date) as count
from (
select t1.client_id,
t1.member_id,
t1.email_count,
t1.last_date
from table1 t1) t) result
where table1.client_id = result.client_id and table1.member_id = result.member_id
Use the same name cte to update with cte because it is in the source table in sql
with counted as (
select t.client_id,
t.member_id,
t.email_count,
t.last_date,
(select count(*)
from table2 t2
where t2.client_id = t.client_id
and t2.member_id = t.member_id
and t2.created_at > t.last_date) as count
from (
select t1.client_id,
t1.member_id,
t1.email_count,
t1.last_date
from table1 t1
) t)
update counted
set email_count = counted.count;
If this is so easy to query...
select table2.client_id
, table2.member_id
, count(table2.*) email_count
from table1
inner join table2 on table2.client_id = table1.client_id
and table2.member_id = table1.member_id
and table2.created_at > table1.last_date
group by table2.client_id
, table2.member_id
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=a0df6cef6916a321d97669fd4d0cec63
...why would you need to persist the value in email_count? That's a form of duplication that, if there isn't a reason to persist the data to improve system performance, you may not want.

Selecting minimal dates, or nulls in SQL

This is grossly oversimplified, but:
I have a table, something like the following:
CREATE TABLE Table1
([ID] int, [USER] varchar(5), [DATE] date)
;
INSERT INTO Table1
([ID], [USER], [DATE])
VALUES
(1, 'A', '2018-10-01'),
(2, 'A', '2018-09-01'),
(3, 'A', NULL),
(4, 'B', '2018-05-03'),
(5, 'B', '2017-04-01'),
(6, 'C', NULL)
;
And for each user, I wish to retrieve the whole row of details where the DATE variable is minimal.
SELECT T.USER FROM TABLE1 T
WHERE T.DATE = (SELECT MIN(DATE) FROM TABLE1 T1 WHERE T1.USER = T.USER)
Works great, however in the instance there is no row with a populated DATE field, there will be a row with a NULL, like the final row of my table above, which I also wish to select.
So my ideal output in this case is:
(2, 'A', '2018-09-01'),
(5, 'B', '2017-04-01'),
(6, 'C', NULL)
SQL fiddle: http://www.sqlfiddle.com/#!9/df42b5/6
I think something could be done using an EXCLUDE statement but it gets complex very quickly.
You may try with row_number()
demo
select * from
(select *, row_number() over(partition by [user] order by [user],case when
[date] is null then 0 else 1 end desc,[date]) as rn
from Table1)x where rn=1
use union and and co-related sub-query with min() function
CREATE TABLE Table1 (ID int, usr varchar(50), DATE1 date)
;
INSERT INTO Table1 VALUES
(1, 'A', '2018-10-01'),
(2, 'A', '2018-09-01'),
(3, 'A', NULL),
(4, 'B', '2018-05-03'),
(5, 'B', '2017-04-01'),
(6, 'C', NULL)
;
select * from Table1 t where
DATE1= (select min(date1) from Table1 t1 where t1.usr=t.usr
) and date1 is not null
union
select * from Table1 t where date1 is null
and t.usr not in ( select usr from Table1 where date1 is not null)
DEMO
ID usr DATE1
2 A 01/09/2018 00:00:00
5 B 01/04/2017 00:00:00
6 C
You can use GROUP BY and JOIN to output the desired results.
select t.Id
, x.[User]
, x.[MinDate] as [Date]
from
(select [User]
, min([Date]) as MinDate
from table1
group by [User]) x
inner join table1 t on t.[User] = x.[User] and (t.[Date] = x.[MinDate] or x.[MinDate] is null)
You can use a Common Table Expression:
;WITH chronology AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY [USER]
ORDER BY ISNULL([DATE], '2900-01-01') ASC
) Idx
FROM TABLE1
)
SELECT ID, [USER], [DATE]
FROM chronology
WHERE Idx=1;
Using a CTE in this solution simplifies the query improving its readability, maintainability and extensibility. Furthermore, I expect this approach to be optimal in terms of performance.

SQL DISTINCT id when SUM weight in sub query

I have two types of sub query's in the statement.
First of all some sample data.
Table
CAT ID Weight GROUP
1 1 200 A
1 2 300 B
1 3 250 B
1 1 200 A
1 4 200 A
One sub query is a count of distinct IDs which works as expected.
( SELECT COUNT (distinct t1.ID)
FROM table t1
WHERE t1.group = 'A'
GROUP BY t1.cat)
AS [count],
The other sub query is a sum of the weight
( SELECT SUM(t1.weight)
FROM table t1
WHERE t1.group = 'A'
GROUP BY t1.cat)
AS [weight],
This doesn't give me what i need as it will total 600 when I want it to total 400 as i want only to use unique ID's as the first query does.
However by adding distinct...
( SELECT SUM(DISTINCT t1.weight)
FROM table t1
WHERE t1.group = 'A'
GROUP BY t1.cat)
AS [weight],
This only returns 200 as it is using distinct weight, what i want is it to use distinct ID in this, but how can i do this while still only selecting the weight?
Something like (logically speaking as this doesn't work)
( SELECT SUM(t1.weight)
FROM table t1
WHERE t1.group = 'A'
AND t1.ID IS DISTINCT
GROUP BY t1.cat)
AS [weight],
SELECT cat,SUM(weight) AS [weight] FROM
(SELECT *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) as rn
FROM table ) as tbl
WHERE [group] = 'A' AND rn=1
GROUP BY cat
I might be missing something, as I've looked at your sample data and what I believe is your desired output, but can you not just do a simple GROUP BY and SUM:
CREATE TABLE SampleData
([CAT] int, [ID] int, [Weight] int, [GROUP] varchar(1))
;
INSERT INTO SampleData
([CAT], [ID], [Weight], [GROUP])
VALUES
(1, 1, 200, 'A'),
(1, 2, 300, 'B'),
(1, 3, 250, 'B'),
(1, 1, 200, 'A'),
(1, 4, 200, 'A')
;
SELECT ID, COUNT(ID) AS [Counter], SUM(Weight) AS SumWeight
FROM SampleData
WHERE [GROUP] = 'A'
GROUP BY ID
To produce:
ID Counter SumWeight
1 2 400
4 1 200

How to get the closest dates in Oracle sql

For example, I have 2 time tables:
T1
id time
1 18:12:02
2 18:46:57
3 17:49:44
4 12:19:24
5 11:00:01
6 17:12:45
and T2
id time
1 18:13:02
2 17:46:57
I need to get time from T1 that are the closest to time from T2. There is no relationship between this tables.
It should be something like this:
select T1.calldatetime
from T1, T2
where T1.calldatetime between
T2.calldatetime-(
select MIN(ABS(T2.calldatetime-T1.calldatetime))
from T2, T1)
and
T2.calldatetime+(
select MIN(ABS(T2.calldatetime-T1.calldatetime))
from T2, T1)
But I can't get it. Any suggestions?
You only have to use a single Cartesian join to solve you problem unlike the other solutions, which use multiple. I assume time is stored as a VARCHAR2. If it is stored as a date then you can remove the TO_DATE functions. If it is stored as a date (I would highly recommend this), you will have to remove the date portions
I've made it slightly verbose so it's obvious what's going on.
select *
from ( select id, tm
, rank() over ( partition by t2id order by difference asc ) as rnk
from ( select t1.*, t2.id as t2id
, abs( to_date(t1.tm, 'hh24:mi:ss')
- to_date(t2.tm, 'hh24:mi:ss')) as difference
from t1
cross join t2
) a
)
where rnk = 1
Basically, this works out the absolute difference between every time in T1 and T2 then picks the smallest difference by T2 ID; returning the data from T1.
Here it is in SQL Fiddle format.
The less pretty (but shorter) format is:
select *
from ( select t1.*
, rank() over ( partition by t2.id
order by abs(to_date(t1.tm, 'hh24:mi:ss')
- to_date(t2.tm, 'hh24:mi:ss'))
) as rnk
from t1
cross join t2
) a
where rnk = 1
I believe this is the query you are looking for:
CREATE TABLE t1(id INTEGER, time DATE);
CREATE TABLE t2(id INTEGER, time DATE);
INSERT INTO t1 VALUES (1, TO_DATE ('18:12:02', 'HH24:MI:SS'));
INSERT INTO t1 VALUES (2, TO_DATE ('18:46:57', 'HH24:MI:SS'));
INSERT INTO t1 VALUES (3, TO_DATE ('17:49:44', 'HH24:MI:SS'));
INSERT INTO t1 VALUES (4, TO_DATE ('12:19:24', 'HH24:MI:SS'));
INSERT INTO t1 VALUES (5, TO_DATE ('11:00:01', 'HH24:MI:SS'));
INSERT INTO t1 VALUES (6, TO_DATE ('17:12:45', 'HH24:MI:SS'));
INSERT INTO t2 VALUES (1, TO_DATE ('18:13:02', 'HH24:MI:SS'));
INSERT INTO t2 VALUES (2, TO_DATE ('17:46:57', 'HH24:MI:SS'));
SELECT t1.*, t2.*
FROM t1, t2,
( SELECT t2.id, MIN (ABS (t2.time - t1.time)) diff
FROM t1, t2
GROUP BY t2.id) b
WHERE ABS (t2.time - t1.time) = b.diff;
Make sure that the time columns have the same date part, because the t2.time - t1.time part won't work otherwise.
EDIT: Thanks for the accept, but Ben's answer below is better. It uses Oracle analytic functions and will perform much better.
This one here selects that row(s) from T1, which has/have the smallest distance to any in T2:
select T1.id, T1.calldatetime from T1, T2
where ABS(T2.calldatetime-T1.calldatetime)
=( select MIN(ABS(T2.calldatetime-T1.calldatetime))from T1, T2);
(tested it with mysql, hope you dont get an ORA from that)
Edit: according to the last comment, it should be like that:
drop table t1;
drop table t2;
create table t1(id int, t time);
create table t2(id int, t time);
insert into t1 values (1, '18:12:02');
insert into t1 values (2, '18:46:57');
insert into t1 values (3, '17:49:44');
insert into t1 values (4, '12:19:24');
insert into t1 values (5, '11:00:01');
insert into t1 values (6, '17:12:45');
insert into t2 values (1, '18:13:02');
insert into t2 values (2, '17:46:57');
select ot2.id, ot2.t, ot1.id, ot1.t from t2 ot2, t1 ot1
where ABS(ot2.t-ot1.t)=
(select min(abs(t2.t-t1.t)) from t1, t2 where t2.id=ot2.id)
Produces:
id t id t
1 18:13:02 1 18:12:02
2 17:46:57 3 17:49:44
Another one way of using analytic functions.
May be strange :)
select id, time,
case
when to_date(time, 'hh24:mi:ss') - to_date(lag_time, 'hh24:mi:ss') < to_date(lead_time, 'hh24:mi:ss') - to_date(time, 'hh24:mi:ss')
then lag_time
else lead_time
end closest_time
from (
select id, tbl,
LAG(time, 1, null) OVER (ORDER BY time) lag_time,
time,
LEAD(time, 1, null) OVER (ORDER BY time) lead_time
from
(
select id, time, 1 tbl from t1
union all
select id, time, 2 tbl from t2
)
)
where tbl = 2
To SQLFiddle... and beyond!
Try this query its little lengthy, I will try to optimize it
select * from t1
where id in (
select id1 from
(select id1,id2,
rank() over (partition by id2 order by diff) rnk
from
(select distinct t1.id id1,t2.id id2,
round(min(abs(to_date(t1.time,'HH24:MI:SS') - to_date(t2.time,'HH24:MI:SS'))),2) diff
from
t1,t2
group by t1.id,t2.id) )
where rnk = 1);