I have a table that looks like this
ID Type Change_Date
1 t1 2015-10-08
1 t2 2016-01-03
1 t3 2016-03-07
2 t1 2017-12-13
2 t2 2018-02-01
It shows if a customer has changed account type and when. However, I'd like a query that can give me the follow output
ID Type Change_Date
1 t1 2015-10
1 t1 2015-11
1 t1 2015-12
1 t2 2016-01
1 t2 2016-02
1 t3 2016-03
1 t3 2016-04
... ... ...
1 t3 2018-10
for each ID. The output shows what account type the customer had for each month until the current month. My problem is filling in the "empty" months. In some cases the interval between account changes can be more than a year.
I hope this makes sense.
Thanks in advance.
Base on Presto SQL(because your origin question is about Presto/SQL)
Update in 2018-11-01: use lead() to simplify SQL
Prepare data
Table mytable same as yours
id type update_date
1 t1 2015-10-08
1 t2 2016-01-03
1 t3 2016-03-07
2 t1 2017-12-13
2 t2 2018-02-01
Table t_month is a dictionary table which has all month data from 2015-01 to 2019-12. This kind of dictionary tables are useful.
ym
2015-01
2015-02
2015-03
2015-04
2015-05
2015-06
2015-07
2015-08
2015-09
...
2019-12
Add lifespan for mytable
Normally, your should 'manage' your data like their lifespan. So mytable should like
id type start_date end_date
1 t1 2015-10-08 2016-01-03
1 t2 2016-01-03 2016-03-07
1 t3 2016-03-07 null
2 t1 2017-12-13 2018-02-01
2 t2 2018-02-01 null
But in this case, you don't. So next step is 'create' one. Use lead() window function.
select
id,
type,
date_format(update_date, '%Y-%m') as start_month,
lead(
date_format(update_date, '%Y-%m'),
1, -- next one
date_format(current_date+interval '1' month, '%Y-%m') -- if null return next month
) over(partition by id order by update_date) as end_month
from mytable
Output
id type start_month end_month
1 t1 2015-10 2016-01
1 t2 2016-01 2016-03
1 t3 2016-03 2018-11
2 t1 2017-12 2018-02
2 t2 2018-02 2018-11
Cross join id and month
It's simple
with id_month as (
select * from t_month
cross join (select distinct id from mytable)
)
select * from id_month
Output
ym id
2015-01 1
2015-02 1
2015-03 1
...
2019-12 1
2015-01 2
2015-02 2
2015-03 2
...
2019-12 2
Finally
Now, you can use subquery in select clause
select
id,
type,
ym
from (
select
t1.id,
t1.ym,
(select type from mytable2 where t1.id = id and t1.ym >= start_month and t1.ym < end_month) as type
from id_month t1
)
where type is not null
-- order by id, ym
Full sql
with mytable2 as (
select
id,
type,
date_format(update_date, '%Y-%m') as start_month,
lead(
date_format(update_date, '%Y-%m'),
1, -- next one
date_format(current_date+interval '1' month, '%Y-%m') -- if null return next month
) over(partition by id order by update_date) as end_month
from mytable
)
, id_month as (
select * from t_month
cross join (select distinct id from mytable)
)
select
id,
type,
ym
from (
select
t1.id,
t1.ym,
(select type from mytable2 where t1.id = id and t1.ym >= start_month and t1.ym < end_month) as type
from id_month t1
)
where type is not null
--order by id, ym
Output
id type ym
1 t1 2015-10
1 t1 2015-11
1 t1 2015-12
1 t2 2016-01
1 t2 2016-02
1 t3 2016-03
1 t3 2016-04
...
1 t3 2018-10
2 t1 2017-12
2 t1 2018-01
2 t2 2018-02
...
2 t2 2018-10
Related
I have a table with messages and I need to find chats where were two or more messages in period of 10 seconds.
table
id message_id time
1 1 13:09:00
1 2 13:09:01
1 3 13:09:50
2 1 15:18:00
2 2 15:20:00
3 1 15:00:00
3 2 15:10:00
3 3 15:10:10
So the result looks like
id
1
3
I can't come up with the idea how to group by a period or maybe it can be done other way?
select id
from t
group by id, ?
having count(message_id) > 1
You can use a self-join:
with add_ymd(id, mid, dt) as (
select id, message_id, (date(now())||' '||time)::timestamp from messages
),
tm_counts as (
select t2.id, max(t2.n) + 1 tm from (
select t.id, t.mid, sum(case when extract(epoch from t.dt - t1.dt) <= 10 then 1 end) n
from add_ymd t join add_ymd t1 on t.dt > t1.dt group by t.id, t.mid) t2
where t2.n is not null group by t2.id
)
select id from tm_counts where tm > 1
Output:
id
---
1
3
Let say I have the following two tables :
Table 1:
ID log_time
1 2013-10-12
1 2014-11-15
2 2013-12-21
2 2016-12-21
3 2015-09-21
3 2018-03-21
Table 2:
ID log_time
1 2011-10-12
1 2012-11-15
2 2012-12-21
2 2017-12-21
3 2014-09-21
3 2019-03-21
I want to get rows of Table 2 which are below min(log_time) of Table1 for each ID.
The result should be like this:
ID log_time
1 2011-10-12
1 2012-11-15
2 2012-12-21
3 2015-09-21
This is join and aggregation:
select t2.*
from table2 t2 join
(select t1.id, min(t1.log_time) as min_log_time
from table1 t1
group by t1.id
) t1
on t2.id = t.id and t2.timestamp < t1.timestamp;
You can also express this as a correlated subquery:
select t2.*
from table2 t2
where t2.log_time < (select min(t1.log_time) from t1 where t1.id = t2.id);
Note that both of these formulations will return no rows for ids missing from table1 (which is quite consistent with your question).
I have the below table.
I want to exclude rows where the start_cycle date is >than the date where the 'source' column = END_DATE. So for this example, removing any rows where the start_cycle date is > than 2/11/2019
END_DATE could be different for each ID
ID START_CYCLE END_CYCLE SOURCE
1 1/20/2019 2/1/2019 START
1 2/2/2019 2/2/2019 START_BRA
1 2/3/2019 2/5/2019 ASSGN
1 2/6/2019 2/10/2019 CUST_START
1 2/11/2019 2/12/2019 ASSGN
1 2/11/2019 12/31/2999 END_DATE
1 1/1/3000 2/12/2019 END_DATE_BRA
For this example, expected results would be: (Removing the last row)
ID START_CYCLE END_CYCLE SOURCE
1 1/20/2019 2/1/2019 START
1 2/2/2019 2/2/2019 START_BRA
1 2/3/2019 2/5/2019 ASSGN
1 2/6/2019 2/10/2019 CUST_START
1 2/11/2019 2/12/2019 ASSGN
1 2/11/2019 12/31/2999 END_DATE
You can do it without a join, assuming that there is only 1 row for each id with source = 'END_DATE':
select * from tablename t
where start_cycle <= (select start_cycle from tablename where id = t.id and source = 'END_DATE')
You can do that with a CTE. First you will query for the START_CYCLE for all ID with SOURCE = 'END_DATE'. Afterwards you will join this Result:
WITH id_end_date as
(
SELECT id, start_cycle
FROM table1
WHERE source = 'END_DATE'
)
SELECT to.*
FROM table1 to
INNER JOIN id_end_date
ON to.id = id_end_date.id
WHERE to.start_cycle > id_end_date.start_cycle
;
Below would be the query, Assuming there would be multiple ID's in the table.
select from <tableName> t1 inner join (select * from <tableName> where source='END_DATE') t2
on t1.id=t2.id and t1.start_cycle > t2.start_cycle;
I have 2 tables. If table 1 has dates greater than table 2 only those record has should be populated in Output.
Table 1:
ID Category Date
1 A 3/2/1990
1 A 3/5/2013
1 C 4/3/1979
2 D 4/3/1970
2 D 5/6/2016
3 E 8/8/2016
Table 2:
ID Category Date
1 A 3/2/1990
1 C 4/3/1979
1 C 4/3/1982
1 D 4/3/1982
2 D 5/6/2016
The expected Output is
ID Category Date
1 A 3/5/2013
3 E 8/8/2016
I tried the below query and its giving me incorrect results.
select a.id,a.category,a,Date from table1 a where
a.Date > (select Max(b.Date) from table2 b where a.id=b.id and a.category =b.catgory group by b.id,b.category)
SQL Fiddle Demo
WITH cte AS (
SELECT ID, Category, MAX(Date) as mdate
FROM Table2
GROUP BY ID, Category
)
SELECT T1.* --, T2.*
FROM Table1 as T1
LEFT JOIN cte as T2
ON T1.ID = T2.ID
AND T1.Category = T2.Category
WHERE T1.Date > T2.mdate
OR T2.mdate is NULL
OUTPUT
SELECT T1.*
FROM Table1 AS T1 INNER JOIN Table2 AS T2
ON T1.ID = T2.ID
WHERE T1.Date > T2.mdate;
As per the required output, you need to use left outer join
SELECT T1.*
FROM table1 T1
LEFT OUTER JOIN (
SELECT ID
,category
,MAX(Date) mdate
FROM Table2
GROUP BY ID
,category
) T2 ON (
T1.ID = T2.ID
AND T1.category = T2.category
)
WHERE T1.date > nvl(T2.mdate, '01/01/1900');
Filtering Table2:
SELECT ID, Category,MAX(Date) as Date
FROM Table2
GROUP BY ID,Category;
| ID | Category | Date |
|----|----------|-------------------------|
| 1 | A | March, 02 1990 00:00:00 |
| 1 | C | April, 03 1982 00:00:00 |
| 1 | D | April, 03 1982 00:00:00 |
| 2 | D | May, 06 2016 00:00:00 |
Now using this to create a left join with Table1:
SELECT t1.*
FROM Table1 t1 LEFT JOIN
(SELECT ID, Category,MAX(Date) as Date
FROM Table2
GROUP BY ID,Category) AS t2part
ON t1.ID = t2part.ID
AND t1.Category = t2part.Category
WHERE t1.Date > t2part.Date;
| ID | Category | Date |
|----|----------|-------------------------|
| 1 | A | March, 05 2013 00:00:00 |
Please note that the row with ID=3, category=E wasn't found due to not matching neither ID or Category in the JOIN.
As good practice if the entities should interact there must be some sort of normalization applied so we could make best use of joins through indexes.
fiddle with your provided data and queries.
Hi i have an issue on handling some data on SQL, and returning some values by the nearest date. I have two Tables:
Table 1
ID Content Date
--------------------------------------------
123 X 2013-11-18
123 ZE 2013-11-29
233 YX 2013-12-30
233 XX 2013-12-28
444 Z 2014-02-24
Table 2
ID Value Validation Date
--------------------------------------------
123 0.54 2013-11-11
123 0.42 2013-11-18
123 0.32 2013-11-27
233 1.2 2013-12-4
233 1.1 2013-12-28
233 1.0 2013-12-29
444 4 2014-02-11
444 3 2014-02-15
444 2 2014-02-23
The output that i pretend is something like:
ID Content Date Value Validation Date
------------------------------------------------------------------------
123 X 2013-11-18 0.42 2013-11-18
123 ZE 2013-11-29 0.32 2013-11-27
233 YX 2013-12-30 1.0 2013-12-29
233 XX 2013-12-28 1.1 2013-12-28
444 Z 2014-02-24 2 2014-02-23
So i would like to return back the value where the validation date is the nearest to the date (where the validation date has to be always smaller than the date). Can you please help me? The ID in table 1 and 2 is not unique.
You can use the following query:
SELECT ID, Content, [Date], Value, [Validation Date]
FROM (
SELECT t1.ID, Content, [Date], Value, [Validation Date],
ROW_NUMBER() OVER (PARTITION BY t1.ID, Content
ORDER BY DATEDIFF(d, [Validation Date], [Date])) AS rn
FROM Table1 AS t1
INNER JOIN Table2 AS t2 ON t1.ID = t2.ID AND [Validation Date] <= [Date]
) t
WHERE t.rn = 1
ROW_NUMBER() is used to track the record with the smallest [Date] -[Validation Date] difference per (ID, Content) pair of values.
try this :
SELECT a.id,
a.content,
a.date,
b.valu,
b.validationdate
FROM (select tt.id,
tt.content,
tt.date,
row_number() over(partition by tt.id order by tt.date desc) rn
from table1 tt) a
JOIN (select t.id,
t.content,
t.date,
t.valu,
t.validationdate,
row_number() over(partition by t.id order by t.validationdate desc) rn
from table2 t) b
on a.id=b.id and a.rn=b.rn
I think the only way to do this is correlation. Something like that.
SELECT a.id, a.content, a.date,
(SELECT TOP 1 b.value, b.validate
FROM table2 b
WHERE b.id=a.id
ORDER BY b.validate DESC) from table1 a
I think the best approach is to use outer apply:
select t1.id, t1.content, t1.date, t2.value, t2.validdate
from table1 t1 outer apply
(SELECT TOP 1 t2.value, t2.validdate
FROM table2 t2
WHERE t2.id = t1.id
ORDER BY t2.validdate DESC
) t2;