Join status by dates in bigquery - sql

I have Table A
SnapshotDat
Invoice ID
2022-09-11
1111
2022-09-12
1111
2022-09-13
1111
2022-09-14
1111
2022-09-15
1111
2022-09-16
1111
2022-09-17
1111
2022-09-18
1111
2022-09-19
1111
2022-09-20
1111
2022-09-21
1111
2022-09-22
1111
2022-09-23
1111
2022-09-24
1111
2022-09-25
1111
Table B
Date
Invoice ID
Status
2022-09-11
1111
draft
2022-09-15
1111
outstanding
2022-09-20
1111
pending
2022-09-24
1111
paid
And I want to establish a join by Invoice ID and Dates, to have this table result
SnapshotDat
Invoice ID
Status
2022-09-11
1111
draft
2022-09-12
1111
draft
2022-09-13
1111
draft
2022-09-14
1111
draft
2022-09-15
1111
outstanding
2022-09-16
1111
outstanding
2022-09-17
1111
outstanding
2022-09-18
1111
outstanding
2022-09-19
1111
outstanding
2022-09-20
1111
pending
2022-09-21
1111
pending
2022-09-22
1111
pending
2022-09-23
1111
pending
2022-09-24
1111
paid
2022-09-25
1111
paid
Here's what I've tried:
SELECT a.SnapshotDate, a.invoiceid, b.status
FROM a
LEFT JOIN b ON a.invoiceid = b.invoiceid
AND a.SnapshotDate<=b.Date

Consider below approach (assuming your dates related columns are actually of date data type
select a.*, status
from TableA as a
join (
select *,
ifnull(-1 + lead(date) over(partition by invoiceId order by date), current_date()) lastDate
from TableB
) as b
on a.invoiceId = b.invoiceId
and snapshotDate between date and lastDate
if applied to sample data in your question - output is

Related

How can I capture two time periods for a single status?

I have 3 columns like below:
ID
Status
Date
001
T
2022-09-27
001
T
2022-09-26
001
T
2022-09-25
001
T
2022-09-24
001
T
2022-09-23
001
T
2022-09-22
001
T
2022-09-21
001
R
2022-09-20
001
R
2022-09-19
001
R
2022-09-18
001
R
2022-09-17
001
R
2022-09-16
001
T
2022-09-15
001
T
2022-09-14
001
T
2022-09-13
001
T
2022-09-12
001
T
2022-09-11
001
T
2022-09-10
001
T
2022-09-09
001
T
2022-09-08
001
T
2022-09-07
001
T
2022-09-06
How can I generate an output like this in Snowflake?
ID
Status
Start_date
End_date
001
T
2022-09-21
2022-09-27
001
T
2022-09-06
2022-09-15
It is "gaps and islands" class problem. The easiest way is to use MATCH_RECOGNIZE clause:
SELECT *
FROM test
MATCH_RECOGNIZE (
PARTITION BY ID
ORDER BY Date
MEASURES
CLASSIFIER() AS Status
,MIN(Date) AS Start_date
,MAX(Date) AS End_date
PATTERN (T+)
DEFINE T AS Status = 'T'
) AS mr;
Output:

SQL Server: Two Level Sort (Order/Group By??)

Everyone:
I have the following code which produced the below table:
SELECT DISTINCT a.[Date]
,a.[ID]
,a.[Account]
,a.[First_Last]
FROM [Table] AS a
WHERE [First_Last] = 1 OR
[First_Last] = (
SELECT MAX([First_Last])
FROM [Table] AS b
WHERE a.[ID] = b.[ID] AND a.[Account] = b.[Account]
)
ORDER BY [ID], [Account], [Date]
Date ID Account First_Last
10/31/2018 1111 45 1
1/29/2021 1111 45 4
9/29/2017 1111 753 1
9/28/2018 1111 753 2
9/29/2017 2222 481 1
1/31/2018 2222 481 2
10/31/2017 2222 488 1
1/31/2018 2222 488 2
11/30/2017 2222 582 1
1/31/2019 2222 582 3
2/28/2017 2222 621 1
2/28/2018 2222 621 2
6/30/2017 2222 1007 1
6/29/2018 2222 1007 2
But I need it to be ordered this way:
Date ID Account First_Last
9/29/2017 1111 753 1
9/28/2018 1111 753 2
10/31/2018 1111 45 1
1/29/2021 1111 45 4
2/28/2017 2222 621 1
2/28/2018 2222 621 2
6/30/2017 2222 1007 1
6/29/2018 2222 1007 2
9/29/2017 2222 481 1
1/31/2018 2222 481 2
10/31/2017 2222 488 1
1/31/2018 2222 488 2
11/30/2017 2222 582 1
1/31/2019 2222 582 3
Notice that the table I need is not sorted by Account. It is sorted by Date for each ID-Account combination. For example, for ID = 1111, Account 753 comes before Account 45 because 753's first date is 9/29/2017 and 45's first date is 10/31/2018. Since I do not want Account to be sorted, I tried to remove Account from the ORDER BY, but that put the Account numbers in random lines because of Date instead of grouping them together.
What am I missing?
Thank you.
You can use window function to find the "first date" by ID and Account.
order by ID,
min([Date]) over(partition by ID, Account),
[Date]

How to restrict the upper limit of rows while doing join in SQL?

I have two tables: balance and calendar.
Balance :
Account Date Balance
1111 01/01/2014 100
1111 02/01/2014 156
1111 03/01/2014 300
1111 04/01/2014 300
1111 07/01/2014 468
1112 02/01/2014 300
1112 03/01/2014 300
1112 06/01/2014 300
1112 07/01/2014 350
1112 08/01/2014 400
1112 09/01/2014 450
1113 01/01/2014 30
1113 02/01/2014 40
1113 03/01/2014 45
1113 06/01/2014 45
1113 07/01/2014 60
1113 08/01/2014 50
1113 09/01/2014 20
1113 10/01/2014 10
Calendar
date business_day_ind
01/01/2014 N
02/01/2014 Y
03/01/2014 Y
04/01/2014 N
05/01/2014 N
06/01/2014 Y
07/01/2014 Y
08/01/2014 Y
09/01/2014 Y
10/01/2014 Y
I need to do the following:
I need to fill in the missing days for all the accounts up to the maximum day for which it has value. Say for account 1111, it has value only till 07/01/2014, so the dates need to be filled only till that. But when I join with the calendar table (plain left join), I am not able restrict the maximum day to the day available for an account.
1111 01/01/2014 100 N
1111 02/01/2014 156 Y
1111 03/01/2014 300 Y
1111 04/01/2014 300 Y
1111 05/01/2014 N
1111 06/01/2014 N
1111 07/01/2014 468 Y
1111 08/01/2014 Y
1111 09/01/2014 Y
1111 10/01/2014 Y
1112 01/01/2014 N
1112 02/01/2014 300 Y
1112 03/01/2014 300 Y
1112 04/01/2014 N
1112 05/01/2014 N
1112 06/01/2014 300 Y
1112 07/01/2014 350 Y
1112 08/01/2014 400 Y
1112 09/01/2014 450 Y
1112 10/01/2014 Y
I need an efficient way (preferably not involving multiple steps) to restrict the dates up to an account's maximum balance available date (07/01/2014 in case of 1111,09/01/2014 in case 1112)
Desired output:
1111 01/01/2014 100 N
1111 02/01/2014 156 Y
1111 03/01/2014 300 Y
1111 04/01/2014 300 Y
1111 05/01/2014 N
1111 06/01/2014 N
1111 07/01/2014 468 Y
1112 01/01/2014 N
1112 02/01/2014 300 Y
1112 03/01/2014 300 Y
1112 04/01/2014 N
1112 05/01/2014 N
1112 06/01/2014 300 Y
1112 07/01/2014 350 Y
1112 08/01/2014 400 Y
1112 09/01/2014 450 Y
After filling the missing days, I am planning to impute the balance of previous business day to the missing days. I am planning to get previous business day for every date and do an update to missing rows by joining the original balance table with acct and previous business day as key.
Thanks.
I am Greenplum database.
A possible way would be put a second select in a subquery. For instance:
select ... from calendar a left outer join balance b on a.date = b.date
where a.date <= (select max(date) from balance c where b.Account = c.Account )
I suppose that you have third table, accounts:
select
accounts.account,
calendar.date,
balance.balance,
calendar.business_day_ind
from
accounts cross join lateral (
select *
from calendar
where calendar.date <= (
select max(date)
from balance
where balance.account = accounts.account)) as calendar left join
balance on (balance.account = accounts.account and balance.date = calendar.date)
order by
accounts.account, calendar.date;
About lateral joins
That was a fun challenge!
CREATE TABLE balance
(account int, balance_date timestamp, balance int)
DISTRIBUTED BY (account, balance_date);
INSERT INTO balance
values (1111,'01/01/2014', 100),
(1111, '02/01/2014', 156),
(1111, '03/01/2014', 300),
(1111, '04/01/2014', 300),
(1111, '07/01/2014', 468),
(1112, '02/01/2014', 300),
(1112, '03/01/2014', 300),
(1112, '06/01/2014', 300),
(1112, '07/01/2014', 350),
(1112, '08/01/2014', 400),
(1112, '09/01/2014', 450),
(1113, '01/01/2014', 30),
(1113, '02/01/2014', 40),
(1113, '03/01/2014', 45),
(1113, '06/01/2014', 45),
(1113, '07/01/2014', 60),
(1113, '08/01/2014', 50),
(1113, '09/01/2014', 20),
(1113, '10/01/2014', 10);
CREATE TABLE calendar
(calendar_date timestamp, business_day_ind boolean)
DISTRIBUTED BY (calendar_date);
INSERT INTO calendar
values ('01/01/2014', false),
('02/01/2014', true),
('03/01/2014', true),
('04/01/2014', false),
('05/01/2014', false),
('06/01/2014', true),
('07/01/2014', true),
('08/01/2014', true),
('09/01/2014', true),
('10/01/2014', true);
analyze balance;
analyze calendar;
And now the query.
select d.account, d.my_date, b.balance, c.business_day_ind
from (
select account, start_date + interval '1 month' * (generate_series(0, duration)) AS my_date
from (
select account, start_date, (date_part('year', duration) * 12 + date_part('month', duration))::int as duration
from (
select start_date, age(end_date, start_date) as duration, account
from (
select account, min(balance_date) as start_date, max(balance_date) as end_date
from balance
group by account
) as sub1
) as sub2
) sub3
) as d
left outer join balance b on d.account = b.account and d.my_date = b.balance_date
join calendar c on c.calendar_date = d.my_date
order by d.account, d.my_date;
Results:
account | my_date | balance | business_day_ind
---------+---------------------+---------+------------------
1111 | 2014-01-01 00:00:00 | 100 | f
1111 | 2014-02-01 00:00:00 | 156 | t
1111 | 2014-03-01 00:00:00 | 300 | t
1111 | 2014-04-01 00:00:00 | 300 | f
1111 | 2014-05-01 00:00:00 | | f
1111 | 2014-06-01 00:00:00 | | t
1111 | 2014-07-01 00:00:00 | 468 | t
1112 | 2014-02-01 00:00:00 | 300 | t
1112 | 2014-03-01 00:00:00 | 300 | t
1112 | 2014-04-01 00:00:00 | | f
1112 | 2014-05-01 00:00:00 | | f
1112 | 2014-06-01 00:00:00 | 300 | t
1112 | 2014-07-01 00:00:00 | 350 | t
1112 | 2014-08-01 00:00:00 | 400 | t
1112 | 2014-09-01 00:00:00 | 450 | t
1113 | 2014-01-01 00:00:00 | 30 | f
1113 | 2014-02-01 00:00:00 | 40 | t
1113 | 2014-03-01 00:00:00 | 45 | t
1113 | 2014-04-01 00:00:00 | | f
1113 | 2014-05-01 00:00:00 | | f
1113 | 2014-06-01 00:00:00 | 45 | t
1113 | 2014-07-01 00:00:00 | 60 | t
1113 | 2014-08-01 00:00:00 | 50 | t
1113 | 2014-09-01 00:00:00 | 20 | t
1113 | 2014-10-01 00:00:00 | 10 | t
(25 rows)
I had to get the min and max dates for each account and then use generate_series to generate the months between the two dates. It would have been a bit cleaner query if you wanted a record for each day but I had to use another subquery to get the results at a monthly level.

SQL - Compare rows by id, date and amount

I need to SELECT a row in which issue_date = maturity_date of another row with same id, and same amount_usd.
I tried with self join, but I do not get right result.
Here is a simplified version of my table:
ID ISSUE_DATE MATURITY_DATE AMOUNT_USD
1 2010-01-01 00:00:00.000 2015-12-01 00:00:00.000 5000
1 2010-01-01 00:00:00.000 2001-09-19 00:00:00.000 700
2 2014-04-09 00:00:00.000 2019-04-09 00:00:00.000 400
1 2015-12-01 00:00:00.000 2016-12-31 00:00:00.000 5000
5 2015-02-24 00:00:00.000 2015-02-24 00:00:00.000 8000
4 2012-11-29 00:00:00.000 2015-11-29 00:00:00.000 10000
3 2015-01-21 00:00:00.000 2018-01-21 00:00:00.000 17500
2 2015-02-02 00:00:00.000 2015-12-05 00:00:00.000 12000
1 2015-01-12 00:00:00.000 2018-01-12 00:00:00.000 18000
2 2015-12-05 00:00:00.000 2016-01-10 00:00:00.000 12000
Result should be:
ID ISSUE_DATE MATURITY_DATE AMOUNT_USD
1 2015-12-01 00:00:00.000 2016-12-31 00:00:00.000 5000
2 2015-12-05 00:00:00.000 2016-01-10 00:00:00.000 12000
Thanks in advance!
Do following: http://sqlfiddle.com/#!6/c0a02/1
select a.id, a.issue_date, a.maturity_date, a.amount_usd
from tbl a
inner join tbl b
on a.id = b.id
and a.maturity_date = b.issue_date
-- added to prevent same maturity date and issue date
where a.maturity_date <> a.issue_date
Output:
| id | issue_date | maturity_date | amount_usd |
|----|----------------------------|----------------------------|------------|
| 1 | January, 01 2010 00:00:00 | December, 01 2015 00:00:00 | 5000 |
| 2 | February, 02 2015 00:00:00 | December, 05 2015 00:00:00 | 12000 |

PostgreSQL for each row from one table join all rows from another table

In table A I have dates from 2014-01-01 to 2014-12-31
action_date
2014-01-01
2014-01-02
2014-01-03
...
2014-12-31
In table B I have some information like
id name action_date deletion_date
1 nik 2013-01-01 2014-02-03
2 tom 2014-06-02 2014-06-30
3 lola 2013-12-30 2014-01-01
I want to join row from B table to each A table row if activation_date<=action_date<=deletion_date
e.g.
action_date id name action_date deletion_date
2014-01-01 1 nik 2013-01-01 2014-02-03
2014-01-01 3 lola 2013-12-30 2014-01-01
2014-01-02 1 nik 2013-01-01 2014-02-03
2014-01-03 1 nik 2013-01-01 2014-02-03
[...]
2014-02-03 1 nik 2013-01-01 2014-02-03
2014-06-02 2 tom 2014-06-02 2014-06-30
2014-06-03 2 tom 2014-06-02 2014-06-30
[...]
2014-06-03 2 tom 2014-06-02 2014-06-30
I tried to use left join without on statement, only with where condition. Unfortunately, it's not working.
You can use the between operator in your join condition:
SELECT a.action_date, b.*
FROM b
JOIN a ON a.action_date BETWEEN b.activation_date AND b.deletion_date