Use Calendar table to generate historical view of the data - sql

I have a created_date (timestamp) on 1 of my tables, that also has the duration column of a project, and I need to join with another table that only has first_day_of_month column that has the first day of each month, and other relevant information.
Table 1
id project_id created_date duration
1 12345 01/01/2015 10
2 12345 20/10/2015 11
3 12345 10/04/2016 13
4 12345 10/08/2016 15
Table 2
project_id month_start_date
12345 01/01/2015
12345 01/02/2015
12345 01/03/2015
12345 01/04/2015
...
12345 01/08/2016
Expected result
project_id month_start_date duration
12345 01/01/2015 10
12345 01/02/2015 10
...
12345 01/10/2015 11
12345 01/11/2015 11
...
12345 01/04/2016 13
12345 01/05/2016 13
12345 01/06/2016 13
...
12345 01/08/2016 15
I want to be able to present the data listed in my second table historically. So, basically I want the query to return the same duration related to the month_start_date, so that values will repeat until another dateadd(month,datediff(month,0,created_date),0) = first_day_of_month is met... and so forth.
This is my query:
select table2.project_name,
table2.month_start_date,
table1.duration,
table1.created_date
from table1 left outer join table2
on table1.project_id=table2.project_id
where dateadd(month,datediff(month,0,table1.created_date),0)<=table2.month_start_date
group by table2.project_name,table2.month_start_date,table1.duration,table1.created_date
order by table2.month_start_date asc
but I get repeated records on this:
Result I'm getting
project_id month_start_date duration
12345 01/01/2015 10
12345 01/02/2015 10
...
12345 01/10/2015 10
12345 01/10/2015 11
...
12345 01/04/2016 10
12345 01/04/2016 11
12345 01/04/2016 13
...
12345 01/08/2016 10
12345 01/08/2016 11
12345 01/08/2016 13
12345 01/08/2016 15
Can anyone help?
Thank you!

I'd use CROSS/OUTER APPLY operator.
Here is one possible variant. For each row in your calendar table Table2 (for each month) the inner correlated subquery inside the CROSS APPLY finds one row from Table1. It will be the row with the same project_id and the first row with created_date before the month_start_date plus 1 month.
SELECT
Table2.project_id
,Table2.month_start_date
,Durations.duration
FROM
Table2
CROSS APPLY
(
SELECT TOP(1) Table1.duration
FROM Table1
WHERE
Table1.project_id = Table2.project_id
AND Table1.created_date < DATEADD(month, 1, Table2.month_start_date)
ORDER BY Table1.created_date DESC
) AS Durations
;
Make sure that Table1 has index on (project_id, created_date) include (duration). Otherwise, performance would be poor.

Related

Update SQL table date based on column in another table

I have a table like this:
ID
start_date
end_date
1
09/01/2022
1
09/04/2022
2
09/01/2022
I have another reference table like this:
ID
date
owner
1
09/01/2022
null
1
09/02/2022
null
1
09/03/2022
Joe
1
09/04/2022
null
1
09/05/2022
Jack
2
09/01/2022
null
2
09/02/2022
John
2
09/03/2022
John
2
09/04/2022
John
For every ID and start_date in the first table, I need find rows in the reference table that occur after start_date, and have non-null owner. Then I need to update this date value in end_date of first table.
Below is the output that I want:
ID
date
end_date
1
09/01/2022
09/03/2022
1
09/04/2022
09/05/2022
2
09/01/2022
09/02/2022

How to get the last day of the month without LAST_DAY() or EOMONTH()?

I have a table t with:
DATE
LOCATION
PRODUCT_ID
AMOUNT
2021-10-29
1
123
10
2021-10-30
1
123
9
2021-10-31
1
123
8
2021-10-29
1
456
100
2021-10-30
1
456
90
2021-10-31
1
456
80
2021-10-29
2
123
18
2021-10-30
2
123
17
2021-11-29
2
456
18
I need to find the AMOUNT of each PRODUCT_ID for each combination of LOCATION + PRODUCT_ID.
If a PRODUCT_ID has no entry for that day the AMOUNT is NULL.
So the result should look like:
DATE
LOCATION
PRODUCT_ID
AMOUNT
2021-10-31
1
123
8
2021-10-31
1
456
80
2021-10-31
2
123
NULL
2021-11-30
2
456
NULL
Sadly EXASOL has no LAST_DAY() or EOMONTH() function. How can I solve this?
You can get to the last day of the month using a date_trunc function in combination with date_add:
case
when t.date = date_add('day', -1, date_add('month', 1, date_trunc('month', t.date)))
then 'Y' else 'N' end as end_of_month
That being said, if you group your table for all combinations of locations and products, you will not get NULLs for products without sales on the last day of the month as shown in your output table.
When you group your data, any value that does not exist will simply not show up in your output table. If you want to force nulls to show up, you can create a new table that contains all combinations of products, locations, and hard-coded end of month dates.
Then, you can left join your old table with this new hard-coded table by date, location, and product. This method will give you the NULL values you expect.

In Postgresql, how do I use joins with multiple conditions including >= and <=

I have table A and table B. Each row in table A represents every time a user sends a message. Each row in table B represents every time a user buys a gift.
Goal: for each time a user sends a message, calculate how many gifts they've purchased within 7 days before the timestamp they sent the message. Some users never send messages and some never purchased gifts. If the user in table A didn't have gift purchased within 7 days, the count should be 0.
Table A:
user_id
time
12345
2021-09-04 09:43:55
12345
2021-09-03 00:39:30
12345
2021-09-02 03:26:07
12345
2021-09-05 15:48:34
23456
2021-09-09 09:06:22
23456
2021-09-08 08:06:21
00001
2021-09-03 15:38:15
00002
2021-09-03 15:38:15
Table B:
user_id
time
12345
2021-09-01 09:43:55
12345
2021-08-03 00:42:30
12345
2021-09-03 02:16:07
00003
2021-09-05 15:48:34
23456
2021-09-03 09:06:22
23456
2021-09-10 08:06:21
Expected output:
user_id
time
count
12345
2021-09-04 09:43:55
2
12345
2021-09-03 00:39:30
1
12345
2021-09-02 03:26:07
1
12345
2021-09-05 15:48:34
2
23456
2021-09-09 09:06:22
1
23456
2021-09-08 08:06:21
1
00001
2021-09-03 15:38:15
0
00002
2021-09-03 15:38:15
0
Query I tried:
SELECT A.user_id, A.time, coalesce(count(*), 0) as count
FROM A
LEFT JOIN B ON A.user_id = B.user_id AND B.time >= A.time - INTERVAL '7 days' AND B.time < A.time
GROUP BY 1,2
The count returned doesn't match the expected result however, not sure if I'm doing the join and conditions correctly.
You need to count the values from the possibly NULL columns i.e. from table B in order to get the correct counts of non-existent matches. i.e. being more specific in COUNT(*) to COUNT(b.column_from_b_table). See modification with working demo fiddle below:
SELECT
A.user_id,
A.time,
coalesce(count(B.user_id), 0) as count
FROM A
LEFT JOIN B ON A.user_id = B.user_id AND
B.time >= A.time - INTERVAL '7 days' AND
B.time < A.time
GROUP BY 1,2;
user_id
time
count
1
2021-09-03T15:38:15.000Z
0
12345
2021-09-05T15:48:34.000Z
2
23456
2021-09-08T08:06:21.000Z
1
12345
2021-09-04T09:43:55.000Z
2
12345
2021-09-03T00:39:30.000Z
1
23456
2021-09-09T09:06:22.000Z
1
2
2021-09-03T15:38:15.000Z
0
12345
2021-09-02T03:26:07.000Z
1
View on DB Fiddle
Let me know if this works for you.

Generate sequence based on the value in the previous row and current row

I have the below table having student information.
S_ID Group_ID Date Score
12345 1 1/1/2015 1
12345 1 2/1/2015 2
12345 1 3/1/2015 4
12345 1 4/1/2015 5
12345 1 9/1/2015 3
12345 1 10/1/2015 8
12345 2 1/1/2015 2
12345 2 2/1/2015 4
12345 2 3/1/2015 6
I want to generate a new table based for few students after adding a sequence column as shown below
S_ID Group_ID Date Score Sequence
12345 1 1/1/2015 1 1
12345 1 2/1/2015 2 2
12345 1 3/1/2015 4 3
12345 1 4/1/2015 5 4
12345 1 9/1/2015 3 3
12345 1 10/1/2015 8 4
12345 2 1/1/2015 2 2
12345 2 2/1/2015 4 3
12345 2 3/1/2015 6 4
Rules:
Sequence should be generated for each combination of S_ID, Group_I
For the first record, sequence number will be same as the Score
2nd record onwards, this will be 1 + the previous sequence number
if the difference between the date of the previous row and current row is
more than 100 days, sequence number will be restarted (same as the
Score for that record)
This is a large table and I am looking for the most optimized SQL. Any help would be greatly appreciated
The trick here is to find where the sequence numbers start over. This is for new students, groups, and when the previous date has too big a gap. For the latter, you can use lag() to calculate a "new dates start flag" and then aggregate this to get a grouping.
select t.*,
(first_value(score) over (partition by s_id, group_id, grp order by date) +
row_number() over (partition by s_id, group_id, grp order by date) - 1
) as sequence
from (select t.*,
sum(case when prev_date is null or prev_date < date - 100
then 1 else 0
end) over (partition by s_id, group_id order by date) as grp
from (select t.*,
lag(date) over (partition by s_id, group_id order by date) as prev_date
from t
) t
) t;

Update the list of dates to have the same day

I have this in my table
TempTable
Id Date
1 1-15-2010
2 2-14-2010
3 3-14-2010
4 4-15-2010
i would like to change every record so that they have all same day, that is the 15th
like this
TempTable
Id Date
1 1-15-2010
2 2-15-2010 <--change to 15
3 3-15-2010 <--change to 15
4 4-15-2010
what if i like on the 30th?
the records should be
TempTable
Id Date
1 1-30-2010
2 2-28-2010 <--change to 28 because feb has 28 days only
3 3-30-2010 <--change to 30
4 4-30-2010
thanks
You can play some fun tricks with DATEADD/DATEDIFF:
create table T (
ID int not null,
DT date not null
)
insert into T (ID,DT)
select 1,'20100115' union all
select 2,'20100214' union all
select 3,'20100314' union all
select 4,'20100415'
SELECT ID,DATEADD(month,DATEDIFF(month,'20100101',DT),'20100115')
from T
SELECT ID,DATEADD(month,DATEDIFF(month,'20100101',DT),'20100130')
from T
Results:
ID
----------- -----------------------
1 2010-01-15 00:00:00.000
2 2010-02-15 00:00:00.000
3 2010-03-15 00:00:00.000
4 2010-04-15 00:00:00.000
ID
----------- -----------------------
1 2010-01-30 00:00:00.000
2 2010-02-28 00:00:00.000
3 2010-03-30 00:00:00.000
4 2010-04-30 00:00:00.000
Basically, in the DATEADD/DATEDIFF, you specify the same component to both (i.e. month). Then, the second date constant (i.e. '20100130') specifies the "offset" you wish to apply from the first date (i.e. '20100101'), which will "overwrite" the portion of the date your not keeping. My usual example is when wishing to remove the time portion from a datetime value:
SELECT DATEADD(day,DATEDIFF(day,'20010101',<date column>),'20100101')
You can also try something like
UPDATE TempTable
SET [Date] = DATEADD(dd,15-day([Date]), DATEDIFF(dd,0,[Date]))
We have a function that calculates the first day of a month, so I just addepted it to calculate the 15 instead...