Cross-product of date ranges - sql

I have two tables containing date ranges that I want to cross multiply in a way to get all distinct ranges. That is, all ranges that have a boundary in one of the tables.
Specifically I have a table with product prices and their validity dates as well as conversion factors with a validity date. I want, as a result, each instance of a specific price/conversion_factor combination and from when to when it was valid:
products:
product_id start_date end_date price_eur
1 2000-01-01 2000-12-31 100
1 2001-01-01 2002-12-31 150
conversion_factors:
start_date end_date dollar_to_eur
1970-01-01 2000-03-31 1.50
2000-04-01 2000-06-30 1.60
2000-07-01 2001-06-30 1.70
2001-07-01 2003-06-30 2.00
result:
product_id start_date end_date price_eur dollar_to_eur
1 2000-01-01 2000-03-31 100 1.50
1 2000-04-01 2000-06-30 100 1.60
1 2000-07-01 2000-12-31 100 1.70
1 2001-01-01 2001-06-30 150 1.70
1 2001-07-01 2002-12-31 150 2.00
So each time one of the tables hits a new date, a new row should be returned. In the result the first two rows reference the validity of the first product row, but split up into two intervals in the conversion_factos table. Similarly the second and third row of the result come from the second conversion factor row, but with different product rows.
Is there any way to do this with a clever join (in PostgreSQL) or do I need to use a PL/pgSQL function?

There are to parts in this, you ask for a smart join and you ask for displaying the correct result. This should answer your problems:
SELECT Greatest(p.start_date, cf.start_date) AS start_date
,Least(p.end_date, cf.end_date) AS end_date
,p.price_eur
,cf.dollar_to_eur
FROM products AS p
JOIN conversion_factors AS cf
ON p.start_date <= cf.end_date AND p.end_date >= cf.start_date

Related

Using WITH and UNION to compute number of flights and weather condition with two tables

Table A
date
flight
airport
2012-10-01
oneway
ATL, GA
2012-10-01
oneway
LAX, CA
2012-10-02
oneway
SAN, CA
2012-10-02
oneway
DTW, MI
2012-10-03
round
SFO, CA
2012-10-04
round
SFO, CA
2012-10-05
round
SFO, CA
Table B
date
temp
precip
2012-10-01
27
0.02
2012-10-02
35
0.00
2012-10-03
66
0.18
2012-10-04
57
0.00
2012-10-05
78
0.24
Table A has about 100k rows and whereas Table B has only about 60 rows
I am trying to query to find total number of flights on cold days and warm days as well as tracking the number of days for either cold or warm.
A cold day is defined when temp from Table B is below (<) 40 and warm otherwise.
In the real data, I have total 10 days that matches the date therefore I need to count for that when aggregating. I tried to get the total count without using CTE but I am keep getting wrong counts.
The expected outcome
Days
Num_of_flight
Num_of_days
cold day
4
2
warm day
3
3
You need a LEFT join of TableB to TableA and aggregation on the result of a CASE expression which returns 'cold' or 'warm':
SELECT CASE WHEN b.temp < 40 THEN 'cold day' ELSE 'warm day' END Days,
COUNT(*) Num_of_flight,
COUNT(DISTINCT a.date) Num_of_days
FROM TableB b LEFT JOIN TableA a
ON a.date = b.date
GROUP BY Days;
See the demo.

how to join two table with range of dates

I am using postgresql, and I have those two tables, Sale and Royalty.
Sale
saleId
ItemId
price
createdAt
1
a
200
2022-08-17
2
b
400
2022-08-19
3
c
500
2022-09-04
Royalty
Id
rate
createdAt
deletedAt
1
0.25
2022-08-10
2022-08-20
2
0.15
2022-08-20
2022-09-01
3
0.20
2022-09-01
null
I want to join sale and royalty to make result like this.
the point is how to match rate with Sale.createdAt comparing to Royalty's rate period.
selected Result
ItemId
rate*price
Sale.createdAt
a
50 (200*0.25)
2022-08-17
b
100 (400*0.25)
2022-08-19
c
100 (500*0.20)
2022-09-04
I don't want to use between on every royalty since more rows could be added.
I'm considering making Sale-Royalty table to get rate*price easily,
but I wonder if there's a way to solve using join with this condition...
One approach is to utilize postgres' daterange type with its <# operator :
select
s.*,
r.rate,
s.price * rate as value
from sale s
join royalty r on s.createdAt <# daterange(r.createdAt, r.deletedAt)
;
caveats :
if royalty date ranges overlap, this will multiply the returned rows (a sale having several valid royalty ranges will appear n times)
replace with an outer (left) join if you need sales even without royalties
dbfiddle

Calculating difference (or deltas) between current and previous row with clickhouse

It would be awesome if there was a way to index rows during a query.
Is there a way to SELECT (compute) the difference of a single column between consecutive rows?
Let's say, something like the following query
SELECT
toStartOfDay(stamp) AS day,
count(day ) AS events ,
day[current] - day[previous] AS difference, -- how do I calculate this
day[current] / day[previous] as percent, -- and this
FROM records
GROUP BY day
ORDER BY day
I want to get the integer and percentage difference between the current row's 'events' column and the previous one for something similar to this:
day
events
difference
percent
2022-01-06 00:00:00
197
NULL
NULL
2022-01-07 00:00:00
656
459
3.32
2022-01-08 00:00:00
15
-641
0.02
2022-01-09 00:00:00
7
-8
0.46
2022-01-10 00:00:00
137
130
19.5
My version of Clickhouse doesn't support window-function but, on looking about the LAG() function mentioned in the comments, I found neighbor(), which works perfectly for what I'm trying to do
SELECT
toStartOfDay(stamp) AS day,
count(day ) AS events ,
(events - neighbor(events, -1)) as diff,
(events / neighbor(events, -1)) as perc
FROM records
GROUP BY day
ORDER BY day

Showing Two Fields With Different Timeline in the Same Date Structure

In the project I am currently working on in my company, I would like to show sales related KPIs together with Customer Score metric on SQL / Tableau / BigQuery
The primary key is order id in both tables. However, order date and the date we measure Customer Score may be different. For example the the sales information for an order that is released in Feb 2020 will be aggregated in Feb 2020, however if the customer survey is made in March 2020, the Customer Score metric must be aggregated in March 2020. And what I would like to achieve in the relational database is as follows:
Sales:
Order ID
Order Date(m/d/yyyy)
Sales ($)
1000
1/1/2021
1000
1001
2/1/2021
2000
1002
3/1/2021
1500
1003
4/1/2021
1700
1004
5/1/2021
1800
1005
6/1/2021
900
1006
7/1/2021
1600
1007
8/1/2021
1900
Customer Score Table:
Order ID
Customer Survey Date(m/d/yyyy)
Customer Score
1000
3/1/2021
8
1001
3/1/2021
7
1002
4/1/2021
3
1003
6/1/2021
6
1004
6/1/2021
5
1005
7/1/2021
3
1006
9/1/2021
1
1007
8/1/2021
7
Expected Output:
KPI
Jan-21
Feb-21
Mar-21
Apr-21
May-21
June-21
July-21
Aug-21
Sep-21
Sales($)
1000
2000
1500
1700
1800
900
1600
1900
AVG Customer Score
7.5
3
5.5
3
7
1
I couldn't find a way to do this, because order date and survey date may/may not be the same.
For sample data and expected output, click here.
I think what you want to do is aggregate your results to the month (KPI) first before joining, as opposed to joining on the ORDER_ID
For example:
with order_month as (
select date_trunc(order_date, MONTH) as KPI, sum(sales) as sales
from `testing.sales`
group by 1
),
customer_score_month as (
select date_trunc(customer_survey_date, MONTH) as KPI, avg(customer_score) as avg_customer_score
from `testing.customer_score`
group by 1
)
select coalesce(order_month.KPI,customer_score_month.KPI) as KPI, sales, avg_customer_score
from order_month
full outer join customer_score_month
on order_month.KPI = customer_score_month.KPI
order by 1 asc
Here, we aggregate the total sales for each month based on the order date, then we aggregate the average customer score for each month based on the date the score was submitted. Now we can join these two on the month value.
This results in a table like this:
KPI
sales
avg_customer_score
2021-01-01
1000
null
2021-02-01
2000
null
2021-03-01
1500
7.5
2021-04-01
1700
3.0
2021-05-01
1800
null
2021-06-01
900
5.5
2021-07-01
1600
3.0
2021-08-01
1900
7.0
2021-09-01
null
1.0
You can pivot the results of this table in Tableau, or leverage a case statement to pull out each month into its own column - I can elaborate more if that will be helpful

Last 3 months average next to current month value in hive

I have a table which has the monthly sales values for each of the items. I need last 3 months average sales value next to the current month sales for each item.
Need to perform this operation in hive.
The sample input table looks like below
Item_ID Sales Month
A 4295 Dec-2018
A 245 Nov-2018
A 1337 Oct-2018
A 3290 Sep-2018
A 2000 Aug-2018
B 856 Dec-2018
B 1694 Nov-2018
B 4286 Oct-2018
B 2780 Sep-2018
B 3100 Aug-2018
The result table should look like this
Item_ID Sales_Current_Month Month Sales_Last_3_months_average
A 4295 Dec-2018 1624
A 245 Nov-2018 2209
B 856 Dec-2018 2920
B 1694 Nov-2018 3388.67
Assuming there is no missing months data, you can use avg window function to do this.
select t.*
,avg(sales) over(partition by item_id order by month rows between 3 preceding and 1 preceding) as avg_sales_prev_3_months
from tbl t
If month column is in a format different from yyyyMM, use an appropriate conversion so the ordering works as expected.