how to join two table with range of dates - sql

I am using postgresql, and I have those two tables, Sale and Royalty.
Sale
saleId
ItemId
price
createdAt
1
a
200
2022-08-17
2
b
400
2022-08-19
3
c
500
2022-09-04
Royalty
Id
rate
createdAt
deletedAt
1
0.25
2022-08-10
2022-08-20
2
0.15
2022-08-20
2022-09-01
3
0.20
2022-09-01
null
I want to join sale and royalty to make result like this.
the point is how to match rate with Sale.createdAt comparing to Royalty's rate period.
selected Result
ItemId
rate*price
Sale.createdAt
a
50 (200*0.25)
2022-08-17
b
100 (400*0.25)
2022-08-19
c
100 (500*0.20)
2022-09-04
I don't want to use between on every royalty since more rows could be added.
I'm considering making Sale-Royalty table to get rate*price easily,
but I wonder if there's a way to solve using join with this condition...

One approach is to utilize postgres' daterange type with its <# operator :
select
s.*,
r.rate,
s.price * rate as value
from sale s
join royalty r on s.createdAt <# daterange(r.createdAt, r.deletedAt)
;
caveats :
if royalty date ranges overlap, this will multiply the returned rows (a sale having several valid royalty ranges will appear n times)
replace with an outer (left) join if you need sales even without royalties
dbfiddle

Related

Using WITH and UNION to compute number of flights and weather condition with two tables

Table A
date
flight
airport
2012-10-01
oneway
ATL, GA
2012-10-01
oneway
LAX, CA
2012-10-02
oneway
SAN, CA
2012-10-02
oneway
DTW, MI
2012-10-03
round
SFO, CA
2012-10-04
round
SFO, CA
2012-10-05
round
SFO, CA
Table B
date
temp
precip
2012-10-01
27
0.02
2012-10-02
35
0.00
2012-10-03
66
0.18
2012-10-04
57
0.00
2012-10-05
78
0.24
Table A has about 100k rows and whereas Table B has only about 60 rows
I am trying to query to find total number of flights on cold days and warm days as well as tracking the number of days for either cold or warm.
A cold day is defined when temp from Table B is below (<) 40 and warm otherwise.
In the real data, I have total 10 days that matches the date therefore I need to count for that when aggregating. I tried to get the total count without using CTE but I am keep getting wrong counts.
The expected outcome
Days
Num_of_flight
Num_of_days
cold day
4
2
warm day
3
3
You need a LEFT join of TableB to TableA and aggregation on the result of a CASE expression which returns 'cold' or 'warm':
SELECT CASE WHEN b.temp < 40 THEN 'cold day' ELSE 'warm day' END Days,
COUNT(*) Num_of_flight,
COUNT(DISTINCT a.date) Num_of_days
FROM TableB b LEFT JOIN TableA a
ON a.date = b.date
GROUP BY Days;
See the demo.

Computing ratio with two tables with multiple conditions

Table A
date
flight
airport
2012-10-01
oneway
ATL, GA
2012-10-01
oneway
LAX, CA
2012-10-01
oneway
SAN, CA
2012-10-01
oneway
DTW, MI
2012-10-02
round
SFO, CA
Table B
date
temp
precip
2012-10-01
67
0.02
2012-10-01
65
0.32
2012-10-01
86
0.18
2012-10-01
87
0.04
2012-10-02
78
0.24
The actual tables have more than 100k rows.
Exepected outcome has two columns temp and ratio
For each temp, I am trying to get the a ratio of flight = oneway where airport have "CA" in it.
I need to first filter rows that the average of precip is greater than 0.2 and cast ratio to interger.
I tried to join on date and group by temp that is having average precip < 0.2 but I am getting fixed wrong value on ratio.
How can I do CTE or CASE WHEN to merge these two tables to compute ratio?
Ratio is should be the (total count of all rows where flight = 'oneway' per each temperature after all filtering) / (total counts of rows)
In the query below I join A and B records over their Date matching as well as B.airport ending with CA, grouped by temperature. The total number of such pairs is the result of COUNT(*) that I divide with. The value I am dividing is the number of items from the pairs which have a oneway flight. It's possible that I did not fully understand the question, in which case we may need to move the airport criteria from the where into the case-when.
SELECT b.temp,
CAST(SUM(
CASE
WHEN A.flight = 'oneway'
THEN 1
ELSE 0
END
) AS FLOAT) / COUNT(*)
FROM A
JOIN B
ON A.`Date` = B.`Date` AND
B.airport LIKE '%CA'
GROUP BY B.temp

Select the rows of the group based on 2 conditions but combine the unique categories of that group

I have a table like below
ID Date Category Cycles
--------------------------------------------
RYI19 6/12/2018 TEMPERATURE 1567 y
RYI19 6/13/2018 VOLUME 1620 n
RYI19 6/25/2018 AREA 1890 y
RYI19 6/28/2018 TEMPERATURE 1435 y
TYI23 5/10/2020 LENGTH 1567 Y
TYI23 6/12/2020 LENGTH 1678 Y
TYI23 6/13/2020 LENGTH 1689 n
Before my only condition was to select first from the group
So I wrote this code:
select
ID, date
from
(select
ID, date,
row_number() over(partition by ID order by date) rn
from
table1) t1
where
rn = 1
Now I have 2 additional columns and 2 conditions if the group is within 2 days and cycles less than 100, don't consider that record. Ideally the cycles need to increase as date increases but in case it is smaller then only date condition of 2 days need to be considered for records to select or not. as far as the category is considered it needs to combine all unique categories when the records are not considered. If it is same dates then one of them needs to be picked.
ID Date Category Cycles
-------------------------------------------------
RYI19 6/12/2018 TEMPERATURE & VOLUME 1567
RYI19 6/25/2018 AREA 1890
RYI19 6/28/2018 TEMPERATURE 1435
TYI23 5/10/2020 LENGTH 1567
TYI23 6/12/2020 LENGTH 1678
I need to make sure to have only unique category in the field-Note that last record did not have LENGTH in the category twice.
Edit:
Adding rules clearly
1)If the dates are within 2 days or the cycles are within 100 cycles then remove the non- VOLUME record but if the categories are both VOLUME or both NON VOLUME records then display the prior date record.
2)If the temperature category is 10 days prior to the volume record then also consider the volume record only that is flag the temperature record to be removed/filtered.
3)If one of dates is in December then consider 30 days difference if the categories are different.
ID Date Category Cycles
RPI100 8/7/2020 Volume 4327
RPI100 8/18/2020 TEMPERATURE 4300
RDY234 6/1/2020 VOLUME 7014
RDY234 6/4/2020 TEMERATURE 7014
PDI23 8/3/2020 VOLUME 9799
PDI23 9/28/2020 TEMERATURE 12968
PDI23 10/6/2020 VOLUME 13398
F128 2/25/2020 TEMERATURE 9875
YU567 12/2/2020 VOLUME 7403
YU567 12/3/2020 VOLUME 7436
RTY78 8/17/2020 STATE 3198
TYI12 1/27/2020 VOLUME 6145
RPI145 12/16/2019 VOLUME 2110
RPI145 1/23/2020 TEMPERATURE 0
Something like this should do the trick
df.groupby(['id', 'date', 'cycles']).agg({"Category": " & ".join})

Access the previous row in select

I have a scenario as below
--source data
departuredttm flight_source flight_destination available_seats
13-07-2016 04:00:00 A B 200
13-07-2016 08:00:00 A B 320
13-07-2016 08:20:00 A B 20
I have a lookup table which tell how many total passengers are there for this source and destinatin whose flights are delayed and needs to adjusted in available seats in source data.lookup table is like this.
--lookup table for passenger_from_delayed_flights
flight_source flight_destination passengers
A B 500
now I have to adjust these 500 passengers in available seats as in source data
---output
DepartureDttm flight_source flight_destination AVAILABLE_SEATS PASSENGERS_TO_ADJUST PASSENGER_LEFT
13-07-2016 04:00:00 A B 200 500 300
13-07-2016 08:00:00 A B 320 300 20
13-07-2016 08:20:00 A B 20 20 0
initially passenger to adjust is 500 where we have 200 seats , next 320 seats are available and we have to adjust 300 (500-200) passengers.
Please help
Thanks
Your expected result is probably wrong, the 2nd flight already has enough seats, so PASSENGER_LEFT should be -20 (or 0).
This is a calculation based on a running total:
passengers - SUM(available_seats)
OVER (ORDER BY departuredttm
ROWS UNBOUNDED PRECEDING) AS PASSENGER_LEFT
available_seats + PASSENGER_LEFT AS PASSENGERS_TO_ADJUST

Cross-product of date ranges

I have two tables containing date ranges that I want to cross multiply in a way to get all distinct ranges. That is, all ranges that have a boundary in one of the tables.
Specifically I have a table with product prices and their validity dates as well as conversion factors with a validity date. I want, as a result, each instance of a specific price/conversion_factor combination and from when to when it was valid:
products:
product_id start_date end_date price_eur
1 2000-01-01 2000-12-31 100
1 2001-01-01 2002-12-31 150
conversion_factors:
start_date end_date dollar_to_eur
1970-01-01 2000-03-31 1.50
2000-04-01 2000-06-30 1.60
2000-07-01 2001-06-30 1.70
2001-07-01 2003-06-30 2.00
result:
product_id start_date end_date price_eur dollar_to_eur
1 2000-01-01 2000-03-31 100 1.50
1 2000-04-01 2000-06-30 100 1.60
1 2000-07-01 2000-12-31 100 1.70
1 2001-01-01 2001-06-30 150 1.70
1 2001-07-01 2002-12-31 150 2.00
So each time one of the tables hits a new date, a new row should be returned. In the result the first two rows reference the validity of the first product row, but split up into two intervals in the conversion_factos table. Similarly the second and third row of the result come from the second conversion factor row, but with different product rows.
Is there any way to do this with a clever join (in PostgreSQL) or do I need to use a PL/pgSQL function?
There are to parts in this, you ask for a smart join and you ask for displaying the correct result. This should answer your problems:
SELECT Greatest(p.start_date, cf.start_date) AS start_date
,Least(p.end_date, cf.end_date) AS end_date
,p.price_eur
,cf.dollar_to_eur
FROM products AS p
JOIN conversion_factors AS cf
ON p.start_date <= cf.end_date AND p.end_date >= cf.start_date