Table A
date
flight
airport
2012-10-01
oneway
ATL, GA
2012-10-01
oneway
LAX, CA
2012-10-01
oneway
SAN, CA
2012-10-01
oneway
DTW, MI
2012-10-02
round
SFO, CA
Table B
date
temp
precip
2012-10-01
67
0.02
2012-10-01
65
0.32
2012-10-01
86
0.18
2012-10-01
87
0.04
2012-10-02
78
0.24
The actual tables have more than 100k rows.
Exepected outcome has two columns temp and ratio
For each temp, I am trying to get the a ratio of flight = oneway where airport have "CA" in it.
I need to first filter rows that the average of precip is greater than 0.2 and cast ratio to interger.
I tried to join on date and group by temp that is having average precip < 0.2 but I am getting fixed wrong value on ratio.
How can I do CTE or CASE WHEN to merge these two tables to compute ratio?
Ratio is should be the (total count of all rows where flight = 'oneway' per each temperature after all filtering) / (total counts of rows)
In the query below I join A and B records over their Date matching as well as B.airport ending with CA, grouped by temperature. The total number of such pairs is the result of COUNT(*) that I divide with. The value I am dividing is the number of items from the pairs which have a oneway flight. It's possible that I did not fully understand the question, in which case we may need to move the airport criteria from the where into the case-when.
SELECT b.temp,
CAST(SUM(
CASE
WHEN A.flight = 'oneway'
THEN 1
ELSE 0
END
) AS FLOAT) / COUNT(*)
FROM A
JOIN B
ON A.`Date` = B.`Date` AND
B.airport LIKE '%CA'
GROUP BY B.temp
Related
Table A
date
flight
airport
2012-10-01
oneway
ATL, GA
2012-10-01
oneway
LAX, CA
2012-10-02
oneway
SAN, CA
2012-10-02
oneway
DTW, MI
2012-10-03
round
SFO, CA
2012-10-04
round
SFO, CA
2012-10-05
round
SFO, CA
Table B
date
temp
precip
2012-10-01
27
0.02
2012-10-02
35
0.00
2012-10-03
66
0.18
2012-10-04
57
0.00
2012-10-05
78
0.24
Table A has about 100k rows and whereas Table B has only about 60 rows
I am trying to query to find total number of flights on cold days and warm days as well as tracking the number of days for either cold or warm.
A cold day is defined when temp from Table B is below (<) 40 and warm otherwise.
In the real data, I have total 10 days that matches the date therefore I need to count for that when aggregating. I tried to get the total count without using CTE but I am keep getting wrong counts.
The expected outcome
Days
Num_of_flight
Num_of_days
cold day
4
2
warm day
3
3
You need a LEFT join of TableB to TableA and aggregation on the result of a CASE expression which returns 'cold' or 'warm':
SELECT CASE WHEN b.temp < 40 THEN 'cold day' ELSE 'warm day' END Days,
COUNT(*) Num_of_flight,
COUNT(DISTINCT a.date) Num_of_days
FROM TableB b LEFT JOIN TableA a
ON a.date = b.date
GROUP BY Days;
See the demo.
I am using postgresql, and I have those two tables, Sale and Royalty.
Sale
saleId
ItemId
price
createdAt
1
a
200
2022-08-17
2
b
400
2022-08-19
3
c
500
2022-09-04
Royalty
Id
rate
createdAt
deletedAt
1
0.25
2022-08-10
2022-08-20
2
0.15
2022-08-20
2022-09-01
3
0.20
2022-09-01
null
I want to join sale and royalty to make result like this.
the point is how to match rate with Sale.createdAt comparing to Royalty's rate period.
selected Result
ItemId
rate*price
Sale.createdAt
a
50 (200*0.25)
2022-08-17
b
100 (400*0.25)
2022-08-19
c
100 (500*0.20)
2022-09-04
I don't want to use between on every royalty since more rows could be added.
I'm considering making Sale-Royalty table to get rate*price easily,
but I wonder if there's a way to solve using join with this condition...
One approach is to utilize postgres' daterange type with its <# operator :
select
s.*,
r.rate,
s.price * rate as value
from sale s
join royalty r on s.createdAt <# daterange(r.createdAt, r.deletedAt)
;
caveats :
if royalty date ranges overlap, this will multiply the returned rows (a sale having several valid royalty ranges will appear n times)
replace with an outer (left) join if you need sales even without royalties
dbfiddle
I have a table like below
ID Date Category Cycles
--------------------------------------------
RYI19 6/12/2018 TEMPERATURE 1567 y
RYI19 6/13/2018 VOLUME 1620 n
RYI19 6/25/2018 AREA 1890 y
RYI19 6/28/2018 TEMPERATURE 1435 y
TYI23 5/10/2020 LENGTH 1567 Y
TYI23 6/12/2020 LENGTH 1678 Y
TYI23 6/13/2020 LENGTH 1689 n
Before my only condition was to select first from the group
So I wrote this code:
select
ID, date
from
(select
ID, date,
row_number() over(partition by ID order by date) rn
from
table1) t1
where
rn = 1
Now I have 2 additional columns and 2 conditions if the group is within 2 days and cycles less than 100, don't consider that record. Ideally the cycles need to increase as date increases but in case it is smaller then only date condition of 2 days need to be considered for records to select or not. as far as the category is considered it needs to combine all unique categories when the records are not considered. If it is same dates then one of them needs to be picked.
ID Date Category Cycles
-------------------------------------------------
RYI19 6/12/2018 TEMPERATURE & VOLUME 1567
RYI19 6/25/2018 AREA 1890
RYI19 6/28/2018 TEMPERATURE 1435
TYI23 5/10/2020 LENGTH 1567
TYI23 6/12/2020 LENGTH 1678
I need to make sure to have only unique category in the field-Note that last record did not have LENGTH in the category twice.
Edit:
Adding rules clearly
1)If the dates are within 2 days or the cycles are within 100 cycles then remove the non- VOLUME record but if the categories are both VOLUME or both NON VOLUME records then display the prior date record.
2)If the temperature category is 10 days prior to the volume record then also consider the volume record only that is flag the temperature record to be removed/filtered.
3)If one of dates is in December then consider 30 days difference if the categories are different.
ID Date Category Cycles
RPI100 8/7/2020 Volume 4327
RPI100 8/18/2020 TEMPERATURE 4300
RDY234 6/1/2020 VOLUME 7014
RDY234 6/4/2020 TEMERATURE 7014
PDI23 8/3/2020 VOLUME 9799
PDI23 9/28/2020 TEMERATURE 12968
PDI23 10/6/2020 VOLUME 13398
F128 2/25/2020 TEMERATURE 9875
YU567 12/2/2020 VOLUME 7403
YU567 12/3/2020 VOLUME 7436
RTY78 8/17/2020 STATE 3198
TYI12 1/27/2020 VOLUME 6145
RPI145 12/16/2019 VOLUME 2110
RPI145 1/23/2020 TEMPERATURE 0
Something like this should do the trick
df.groupby(['id', 'date', 'cycles']).agg({"Category": " & ".join})
I have two tables containing date ranges that I want to cross multiply in a way to get all distinct ranges. That is, all ranges that have a boundary in one of the tables.
Specifically I have a table with product prices and their validity dates as well as conversion factors with a validity date. I want, as a result, each instance of a specific price/conversion_factor combination and from when to when it was valid:
products:
product_id start_date end_date price_eur
1 2000-01-01 2000-12-31 100
1 2001-01-01 2002-12-31 150
conversion_factors:
start_date end_date dollar_to_eur
1970-01-01 2000-03-31 1.50
2000-04-01 2000-06-30 1.60
2000-07-01 2001-06-30 1.70
2001-07-01 2003-06-30 2.00
result:
product_id start_date end_date price_eur dollar_to_eur
1 2000-01-01 2000-03-31 100 1.50
1 2000-04-01 2000-06-30 100 1.60
1 2000-07-01 2000-12-31 100 1.70
1 2001-01-01 2001-06-30 150 1.70
1 2001-07-01 2002-12-31 150 2.00
So each time one of the tables hits a new date, a new row should be returned. In the result the first two rows reference the validity of the first product row, but split up into two intervals in the conversion_factos table. Similarly the second and third row of the result come from the second conversion factor row, but with different product rows.
Is there any way to do this with a clever join (in PostgreSQL) or do I need to use a PL/pgSQL function?
There are to parts in this, you ask for a smart join and you ask for displaying the correct result. This should answer your problems:
SELECT Greatest(p.start_date, cf.start_date) AS start_date
,Least(p.end_date, cf.end_date) AS end_date
,p.price_eur
,cf.dollar_to_eur
FROM products AS p
JOIN conversion_factors AS cf
ON p.start_date <= cf.end_date AND p.end_date >= cf.start_date
I need to find the difference in averages between patient weights at different visits (time points), but I'm struggling with finding the "paired" averages:
I have 1 table (PHYS) containing patient weights at different visits:
PATIENT VISIT WEIGHT
1 Baseline 200
1 1 Month 190
1 2 Month 170
2 Baseline 300
2 1 Month 290
2 2 Month 280
3 Baseline 250
3 1 Month 230
My problem is that I only want to find the difference for paired data. For example, when calculating the amount of weight loss between the 2 month and Baseline visits, I would want to find the difference between the (average 2 Month weight) and the (average Baseline weight FOR ONLY THOSE PATIENTS WITH A 2 MONTH WEIGHT). In this example, the result should be AVG(170,280) - AVG(200,300) = -25 (since only patient 1 and 2 have 2 Month weights).
Here is what I have, but it calculates the difference based on all weights:
SELECT VISIT
AVG(WEIGHT)
-
(SELECT
AVG(WEIGHT)
FROM PHYS
WHERE VISIT = 'BASELINE')
FROM PHYS
GROUP BY VISIT
My desired output would be (I know I need to add an ORDER BY):
VISIT CHANGE FROM BASELINE
Baseline 0
1 Month -13.3
2 Month -25
Thank you and sorry for such a newb question.
You can do this with a join to the same table but only for the 'Baseline'. Then, the aggregation only aggregates the values that match, so you should get different baseline averages for the three groups (because the populations are different):
select p.visit, avg(p.weight) as avg_weight, avg(pbl.weight) as avg_blweight,
(avg(p.weight) - avg(pbl.weight)) as change
from phys p join
phys pbl
on p.patient = pbl.patient and
pbl.visit = 'Baseline'
group by p.visit;