Problem calculating in SELECT based on 2 subqueries on same table - sql

I have a table containing meter readings logged on monthly basis including information on sunshine hours and heating degree days.
Using this table I try to calculate efficiency in our solarpanels as kwh produced per sunhour - on yearly basis. Goal is to identify possible decrease in production over time (age of solarpanels).
I can use meter readings to calculate yearly consumption (by subtracting last year end reading form this year end reading) - and I can sum up sunhours (monthly reading per year) - but my (selflearned) skills does enable me to get the combined calculation to work.
Result should be result listing 'Season, efficiency' to use in a graphing tool.
First attempt:
SELECT
season,
solarpanel - LAG(solarpanel, 1, 17449) OVER( ORDER BY season) AS kwh
FROM
kv28c_meterreadings_readings
WHERE
reading = 12
UNION
SELECT
season, SUM(sun_hrs) AS hours
FROM
kv28c_meterreadings_readings
GROUP BY
season
ORDER BY
season
This returns a list:
season kwh
-----------------
2018/19 1891.0
2018/19 1925.0
2019/20 1802.2
2019/20 1770.0
Now I need to divide the second row by the first row.
Latest attempt:
SELECT
k.season,
(k.solarpanel - LAG(k.solarpanel, 1, 17449) OVER( ORDER BY k.season)) / SUM(h.sun_hrs) AS k.efficiency
FROM
kv28c_meterreadings_readings k
WHERE
k.reading = 12
JOIN
kv28c_meterreadings_readings h ON k.season = h.season
ORDER BY
k.season
which fails on syntax at WHERE.
What to do?

Without seeing your table structure, it's hard to know what the best answer would be but I've come up with two possibilities.
No joins - Everything is coming from the same table. Pull it all at the same time.
SELECT season
, solarpanel - LAG(solarpanel, 1, 17449) OVER( ORDER BY season) as kwh
, SUM(sun_hrs) as hours
, (solarpanel - LAG(solarpanel, 1, 17449) OVER( ORDER BY season))/(SUM(sun_hrs)) as efficiency
FROM kv28c_meterreadings_readings
WHERE reading = 12
ORDER BY season
CTE - a cte to separate kwh from hours.
WITH power AS
( SELECT season
, solarpanel - LAG(solarpanel, 1, 17449) OVER( ORDER BY season) as kwh
FROM kv28c_meterreadings_readings
WHERE reading = 12
)
, sun AS
( SELECT season
, SUM(sun_hrs) AS hours
FROM kv28c_meterreadings_readings
)
SELECT power.season
, power.kwh / sun.hours AS efficiency
FROM power
JOIN sun ON power.season = sun.season
ORDER BY power.season

to get your numbers, you can use two cte and join the results
WITH CTE_Reading as(
SELECT
season,
solarpanel - LAG(solarpanel, 1, 17449) OVER( ORDER BY season) AS kwh
FROM
kv28c_meterreadings_readings
WHERE
reading = 12),
CTE_group as(
SELECT
season, SUM(sun_hrs) AS hours
FROM
kv28c_meterreadings_readings
GROUP BY
season)
SELECT c1.season, kwh/hours
FROM CTE_Reading c1 JOIN CTE_group c2 ON c1.season = c2.season
ORDER BY
c1.season

Related

How to join partitioned table with another one

Sorry for the newbie question, but I'm really having trouble with the following issue:
Say, I have this code in place:
WITH active_pass AS (SELECT DATE_TRUNC(fr.day, MONTH) AS month, id,
CASE
WHEN SUM(fr.imps) > 100 THEN 1
WHEN SUM(fr.imps) < 100 THEN 0
END AS active_or_passive
FROM table1 AS fr
WHERE day between (CURRENT_DATE() - 730) AND (CURRENT_DATE() - EXTRACT(DAY FROM CURRENT_DATE()))
GROUP BY month, id
ORDER BY month desc),
# summing the score for each customer (sum for the whole year)
active_pass_assigned AS (SELECT id, month,
SUM(SUM(active_or_passive)) OVER (PARTITION BY id ORDER BY month rows BETWEEN 3 PRECEDING AND 1 PRECEDING) AS trailing_act
FROM active_pass AS a
GROUP BY month, id
ORDER BY MONTH desc)
What it does is it creates a trailing total over the last 3 months to see how many of those last 3 month the customer was active. However, I have no idea how to join with the next table to get a sum of revenue that said client generated. What I tried is this:
SELECT c.id, DATE_TRUNC(day, MONTH) AS month, SUM(revenue) AS Rev, name
FROM table2 AS c
JOIN active_pass_assigned AS a
ON c.id = a.id
WHERE day between (CURRENT_DATE() - 365) AND (CURRENT_DATE() - EXTRACT(DAY FROM CURRENT_DATE()))
GROUP BY month, id, name
ORDER BY month DESC
However, it returns waaay higher values for Revenue than the actual ones and I have no idea why. Furthermore, could you please tell me how to join those two tables together so that I only get the customer's revenue on the months his activity was equal to 3?

How to Calculate Full/Repeat Retention in BigQuery SQL

I am trying to calculate a "rolling retention" or "repeat retention" (Not sure what the appropriate name for this is), but a scenario where I only want to count the proportion of users who place an order every single month consecutively.
So if 10 users place an order in Jan 2020, and 5 of them come back in Feb, that would equal a 50% retention.
Now for March, I only want to consider the 5 users who ordered in February, still taking note of the total January cohort size.
So if 2 users from February come back in March, retention for March will be 2/10 = 20%. If a user from Jan who didn't return in Feb, places an order in March, they will not be included in the calculation for March, because they did not return in February.
Basically, this retention will progressively decrease to 0% and can never increase.
Here is what I have done so far:
WITH first_order AS (SELECT
customerEmail,
MIN(orderedat) as firstOrder,
FROM fact AS fact
GROUP BY 1 ),
cohort_data AS (SELECT
first_order.customerEmail,
orderedAt as order_month,
MIN(FORMAT_DATE("%y-%m (%b)", date(firstorder))) as cohort_month,
FROM first_order as first_order
LEFT JOIN fact as fact
ON first_order.customeremail = fact.customeremail
GROUP BY 1,2, FACT.orderedAt),
cohort_count AS (select cohort_month, count(distinct customeremail) AS total_cohort_count FROM cohort_data GROUP BY 1 )
SELECT
cd.cohort_month,
date_trunc(date(cd.order_month), month) as order_month,
total_cohort_count,
count(distinct cd.customeremail) as total_repeat
FROM cohort_data as cd
JOIN cohort_data as last_month
ON cd.customeremail= last_month.customeremail
and date(cd.order_month) = date_add(date(last_month.order_month), interval 1 month)
LEFT JOIN cohort_count AS cc
on cd.cohort_month = cc.cohort_month
GROUP BY 1,2,3
ORDER BY cohort_month, order_month ASC
Here is the result. I'm not sure where I got it wrong but the numbers are too small and the retention increases in some months which shouldn't be.
I did an INNER JOIN in the last query so I could compare the previous month to the current month, but it didn't work exactly how I wanted.
Sample Data:
I'd appreciate any help
I would start with one row per customer per month. Then, I would enumerate the customer/months and keep only those with no gaps . . . and aggregate:
with customer_months as (
select customer_email,
date_trunc(ordered_at, month) as yyyymm,
min(date_trunc(ordered_at, month)) over (partition by customer_email) as first_yyyymm
from cohort_data
group by 1, 2
)
select first_yyyymm, yyyymm, count(*)
from (select cm.*,
row_number() over (partition by custoemr_email order by yyyymm) as seqnum
from customer_months cm
) cm
where yyyymm = date_add(first_yyyymm, interval seqnum - 1 month)
group by 1, 2
order by 1, 2;

Calculate average days between orders The last three records tsql

I trying to take an average per customer, but you're not grouping by customer.
I would like to calculate the average days between several order dates from a table called invoice. For each BusinessPartnerID, what is the average days between orders i want average days last three records orders .
I got the average of all order for each user but need days last three records orders
The sample table is as below
;WITH temp (avg,invoiceid,carname,carid,fullname,mobail)
AS
(
SELECT AvgLag = AVG(Lag) , Lagged.idinvoice,
Lagged.carname ,
Lagged.carid ,Lagged.fullname,Lagged.mobail
FROM
(
SELECT
(car2.Name) as carname ,
(car2.id) as carid ,( busin.Name) as fullname, ( busin.Mobile) as mobail , INV.Id as idinvoice , Lag = CONVERT(int, DATEDIFF(DAY, LAG(Date,1)
OVER (PARTITION BY car2.Id ORDER BY Date ), Date))
FROM [dbo].[Invoice] AS INV
JOIN [dbo].[InvoiceItem] AS INITEM on INV.Id=INITEM.Invoiceid
JOIN [dbo].[BusinessPartner] as busin on busin.Id=INV.BuyerId and Type=5
JOIN [dbo].[Product] as pt on pt.Id=INITEM.ProductId and INITEM.ProductId is not null and pt.ProductTypeId=3
JOIN [dbo].[Car] as car2 on car2.id=INv.BusinessPartnerCarId
) AS Lagged
GROUP BY
Lagged.carname,
Lagged.carid,Lagged.fullname,Lagged.mobail, Lagged.idinvoice
-- order by Lagged.fullname
)
SELECT * FROM temp where avg is not null order by avg
I don't really see how your query relate to your question. Starting from a table called invoice that has columns businesspartnerid, and date, here is how you would take the average of the day difference between the last 3 invoices of each business partner:
select businesspartnerid,
avg(1.0 * datediff(
day,
lag(date) over(partition by businesspartnerid order by date),
date
) avg_diff_day
from (
select i.*,
row_number() over(partiton by businesspartnerid order by date desc) rn
from invoice i
) i
where rn <= 3
group by businesspartnerid
Note that 3 rows gives you 2 intervals only, that will be averaged.

SQL - lag variable creation using window function

I have daily city level data with some counts. I have to aggregate this data at monthly level(1st day of each month) and then create lag variables based on last 1 week from 1st day of month.
I have used following code to create lag variables for last 1 month using (after aggregating data at monthly level ( with 1st date of month)
sum(count) over (partition by City order by month_date rows between 1 preceding and 1 preceding) as last_1_month_count
Is there a way to aggregate data at monthly level and create lag variables based on last 7,14,21,28 days using window function?
you can use this L
select
CITY
, month(Date)
, year(date)
, sum(count)
from table1
where date < Datediff(days , 7 , getdate())
group by
City
, month(Date)
, year(date)
I think you're looking for something like this. The first cte summarizes city counts to the day, week, month, year. The second summarizes the counts to the week, month, year. To group sales by weeks starting from the 1st day it uses the DAY function along with YEAR and MONTH. Since DAY returns and integer, groups of distinct weeks can be created by dividing by 7, i.e. DAY(day_dt)/7.
One way to get the prior week sales would be to join the week sales summary cte to itself where the week is offset by -1. Since the prior week might possible have 0 sales it seems safer to LEFT JOIN than to use LAG imo
with
day_sales_cte(city, day_dt, yr, mo, wk, sum_count) as (
select city, day_dt, year(day_dt), month(day_cte), day(day_dt)/7, sum([count]) sum_counts
from city_level_data
group by city, day_dt, year(day_dt), month(day_cte), day(day_dt)/7)
wk_sales_cte(city, yr, mo, wk, sum_count) as (
select city, yr, mo, wk, sum(sum_counts) sum_counts
from sales_cte
group by city, yr, mo, wk)
select ws.*, ws2.sum_sales prior_wk_sales
from wk_sales_cte ws
left join wk_sales_cte ws2 on ws.city=ws2.city
and ws.yr=ws2.yr
and ws.mo=ws2.mo
and ws.wk=ws.wk-1;

I'm trying to calculate the difference between two weeks but I'm getting a weird peak when plotting the results ( SQL / BigQuery )

so I have this daily table that contains the number of visitors per store, everyday.
My tables columns are:
Date
Store
Number_of_Visitors
Views : number of views of the stores' ads.
So I first started with aggregating my table to a weekly table so that I can calculate the variance between a week and the next one.
Here is how I defined variance:
Variance = `Number Of Visitors in WEEK N+1 / Number of Visitors in WEEK N
I wrote the following query to do that (new table called: weekly)
SELECT
year_week,
min(date) as date,
Store,
SUM(Number_Of_Visitors) AS TOTAL_VISITORS
FROM (
SELECT
*,
CONCAT(cast((extract(YEAR from date)), LPAD(cast((extract(WEEK from date)) as string), 2, '0') ) AS year_week
FROM `my-project`)
GROUP BY
year_week, Store
ORDER BY year_week
Then, in order to calculate the variance, I used the following query as well:
SELECT
base.*,
((base.TOTAL_VISITORS-lw.TOTAL_VISITORS)/lw.TOTAL_VISITORS) AS VAR_FF,
FROM
`weekly` base
JOIN (
SELECT
* EXCEPT (date),
DATE_ADD(DATE(TIMESTAMP(date)), INTERVAL 1 Week)AS n_date
FROM
`weekly` ) lw
ON
base.date = lw.n_date
AND base.Store= lw.Store
When I'm plotting the variance (VAR_FF) using Data Studio and I'm getting the following plot that doesnt 't seem to be making sense with the high peak in the middle;
I am thinking your code should look like this:
SELECT date_trunc(date, week) as year_week,
Store,
SUM(Number_Of_Visitors) AS TOTAL_VISITORS,
(1 -
(LAG(SUM(Number_Of_Visitors)) OVER (PARTITION BY Store ORDER BY MIN(date) /
SUM(Number_Of_Visitors)
)
) as VAR_FF,
FROM`my-project`
GROUP BY year_week, Store
ORDER BY year_week;
I'm not sure what your weird calculations for calculating the week are really doing. This is based on the previous week in the data.