I am trying to calculate if a member shops in January, what proportion shop again in February and what proportion shop again within 3 months. Ultimately to create a table similar to the image attached.
I have tried the below code. The first left join works, but when I add the second one to calculate within_3months the error: "FROM keyword not found where expected" is shown (for the separate line). Can I left join twice or must I do separate scripts for columns?
, count(distinct B.members)/count(distinct A.members) *100 as 1month_retention_rate
select
year_month_january21
, count(distinct A.members) as num_of_mems_shopped_january21
, count(distinct B.members)as retained_february21
, count(distinct B.members)/count(distinct A.members) *100 as 1month_retention_rate
, count(distinct C.members)/count(distinct A.members) *100 as within_3months
from
(select
members
, year_month as year_month_january21
from table.members t
join table.date tm on t.dt_key = tm.date_key
and year_month = 202101
group by
members
, year_month) A
left join
(select
members
, year_month as year_month_february21
from table.members t
join table.date tm on t.dt_key = tm.date_key
and year_month = 202102
group by
members
, year_month) B on A.members = B.members
left join
(select
members
, year_month as year_month_3months
from table.members t
join table.date tm on t.dt_key = tm.date_key
and year_month between 202102 and 202104
group by
members
, year_month) C on A.members = C.members
group by
year_month_january21;
I have tried left creating a separate time table and joining to this. It does not work. Doing calculations separately works but I must do this for multiple time frames so will take a long time.
The error isn't coming from the added left join, it's from the as 1month_retention_rate part, because it's an illegal name.
You can see that more simply with:
select dummy as 1month_retention_rate
from dual;
ORA-00923: FROM keyword not found where expected
You could change the column alias so it follows the naming rules (specifically here, does not start with a digit), or if that specific name is actually required then you could make it a quoted identifier - generally not a good option, but sometimes OK in the final output of a query.
fiddle
So in your code you would just change your new line
, count(distinct B.members)/count(distinct A.members) *100 as 1month_retention_rate
to something like
, count(distinct B.members)/count(distinct A.members) *100 as one_month_retention_rate
or with a quoted identifier
, count(distinct B.members)/count(distinct A.members) *100 as "1month_retention_rate"
fiddle - which still errors but now with ORA-00942 as I don't have your tables, and that is after changing your obfuscated schema/table names to something legal too.
There may be more efficient ways to perform the calculation, but that's a separate issue...
I could understand that you want to get :
count of all members who visited in Jan.
count of all members who visited in Jan and visited again in Feb.
count of all members who visited in Jan and visited again in Feb, Mars and April.
If my understanding is true then you could simplify your inner query using IF instead of LEFT JOIN .
Take a look on the following query. Assuming that table members have an ID field :
SELECT
mem_jan AS num_of_mems_shopped_january21,
mem_feb AS retained_february21,
mem_feb / mem_jan * 100 as 1month_retention_rate
mem_3m / mem_jan * 100 as within_3months
FROM(
SELECT
SUM(IF(mm_jan>0,1,0) AS mem_jan,
SUM(IF(mm_jan>0 AND mm_feb>0,1,0) AS mem_feb,
SUM(IF(mm_jan>0 AND mm_count_3m>0,1,0) AS mem_3m
FROM
(
SELECT
t.Id,
SUM(IF(year_month = 202101, 1,0)) AS mm_jan, /*visit for a member in Jan*/
SUM(IF(year_month = 202102, 1,0)) AS mm_feb, /*visit for a member in Feb*/
SUM(IF(year_month between 202102 and 202104,1,0)) AS mem_3m/*visit for a member in 3 months*/
FROM
table.members t
join table.date tm on t.dt_key = tm.date_key
WHERE
year_month between 202101 and 202104
GROUP BY
t.Id
) AS t1
) AS t2
This is not a final running query but it can explain my idea. According to your engine you may use CASE or IF THEN ELSE
Don't use multiple joins, count the shops per member per month and then use conditional aggregation.
In Oracle, that would be:
SELECT 202101 AS year_month,
COUNT(CASE WHEN cnt_202101 > 0 THEN 1 END)
AS members_shopped_202101,
COUNT(CASE WHEN cnt_202101 > 0 AND cnt_202102 > 0 THEN 1 END)
AS members_retained_202102,
COUNT(CASE WHEN cnt_202101 > 0 AND cnt_202102 > 0 THEN 1 END)
/ COUNT(CASE WHEN cnt_202101 > 0 THEN 1 END) * 100
AS one_month_retention_rate,
COUNT(CASE WHEN cnt_202101 > 0 AND (cnt_202102 > 0 OR cnt_202103 > 0 OR cnt_202104 > 0) THEN 1 END)
/ COUNT(CASE WHEN cnt_202101 > 0 THEN 1 END) * 100
AS within_3months
FROM (
SELECT members,
year_month
FROM members m
INNER JOIN date d
ON m.dt_key = d.date_key
)
PIVOT (
COUNT(*)
FOR year_month IN (
202101 AS cnt_202101,
202102 AS cnt_202102,
202103 AS cnt_202103,
202104 AS cnt_202104
)
);
Related
Unable to pivot multiple columns in snowflake and I would appreciate it if some one can help me:
I basically have the table attached in the screenshot in the left and need to change it to the format in the right. I wonder if pivot can work in this case ?
my current code:
select
CONCAT(RIGHT(TO_VARCHAR(YEAR(DATE)),2),'-Q',TO_VARCHAR(QUARTER(DATE)) ) closed_date,
IFNULL(sum(case when STAG='Closed' then REVENUE_AMOUNTS end),0) REVENUE AMER,
IFNULL(sum(case when STAG='Closed' then REVENUE_AMOUNTS end),0) REVENUE APAC,
IFNULL(sum(case when STAG='Closed' then REVENUE_AMOUNTS end),0) REVENUE EMEA
from REVENUE_TABLE
where 1=1
group by 1
order by 1 asc
link to screenshot
So assuming the SQL you have posted is more like this (with included fake data in a CTE)
WITH REVENUE_TABLE as (
SELECT * FROM VALUES
('Closed', 1, '2020-01-01'::date, 'amer'),
('Closed', 2, '2020-04-01'::date, 'apac'),
('Closed', 3, '2020-08-01'::date, 'emea'),
('Closed', 4, '2021-01-01'::date, 'emea')
v(stag, REVENUE_AMOUNTS, date, loc)
)
select
CONCAT(RIGHT(TO_VARCHAR(YEAR(DATE)),2),'-Q',TO_VARCHAR(QUARTER(DATE)) ) closed_date,
ZEROIFNULL(sum(IFF(loc='amer' AND STAG='Closed', REVENUE_AMOUNTS, null))) as REVENUE_AMER,
ZEROIFNULL(sum(IFF(loc='apac' AND STAG='Closed', REVENUE_AMOUNTS, null))) as REVENUE_APAC,
ZEROIFNULL(sum(IFF(loc='emea' AND STAG='Closed', REVENUE_AMOUNTS, null))) as REVENUE_EMEA
from REVENUE_TABLE
group by 1
order by 1 asc
I swapped you CASE for an IFF and puting the which column does it belong in. And I swapped IFNULL(x, 0) for ZEROIFNULL(x) while longer, it more intent clear.
which gives results that look like your existing output:
CLOSED_DATE
REVENUE_AMER
REVENUE_APAC
REVENUE_EMEA
20-Q1
1
0
0
20-Q2
0
2
0
20-Q3
0
0
3
21-Q1
0
0
4
so if that hold as "the way it is", then to get to "where you want to go" you need to find the distinct set of values or locations, and then join to your results based on that.
select l.loc,
ZEROIFNULL(sum(IFF(r.cd='20-Q1', r.REVENUE_AMOUNTS, null))) as "20-Q1",
ZEROIFNULL(sum(IFF(r.cd='20-Q2', r.REVENUE_AMOUNTS, null))) as "20-Q2",
ZEROIFNULL(sum(IFF(r.cd='20-Q3', r.REVENUE_AMOUNTS, null))) as "20-Q3",
ZEROIFNULL(sum(IFF(r.cd='21-Q1', r.REVENUE_AMOUNTS, null))) as "21-Q1"
from (
select distinct loc
FROM REVENUE_TABLE
) as l
left join (
select loc,
revenue_amounts,
CONCAT(RIGHT(TO_VARCHAR(YEAR(DATE)),2),'-Q',TO_VARCHAR(QUARTER(DATE)) ) cd
FROM REVENUE_TABLE
WHERE STAG='Closed'
) as r on l.loc = r.loc
group by 1
order by 1 asc;
gives:
LOC
20-Q1
20-Q2
20-Q3
21-Q1
amer
1
0
0
0
apac
0
2
0
0
emea
0
0
3
4
Now the downside of this pattern is you need to explicitly know the column names, but you have that problem in the PIVOT case as well. That could be worked around with Snowflake Scripting I believe.
Matched Sales are provided by the join, It's getting the unmatched that is eluding me.
CTE
With PriorSalesCTE
(
Item
Variant,
Sum(sales)
Date between 7/1/2020 and 7/5/2020
),
CurrentSalesCTE
(
Item
Variant,
Sum(sales)
Date between 7/1/2021 and 7/5/2021
)
Select
SUM(cs.Sales) ‘MatchedSales’
FROM PriorSalesCTE ps join CurrentSalesCTE ps
ON cs.Item = ps.Item
And cs.Variant = ps.Variant
Now I need the empty spaces on both sides
I need the sales for items sold in 2020 but not sold in 2021 – Lost Sales
Conversely, sales for 2021 that did not sell in 2020 – New Sales.
I tried adding these in the CTE as separate sections of the CTE, but the join doesn’t give me what I need.
Any suggestions? Is the CTE simply preventing me for getting everything and maybe add a UNION ALL query to get the unmatched values?
For your actual query, you could use a FULL JOIN, which will give you the results from either side also.
But I think there is another solution: you don't need to join separate queries for this, you can just use conditional aggregation
WITH SalesByItem AS (
SELECT
t.Item,
t.Variant
Sales2020 = SUM(CASE WHEN Date BETWEEN '20200701' and '20200705' THEN t.Sales END),
Sales2021 = SUM(CASE WHEN Date BETWEEN '20210701' and '20210705' THEN t.Sales END)
FROM YourTable t
WHERE (Date BETWEEN '20200701' and '20200705'
OR Date BETWEEN '20210701' and '20210705')
GROUP BY
t.Item,
t.Variant
)
SELECT
NewSales = SUM(CASE WHEN Sales2020 IS NULL THEN Sales2021 END),
MatchedSales = SUM(CASE WHEN Sales2020 IS NOT NULL AND Sales2021 IS NOT NULL THEN Sales2021 END),
LostSales = SUM(CASE WHEN Sales2021 IS NULL THEN Sales2020 END)
FROM SalesByItem s;
Searched Stackoverflow, and was not able to find an answer to my question (maybe it's there, but did not see one).
Have the following query which lists the mileage used, fuel cost, and fuel quantity for multiple vehicles stored at a location in the MAIN table. Also have a sub-query to calculate the cost per mile - and in that subquery is a WHERE clause to not calculate unless the fuel_qty > 0 (cannot divide by zero, unless you are Chuck Norris - ha ha). Also need to display a zero for the fuel_qty (in line 3 of this query) if it is a zero value. Am getting an error with this query - saying that it is "not a single-group group function". Is there something which I am missing or not seeing?
Have tried adding cost_per_mile to the group by clause, but received an "invalid identifier" error. Then also added a group by clause to the subquery - but that also did not work.
select cost.mileage_useage
, cost.fuel_cost
, cost.fuel_qty
, (select (sum(cost1.mileage_usage / cost1.fuel_qty) * cost1.fuel_cost)
from cost cost1
where cost1.fuel_qty > 0) as cost_per_mile
from cost
inner join main on main.equip_no = cost.equip_no
where main.stored_loc = 4411
group by
cost.mileage_useage
, cost.fuel_cost
, cost.fuel_qty
Why doesn't this do what you want?
select c.mileage_useage, c.fuel_cost, c.fuel_qty,
(sum(c.mileage_usage) * c.fuel_cost /
nullif(c.fuel_qty, 0)
) as cost_per_mile
from cost c inner join
main m
on m.equip_no = c.equip_no
where main.stored_loc = 4411
group by c.mileage_useage, c.fuel_cost, c.fuel_qty
Believe I found an answer - thank you for all your help! This takes into consideration if the mileage useage = 0 or is a negative number. Also if the fuel quantity = 0 then that portion of the equation is not possible to divide by a zero value. It may look a little strange, but this works!
select cost.mileage_useage
, cost.fuel_cost
, cost.fuel_qty
, ( sum(((CASE WHEN cost.mileage_usage = 0 THEN 1
WHEN cost.mileage_usage < 0 THEN TO_NUMBER(NULL)
ELSE cost.mileage_usage END)
/ DECODE(eq_cost.fuel_qty,0, 1, eq_cost.fuel_qty))
* eq_cost.fuel_cost )) as cost_per_mile
from cost
inner join main on main.equip_no = cost.equip_no
where main.stored_loc = 4411
group by cost.mileage_useage
, cost.fuel_cost
, cost.fuel_qty
You can further simplify it as following:
select cost.mileage_useage
, cost.fuel_cost
, cost.fuel_qty
, sum((CASE WHEN cost.mileage_usage = 0 THEN eq_cost.fuel_cost
WHEN cost.mileage_usage > 0 THEN cost.mileage_usage * eq_cost.fuel_cost END)
/ (case when eq_cost.fuel_qty = 0 then 1 else eq_cost.fuel_qty end)) as cost_per_mile
from cost
inner join main on main.equip_no = cost.equip_no
where main.stored_loc = 4411
group by cost.mileage_useage
, cost.fuel_cost
, cost.fuel_qty;
Cheers!!
I have 2 separate queries below which run correctly.Now I've created a calculated column to provide a count of working days by YMs and would like to bring this through to query1(the join would be query1.Period = query2.Yms)
please see the query and outputs below.
SELECT Client, ClientGroup, Type, Value, Period, PeriodName, PeriodNumber, ClientName
FROM metrics.dbo.vw_KPI_001_Invoice
select YMs,sum(case when IsWorkDay = 'X' then 1 else 0 end) from IESAONLINE.Dbo.DS_Dates
where Year > '2013'
group by YMs
Query 1
Client ClientGroup Type Value Period PeriodName PeriodNumber ClientName
0LG0 KarroFoods Stock 5691.68 201506 Week 06 2015 35 Karro Foods Scunthorpe
Query 2
YMs (No column name)
201401 23
Would the following work:
SELECT Client, ClientGroup, Type, Value, Period, PeriodName, PeriodNumber, ClientName, cnt
FROM metrics.dbo.vw_KPI_001_Invoice q1
INNER JOIN (select YMs,sum(case when IsWorkDay = 'X' then 1 else 0 end) as cnt from IESAONLINE.Dbo.DS_Dates
where Year > '2013'
group by YMs ) q2 ON q1.Period = q2.YMs
If a value isn't always available then you might consider changing the INNER JOIN to an OUTER JOIN.
I am currently working on aggregating the sum qty of "OUT" and "OUT+IN".
Current query is the following:
Select
a.Date
,a.DepartmentID
from
(Select
dris.Date
,dris.RentalItemKey
,dris.WarehouseKey
,ISNULL((Select TOP 1 dris.Date where OutQty=1 order by Date DESC),(Select ri.ReceiveDate from RentalItem ri where ri.RentalItemKey=dris.RentalItemKey)) as LastOutDate
,(Select d.DepartmentKey from Department d where d.Department=i.Department)as DepartmentID
, (CASE WHEN OutQty=1 OR (RepairQty=1 AND RentedQty=1) THEN 'IN' ELSE 'OUT' END) as Status
from DailyRentalItemStatus dris
inner join Inventory i on i.InventoryKey=dris.InventoryKey
where dris.Date='2014-08-02'
and i.ICode='3223700'
and i.Classification IN ('ITEM', 'ACCESSORY')
and i.AvailFor='RENT'
and i.AvailFrom='WAREHOUSE'
and dris.Warehouse='TORONTO')a
and I would like the result to be the following:
Date WarehouseID DepartmentID ICode Owned NotRedundant Out
2014-08-02 001T A00G 3223700 30 30 19
Where Owned is is The items with status as "OUT+IN", out is "OUT" and Not Redundant as where the lastout date is within the last 2 years from the date.
Help would be greatly appreciated.
I think this is close to what you're looking for. Your Not Redundant description, is hard to understand. Which dates are you comparing. The same trick for OUT may be used for that though.
My query also assumes that you always have a department connecting to the inventory table and that there's always a rentalitem.receivedate.
;WITH LastOut as
(Select Max(Date) as LastOutDate, rentalItemKey
from DailyRentalItemStatus
WHERE OutQty=1
)
Select
dris.Date
,dris.WarehouseKey as WarehouseID
,d.DepartmentKey as DepartmentID
, i.Icode
--,ISNULL((Select TOP 1 dris.Date where OutQty=1 order by Date DESC),(Select ri.ReceiveDate from RentalItem ri where ri.RentalItemKey=dris.RentalItemKey)) as LastOutDate
, Count(1) as Owned
, Sum(CASE WHEN NOT (OutQty=1 OR (RepairQty=1 AND RentedQty=1)) THEN 1 ELSE 0 END) as OUT
, Sum(CASE WHEN DateAdd(yy, 2,dris.[date]) >= ISNULL(lastout.lastoutdate, ri.ReceiveDate) then 1 else 0 end) as NonRedundent
from DailyRentalItemStatus dris
inner join Inventory i on i.InventoryKey=dris.InventoryKey
INNER JOIN Department d ON d.Department=i.Department
INNER JOIN RentalItem ri ON ri.RentalItemKey=dris.RentalItemKey
LEFT OUTER JOIN LastOUT ON LastOut.rentalItemKey=dris.RentalItemKey
where dris.Date='2014-08-02'
and i.ICode='3223700'
and i.Classification IN ('ITEM', 'ACCESSORY')
and i.AvailFor='RENT'
and i.AvailFrom='WAREHOUSE'
and dris.Warehouse='TORONTO'
Group BY dris.Date, d.DepartmentKey, Dris.WarehouseKey , i.icode