Q) Write a query to return Territory and corresponding Sales Growth (compare growth between periods Q4-2019 vs Q3-2019) - sql

Q) Write a query to return Territory and corresponding Sales Growth (compare growth between periods Q4-2019 vs Q3-2019).
Tables given-
Cust_Sales: -Cust_id,product_sku,order_date,order_value,order_id,month
Cust_Territory: cust_id,territory_id,customer_city,customer_pincode
Use tables FCT_CUSTOMER_SALES (which has sales for each Customer) and MAP_CUSTOMER_TERRITORY (which provides Territory-to-Customer mapping) for this question.
Output format-
TERRITORY_ID | SALES_GROWTH
My solution-
Select ((q2.claims - q1.claims)/q1.claims * 100) AS SALES_GROWTH , c.territory_id
From
(select sum(s.order_value) from FCT_CUSTOMER_SALES s inner join MAP_CUSTOMER_TERRITORY c on s.customer_id=c.customer_id where s.order_datetime between 1/07/2019 and 30/09/2019 group by c.territory_id) as q1.claims,
(select sum(s.order_value) from FCT_CUSTOMER_SALES s inner join MAP_CUSTOMER_TERRITORY c on s.customer_id=c.customer_id where s.order_datetime between 1/10/2019 and 31/12/2019 group by c.territory_id) as q2.claims
Group by c.territory_id
My solution is showing up as incorrect I would request anyone who can help me out with the solution and let me know where my mistake is

One option uses conditional aggregation. The idea is to filter the table on the two quarters at once, then use case expressions within the sum() aggregate function to compute the sales of each of them:
select
c.territory_id,
( sum(case when s.order_date >= date '2020-01-01' then s.order_value end)
- sum(case when s.order_date < date '2020-01-01' then s.order_value end)
) / (sum(case when s.order_date < date '2020-01-01' then s.order_value end)) * 100.0 as sales_growth
from fct_customer_sales s
inner join map_customer_territory c on s.customer_id = c.customer_id
where s.order_datetime >= date '2020-01-07' and s.order_datetime < '2020-01-01'
group by c.territory_id
You did not tell which database you are using, while date features are highly vendor-dependent. This uses the standard DATE syntax to declare the literal dates - you might need to adapat that if your database does not support it.

Related

Can I left join twice to do multiple calculations?

I am trying to calculate if a member shops in January, what proportion shop again in February and what proportion shop again within 3 months. Ultimately to create a table similar to the image attached.
I have tried the below code. The first left join works, but when I add the second one to calculate within_3months the error: "FROM keyword not found where expected" is shown (for the separate line). Can I left join twice or must I do separate scripts for columns?
, count(distinct B.members)/count(distinct A.members) *100 as 1month_retention_rate
select
year_month_january21
, count(distinct A.members) as num_of_mems_shopped_january21
, count(distinct B.members)as retained_february21
, count(distinct B.members)/count(distinct A.members) *100 as 1month_retention_rate
, count(distinct C.members)/count(distinct A.members) *100 as within_3months
from
(select
members
, year_month as year_month_january21
from table.members t
join table.date tm on t.dt_key = tm.date_key
and year_month = 202101
group by
members
, year_month) A
left join
(select
members
, year_month as year_month_february21
from table.members t
join table.date tm on t.dt_key = tm.date_key
and year_month = 202102
group by
members
, year_month) B on A.members = B.members
left join
(select
members
, year_month as year_month_3months
from table.members t
join table.date tm on t.dt_key = tm.date_key
and year_month between 202102 and 202104
group by
members
, year_month) C on A.members = C.members
group by
year_month_january21;
I have tried left creating a separate time table and joining to this. It does not work. Doing calculations separately works but I must do this for multiple time frames so will take a long time.
The error isn't coming from the added left join, it's from the as 1month_retention_rate part, because it's an illegal name.
You can see that more simply with:
select dummy as 1month_retention_rate
from dual;
ORA-00923: FROM keyword not found where expected
You could change the column alias so it follows the naming rules (specifically here, does not start with a digit), or if that specific name is actually required then you could make it a quoted identifier - generally not a good option, but sometimes OK in the final output of a query.
fiddle
So in your code you would just change your new line
, count(distinct B.members)/count(distinct A.members) *100 as 1month_retention_rate
to something like
, count(distinct B.members)/count(distinct A.members) *100 as one_month_retention_rate
or with a quoted identifier
, count(distinct B.members)/count(distinct A.members) *100 as "1month_retention_rate"
fiddle - which still errors but now with ORA-00942 as I don't have your tables, and that is after changing your obfuscated schema/table names to something legal too.
There may be more efficient ways to perform the calculation, but that's a separate issue...
I could understand that you want to get :
count of all members who visited in Jan.
count of all members who visited in Jan and visited again in Feb.
count of all members who visited in Jan and visited again in Feb, Mars and April.
If my understanding is true then you could simplify your inner query using IF instead of LEFT JOIN .
Take a look on the following query. Assuming that table members have an ID field :
SELECT
mem_jan AS num_of_mems_shopped_january21,
mem_feb AS retained_february21,
mem_feb / mem_jan * 100 as 1month_retention_rate
mem_3m / mem_jan * 100 as within_3months
FROM(
SELECT
SUM(IF(mm_jan>0,1,0) AS mem_jan,
SUM(IF(mm_jan>0 AND mm_feb>0,1,0) AS mem_feb,
SUM(IF(mm_jan>0 AND mm_count_3m>0,1,0) AS mem_3m
FROM
(
SELECT
t.Id,
SUM(IF(year_month = 202101, 1,0)) AS mm_jan, /*visit for a member in Jan*/
SUM(IF(year_month = 202102, 1,0)) AS mm_feb, /*visit for a member in Feb*/
SUM(IF(year_month between 202102 and 202104,1,0)) AS mem_3m/*visit for a member in 3 months*/
FROM
table.members t
join table.date tm on t.dt_key = tm.date_key
WHERE
year_month between 202101 and 202104
GROUP BY
t.Id
) AS t1
) AS t2
This is not a final running query but it can explain my idea. According to your engine you may use CASE or IF THEN ELSE
Don't use multiple joins, count the shops per member per month and then use conditional aggregation.
In Oracle, that would be:
SELECT 202101 AS year_month,
COUNT(CASE WHEN cnt_202101 > 0 THEN 1 END)
AS members_shopped_202101,
COUNT(CASE WHEN cnt_202101 > 0 AND cnt_202102 > 0 THEN 1 END)
AS members_retained_202102,
COUNT(CASE WHEN cnt_202101 > 0 AND cnt_202102 > 0 THEN 1 END)
/ COUNT(CASE WHEN cnt_202101 > 0 THEN 1 END) * 100
AS one_month_retention_rate,
COUNT(CASE WHEN cnt_202101 > 0 AND (cnt_202102 > 0 OR cnt_202103 > 0 OR cnt_202104 > 0) THEN 1 END)
/ COUNT(CASE WHEN cnt_202101 > 0 THEN 1 END) * 100
AS within_3months
FROM (
SELECT members,
year_month
FROM members m
INNER JOIN date d
ON m.dt_key = d.date_key
)
PIVOT (
COUNT(*)
FOR year_month IN (
202101 AS cnt_202101,
202102 AS cnt_202102,
202103 AS cnt_202103,
202104 AS cnt_202104
)
);

How to solve a nested aggregate function in SQL?

I'm trying to use a nested aggregate function. I know that SQL does not support it, but I really need to do something like the below query. Basically, I want to count the number of users for each day. But I want to only count the users that haven't completed an order within a 15 days window (relative to a specific day) and that have completed any order within a 30 days window (relative to a specific day). I already know that it is not possible to solve this problem using a regular subquery (it does not allow to change subquery values for each date). The "id" and the "state" attributes are related to the orders. Also, I'm using Fivetran with Snowflake.
SELECT
db.created_at::date as Date,
count(case when
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-15,Date) and dateadd(day,-1,Date)) then db.id end)
= 0) and
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-30,Date) and dateadd(day,-16,Date)) then db.id end)
> 0) then db.user end)
FROM
data_base as db
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
In other words, I want to transform the below query in a way that the "current_date" changes for each date.
WITH completed_15_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-15,current_date) and dateadd(day,-1,current_date)
group by User
),
completed_16_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-30,current_date) and dateadd(day,-16,current_date)
group by User
)
SELECT
date(db.created_at) as Date,
count(distinct case when comp_15.completadas = 0 and comp_16.completadas > 0 then comp_15.user end) as "Total Users Churn",
count(distinct case when comp_15.completadas > 0 then comp_15.user end) as "Total Users Active",
week(Date) as Week
FROM
data_base as db
left join completadas_15_days_before as comp_15 on comp_15.user = db.user
left join completadas_16_days_before as comp_16 on comp_16.user = db.user
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
Does anyone have a clue on how to solve this puzzle? Thank you very much!
The following should give you roughly what you want - difficult to test without sample data but should be a good enough starting point for you to then amend it to give you exactly what you want.
I've commented to the code to hopefully explain what each section is doing.
-- set parameter for the first date you want to generate the resultset for
set start_date = TO_DATE('2020-01-01','YYYY-MM-DD');
-- calculate the number of days between the start_date and the current date
set num_days = (Select datediff(day, $start_date , current_date()+1));
--generate a list of all the dates from the start date to the current date
-- i.e. every date that needs to appear in the resultset
WITH date_list as (
select
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', current_date())
) as date_item
from table (generator(rowcount => ($num_days)))
)
--Create a list of all the orders that are in scope
-- i.e. 30 days before the start_date up to the current date
-- amend WHERE clause to in/exclude records as appropriate
,order_list as (
SELECT created_at, rt_id
from data_base
where created_at between dateadd(day,-30,$start_date) and current_date()
and state = 'finished'
)
SELECT dl.date_item
,COUNT (DISTINCT ol30.RT_ID) AS USER_COUNT
,COUNT (ol30.RT_ID) as ORDER_COUNT
FROM date_list dl
-- get all orders between -30 and -16 days of each date in date_list
left outer join order_list ol30 on ol30.created_at between dateadd(day,-30,dl.date_item) and dateadd(day,-16,dl.date_item)
-- exclude records that have the same RT_ID as in the ol30 dataset but have a date between 0 amd -15 of the date in date_list
WHERE NOT EXISTS (SELECT ol15.RT_ID
FROM order_list ol15
WHERE ol30.RT_ID = ol15.RT_ID
AND ol15.created_at between dateadd(day,-15,dl.date_item) and dl.date_item)
GROUP BY dl.date_item
ORDER BY dl.date_item;

YTD for the below query

I want to add add the Year to date component to this code. I have tried some other ways but I am not getting what I would like to see. Can someone please help me revised this to include the YTD in addition to the Month to date that is already there?
SELECT
COST__DESC,
ST.AD_SRV_MTN AS MONTH_OF_AD,
COUNT(DISTINCT CM.CM_NBR) AS CMS,
MEM_MO AS MBR_MTH,
CMS/MBR_MTH*1000 AS CMS_PER_1000
FROM XTR.FT_CM AS CM
JOIN XTR.FT_ST AS ST ON ST.CM_NBR = CM.CM_NBR
JOIN XTR.DIM_MED_CST AS MC ON ST.CST_CK = MCC.CST_CK
JOIN XTR.DIM_AF AS AFF ON ST.PRO_CK = AFF.AFF_CK
JOIN XTR.DIM_ADJDCTN_STAT AS A_S ON ST.ADJDCTN_STAT_CK = A_S.ADJDCTN_STAT_CK
JOIN XTR.DIM_ADJ_OT AS OT ON ST.ADJ_CK = OT.ADJ_CK
LEFT JOIN
(SELECT
CALENDAR_YEAR_MONTH as YEAR_MO,
SUM(MBR.COUNT_NBR) as MEM_MO
FROM XTR.FT_MBR_MONTHS MBR
INNER JOIN DIM_MBR_C ON MBR.DB_MBR_CK = DIM_MBR_C.DB_MBR_CK
AND MBR.DATE_CK BETWEEN DIM_MBR_C.DB_eff_date_ck
AND DIM_MBR_C.DB_END_DATE_CK
INNER JOIN DIM_DATE DT ON ELI_DATE_CK = DT.DATE_CK
WHERE MBR.F_C_CK = 500058321 AND YEAR_MO >= 201701
GROUP BY 1) MM ON ST.AD_SRV_MTN = MM.YEAR_MO
WHERE ST.F_C_CK = 500058321 AND ST.ST_START_DATE_CK >= 20200101
AND ST.AD_SRV_MTN > 201912 AND MC.MED_DESC IN ('Er', 'IP')
AND ST.AD_SRV_MTN < ((EXTRACT (YEAR FROM CURRENT_DATE) *100) +
EXTRACT (MONTH FROM CURRENT_DATE))
GROUP BY 1,2,4
ORDER BY 1,2
Honestly I don't really get your SQL and what is counted, but: Your can play with dates quite easy in Teradata, as Dates are stored (and can be used) internally as INTEGER. Just keep in mind year 1900 as year 0 and format YYYYMMDD.
So e.g. 16-Apr-2020 is in Format YYYYMMDD 20200416 and if you take 1900 as 0 you'll end up with 1200416 which is the internal format. Just try SELECT CURRENT_DATE (INT); - So if you want compare YearNumers you just have to divide by 10000.
With this your can implement YTD as SUM (CASE WHEN CURRENT_DATE/10000 = <YourDateField>/10000 THEN <YourKPI> else 0 END) as YourKPI_YTD. Counting can be done by SUM...THEN 1 ELSE 0 END....

Summing Two Columns from Two Tables with Two Dates

I work for a CPG company and need to create a report that compares the previous month's delivered units to the next month's forecast. (Simply, our forecasting tool screws up occasionally and this will help identify when the forecast is off.)
My issue is my SQL query is summing forecast sales correctly, but the sum of total delivered is not respecting the dates I have in my WHERE clause -- it's summing total delivered for as far back as the query can reach.
Here is my query:
SELECT
DelUnits.Customer, DelUnits.ObsText01,
FinalFcst.SKU, FinalFcst.Customer,
SUM(DelUnits.Value) AS TotalDelivered,
SUM(FinalFcst.FinalFcst) AS ForecastSales
FROM
DelUnits
LEFT JOIN
FinalFcst ON DelUnits.Customer = FinalFcst.Customer
WHERE
(FinalFcst.DT >= '2018-01-01' and FinalFcst.DT <= '2018-01-31')
AND (DelUnits.Date >= '2017-12-01' and DelUnits.Date <= '2017-12-31')
AND DelUnits.ObsText01 = '10_LB'
AND FinalFcst.SKU = '10_LB'
GROUP BY
DelUnits.Customer, DelUnits.ObsText01, FinalFcst.SKU, FinalFcst.Customer
Again, the query seems to work correctly for the final forecast (summing the forecast between 1/1/18 - 1/31/18) but sums the entire delivery history for a customer. I don't understand why it won't sum the delivery history for just 12/1/17 - 12/31/17.
Thank you for your help!
Presumably, there is only one row for FinalFcst. So, either include it in the GROUP BY clause or use MAX() instead of SUM():
max(FinalFcst.FinalFcst) as ForecastSales
One way to achieve this is to calculate TotalDelivered and ForecastSales in 2 different queries and then join them together.
Try this:
SELECT DelUnits.customer,
DelUnits.obstext01,
FinalFcst.sku,
FinalFcst.customer,
totaldelivered,
forecastsales
FROM (SELECT customer,
obstext01,
Sum(value) AS TotalDelivered
FROM delunits
WHERE date >= '2017-12-01'
AND date <= '2017-12-31'
AND obstext01 = '10_LB'
GROUP BY customer,
obstext01) DelUnits
LEFT JOIN (SELECT customer,
sku,
Sum(finalfcst) AS ForecastSales
FROM finalfcst
WHERE dt >= '2018-01-01'
AND dt <= '2018-01-31'
AND sku = '10_LB'
GROUP BY customer,
sku) FinalFcst ON DelUnits.customer = FinalFcst.customer
You have a many to many relationship between the tables. Ultimately you need to SUM() one table before joining to the other to create a one to many relationship, or you end up duplicating records.
My favorite approach is a derived table:
SELECT C.Customer,
C.ObsText01,
FC.SKU,
C.TotalDelivered,
SUM(FC.FinalFcst) ForecastSales
FROM (SELECT SUM(Value) TotalDelivered, Customer, ObsText01
FROM DelUnits
WHERE Date >= '2017-12-01' AND Date <= '2017-12-31'
AND ObsText01 = '10_LB'
GROUP BY Customer) C
LEFT JOIN FinalFcst FC ON C.Customer = FC.Customer
AND FC.DT >= '2018-01-01'
AND FC.DT <= '2018-01-31'
AND FC.SKU = '10_LB'
GROUP BY C.Customer, C.ObsText01, FC.SKU, C.TotalDelivered
A couple things: Added your forecast table filters to the join predicate, since having those in the WHERE will create an INNER JOIN out of your LEFT JOIN. Also removed FC.Customer from the select and the group since it is redundant with C.Customer.
Maybe you could try to create a temp table to calculate the delivery history. I am not sure of the SQL Server verbiage, but something like this:
WITH DEL_HIST AS
(SELECT DelUnits.Customer,
DelUnits.ObsText01,
sum(DelUnits.Value) as TotalDelivered,
FROM DelUnits
Where(DelUnits.Date >= '2017-12-01' and DelUnits.Date <= '2017-12-31')
and DelUnits.ObsText01 = '10_LB'
Group By DelUnits.Customer, DelUnits.ObsText01)
SELECT
DEL_HIST.Customer,
DEL_HIST.ObsText01,
FinalFcst.SKU,
FinalFcst.Customer,
DEL_HIST.TotalDelivered,
sum(FinalFcst.FinalFcst) as ForecastSales
FROM DEL_HIST
left join FinalFcst ON DelUnits.Customer = FinalFcst.Customer
Where (FinalFcst.DT >= '2018-01-01' and FinalFcst.DT <= '2018-01-31')
and FinalFcst.SKU = '10_LB'
Group By DelUnits.Customer, DelUnits.ObsText01, FinalFcst.SKU, FinalFcst.Customer

how to return total sold p / year for several years in columnar format?

The SQL below give me the columns for Account, Name and the Total Sale for the year of 2015.But how, if possible, can I add another column for the previous year of 2014 ?
select a.AcctNo, b.Name, Sum(a.TotSold) as [ Total Sold ]
from Orders as A
Join Accounts as b on a.AcctNo = b.AcctNo
where (a.PurchaseDate between '1/1/2015' and '12/31/2015' )
Group by a.AcctNo, b.Name
You can use conditional aggregation:
select o.AcctNo, a.Name,
Sum(case when year(PurchaseDate) = 2015 then TotSold else 0 end) as tot_2015,
Sum(case when year(PurchaseDate) = 2014 then TotSold else 0 end) as tot_2014
from Orders o Join
Accounts a
on a.AcctNo = o.AcctNo
where PurchaseDate >= '2014-01-01'
PurchaseDate < '2016-01-01'
Group by o.AcctNo, a.Name ;
Notes:
Use ISO standard formats for date constants.
When using dates, try not to use between. It doesn't work as expected when there is a time component. The above inequalities work regardless of a time component.
Use table aliases that are abbreviations of the table name. That makes the query easier to follow.