In the project I am currently working on in my company, I would like to show sales related KPIs together with Customer Score metric on SQL / Tableau / BigQuery
The primary key is order id in both tables. However, order date and the date we measure Customer Score may be different. For example the the sales information for an order that is released in Feb 2020 will be aggregated in Feb 2020, however if the customer survey is made in March 2020, the Customer Score metric must be aggregated in March 2020. And what I would like to achieve in the relational database is as follows:
Sales:
Order ID
Order Date(m/d/yyyy)
Sales ($)
1000
1/1/2021
1000
1001
2/1/2021
2000
1002
3/1/2021
1500
1003
4/1/2021
1700
1004
5/1/2021
1800
1005
6/1/2021
900
1006
7/1/2021
1600
1007
8/1/2021
1900
Customer Score Table:
Order ID
Customer Survey Date(m/d/yyyy)
Customer Score
1000
3/1/2021
8
1001
3/1/2021
7
1002
4/1/2021
3
1003
6/1/2021
6
1004
6/1/2021
5
1005
7/1/2021
3
1006
9/1/2021
1
1007
8/1/2021
7
Expected Output:
KPI
Jan-21
Feb-21
Mar-21
Apr-21
May-21
June-21
July-21
Aug-21
Sep-21
Sales($)
1000
2000
1500
1700
1800
900
1600
1900
AVG Customer Score
7.5
3
5.5
3
7
1
I couldn't find a way to do this, because order date and survey date may/may not be the same.
For sample data and expected output, click here.
I think what you want to do is aggregate your results to the month (KPI) first before joining, as opposed to joining on the ORDER_ID
For example:
with order_month as (
select date_trunc(order_date, MONTH) as KPI, sum(sales) as sales
from `testing.sales`
group by 1
),
customer_score_month as (
select date_trunc(customer_survey_date, MONTH) as KPI, avg(customer_score) as avg_customer_score
from `testing.customer_score`
group by 1
)
select coalesce(order_month.KPI,customer_score_month.KPI) as KPI, sales, avg_customer_score
from order_month
full outer join customer_score_month
on order_month.KPI = customer_score_month.KPI
order by 1 asc
Here, we aggregate the total sales for each month based on the order date, then we aggregate the average customer score for each month based on the date the score was submitted. Now we can join these two on the month value.
This results in a table like this:
KPI
sales
avg_customer_score
2021-01-01
1000
null
2021-02-01
2000
null
2021-03-01
1500
7.5
2021-04-01
1700
3.0
2021-05-01
1800
null
2021-06-01
900
5.5
2021-07-01
1600
3.0
2021-08-01
1900
7.0
2021-09-01
null
1.0
You can pivot the results of this table in Tableau, or leverage a case statement to pull out each month into its own column - I can elaborate more if that will be helpful
Related
I am looking to filter very large tables to the latest entry per user per month. I'm not sure if I found the best way to do this. I know I "should" trust the SQL engine (snowflake) but there is a part of me that does not like the join on three columns.
Note that this is a very common operation on many big tables, and I want to use it in DBT views which means it will get run all the time.
To illustrate, my data is of this form:
mytable
userId
loginDate
year
month
value
1
2021-01-04
2021
1
41.1
1
2021-01-06
2021
1
411.1
1
2021-01-25
2021
1
251.1
2
2021-01-05
2021
1
4369
2
2021-02-06
2021
2
32
2
2021-02-14
2021
2
731
3
2021-01-20
2021
1
258
3
2021-02-19
2021
2
4251
3
2021-03-15
2021
3
171
And I'm trying to use SQL to get the last value (by loginDate) for each month.
I'm currently doing a groupby & a join as follows:
WITH latest_entry_by_month AS (
SELECT "userId", "year", "month", max("loginDate") AS "loginDate"
FROM mytable
)
SELECT * FROM mytable NATURAL JOIN latest_entry_by_month
The above results in my desired output:
userId
loginDate
year
month
value
1
2021-01-25
2021
1
251.1
2
2021-01-05
2021
1
4369
2
2021-02-14
2021
2
731
3
2021-01-20
2021
1
258
3
2021-02-19
2021
2
4251
3
2021-03-15
2021
3
171
But I'm not sure if it's optimal.
Any guidance on how to do this faster? Note that I am not materializing the underlying data, so it is effectively un-clustered (I'm getting it from a vendor via the Snowflake marketplace).
Using QUALIFY and windowed function(ROW_NUMBER):
SELECT *
FROM mytable
QUALIFY ROW_NUMBER() OVER(PARTITION BY userId, year, month
ORDER BY loginDate DESC) = 1
This question already has answers here:
Is there a way to access the "previous row" value in a SELECT statement?
(9 answers)
Closed 7 months ago.
I have a table in SQL Server with sales price data of items on different dates like this:
Item
Date
Price
1
2021-05-01
200
1
2021-06-11
210
1
2021-06-27
225
1
2021-08-01
250
2
2021-02-10
600
2
2021-04-21
650
2
2021-06-17
675
2
2021-07-23
700
I'm creating a table that specifies the start and end date of prices as below:
Item
DateStart
Price
DateEnd
1
2021-05-01
200
2021-06-10
1
2021-06-11
210
2021-06-26
1
2021-06-27
225
2021-07-31
1
2021-08-01
250
Today date
2
2021-02-10
600
2021-04-20
2
2021-04-21
650
2021-06-16
2
2021-06-17
675
2021-07-22
2
2021-07-23
700
Today date
As you can see, the end date is one day less than the next price change date. I also have a calendar table called "DimDates" with one row per day. I had hoped to use joins but it doesn't do what I thought it would do. Any suggestions on how to write the query? I'm using SQL Server 2016.
We can use LEAD() here along with DATEADD():
WITH cte AS (
SELECT *, DATEADD(day, -1, LEAD(Date, 1, GETDATE())
OVER (PARTITION BY Item
ORDER BY Date)) AS LastDate
FROM yourTable
)
SELECT Item, Date AS DateStart, Price, LastDate AS DateEnd
FROM cte
ORDER BY Item, Date;
Demo
So my doubt is in sql. I am looking to find the total revenue of a parent account for the last 12 months.
The data will look something like this
revenue
name
month
year
10000
abc
201001
2010-01-12
10000
abc
201402
2014-02-14
2000
abc
201404
2014-04-12
3000
abc
201406
2014-06-30
30000
def
201301
2013-01-14
6000
def
201304
2013-04-12
9000
def
201407
2013-07-19
And the output should be something like this
revenue
name
month
year
Running Sum
10000
abc
201001
2010-01-12
10000
10000
abc
201402
2014-02-14
10000
2000
abc
201404
2014-04-12
12000
3000
abc
201406
2014-06-30
15000
30000
def
201301
2013-01-14
30000
6000
def
201304
2013-04-12
36000
9000
def
201407
2013-07-19
45000
I have tried using using windowing function something like this and the logic that I need
select revenue, name, date, month,
sum(revenue) over (partition by name order by month rows between '12 months' preceding AND CURRENT ROW )
from table
but the above command gives a syntax error
Redshift does not support intervals in the window frame specification.
So, convert to a number. A convenient one in this case is the number of months since some point in time:
select revenue, name, date, month,
sum(revenue) over (partition by name
order by datediff(month, '1900-01-01', month)
range between 12 preceding and current row
)
from table;
I will note that your logic adds up data from 13 months, not 12. I suspect you want between 11 preceding and current row.
You can use rows between if you have data for all months:
sum(revenue) over (partition by name
order by datediff(month, '1900-01-01', month)
rows between 12 preceding and current row
)
I am dealing with a sales order table (ORDER) that looks roughly like this (updated 2018/12/20 to be closer to my actual data set):
SOID SOLINEID INVOICEDATE SALESAMOUNT AC
5 1 2018-11-30 100.00 01
5 2 2018-12-05 50.00 02
4 1 2018-12-12 25.00 17
3 1 2017-12-31 75.00 03
3 2 2018-01-03 25.00 05
2 1 2017-11-25 100.00 17
2 2 2017-11-27 35.00 03
1 1 2017-11-20 15.00 08
1 2 2018-03-15 30.00 17
1 3 2018-04-03 200.00 05
I'm able to calculate the average sales by SOID and SOLINEID:
SELECT SUM(SALESAMOUNT) / COUNT(DISTINCT SOID) AS 'Total Sales per Order ($)',
SUM(SALESAMOUNT) / COUNT(SOLINEID) AS 'Total Sales per Line ($)'
FROM ORDER
This seems to provide a perfectly good answer, but I was then given an additional constraint, that this count be done by year and month. I thought I could simply add
GROUP BY YEAR(INVOICEDATE), MONTH(MONTH)
But this aggregates the SOID and then performs the COUNT(DISTINCT SOID). This becomes a problem with SOIDs that appears across multiple months, which is fairly common since we invoice upon shipment.
I want to get something like this:
Year Month Total Sales Per Order Total Sales Per Line
2018 11 0.00
The sore thumb sticking out is that I need some way of defining in which month and year an SOID will be aggregated if it spans across multiple ones; for that purpose, I'd use MAX(INVOICEDATE).
From there, however, I'm just not sure how to tackle this. WITH? A subquery? Something else? I would appreciate any help, even if it's just pointing in the right direction.
You should select Year() and month() for invocedate and group by
SELECT YEAR(INVOICEDATE) year
, MONTH(INVOICEDATE) month
, SUM(SALESAMOUNT) / COUNT(DISTINCT SOID) AS 'Total Sales per Order ($)'
, SUM(SALESAMOUNT) / COUNT(SOLINEID) AS 'Total Sales per Line ($)'
FROM ORDER
GROUP BY YEAR(INVOICEDATE), MONTH(INVOICEDATE)
Here are the results, but the data sample does not have enuf rows to show Months...
SELECT
mDateYYYY,
mDateMM,
SUM(SALESAMOUNT) / COUNT(DISTINCT t1.SOID) AS 'Total Sales per Order ($)',
SUM(SALESAMOUNT) / COUNT(SOLINEID) AS 'Total Sales per Line ($)'
FROM DCORDER as t1
left join
(Select
SOID
,Year(max(INVOICEDATE)) as mDateYYYY
,Month(max(INVOICEDATE)) as mDateMM
From DCOrder
Group By SOID
) as t2
On t1.SOID = t2.SOID
Group by mDateYYYY, mDateMM
mDateYYYY mDateMM Total Sales per Order ($) Total Sales per Line ($)
2018 12 87.50 58.33
I have used new SQL still MAX(INVOICEDATE)(not above), with new 12/20 data, and excluded AC=17.
YYYY MM Total Sales per Order ($) Total Sales per Line ($)
2017 11 35.00 35.00
2018 1 100.00 50.00
2018 4 215.00 107.50
2018 12 150.00 75.00
I have a data table that has annual data points and quarterly data points. I want to subtract the quarterly data points from the corresponding prior annual entry, e.g. Annual 2014 - Q3 2014, using t-SQL. I have an id variable for each entry, plus a reconcile id variable that shows which quarterly entry corresponds to which annual entry. See below:
CurrentDate PreviousDate Value Entry Id Reconcile Id Annual/Quarterly
9/30/2012 9/30/2011 112 2 3 Annual
9/30/2013 9/30/2012 123 1 2 Annual
9/30/2014 9/30/2013 123.5 9 1 Annual
12/31/2013 9/30/2014 124 4 1 Quarterly
3/31/2014 12/31/2013 124.5 5 1 Quarterly
6/30/2014 3/31/2014 125 6 1 Quarterly
9/30/2014 6/30/2014 125.5 7 1 Quarterly
12/31/2014 9/30/2014 126 10 9 Quarterly
3/31/2015 12/31/2014 126.5 11 9 Quarterly
6/30/2015 3/31/2015 127 12 9 Quarterly
For example, Reconcile ID 9 for the quarterly entries corresponds to Entry ID 9, which is an annual entry.
I have code to just subtract the prior entry from the current entry, but I cannot figure out how to subtract quarterly entries from annual entries where the Entry ID and Reconcile ID are the same.
Here is the code I am using, which is resulting in the right calculation, but increasing the number of results by many rows. I have also tried this as an inner join. I only want the original 10 rows, plus a new difference column:
SELECT DISTINCT T1.[EntryID]
, [T1].[RECONCILEID]
, [T1].[CurrentDate]
, [T1].[Annual_Quarterly]
, [T1].[Value]
, [T1].[Value]-T2.[Value] AS Difference
FROM Table T1
LEFT JOIN Table T2 ON T2.EntryID = T1.RECONCILEID;
Your code should be fine, here's the results I'm getting:
EntryId Annual_Quarterly CurrentDate ReconcileId Value recVal diff
2 Annual 9/30/2012 3 112
1 Annual 9/30/2013 2 123 112 11
9 Annual 9/30/2014 1 123.5 123 0.5
4 Quarterly 12/31/2013 1 124 123 1
5 Quarterly 3/31/2014 1 124.5 123 1.5
6 Quarterly 6/30/2014 1 125 123 2
7 Quarterly 9/30/2014 1 125.5 123 2.5
10 Quarterly 12/31/2014 9 126 123.5 2.5
11 Quarterly 3/31/2015 9 126.5 123.5 3
12 Quarterly 6/30/2015 9 127 123.5 3.5
with your data and this SQL:
SELECT
tr.EntryId,
tr.Annual_Quarterly,
tr.CurrentDate,
tr.ReconcileId,
tr.Value,
te.Value AS recVal,
tr.[VALUE]-te.[VALUE] AS diff
FROM
t AS tr LEFT JOIN
t AS te ON
tr.ReconcileId = te.EntryId
ORDER BY
tr.Annual_Quarterly,
tr.CurrentDate;
Your question is a bit vague as far as how you're wanting to subtract these values, but this should give you some idea.
Select T1.*, T1.Value - Coalesce(T2.Value, 0) As Difference
From Table T1
Left Join Table T2 On T2.[Entry Id] = T1.[Reconcile Id]