add missing month in sales - sql

I have a sales table with below values.
TransactionDate,CustomerID,Quantity
2020-01-01,1234,5
2020-07-01,1234,9
2020-03-01,3241,8
2020-07-01,3241,4
As you can see first purchase was for CustomerID = 1234 in Jan 2020 and for CustomerID = 3241 in MAR 2020.
I want on output where in all the date should be filled up with 0 purchase value.
means if there is no sale between Jan and July Then output should be as below.
TransactionDate,CustomerID,Quantity
2020-01-01,1234,5
2020-02-01,1234,0
2020-03-01,1234,0
2020-04-01,1234,0
2020-05-01,1234,0
2020-06-01,1234,0
2020-07-01,1234,9
2020-03-01,3241,8
2020-04-01,3241,0
2020-05-01,3241,0
2020-06-01,3241,0
2020-07-01,3241,4

You can use a recursive query to create the missing dates per customer.
with recursive dates (customerid, transactiondate, max_transactiondate) as
(
select customerid, min(transactiondate), max(transactiondate)
from sales
group by customerid
union all
select customerid, dateadd(month, 1, transactiondate), max_transactiondate
from dates
where transactiondate < max_transactiondate
)
select
d.customerid,
d.transactiondate,
coalesce(s.quantity, 0) as quantity
from dates d
left join sales s on s.customerid = d.customerid and s.transactiondate = d.transactiondate
order by d.customerid, d.transactiondate;

This is a convenient place to use a recursive CTE. Assuming all your dates are on the first of the month:
with cr as (
select customerid, min(transactiondate) as mindate, max(transactiondate) as maxdate
from t
group by customerid
union all
select customerid, dateadd(month, 1, mindate), maxdate
from cr
where mindate < maxdate
)
select cr.customerid, cr.mindate as transactiondate, coalesce(t.quantity, 0) as quantity
from cr left join
t
on cr.customerid = t.customerid and
cr.mindate = t.transactiondate;
Here is a db<>fiddle.
Note that if you have more than 100 months to fill in, then you will need option (maxrecursion 0).
Also, this can easily be adapted if the dates are not all on the first of the month. But you would need to explain what the result set should look like in that case.

[EDIT] Based on what other posted I updated the code.
;with
min_date_cte(MinTransactionDate, MaxTransactionDate) as (
select min(TransactionDate), max(TransactionDate) from tsales),
unq_yrs_cte(year_int) as (
select distinct year(TransactionDate) from tsales),
unq_cust_cte(CustomerID) as (
select distinct CustomerID from tsales)
select datefromparts(uyc.year_int, v.month_int, 1) TransactionDate,
ucc.CustomerID,
isnull(t.Quantity, 0) Quantity
from min_date_cte mdc
cross join unq_yrs_cte uyc
cross join unq_cust_cte ucc
cross join (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) v(month_int)
left join tsales t on datefromparts(uyc.year_int, v.month_int, 1)=t.TransactionDate
and ucc.CustomerID=t.CustomerId
where
datefromparts(uyc.year_int, v.month_int, 1)>=mdc.MinTransactionDate
and datefromparts(uyc.year_int, v.month_int, 1)<=mdc.MaxTransactionDate;
Results
TransactionDate CustomerID Quantity
2020-01-01 1234 5
2020-01-01 3241 0
2020-02-01 1234 0
2020-02-01 3241 0
2020-03-01 1234 0
2020-03-01 3241 8
2020-04-01 1234 0
2020-04-01 3241 0
2020-05-01 1234 0
2020-05-01 3241 0
2020-06-01 1234 0
2020-06-01 3241 0
2020-07-01 1234 9
2020-07-01 3241 4

You can make use of recursive query:
WITH cte1 as
(
select customerid, min([TransactionDate]) as Monthly_date, max([TransactionDate]) as end_date from calender_table
group by customerid
union all
select customerid, dateadd(month, 1, Monthly_date), end_date from cte1
where Monthly_date < end_date
)
select a.Monthly_date, a.customerid,coalesce(b.quantity, 0) from cte1 a left outer join calender_table b
on (a.Monthly_date = b.[TransactionDate] and a.customerid = b.customerid)
order by a.customerid, a.Monthly_date;

Related

In SQL, is there a way to show all dates even if the date doesn't have data points?

I have a transaction table t as follows in MS SQL Management Studio:
If I run the following SQL to summarise the transaction:
Select
Format(Transaction_Date, 'MMM-yyyy') as 'Year/Month'
,Customer
,Count(Customer) as SalesCount
From t
Group by Format(Transaction_Date, 'MMM-yyyy'), Customer
Order by Customer, Format(Transaction_Date, 'MMM-yyyy')
I'll get:
However I was asked to add all the months for the year 2019 and if there's no transaction in a certain month then return 0 for the SalesCount column:
I tried to create a month table with all the months in 2019 and left join it with the transaction table, but it still returns the same result with no showing of the months without transactions.
Time table I created:
declare #StartDate date = '2019-01-01';
declare #EndDate date = '2020-01-01';
With cte as (
Select #StartDate AS myDate
Union All
Select Dateadd(Month,1,myDate)
From cte
Where Dateadd(Month,1,myDate) < #EndDate
)
,TimeTable as(
SELECT
year(myDate)
,Datename(Month,myDate)
,Format(myDate,'MMMM-yy') as 'Month-Year'
FROM cte
)
Select
tb.'Month-Year'
t.Format(Transaction_Date, 'MMM-yyyy') as Year/Month
,t.Customer
,t.Count(Customer) as SalesCount
From TimeTable tb
Left Join Transaction t on t.'Month/Year' = tb.'Month-Year'
Group by tb.'Month-Year', Format(Transaction_Date, 'MMM-yyyy'), Customer
Order by Customer, Format(Transaction_Date, 'MMM-yyyy')
Your help will be much appreciated!
You need to generate all the rows for the months. One method uses a recursive CTE. Then rest is then left join and aggregation.
Let me assume you are using SQL Server:
with months as (
select convert(date, '2019-01-01') as mon
union all
select dateadd(month, 1, mon)
from months
where mon < '2019-12-01'
)
select Format(m.mon, 'MMM-yyyy') as year_month,
c.Customer,
count(t.customer) as SalesCount
from months m cross join
(select distinct customer from t) c left join
t
on t.transaction_date >= m.mon and
t.transaction_date < dateadd(month, 1, mon) and
t.customer = c.customer
group by m.mon, c.customer
order by c.ustomer, c.mon ;
Note the other changes to the query:
Year/Month is not a valid column alias.
This orders the rows chronologically. That is usually (always?) preferred over alphabetic sorting of months.
You can use monthYear combination table and with distinct customer list, you can achieve this.
declare #table table(trandate date,customer char(1))
inSert into #table values
('2019-01-03','A'),
('2019-01-17','A'),
('2019-06-03','A'),
('2019-07-03','A'),
('2019-06-03','B'),
('2019-07-03','B');
;with monthYear AS
(
select * from
(
values
('Jan-19')
,('Feb-19')
,('Mar-19')
,('Apr-19')
,('May-19')
,('Jun-19')
,('Jul-19')
,('Aug-19')
,('Sep-19')
,('Oct-19')
,('Nov-19')
,('Dec-19')
) as t(mon)
)
SELECT my.mon,c.customer, isnull(t.salescount,0) as salescount FROM monthYear as my
CROSS JOIN (SELECT distinct customer from #table) as c
OUTER APPLY
(
select format(trandate,'MMM-yy') as MonthYear, count(customer) as salescount
from #table
where customer = c.customer
group by format(trandate,'MMM-yy')
having format(trandate,'MMM-yy') = my.mon
) as t
mon
customer
salescount
Jan-19
A
2
Feb-19
A
0
Mar-19
A
0
Apr-19
A
0
May-19
A
0
Jun-19
A
1
Jul-19
A
1
Aug-19
A
0
Sep-19
A
0
Oct-19
A
0
Nov-19
A
0
Dec-19
A
0
Jan-19
B
0
Feb-19
B
0
Mar-19
B
0
Apr-19
B
0
May-19
B
0
Jun-19
B
1
Jul-19
B
1
Aug-19
B
0
Sep-19
B
0
Oct-19
B
0
Nov-19
B
0
Dec-19
B
0

SQL query group by with null values is returning duplicates

I have following query
My #dates table has following records:
month year saledate
9 2020 2020-09-01
10 2020 2020-10-01
11 2020 2020-11-01
with monthlysalesdata as(
select month(salesdate) as salemonth, year(salesdate) as saleyear,salesrepid, salespercentage
from salesrecords r
join #dates d on d.saledate = r.salesdate
group by salesrepid, salesdate),
averagefor3months as(
select 0 as salemonth, 0 as saleyear, salesrepid, salespercentage
from monthlysalesdata
group by salesrepid)
finallist as(
select * from monthlysalesdata
union
select * from averagefor3months
This query returns following records which gives duplicate for a averagefor3months result set when there is null record in the first monthlyresultdata. how to achieve average for 3 months as one record instead of having duplicates?
salesrepid salemonth saleyear percentage
232 0 0 null -------------this is the duplicate record
232 0 0 90
232 9 2020 80
232 10 2020 null
232 11 2020 100
My first cte has this result:
salerepid month year percentage
---------------------------------------------
232 9 2020 80
232 10 2020 null
232 11 2020 100
My second cte has this result:
salerepid month year percentage
---------------------------------------------
232 0 0 null
232 0 0 90
How to avoid the duplicate record in my second cte,
I suspect that you want a summary row per sales rep based on some aggregation. Your question is not clear on what is needed for the aggregation, but something like this:
with ym as (
select r.salesrepid, d.year, d.month, sum(<something>) as whatever
from salesrecords r join
#dates d
on d.saledate = r.salesdate
group by r.salesrepid, d.year, d.month
)
select ym.*
from ym
union all
select salesrepid, null, null, avg(whatever)
from hm
group by salesrepid;
I updated to selected the group by from the table directly instead of the previous cte and got my results. Thank you all for helping
with ym as (
select r.salesrepid, d.year, d.month, sum(<something>) as whatever
from salesrecords r join
#dates d
on d.saledate = r.salesdate
group by r.salesrepid, d.year, d.month
),
threemonthsaverage as(
select r.salesrepid, r.year, r.month, sum(something) as whatever
from salesrecords as r
group by salesrepid)
select ym *
union
select threemonthsaverage*

SQL left join same column and table

I have a customer order data and would like to do analysis on customer retention after price changes.
The order table is as follows:
customer_id order_number order_delivered_date
14156 R980193622 2/6/2020 14:51
1926396 R130222714 22/5/2020 11:02
1085123 R313065343 22/5/2020 14:50
699858 R693959049 8/6/2020 17:03
1609769 R195969327 3/6/2020 16:14
14156 R997103187 27/6/2020 14:01
1926396 R403942827 11/6/2020 14:42
1926396 R895013611 8/7/2020 17:04
So, I would like to pull order in the period before new price. Assume the new price implementation is on 10/6/2020. I would like to do left join to order after the new price on the customer_id.
Before is a set of data dated 10/5/2020 00:00:00 to 9/6/2020 23:59:59 while After is a set of data dated 10/6/2020 00:00:00 to 9/7/2020 23:59:59.
The desired table:
Before After
14156 14156
1926396 1926396
1085123 Null
699858 Null
1609769 Null
If customer_id is found side by side it means they are retained. It should be simple...But I have been stucked.
EDIT:
This is few code that I have been trying
First try:
select ol2.customer_id as before, ol.customer_id as after
from master.order_level ol,
left join master.order_level ol2
on ol2.customer_id = ol.customer_id
where order_delivered_date between '2020-05-10 00:00:00' and '2020-07-09 23:59:59' and country_id = 2
Second try:
SELECT ol.customer_id as before, ol2.customer_id as after
FROM master.order_level ol,master.order_level ol2
left join master.order_level
ON ol.customer_id = ol2.customer_id
WHERE ol.order_delivered_date between '2020-05-10 00:00:00' and '2020-06-09 23:59:59' and ol.country_id =2 and ol2.order_delivered_date between '2020-06-10 00:00:00' and '2020-07-09 23:59:59' and ol2.country_id =2
No need to do a join, you can just use you can do a simple group by and use case and aggregate functions. I also made a fiddle showing it in action here
SELECT customer_id,
CASE
WHEN MIN(order_delivered_date) < '3-15-2019' THEN customer_id
ELSE NULL END customer_before,
CASE
WHEN MAX(order_delivered_date) >= '3-15-2019' THEN customer_id
ELSE NULL END customer_after
FROM my_table
GROUP BY customer_id
there qyery will giva you results like this
customer_id customer_before customer_after
4 4 (null)
1 1 1
3 3 (null)
2 2 2
with before (customer_id) as
( select distinct customer_id from orders where order_delivered_date <= '10/06/2020'
),
after (customer_id) as
(select distinct customer_id from orders where order_delivered_date between '10/06/2020' and '09/07/2020')
select
before.customer_id,
after.customer_id
from before left outer join after on before.customer_id = after.customer_id
you can use union
select customer_id as before, null as after
from #order
where order_delivered_date <'2020-06-10'
union
select null as before, customer_id as after
from #order
where order_delivered_date >='2020-06-10'
results

Aggregate a subtotal column based on two dates of that same row

Situation:
I have 5 columns
id
subtotal (price of item)
order_date (purchase date)
updated_at (if refunded or any other status change)
status
Objective:
I need the order date as column 1
I need to get the subtotal for each day regardless if of the status as column 2
I need the subtotal amount for refunds for the third column.
Example:
If a purchase is made on May 1st and refunded on May 3rd. The output should look like this
+-------+----------+--------+
| date | subtotal | refund |
+-------+----------+--------+
| 05-01 | 10.00 | 0.00 |
| 05-02 | 00.00 | 0.00 |
| 05-03 | 00.00 | 10.00 |
+-------+----------+--------+
while the row will look like that
+-----+----------+------------+------------+----------+
| id | subtotal | order_date | updated_at | status |
+-----+----------+------------+------------+----------+
| 123 | 10 | 2019-05-01 | 2019-05-03 | refunded |
+-----+----------+------------+------------+----------+
Query:
Currently what I have looks like this:
Note: Timezone discrepancy therefore bring back the dates by 8 hours.
;with cte as (
select id as orderid
, CAST(dateadd(hour,-8,order_date) as date) as order_date
, CAST(dateadd(hour,-8,updated_at) as date) as updated_at
, subtotal
, status
from orders
)
select
b.dates
, sum(a.subtotal_price) as subtotal
, -- not sure how to aggregate it to get the refunds
from Orders as o
inner join cte as a on orders.id=cte.orderid
inner join (select * from cte where status = ('refund')) as b on o.id=cte.orderid
where dates between '2019-05-01' and '2019-05-31'
group by dates
And do I need to join it twice? Hopefully not since my table is huge.
This looks like a job for a Calendar Table. Bit of a stab in the dark, but:
--Overly simplistic Calendar table
CREATE TABLE dbo.Calendar (CalendarDate date);
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I
FROM N N1, N N2, N N3, N N4, N N5) --Many years of data
INSERT INTO dbo.Calendar
SELECT DATEADD(DAY, T.I, 0)
FROM Tally T;
GO
SELECT C.CalendarDate AS [date],
CASE C.CalendarDate WHEN V.order_date THEN subtotal ELSE 0 END AS subtotal,
CASE WHEN C.CalendarDate = V.updated_at AND V.[status] = 'refunded' THEN subtotal ELSE 0.00 END AS subtotal
FROM (VALUES(123,10.00,CONVERT(date,'20190501'),CONVERT(date,'20190503'),'refunded'))V(id,subtotal,order_date,updated_at,status)
JOIN dbo.Calendar C ON V.order_date <= C.CalendarDate AND V.updated_at >= C.CalendarDate;
GO
DROP TABLE dbo.Calendar;
Consider joining on a recursive CTE of sequential dates:
WITH dates AS (
SELECT CONVERT(datetime, '2019-01-01') AS rec_date
UNION ALL
SELECT DATEADD(d, 1, CONVERT(datetime, rec_date))
FROM dates
WHERE rec_date < '2019-12-31'
),
cte AS (
SELECT id AS orderid
, CAST(dateadd(hour,-8,order_date) AS date) as order_date
, CAST(dateadd(hour,-8,updated_at) AS date) as updated_at
, subtotal
, status
FROM orders
)
SELECT rec_date AS date,
CASE
WHEN c.order_date = d.rec_date THEN subtotal
ELSE 0
END AS subtotal,
CASE
WHEN c.updated_at = d.rec_date THEN subtotal
ELSE 0
END AS refund
FROM cte c
JOIN dates d ON d.rec_date BETWEEN c.order_date AND c.updated_at
WHERE c.status = 'refund'
option (maxrecursion 0)
GO
Rextester demo

sql completion of a result where it miss some results in the sequence

I have a table with sales information
like this: |product | sales | date|
Most of the time the date are consecutive from 201601 to 201652.
but some times there is a gap ex : no line for 201602 for productA
How can I make an SQL query that will return a result for this gap like this :
productA,4,201601
**productA,0,201602**
productA,5,201603
productA,8,201604
(...)
instead of :
productA,4,201601
productA,5,201603
productA,8,201604
(...)
Of course it will also be some product B,C,...
You do this by using cross join to get all the rows and then left join to pull in the values.
Assuming you have some data for each week:
select p.product, d.date, coalesce(s.sales, 0) as sales
from (select distinct product from sales) p cross join
(select distinct date from sales) d left join
sales s
on s.product = p.product and s.date = d.date;
If you have tables of products and dates, you can use those instead of the subqueries.
Starting from oracle 10g you can use partition outer join to produce desired result:
-- sample of data
with sales(product, sales, dt) as(
select 'product A', 4, 201601 from dual union all
select 'product A', 5, 201603 from dual union all
select 'product A', 8, 201604 from dual
),
-- here we generate months for the year 2016
mnth(dt) as(
select 201600 + level
from dual
connect by level <= 12
)
-- actual query
select s.product
, nvl(s.sales, 0) as sales
, m.dt as date1
from sales s
partition by(s.product)
right join mnth m
on (m.dt = s.dt)
order by s.product, m.dt
Result:
PRODUCT SALES DATE1
--------- ---------- ----------
product A 4 201601
product A 0 201602
product A 5 201603
product A 8 201604
product A 0 201605
product A 0 201606
product A 0 201607
product A 0 201608
product A 0 201609
product A 0 201610
product A 0 201611
product A 0 201612
12 rows selected
based on Gordon's response, I edited so date does not depend on Sales table. Here assumption is that tab will have atleast 52 row, if not please use appropriate data-dictionary table from oracle.
select p.product, d.date, coalesce(s.sales, 0) as sales
from (select distinct product from sales) p cross join
(select 2016 || rownum rn from tab where rownum<=52) d left join
sales s
on s.product = p.product and s.date = d.date;