sql completion of a result where it miss some results in the sequence - sql

I have a table with sales information
like this: |product | sales | date|
Most of the time the date are consecutive from 201601 to 201652.
but some times there is a gap ex : no line for 201602 for productA
How can I make an SQL query that will return a result for this gap like this :
productA,4,201601
**productA,0,201602**
productA,5,201603
productA,8,201604
(...)
instead of :
productA,4,201601
productA,5,201603
productA,8,201604
(...)
Of course it will also be some product B,C,...

You do this by using cross join to get all the rows and then left join to pull in the values.
Assuming you have some data for each week:
select p.product, d.date, coalesce(s.sales, 0) as sales
from (select distinct product from sales) p cross join
(select distinct date from sales) d left join
sales s
on s.product = p.product and s.date = d.date;
If you have tables of products and dates, you can use those instead of the subqueries.

Starting from oracle 10g you can use partition outer join to produce desired result:
-- sample of data
with sales(product, sales, dt) as(
select 'product A', 4, 201601 from dual union all
select 'product A', 5, 201603 from dual union all
select 'product A', 8, 201604 from dual
),
-- here we generate months for the year 2016
mnth(dt) as(
select 201600 + level
from dual
connect by level <= 12
)
-- actual query
select s.product
, nvl(s.sales, 0) as sales
, m.dt as date1
from sales s
partition by(s.product)
right join mnth m
on (m.dt = s.dt)
order by s.product, m.dt
Result:
PRODUCT SALES DATE1
--------- ---------- ----------
product A 4 201601
product A 0 201602
product A 5 201603
product A 8 201604
product A 0 201605
product A 0 201606
product A 0 201607
product A 0 201608
product A 0 201609
product A 0 201610
product A 0 201611
product A 0 201612
12 rows selected

based on Gordon's response, I edited so date does not depend on Sales table. Here assumption is that tab will have atleast 52 row, if not please use appropriate data-dictionary table from oracle.
select p.product, d.date, coalesce(s.sales, 0) as sales
from (select distinct product from sales) p cross join
(select 2016 || rownum rn from tab where rownum<=52) d left join
sales s
on s.product = p.product and s.date = d.date;

Related

What to use in place of union in above query i wrote or more optimize query then my given query without union and union all

I am counting the birthdays , sales , order in all 12 months from customers table in SQL server like these
In Customers table birth_date ,sale_date, order_date are columns of the table
select 1 as ranking,'Birthdays' as Type,[MONTH],TOTAL
from ( select DATENAME(month, birth_date) AS [MONTH],count(*) TOTAL
from customers
group by DATENAME(month, birth_date)
)x
union
select 2 as ranking,'sales' as Type,[MONTH],TOTAL
from ( select DATENAME(month, sale_date) AS [MONTH],count(*) TOTAL
from customers
group by DATENAME(month, sale_date)
)x
union
select 3 as ranking,'Orders' as Type,[MONTH],TOTAL
from ( select DATENAME(month, order_date) AS [MONTH],count(*) TOTAL
from customers
group by DATENAME(month, order_date)
)x
And the output is like these(just dummy data)
ranking
Type
MONTH
TOTAL
1
Birthdays
January
12
1
Birthdays
April
6
1
Birthdays
May
10
2
Sales
Febrary
8
2
Sales
April
14
2
Sales
May
10
3
Orders
June
4
3
Orders
July
3
3
Orders
October
6
3
Orders
December
17
I want to find count of these all these three types without using UNION and UNION ALL, means I want these data by single query statement (or more optimize version of these query)
Another approach is to create a CTE with all available ranking values ​​and use CROSS APPLY for it, as shown below.
WITH ranks(ranking) AS (
SELECT * FROM (VALUES (1), (2), (3)) v(r)
)
SELECT
r.ranking,
CASE WHEN r.ranking = 1 THEN 'Birthdays'
WHEN r.ranking = 2 THEN 'Sales'
WHEN r.ranking = 3 THEN 'Orders'
END AS Type,
DATENAME(month, CASE WHEN r.ranking = 1 THEN c.birth_date
WHEN r.ranking = 2 THEN c.sale_date
WHEN r.ranking = 3 THEN c.order_date
END) AS MONTH,
COUNT(*) AS TOTAL
FROM customers c
CROSS APPLY ranks r
GROUP BY r.ranking,
DATENAME(month, CASE WHEN r.ranking = 1 THEN c.birth_date
WHEN r.ranking = 2 THEN c.sale_date
WHEN r.ranking = 3 THEN c.order_date
END)
ORDER BY r.ranking, MONTH

BigQuery missing rows with SUM OVER PARTITION BY

TL;DR:
Given this table:
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
)
How to do I get a table where the missing date/product combination (2020-11-02 - premium) is included with a fallback value for diff of 0.
Ideally, for multiple products. A list of all products can be get like this:
SELECT ARRAY_AGG(DISTINCT product) FROM subscriptions
I want to be able to get the subscription count per day, either for all products or just for some products.
And the way I think this can be easily achieved is by preparing a database that looks like this:
|---------------------|------------------|------------------|
| date | product | total |
|---------------------|------------------|------------------|
| 2020-11-01 | premium | 100 |
|---------------------|------------------|------------------|
| 2020-11-01 | basic | 50 |
|---------------------|------------------|------------------|
With this table, I can easily group by date and product or just by date and sum the total.
Before I get to the result table I have generated a table where for each day and product I calculate the difference in subscriptions. How many new subscribers for each product are there and how many are no longer subscribed.
This table looks like this:
|---------------------|------------------|------------------|
| date | product | diff |
|---------------------|------------------|------------------|
| 2020-11-01 | premium | 50 |
|---------------------|------------------|------------------|
| 2020-11-01 | basic | -20 |
|---------------------|------------------|------------------|
Meaning on November, 1st the total count of premium subscribers increased by 50, and the total count of basic subscribers decreased by 20.
The problem now is that this temporary table is missing date points if there weren't any changes one product, see the example below.
When I started there was no product table and I only had the date and diff column.
To get from the second to the first table I used this query which worked perfect:
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, 150 as diff
UNION ALL SELECT TIMESTAMP("2020-11-02"), -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), 60
)
SELECT
*,
SUM(diff) OVER (ORDER BY date) as total_subscriptions
FROM subscriptions
ORDER BY date
But when I add the product column and try to calculate the sum per day and product there are some data points missing.
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
)
SELECT
*,
SUM(diff) OVER (PARTITION BY product ORDER BY date) as total_subscriptions
FROM subscriptions
ORDER BY date
--
|---------------------|------------------|------------------|
| date | product | total |
|---------------------|------------------|------------------|
| 2020-11-01 | basic | 100 |
|---------------------|------------------|------------------|
| 2020-11-01 | premium | 50 |
|---------------------|------------------|------------------|
| 2020-11-02 | basic | 90 |
|---------------------|------------------|------------------|
| 2020-11-03 | basic | 130 |
|---------------------|------------------|------------------|
| 2020-11-03 | premium | 70 |
|---------------------|------------------|------------------|
If I now show the total number of subscriptions per day, I would get:
150 -> 90 -> 200
But I would expect:
150 -> 140 -> 200
Same goes for the total number of premium subscriptions per day:
50 -> 0 -> 70
But I would expect:
50 -> 50 -> 70
I believe the best option to fix this would be to add the missing date/product combinations.
How would I do this?
-- Try this,I am creating a table for list of products and add total product in that list. Joining with your table to get data as per your requirement.
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
),
product_name as (
Select product from subscriptions group by 1
union all
Select "Total" as product
)
Select date
,product
,total_subscriptions
from (
Select a.date
,a.product
,diff
,SUM(diff) OVER (PARTITION BY a.product ORDER BY a.date) as total_subscriptions
from
(
Select date,a.product
from product_name A
join subscriptions B
on 1=1
where a.product !='Total'
group by 1,2
) A
left join subscriptions B
on A.product = B.product
and A.date = B.date
group by 1,2,3
) group by 1,2,3
union all
Select date
,product
,total_subscriptions
from
(
Select date,a.product
,diff
,SUM(diff) OVER (PARTITION BY a.product ORDER BY date) as total_subscriptions
from product_name A
join subscriptions B
on 1=1
where a.product ='Total'
group by 1,2,3
) group by 1,2,3
order by 1,2
If I follow you correctly, one approach is to can generate a fixed the list of dates for the period you want, and cross join it with the list of products. This gives you all possible combinations. Then, you can bring the subscriptions table with a left join, and finally perform the window sum:
select d.dt, p.product, sum(s.diff) over(partition by p.product order by d.dt) total
from unnest(generate_timestamp_array(
timestamp('2020-11-01'),
timestamp('2020-11-03'),
interval 1 day)
) dt
cross join (
select 'basic' product
union all select 'premium'
) p
left join subscriptions on s.product = p.product and s.date = dt
We can make the query a more generic by dynamically generating the date range and list of products:
select d.dt, p.product, sum(s.diff) over(partition by p.product order by d.dt) total
from (select min(date) min_dt, max(date) max_dt from subscriptions) d0
cross join unnest(generate_timestamp_array(d0.min_dt, d0.max_dt, interval 1 day)) dt
cross join (select distinct product from subscriptions) p
left join subscriptions on s.product = p.product and s.date = dt
Use GENERATE_TIMESTAMP_ARRAY:
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
),
dates AS (
SELECT *
FROM UNNEST(GENERATE_TIMESTAMP_ARRAY('2020-11-01 00:00:00', '2020-11-03 00:00:00', INTERVAL 1 DAY)) as date
),
products AS (
SELECT DISTINCT product FROM subscriptions
)
SELECT dates.date, products.product, subscriptions.diff
FROM dates
CROSS JOIN products
LEFT JOIN subscriptions
ON subscriptions.date = dates.date AND subscriptions.product = products.product

add missing month in sales

I have a sales table with below values.
TransactionDate,CustomerID,Quantity
2020-01-01,1234,5
2020-07-01,1234,9
2020-03-01,3241,8
2020-07-01,3241,4
As you can see first purchase was for CustomerID = 1234 in Jan 2020 and for CustomerID = 3241 in MAR 2020.
I want on output where in all the date should be filled up with 0 purchase value.
means if there is no sale between Jan and July Then output should be as below.
TransactionDate,CustomerID,Quantity
2020-01-01,1234,5
2020-02-01,1234,0
2020-03-01,1234,0
2020-04-01,1234,0
2020-05-01,1234,0
2020-06-01,1234,0
2020-07-01,1234,9
2020-03-01,3241,8
2020-04-01,3241,0
2020-05-01,3241,0
2020-06-01,3241,0
2020-07-01,3241,4
You can use a recursive query to create the missing dates per customer.
with recursive dates (customerid, transactiondate, max_transactiondate) as
(
select customerid, min(transactiondate), max(transactiondate)
from sales
group by customerid
union all
select customerid, dateadd(month, 1, transactiondate), max_transactiondate
from dates
where transactiondate < max_transactiondate
)
select
d.customerid,
d.transactiondate,
coalesce(s.quantity, 0) as quantity
from dates d
left join sales s on s.customerid = d.customerid and s.transactiondate = d.transactiondate
order by d.customerid, d.transactiondate;
This is a convenient place to use a recursive CTE. Assuming all your dates are on the first of the month:
with cr as (
select customerid, min(transactiondate) as mindate, max(transactiondate) as maxdate
from t
group by customerid
union all
select customerid, dateadd(month, 1, mindate), maxdate
from cr
where mindate < maxdate
)
select cr.customerid, cr.mindate as transactiondate, coalesce(t.quantity, 0) as quantity
from cr left join
t
on cr.customerid = t.customerid and
cr.mindate = t.transactiondate;
Here is a db<>fiddle.
Note that if you have more than 100 months to fill in, then you will need option (maxrecursion 0).
Also, this can easily be adapted if the dates are not all on the first of the month. But you would need to explain what the result set should look like in that case.
[EDIT] Based on what other posted I updated the code.
;with
min_date_cte(MinTransactionDate, MaxTransactionDate) as (
select min(TransactionDate), max(TransactionDate) from tsales),
unq_yrs_cte(year_int) as (
select distinct year(TransactionDate) from tsales),
unq_cust_cte(CustomerID) as (
select distinct CustomerID from tsales)
select datefromparts(uyc.year_int, v.month_int, 1) TransactionDate,
ucc.CustomerID,
isnull(t.Quantity, 0) Quantity
from min_date_cte mdc
cross join unq_yrs_cte uyc
cross join unq_cust_cte ucc
cross join (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) v(month_int)
left join tsales t on datefromparts(uyc.year_int, v.month_int, 1)=t.TransactionDate
and ucc.CustomerID=t.CustomerId
where
datefromparts(uyc.year_int, v.month_int, 1)>=mdc.MinTransactionDate
and datefromparts(uyc.year_int, v.month_int, 1)<=mdc.MaxTransactionDate;
Results
TransactionDate CustomerID Quantity
2020-01-01 1234 5
2020-01-01 3241 0
2020-02-01 1234 0
2020-02-01 3241 0
2020-03-01 1234 0
2020-03-01 3241 8
2020-04-01 1234 0
2020-04-01 3241 0
2020-05-01 1234 0
2020-05-01 3241 0
2020-06-01 1234 0
2020-06-01 3241 0
2020-07-01 1234 9
2020-07-01 3241 4
You can make use of recursive query:
WITH cte1 as
(
select customerid, min([TransactionDate]) as Monthly_date, max([TransactionDate]) as end_date from calender_table
group by customerid
union all
select customerid, dateadd(month, 1, Monthly_date), end_date from cte1
where Monthly_date < end_date
)
select a.Monthly_date, a.customerid,coalesce(b.quantity, 0) from cte1 a left outer join calender_table b
on (a.Monthly_date = b.[TransactionDate] and a.customerid = b.customerid)
order by a.customerid, a.Monthly_date;

Comparing data from two rows in a same sql table

I am trying to find out differences between two rows in a same table. Having trouble to find right query. For example, I have
Year Item Qty Amount
------------------------------
2014 Shoes 500 2500
2014 Ties 300 900
2014 Pants 200 4000
2015 Shoes 600 3000
2015 Ties 200 600
I am trying to find out what was the increased (or decreased) from previous year to this year. I will always have only two years to compare. The query result should look like following:
Items Qty Diff Amount Diff
------------------------------
Shoes 100 500
Ties (-100) (-300)
Pants Null Null
What should be the query look like?
If you want to include everything, then you can use FULL OUTER JOIN, if just the one with the earlier year, LEFT OUTER JOIN, if you want the one with both earlier and subsequent year, then INNER JOIN.
SELECT
T1.Item
, (T2.QTY-T1.QTY) AS [QTY Diff]
, (T2.Amount - T1.Amount) AS [Amount Diff]
FROM
<<Table>> T1
LEFT OUTER JOIN <<Table>> T2
ON T1.Item=T2.Item
AND T1.YEAR=(T2.YEAR-1);
1. Use LAG or LEAD
WITH tb(Year,Item,Qty,Amount) AS (
SELECT 2014,'Shoes',500,2500 UNION
SELECT 2014,'Ties',300,900 UNION
SELECT 2014,'Pants',200,4000 UNION
SELECT 2015,'Shoes',600,3000 UNION
SELECT 2015,'Ties',200,600
)
SELECT *,Qty-LAG(qty)OVER(PARTITION BY Item ORDER BY year) AS QtyDiff ,Amount-LAG(Amount)OVER(PARTITION BY Item ORDER BY year) AS AmountDiff
FROM tb
Year Item Qty Amount QtyDiff AmountDiff
----------- ----- ----------- ----------- ----------- -----------
2014 Pants 200 4000 NULL NULL
2014 Shoes 500 2500 NULL NULL
2015 Shoes 600 3000 100 500
2014 Ties 300 900 NULL NULL
2015 Ties 200 600 -100 -300
2.Cross or Outer Apply
WITH tb(Year,Item,Qty,Amount) AS (
SELECT 2014,'Shoes',500,2500 UNION
SELECT 2014,'Ties',300,900 UNION
SELECT 2014,'Pants',200,4000 UNION
SELECT 2015,'Shoes',600,3000 UNION
SELECT 2015,'Ties',200,600
)
SELECT t1.Year,t1.Item,t1.Qty- t2.qty AS DiffQty,t1.Amount-t2.Amount AS DiffAmount
FROM tb AS t1
OUTER APPLY (SELECT TOP 1 tt.qty,tt.Amount FROM tb AS tt WHERE tt.Year<t1.Year AND t1.Item=tt.Item ORDER BY tt.Year desc) AS t2
ORDER BY t1.Item,t1.Year
Using the lag function is the best approach to this.
SELECT [Year], [Item], [Qty], [Amount],
[Qty] - LAG([Qty]) OVER (PARTITION BY [Item] ORDER BY [Year]) [QtyDiff],
[Amount] - LAG([Amount]) OVER (PARTITION BY [Item] ORDER BY [Year]) [AmountDiff]
FROM [ItemTable] it
order BY [Year] DESC, [Item];
Hope this helps.
Here is the required query:
SET #YEAR1 = '2014';
SET #YEAR2 = '2015';
SELECT
Item,
if(count(*)>1,sum(if(Year=#YEAR2,Qty,-Qty)),NULL) as 'Qty Diff',
if(count(*)>1,sum(if(Year=#YEAR2,Amount,-Amount)),NULL) as 'Amount Diff'
FROM
table
WHERE
Year IN (#YEAR1,#YEAR2)
group by Item;

SQL Sum by week from daily table

I have a table with sales for products.
The sales are per day. like
product date sales
1 '2013-11-01' 100
1 '2013-11-02' 423
1 '2013-11-03' 700
1 '2013-11-04' 233
2 '2013-11-01' 623
2 '2013-11-02' 451
2 '2013-11-03' 9000
I want to get a query which will show me the week over week sum of sales
So something like:
product week ending sales
1 '2013-11-01' 10000
1 '2013-11-08' 15000
2 '2013-11-01' 4900
2 '2013-11-08' 30000
I'm not sure how I get this weekly groups when summing up.
I'm using teradata
If you are using Teradata 14 you can leverage the DayNumber_Of_Week() function in the database TD_SYSFNLIB:
SELECT s.Product
, s.Date + (7-DayNumber_Of_Week(s.date)) AS WeekEndingDate /* Saturday */
, SUM(s.Sales) AS Sales
FROM sales AS S
GROUP BY 1,2;
This should work in Teradata 13.10 as well.
Using Sys_Calendar:
SELECT s.Product
, s.DATE + (7-c.Day_Of_Week) AS WeekEndingDate /* Saturday */
, SUM(s.Sales) AS Sales
FROM sales AS S
INNER JOIN
Sys_Calendar.Calendar c
ON S.date = c.calendar_date
GROUP BY 1,2;
I know very little about TERADATA, but I believe you can leverage the sys_calendar.calendar table, something like:
SELECT s.Product, c.week_of_year, SUM(s.sales) AS Sales
FROM sales AS s
JOIN sys_calendar.calendar as C
ON s.date = c.date
You'd need the Year in there as well, so as to not group up week 1 of 2013 with week 1 of 2012.