Replace a missing value SQL Server? - sql

There is a table (SQL Server 2017) on sales of goods in stores, some records have no price.
+---------+-------------+---------+----------+-------+
| year_id | week_number | good_id | store_id | price |
+---------+-------------+---------+----------+-------+
| 2019 | 6 | 140629 | 2 | 199 |
+---------+-------------+---------+----------+-------+
| 2019 | 8 | 140629 | 2 | NULL |
+---------+-------------+---------+----------+-------+
| 2017 | 40 | 137233 | 9 | 278 |
+---------+-------------+---------+----------+-------+
| 2017 | 35 | 137233 | 9 | NULL |
+---------+-------------+---------+----------+-------+
| 2017 | 37 | 137233 | 9 | NULL |
+---------+-------------+---------+----------+-------+
We would like to replace the missing values according to the following scheme: set the price value to the same as the good with this number (good_id) from the same store (store_id), but sold as far as possible in the nearest to the missing value date, for example:
+---------+-------------+---------+----------+-------+
| year_id | week_number | good_id | store_id | price |
+---------+-------------+---------+----------+-------+
| 2019 | 6 | 140629 | 2 | 199 |
+---------+-------------+---------+----------+-------+
| 2019 | 8 | 140629 | 2 | 199 |
+---------+-------------+---------+----------+-------+
| 2017 | 40 | 137233 | 9 | 278 |
+---------+-------------+---------+----------+-------+
| 2017 | 35 | 137233 | 9 | 278 |
+---------+-------------+---------+----------+-------+
| 2017 | 37 | 137233 | 9 | 278 |
+---------+-------------+---------+----------+-------+
So far made something like this, but this query contains mutually exclusive conditions, so it does not affect the rows:
UPDATE dataset
SET price = p.price
FROM dataset AS p
WHERE good_id = p.good_id
AND store_id = p.store_id
AND price IS NULL
AND p.price IS NOT NULL;
GO

You can use apply. This works if all years have 52 weeks:
update d
set price = d2.price
from dataset d cross apply
(select top (1) d2.*
from dataset d2
where d2.good_id = d.good_id and
d2.store_id = d.store_id and
d2.price is not null
order by abs( (d2.year_id * 52 + d2.week_id) - (d.year_id * 52 + d.week_id) )
) d2
where d.price is null;
The only issue is when the comparisons pass the year boundary and the previous year has 53 weeks. Depending on how you define years, you can convert the year/week combos in to dates and use direct date comparisons for the difference.

Related

Aggregate yearly spend of user based on billing cycle (SQL)

I want to track the yearly spend of users based on a particular month from which we start their cycle. This is to keep track of their yearly spend so that they don't exceed the allowed limits. I have the following two tables:
Spend (Contains 1 row per user per month) (I can modify the date column of this table to any date format as needed, if it helps):
+----+-----------+------+-------+-------+
| ID | Date | Year | Month | Spend |
+----+-----------+------+-------+-------+
| 11 | 01-Sep-19 | 2019 | 9 | 10 |
+----+-----------+------+-------+-------+
| 11 | 01-Oct-19 | 2019 | 10 | 23 |
+----+-----------+------+-------+-------+
| 11 | 01-Nov-19 | 2019 | 11 | 27 |
+----+-----------+------+-------+-------+
| 11 | 01-Dec-19 | 2019 | 12 | 14 |
+----+-----------+------+-------+-------+
| 11 | 01-Jan-20 | 2020 | 1 | 13 |
+----+-----------+------+-------+-------+
| 11 | 01-Feb-20 | 2020 | 2 | 33 |
+----+-----------+------+-------+-------+
| 11 | 01-Mar-20 | 2020 | 3 | 25 |
+----+-----------+------+-------+-------+
| 11 | 01-Apr-20 | 2020 | 4 | 17 |
+----+-----------+------+-------+-------+
| 11 | 01-May-20 | 2020 | 5 | 14 |
+----+-----------+------+-------+-------+
| 11 | 01-Jun-20 | 2020 | 6 | 10 |
+----+-----------+------+-------+-------+
| 11 | 01-Jul-20 | 2020 | 7 | 46 |
+----+-----------+------+-------+-------+
| 11 | 01-Aug-20 | 2020 | 8 | 53 |
+----+-----------+------+-------+-------+
| 11 | 01-Sep-20 | 2020 | 9 | 38 |
+----+-----------+------+-------+-------+
| 11 | 01-Oct-20 | 2020 | 10 | 22 |
+----+-----------+------+-------+-------+
| 11 | 01-Nov-20 | 2020 | 11 | 29 |
+----+-----------+------+-------+-------+
| 50 | 01-Jul-20 | 2020 | 7 | 56 |
+----+-----------+------+-------+-------+
| 50 | 01-Aug-20 | 2020 | 8 | 62 |
+----+-----------+------+-------+-------+
| 50 | 01-Sep-20 | 2020 | 9 | 77 |
+----+-----------+------+-------+-------+
| 50 | 01-Oct-20 | 2020 | 10 | 52 |
+----+-----------+------+-------+-------+
| 50 | 01-Nov-20 | 2020 | 11 | 45 |
+----+-----------+------+-------+-------+
Billing Cycle (contains the months between which we calculate their total spends):
+-----+------------+----------+
| ID | StartMonth | EndMonth |
+-----+------------+----------+
| 11 | 10 | 9 |
+-----+------------+----------+
| 50 | 9 | 8 |
+-----+------------+----------+
Sample Output:
+----+-------+------------+
| ID | Cycle | TotalSpend |
+----+-------+------------+
| 11 | 1 | 10 |
+----+-------+------------+
| 11 | 2 | 313 |
+----+-------+------------+
| 11 | 3 | 51 |
+----+-------+------------+
| 50 | 1 | 118 |
+----+-------+------------+
| 50 | 2 | 174 |
+----+-------+------------+
In the sample output, for ID = 11, cycle 1 indicates spend in Sep'19, cycle 2 indicates total spend from Oct'19 (Month 10) to Sep'20 (Month 9) and cycle 3 indicates total spend for the next 12 months from Oct'20 (till whichever month data is present).
I'm a beginner to SQL and I believe doing this might require the use of CTE/Subqueries. Would appreciate any help or guidance for this.
Since this seems to be an exercise of some sort, I'm not going to provide a full answer, but give you hints to how this could be solved conceptually.
First I think you should associate entries to effective cycles (with cycle number) for the required date range. This could be done by using a recursive CTE. These are not the most efficient approach, but since we don't have the effective cycles with their numbers as a distinct table it can be a working solution nevertheless.
The result then just needs to be grouped by the ID and cycle number and the amounts summed up, and you're done.

Checking for Consecutive 12 Weeks of 0 Sales

I have a table with customer_number, week, and sales. I need to check if there were 12 consecutive weeks of no sales for each customer and create a flag of 0/1.
I can check the last 12 weeks or a certain time frame, but what's the best way to check for consecutive runs? Here is the code I have so far:
select * from weekly_sales
where customer_nbr in (123, 234)
and week < '2015-11-01'
and week > '2014-11-01'
order by customer_nbr, week
;
Sql Fiddle Demo
Here is a simplify version only need a week_id and sales
SELECT S1.weekid start_week, MAX(S2.weekid) end_week, SUM (S2.sales)
FROM Sales S1
JOIN Sales S2
ON S2.weekid BETWEEN S1.weekid and S1.weekid + 11
WHERE S1.weekid BETWEEN 1 and 25 -- your search range
GROUP BY S1.weekid
Let me know if that work for you
OUTPUT
| start_week | end_week | |
|------------|----------|----|
| 1 | 12 | 12 |
| 2 | 13 | 8 |
| 3 | 14 | 3 |
| 4 | 15 | 2 |
| 5 | 16 | 0 | <-
| 6 | 17 | 0 | <- no sales for 12 week
| 7 | 18 | 0 | <-
| 8 | 19 | 4 |
| 9 | 20 | 9 |
| 10 | 21 | 11 |
| 11 | 22 | 15 |
| 12 | 23 | 71 |
| 13 | 24 | 78 |
| 14 | 25 | 86 |
| 15 | 25 | 86 | < - less than 12 week range
| 16 | 25 | 86 | < - below this line
| 17 | 25 | 86 |
| 18 | 25 | 86 |
| 19 | 25 | 86 |
| 20 | 25 | 82 |
| 21 | 25 | 77 |
| 22 | 25 | 75 |
| 23 | 25 | 71 |
| 24 | 25 | 15 |
| 25 | 25 | 8 |
Your final query should have
HAVING SUM (S2.sales) = 0
AND COUNT(*) = 12
Ummmmm...You could use between 'week' and 'week', and you can use too the "count(column)" in order to improve performance.
So you only have to compare if result is bigger than 0

Rolling total with no sub-select and no vendor specific extensions

What I'm trying to achieve: rolling total for quantity and amount for a given day, grouped by hour.
It's easy in most cases, but if you have some additional columns (dir and product in my case) and you don't want to group/filter on them, that's a problem.
I know there are extensions in Oracle and MSSQL specifically for that, and there's SELECT OVER PARTITION in Postgres.
At the moment I'm working on an app prototype, and it's backed by MySQL, and I have no idea what it will be using in production, so I'm trying to avoid vendor lock-in.
The entrire table:
> SELECT id, dir, product, date, hour, quantity, amount FROM sales
ORDER BY date, hour;
+------+-----+---------+------------+------+----------+--------+
| id | dir | product | date | hour | quantity | amount |
+------+-----+---------+------------+------+----------+--------+
| 2230 | 65 | ABCDEDF | 2014-09-11 | 1 | 1 | 10 |
| 2231 | 64 | ABCDEDF | 2014-09-11 | 3 | 4 | 40 |
| 2232 | 64 | ABCDEDF | 2014-09-11 | 5 | 5 | 50 |
| 2235 | 64 | ZZ | 2014-09-11 | 7 | 6 | 60 |
| 2233 | 64 | ABCDEDF | 2014-09-11 | 7 | 6 | 60 |
| 2237 | 66 | ABCDEDF | 2014-09-11 | 7 | 6 | 60 |
| 2234 | 64 | ZZ | 2014-09-18 | 3 | 1 | 11 |
| 2236 | 66 | ABCDEDF | 2014-09-18 | 3 | 1 | 100 |
| 2227 | 64 | ABCDEDF | 2014-09-18 | 3 | 1 | 100 |
| 2228 | 64 | ABCDEDF | 2014-09-18 | 5 | 2 | 200 |
| 2229 | 64 | ABCDEDF | 2014-09-18 | 7 | 3 | 300 |
+------+-----+---------+------------+------+----------+--------+
For a given date:
> SELECT id, dir, product, date, hour, quantity, amount FROM sales
WHERE date = '2014-09-18'
ORDER BY hour;
+------+-----+---------+------------+------+----------+--------+
| id | dir | product | date | hour | quantity | amount |
+------+-----+---------+------------+------+----------+--------+
| 2227 | 64 | ABCDEDF | 2014-09-18 | 3 | 1 | 100 |
| 2236 | 66 | ABCDEDF | 2014-09-18 | 3 | 1 | 100 |
| 2234 | 64 | ZZ | 2014-09-18 | 3 | 1 | 11 |
| 2228 | 64 | ABCDEDF | 2014-09-18 | 5 | 2 | 200 |
| 2229 | 64 | ABCDEDF | 2014-09-18 | 7 | 3 | 300 |
+------+-----+---------+------------+------+----------+--------+
The results that I need, using sub-select:
> SELECT date, hour, SUM(quantity),
( SELECT SUM(quantity) FROM sales s2
WHERE s2.hour <= s1.hour AND s2.date = s1.date
) AS total
FROM sales s1
WHERE s1.date = '2014-09-18'
GROUP by date, hour;
+------------+------+---------------+-------+
| date | hour | sum(quantity) | total |
+------------+------+---------------+-------+
| 2014-09-18 | 3 | 3 | 3 |
| 2014-09-18 | 5 | 2 | 5 |
| 2014-09-18 | 7 | 3 | 8 |
+------------+------+---------------+-------+
My concerns for using sub-select:
once there are round million records in the table, the query may become too slow, not sure if it's subject to optimizations even though it has no HAVING statements.
if I had to filter on a product or dir, I will have to put those conditions to both main SELECT and sub-SELECT too (WHERE product = / WHERE dir =).
sub-select only counts a single sum, while I need two of them (sum(quantity) и sum(amount)) (ERROR 1241 (21000): Operand should contain 1 column(s)).
The closest result I were able to get using JOIN:
> SELECT DISTINCT(s1.hour) AS ih, s2.date, s2.hour, s2.quantity, s2.amount, s2.id
FROM sales s1
JOIN sales s2 ON s2.date = s1.date AND s2.hour <= s1.hour
WHERE s1.date = '2014-09-18'
ORDER by ih;
+----+------------+------+----------+--------+------+
| ih | date | hour | quantity | amount | id |
+----+------------+------+----------+--------+------+
| 3 | 2014-09-18 | 3 | 1 | 100 | 2236 |
| 3 | 2014-09-18 | 3 | 1 | 100 | 2227 |
| 3 | 2014-09-18 | 3 | 1 | 11 | 2234 |
| 5 | 2014-09-18 | 3 | 1 | 100 | 2236 |
| 5 | 2014-09-18 | 3 | 1 | 100 | 2227 |
| 5 | 2014-09-18 | 5 | 2 | 200 | 2228 |
| 5 | 2014-09-18 | 3 | 1 | 11 | 2234 |
| 7 | 2014-09-18 | 3 | 1 | 100 | 2236 |
| 7 | 2014-09-18 | 3 | 1 | 100 | 2227 |
| 7 | 2014-09-18 | 5 | 2 | 200 | 2228 |
| 7 | 2014-09-18 | 7 | 3 | 300 | 2229 |
| 7 | 2014-09-18 | 3 | 1 | 11 | 2234 |
+----+------------+------+----------+--------+------+
I could stop here and just use those results to group by ih (hour), calculate the sum for quantity and amount and be happy. But something eats me up telling that this is wrong.
If I remove DISTINCT most rows become to be duplicated. Replacing JOIN with its invariants doesn't help.
Once I remove s2.id from statement you get a complete mess with disappearing/collapsion meaningful rows (e.g. ids 2236/2227 got collapsed):
> SELECT DISTINCT(s1.hour) AS ih, s2.date, s2.hour, s2.quantity, s2.amount
FROM sales s1
JOIN sales s2 ON s2.date = s1.date AND s2.hour <= s1.hour
WHERE s1.date = '2014-09-18'
ORDER by ih;
+----+------------+------+----------+--------+
| ih | date | hour | quantity | amount |
+----+------------+------+----------+--------+
| 3 | 2014-09-18 | 3 | 1 | 100 |
| 3 | 2014-09-18 | 3 | 1 | 11 |
| 5 | 2014-09-18 | 3 | 1 | 100 |
| 5 | 2014-09-18 | 5 | 2 | 200 |
| 5 | 2014-09-18 | 3 | 1 | 11 |
| 7 | 2014-09-18 | 3 | 1 | 100 |
| 7 | 2014-09-18 | 5 | 2 | 200 |
| 7 | 2014-09-18 | 7 | 3 | 300 |
| 7 | 2014-09-18 | 3 | 1 | 11 |
+----+------------+------+----------+--------+
Summing doesn't help, and it adds up to the mess.
First row (hour = 3) should have SUM(s2.quantity) equal 3, but it has 9. What does SUM(s1.quantity) shows is a complete mystery to me.
> SELECT DISTINCT(s1.hour) AS hour, sum(s1.quantity), s2.date, SUM(s2.quantity)
FROM sales s1 JOIN sales s2 ON s2.date = s1.date AND s2.hour <= s1.hour
WHERE s1.date = '2014-09-18'
GROUP BY hour;
+------+------------------+------------+------------------+
| hour | sum(s1.quantity) | date | sum(s2.quantity) |
+------+------------------+------------+------------------+
| 3 | 9 | 2014-09-18 | 9 |
| 5 | 8 | 2014-09-18 | 5 |
| 7 | 15 | 2014-09-18 | 8 |
+------+------------------+------------+------------------+
Bonus points/boss level:
I also need a column that will show total_reference, the same rolling total for the same periods for a different date (e.g. 2014-09-11).
If you want a cumulative sum in MySQL, the most efficient way is to use variables:
SELECT date, hour,
(#q := q + #q) as cumeq, (#a := a + #a) as cumea
FROM (SELECT date, hour, SUM(quantity) as q, SUM(amount) as a
FROM sales s
WHERE s.date = '2014-09-18'
GROUP by date, hour
) dh cross join
(select #q := 0, #a := 0) vars
ORDER BY date, hour;
If you are planning on working with databases such as Oracle, SQL Server, and Postgres, then you should use a database more similar in functionality and that supports that ANSI standard window functions. The right way to do this is with window functions, but MySQL doesn't support those. Postgres, SQL Server, and Oracle all have free versions that yo can use for development purposes.
Also, with proper indexing, you shouldn't have a problem with the subquery approach, even on large tables.

How to calculate running total (month to date) in SQL Server 2008

I'm trying to calculate a month-to-date total using SQL Server 2008.
I'm trying to generate a month-to-date count at the level of activities and representatives. Here are the results I want to generate:
| REPRESENTATIVE_ID | MONTH | WEEK | TOTAL_WEEK_ACTIVITY_COUNT | MONTH_TO_DATE_ACTIVITIES_COUNT |
|-------------------|-------|------|---------------------------|--------------------------------|
| 40 | 7 | 7/08 | 1 | 1 |
| 40 | 8 | 8/09 | 1 | 1 |
| 40 | 8 | 8/10 | 1 | 2 |
| 41 | 7 | 7/08 | 2 | 2 |
| 41 | 8 | 8/08 | 4 | 4 |
| 41 | 8 | 8/09 | 3 | 7 |
| 41 | 8 | 8/10 | 1 | 8 |
From the following tables:
ACTIVITIES_FACT table
+-------------------+------+-----------+
| Representative_ID | Date | Activity |
+-------------------+------+-----------+
| 41 | 8/03 | Call |
| 41 | 8/04 | Call |
| 41 | 8/05 | Call |
+-------------------+------+-----------+
LU_TIME table
+-------+-----------------+--------+
| Month | Date | Week |
+-------+-----------------+--------+
| 8 | 8/01 | 8/08 |
| 8 | 8/02 | 8/08 |
| 8 | 8/03 | 8/08 |
| 8 | 8/04 | 8/08 |
| 8 | 8/05 | 8/08 |
+-------+-----------------+--------+
I'm not sure how to do this: I keep running into problems with multiple-counting or aggregations not being allowed in subqueries.
A running total is the summation of a sequence of numbers which is
updated each time a new number is added to the sequence, simply by
adding the value of the new number to the running total.
I THINK He wants a running total for Month by each Representative_Id, so a simple group by week isn't enough. He probably wants his Month_To_Date_Activities_Count to be updated at the end of every week.
This query gives a running total (month to end-of-week date) ordered by Representative_Id, Week
SELECT a.Representative_ID, l.month, l.Week, Count(*) AS Total_Week_Activity_Count
,(SELECT count(*)
FROM ACTIVITIES_FACT a2
INNER JOIN LU_TIME l2 ON a2.Date = l2.Date
AND a.Representative_ID = a2.Representative_ID
WHERE l2.week <= l.week
AND l2.month = l.month) Month_To_Date_Activities_Count
FROM ACTIVITIES_FACT a
INNER JOIN LU_TIME l ON a.Date = l.Date
GROUP BY a.Representative_ID, l.Week, l.month
ORDER BY a.Representative_ID, l.Week
| REPRESENTATIVE_ID | MONTH | WEEK | TOTAL_WEEK_ACTIVITY_COUNT | MONTH_TO_DATE_ACTIVITIES_COUNT |
|-------------------|-------|------|---------------------------|--------------------------------|
| 40 | 7 | 7/08 | 1 | 1 |
| 40 | 8 | 8/09 | 1 | 1 |
| 40 | 8 | 8/10 | 1 | 2 |
| 41 | 7 | 7/08 | 2 | 2 |
| 41 | 8 | 8/08 | 4 | 4 |
| 41 | 8 | 8/09 | 3 | 7 |
| 41 | 8 | 8/10 | 1 | 8 |
SQL Fiddle Sample
As I understand your question:
SELECT af.Representative_ID
, lt.Week
, COUNT(af.Activity) AS Qnt
FROM ACTIVITIES_FACT af
INNER JOIN LU_TIME lt ON lt.Date = af.date
GROUP BY af.Representative_ID, lt.Week
SqlFiddle
Representative_ID Week Month_To_Date_Activities_Count
41 2013-08-01 00:00:00.000 1
41 2013-08-08 00:00:00.000 3
USE tempdb;
GO
IF OBJECT_ID('#ACTIVITIES_FACT','U') IS NOT NULL DROP TABLE #ACTIVITIES_FACT;
CREATE TABLE #ACTIVITIES_FACT
(
Representative_ID INT NOT NULL
,Date DATETIME NULL
, Activity VARCHAR(500) NULL
)
IF OBJECT_ID('#LU_TIME','U') IS NOT NULL DROP TABLE #LU_TIME;
CREATE TABLE #LU_TIME
(
Month INT
,Date DATETIME
,Week DATETIME
)
INSERT INTO #ACTIVITIES_FACT(Representative_ID,Date,Activity)
VALUES
(41,'7/31/2013','Chat')
,(41,'8/03/2013','Call')
,(41,'8/04/2013','Call')
,(41,'8/05/2013','Call')
INSERT INTO #LU_TIME(Month,Date,Week)
VALUES
(8,'7/31/2013','8/01/2013')
,(8,'8/01/2013','8/08/2013')
,(8,'8/02/2013','8/08/2013')
,(8,'8/03/2013','8/08/2013')
,(8,'8/04/2013','8/08/2013')
,(8,'8/05/2013','8/08/2013')
--Begin Query
SELECT AF.Representative_ID
,LU.Week
,COUNT(*) AS Month_To_Date_Activities_Count
FROM #ACTIVITIES_FACT AS AF
INNER JOIN #LU_TIME AS LU
ON AF.Date = LU.Date
Group By AF.Representative_ID
,LU.Week

SQL Query to display % Change

I have a database with sample data represented by Table 1 below. How do I write an SQL query to display them in either Table 2 or Table 3 format?
Table 1 Table 2
Date | Value Year | Week | Total Value | % Change
------------+------- ------+-----+--|---------------|----------
19/12/2011 | 60 2012 | 1 | 295 | 656.41%
20/12/2011 | 49 2012 | 0 | 39 | -80.98%
21/12/2011 | 42 2012 | 52 | 205 | -41.76%
22/12/2011 | 57 2011 | 51 | 352 |
23/12/2011 | 88
24/12/2011 | 18 Table 3
25/12/2011 | 38 Year | Week | SUM1 | Year | Week | SUM2 | % Change
26/12/2011 | 16 ------+--------+--------+--------+--------+--------+-----------
27/12/2011 | 66 2012 | 1 | 295 | 2012 | 0 | 39 | 656.41%
28/12/2011 | 21 2012 | 0 | 39 | 2011 | 52 | 205 | -80.98%
29/12/2011 | 79 2011 | 52 | 205 | 2011 | 51 | 352 | -41.76%
30/12/2011 | 7 2011 | 51 | 352 |
31/12/2011 | 16
01/01/2012 | 39
02/01/2012 | 17
03/01/2012 | 86
04/01/2012 | 55
05/01/2012 | 82
06/01/2012 | 0
07/01/2012 | 9
08/01/2012 | 46
My preference would be to run 1 query to aggregate Table 1 to the year/week level and then do the "% change" in another language, depending on your environment. However, if you truly needed a SQL-only solution, you could do something like this.
create table t1 as
select year(Date) as year, week(Date) as week, sum(Value) as totalvalue
from table1
group by year(Date) as year, week(Date) as week
order by Date desc
;
select a.year, a.month, a.totalvalue,
(a.totalvalue-b.totalvalue)/b.totalvalue as pct_change
from (
select year, month, totalvalue,
case when week>1 then week-1 else 52 end as prevweek,
case when week>1 then year else year-1 end as prevyear
from t1
) a
left outer join t1 b
on a.prevweek=b.week and a.prevyear =b.year
;