Checking for Consecutive 12 Weeks of 0 Sales - sql

I have a table with customer_number, week, and sales. I need to check if there were 12 consecutive weeks of no sales for each customer and create a flag of 0/1.
I can check the last 12 weeks or a certain time frame, but what's the best way to check for consecutive runs? Here is the code I have so far:
select * from weekly_sales
where customer_nbr in (123, 234)
and week < '2015-11-01'
and week > '2014-11-01'
order by customer_nbr, week
;

Sql Fiddle Demo
Here is a simplify version only need a week_id and sales
SELECT S1.weekid start_week, MAX(S2.weekid) end_week, SUM (S2.sales)
FROM Sales S1
JOIN Sales S2
ON S2.weekid BETWEEN S1.weekid and S1.weekid + 11
WHERE S1.weekid BETWEEN 1 and 25 -- your search range
GROUP BY S1.weekid
Let me know if that work for you
OUTPUT
| start_week | end_week | |
|------------|----------|----|
| 1 | 12 | 12 |
| 2 | 13 | 8 |
| 3 | 14 | 3 |
| 4 | 15 | 2 |
| 5 | 16 | 0 | <-
| 6 | 17 | 0 | <- no sales for 12 week
| 7 | 18 | 0 | <-
| 8 | 19 | 4 |
| 9 | 20 | 9 |
| 10 | 21 | 11 |
| 11 | 22 | 15 |
| 12 | 23 | 71 |
| 13 | 24 | 78 |
| 14 | 25 | 86 |
| 15 | 25 | 86 | < - less than 12 week range
| 16 | 25 | 86 | < - below this line
| 17 | 25 | 86 |
| 18 | 25 | 86 |
| 19 | 25 | 86 |
| 20 | 25 | 82 |
| 21 | 25 | 77 |
| 22 | 25 | 75 |
| 23 | 25 | 71 |
| 24 | 25 | 15 |
| 25 | 25 | 8 |
Your final query should have
HAVING SUM (S2.sales) = 0
AND COUNT(*) = 12

Ummmmm...You could use between 'week' and 'week', and you can use too the "count(column)" in order to improve performance.
So you only have to compare if result is bigger than 0

Related

Integrate row information with previous rows SQL

I need to integrate row information with previous rows
| ID | no| number |
+--------------+-----+--------+
| 1 | 40| 10 |
| 2 | 32| 12 |
| 3 | 40| 15 |
| 4 | 45| 23 |
| 5 | 32| 15 |
| 6 | 12| 14
| 7 | 40| 20
| 8 | 32| 18
| 9 | 45| 27
| 10 | 12| 16
Desired result :
| ID | no | number | last number
+--------------+-----+--------+-------------
| 1 | 40 | 10 | 0
| 3 | 32 | 12 | 0
| 3 | 40 | 15 | 0
| 4 | 45 | 23 | 0
| 5 | 32 | 15 | 12
| 6 | 12 | 14 | 0
| 7 | 40 | 20 | 15
| 8 | 32 | 18 | 15
| 9 | 45 | 27 | 23
| 10 | 12 | 16 | 14
The best guess from me is - you are looking for a script as below. But according to the below logic, row with "id = 3" should get 10 as value in the column 'last number'
You can check the DEMO HERE
SELECT *,
ISNULL
(
(
SELECT number
FROM your_table C
WHERE C.ID =
(
SELECT MAX(ID) FROM your_table B WHERE B.ID < A.ID AND B.no = A.No
)
)
,0) [last number]
FROM your_table A
Output is-
ID no number last number
1 40 10 0
2 32 12 0
3 40 15 10
4 45 23 0
5 32 15 12
6 12 14 0
7 40 20 15
8 32 18 15
9 45 27 23
10 12 16 14

Get RMSE score while fetching data from the Table directly.Write a query for that

I have a table in the Database which has many features each feature is having its own actual and predicted value in its and we have two more column which is Id_partner and Id_accounts.My main goal is to get the RMSE score for each feature for each accounts in each partners, I have done that with the for loop but it is taking hell lot of time to complete in PySpark is there an efficient way of doing that directly with the help of query while reading the data only so I get the RMSE score for each accounts in each partner.
My Table is something like this
Actual_Feature_1 = Act_F_1
Predicted_Feature_1 = Pred_F_1
Actual_Feature_1 = Act_F_2
Predicted_Feature_1 = Pred_F_2
Table 1:
ID_PARTNER | ID_ACCOUNT | Act_F_1 | Pred_F_1 | Act_F_2 | Pred_F_2 |
4 | 24 | 10 | 12 | 22 | 20 |
4 | 24 | 11 | 13 | 23 | 21 |
4 | 24 | 11 | 12 | 24 | 23 |
4 | 25 | 13 | 15 | 22 | 20 |
4 | 25 | 15 | 12 | 21 | 20 |
4 | 25 | 15 | 14 | 21 | 21 |
4 | 27 | 13 | 12 | 35 | 32 |
4 | 27 | 12 | 16 | 34 | 31 |
4 | 27 | 17 | 14 | 36 | 34 |
5 | 301 | 19 | 17 | 56 | 54 |
5 | 301 | 21 | 20 | 58 | 54 |
5 | 301 | 22 | 19 | 59 | 57 |
5 | 301 | 24 | 22 | 46 | 50 |
5 | 301 | 25 | 22 | 49 | 54 |
5 | 350 | 12 | 10 | 67 | 66 |
5 | 350 | 12 | 11 | 65 | 64 |
5 | 350 | 14 | 13 | 68 | 67 |
5 | 350 | 15 | 12 | 61 | 61 |
5 | 350 | 12 | 10 | 63 | 60 |
7 | 420 | 51 | 49 | 30 | 29 |
7 | 420 | 51 | 48 | 32 | 30 |
7 | 410 | 49 | 45 | 81 | 79 |
7 | 410 | 48 | 44 | 83 | 80 |
7 | 410 | 45 | 43 | 84 | 81 |
I need the RMSE score for each account in each partners in this format
Resulted Table :
ID_PARTNER | ID_ACCOUNT | FEATURE_1 | FEATURE_2 |
4 | 24 | rmse_score | rmse_score |
4 | 25 | rmse_score | rmse_score |
4 | 27 | rmse_score | rmse_score |
5 | 301 | rmse_score | rmse_score |
5 | 350 | rmse_score | rmse_score |
7 | 420 | rmse_score | rmse_score |
7 | 410 | rmse_score | rmse_score |
Note : For this we need to do consideration of both id_account and id_partner by seeing the above table i.e actual table we see that id_accounts can be just used for getting rmse but different id_partner can have the same accounts as other partner is having.
I need an SQL query that provides the resulted table directly while reading the table from the database.
Yes, you can calculate the root-mean-square-error in SQL.
SELECT ID_PARTNER, ID_ACCOUNT
, SQRT(Avg( POWER(Act_F_1 - Pred_F_1 , 2) ) ) as feature_1_rmse
FROM ...
GROUP BY ID_PARTNER, ID_ACCOUNT

Dynamic variable T-SQL (using stored procedure)

So I have got a table-valued function with parameters :
SampleProcedure(#date,#par1,#par2,#par3)
Date variable is an INT , for example :
#date int = 20170102
What I would like to do is to iterate through next days until EOF or specific , predefined date , so the #date variable should change once the previous iteration is done. Other parameters are not changing.
What approach should I take? I was wondering if I should use cursors , but I don't really understand them at the moment - I'd be thankful if anyone explains me them at this example (iteration through dates as ints).
EDIT :
More specific case :
I have got GetDailyUsageReal and GetDailyUsageForecast stored procedures.
GetDailyUsageReal(#date,#par1,#par2)
GetDailyUsageForecast(#date,#par1,#par2)
My input :
DECLARE #date int = 20170102,
#par1 INT = 4000,
#par2 INT = 1,
;WITH CTE as (SELECT Hour, SUM(CAST(UsReal AS DECIMAL(19, 6))) / 1000000 as Real, Day
FROM GetDailyUsageReal(#date,#par1,#par2)
Group BY Hour,Day),
CTE2 as (SELECT Hour, SUM(CAST(UsForecast AS DECIMAL(19, 6))) / 1000000 as Forecast, Day
FROM GetDailyUsageForecast(#date,#par1,#par2)
Group BY Hour,Day)
SELECT cte.Hour, Real, cte2.Forecast , cte.Day
FROM CTE
JOIN CTE2 on cte.hour=cte2.hour AND cte.day=cte2.day
ORDER BY cte.hour
The output is :
+------+------+----------+----------+--+
| Hour | Real | Forecast | Day | |
+------+------+----------+----------+--+
| 1 | 10 | 12 | 20170102 | |
| 5 | 24 | 23 | 20170102 | |
| 7 | 24 | 22 | 20170102 | |
| 8 | 27 | 27 | 20170102 | |
| 9 | 26 | 21 | 20170102 | |
| 10 | 21 | 21 | 20170102 | |
| 11 | 11 | 12 | 20170102 | |
| 12 | 25 | 24 | 20170102 | |
| 13 | 17 | 18 | 20170102 | |
| 14 | 18 | 19 | 20170102 | |
| 15 | 26 | 25 | 20170102 | |
| 16 | 22 | 21 | 20170102 | |
| 17 | 23 | 23 | 20170102 | |
| 18 | 24 | 23 | 20170102 | |
| 19 | 19 | 18 | 20170102 | |
| 20 | 10 | 11 | 20170102 | |
| 21 | 11 | 13 | 20170102 | |
| 22 | 18 | 16 | 20170102 | |
| 23 | 19 | 17 | 20170102 | |
| 24 | 11 | 13 | 20170102 | |
+------+------+----------+----------+--+
What I want to get is basically output for the next days, let's say until 2019 (there's some data even for 2019 in my DB).
So what I need is the iteration of date. I have no access to change #date data type to DATE.
#EDIT2 :
My expected output :
+------+------+----------+----------+--+
| Hour | Real | Forecast | Day | |
+------+------+----------+----------+--+
| 1 | 10 | 12 | 20170102 | |
| 5 | 24 | 23 | 20170102 | |
| 7 | 24 | 22 | 20170102 | |
| 8 | 27 | 27 | 20170102 | |
| 9 | 26 | 21 | 20170102 | |
| 10 | 21 | 21 | 20170102 | |
| 11 | 11 | 12 | 20170102 | |
| 12 | 25 | 24 | 20170102 | |
| 13 | 17 | 18 | 20170102 | |
| 14 | 18 | 19 | 20170102 | |
| 15 | 26 | 25 | 20170102 | |
| 16 | 22 | 21 | 20170102 | |
| 17 | 23 | 23 | 20170102 | |
| 18 | 24 | 23 | 20170102 | |
| 19 | 19 | 18 | 20170102 | |
| 20 | 10 | 11 | 20170102 | |
| 21 | 11 | 13 | 20170102 | |
| 22 | 18 | 16 | 20170102 | |
| 23 | 19 | 17 | 20170102 | |
| 24 | 11 | 13 | 20170102 | |
| 1 | 15 | 14 | 20170103 | |
| 5 | 18 | 11 | 20170103 | |
| 7 | 26 | 44 | 20170103 | |
| 8 | 21 | 33 | 20170103 | |
| 9 | 22 | 12 | 20170103 | |
| 10 | 21 | 21 | 20170103 | |
| 11 | 11 | 12 | 20170103 | |
| 12 | 15 | 12 | 20170103 | |
| 13 | 17 | 18 | 20170103 | |
| 14 | 18 | 19 | 20170103 | |
| 15 | 26 | 25 | 20170103 | |
| 16 | 22 | 21 | 20170103 | |
| 17 | 23 | 23 | 20170103 | |
| 18 | 24 | 23 | 20170103 | |
| 19 | 19 | 18 | 20170103 | |
| 20 | 10 | 11 | 20170103 | |
| 21 | 11 | 13 | 20170103 | |
| 22 | 18 | 16 | 20170103 | |
| 23 | 19 | 17 | 20170103 | |
| 24 | 11 | 13 | 20170103 | |
+------+------+----------+----------+--+
I just want to have values from dates between selected range ,or range from selected day till end of file - last row in DB basing on day (so the last day could be for example 20210131). I want to have them in one result table, as shown above.
#EDIT after changes :
Output :
+------+-----------+-----------+----------+
| Hour | Real | Forecast | Workdate |
+------+-----------+-----------+----------+
| 20 | 11.831587 | 15.140129 | 20170101 |
| 21 | 11.659364 | 15.003950 | 20170101 |
| 22 | 11.111199 | 14.736179 | 20170101 |
| 23 | 11.075579 | 14.812968 | 20170101 |
| NULL | NULL | NULL | NULL |
| 1 | 9.930323 | 12.856905 | 20170102 |
| 2 | 9.826946 | 12.741908 | 20170102 |
+------+-----------+-----------+----------+
#Pejczi, I have done this logic for you. You need a CTE to build all the dates that you are interested in. Then join the table function with an outer apply - this ensures that a valid date is passed to the function and thus returns the hour and forecast/real column for each date.
Let me know how it goes:
DECLARE #StartDate DATE='20170101'
DECLARE #EndDate DATE='20180601'--current_timestamp
DECLARE #Dates TABLE(
Workdate DATE Primary Key
)
;WITH Dates AS(
SELECT Workdate=#StartDate
UNION ALL
SELECT CurrDate=DateAdd(DAY,1,Workdate) FROM Dates WHERE Workdate<#EndDate
)
SELECT *
FROM
Dates D
OUTER APPLY
(
SELECT Hour, SUM(CAST(UsForecast AS DECIMAL(19, 6))) / 1000000 as Real, Day as WorkDate
FROM GetDailyUsageReal(CONVERT(CHAR(8),D.Workdate,112),#par1,#par2)
GROUP BY
Hour,Day
)R
OUTER APPLY
(
SELECT Hour, SUM(CAST(UsForecast AS DECIMAL(19, 6))) / 1000000 as Forecast, Day as WorkDate
FROM GetDailyUsageForecast(CONVERT(CHAR(8),D.Workdate,112),#par1,#par2)
GROUP BY
Hour,Day
)F
ORDER BY
d.Workdate
option (maxrecursion 0);
#openshac suggestion is valid. You should store date as DATE datatype and using StartDate/EndDate it will make it easier to query. See if you can replace:
DECLARE #DATE DATE ='20170102'
with
DECLARE #StartDate DATE ='20170102'
DECLARE #EndDate DATE ='20180102'
This version expands on the CTE dates and adds hour column, so you can join the real/forcasted table functions on the Hour.
DECLARE #StartDate DATETIME='20170101'
DECLARE #EndDate DATETIME='20170201'--current_timestamp
DECLARE #Dates TABLE(
Workdate DATE Primary Key
)
;WITH Dates AS(
SELECT Workdate=#StartDate,WorkHour=DATEPART(HOUR,#StartDate)+1
UNION ALL
SELECT CurrDate=DateAdd(HH,1,Workdate),DATEPART(HOUR,DateAdd(HH,1,Workdate))+1 FROM Dates WHERE Workdate<#EndDate
)
SELECT Workdate=CAST(Workdate AS date),WorkHour
FROM
Dates D
OUTER APPLY
(
SELECT Hour, SUM(CAST(UsForecast AS DECIMAL(19, 6))) / 1000000 as Real, Day as WorkDate
FROM GetDailyUsageReal(CONVERT(CHAR(8),D.Workdate,112),#par1,#par2) R
WHERE R.Hour=D.WorkHour
GROUP BY
Hour,Day
)R
OUTER APPLY
(
SELECT Hour, SUM(CAST(UsForecast AS DECIMAL(19, 6))) / 1000000 as Forecast, Day as WorkDate
FROM GetDailyUsageForecast(CONVERT(CHAR(8),D.Workdate,112),#par1,#par2) F
WHERE F.Hour=D.WorkHour
GROUP BY
Hour,Day
)F
option (maxrecursion 0);

postgres, add row when a value is missing

Forgive what may be a silly question, but I'm not much of a database guru.
I have a table with three columns. Here's a sample:
stationtest | id_date | val_no3
------------+-------------+---------
27 | 1 |
27 | 2 | 7
27 | 25 |
27 | 50 | 8
27 | 75 | 9
27 | 100 | 10
30 | 1 |
30 | 14 | 7
30 | 25 |
30 | 65 | 8
30 | 75 | 9
30 | 100 | 10
I would like to have a new table that have one row for each value id_date missing and it combines stationtest number,
like this one :
stationtest | id_date | val_no3
------------+-------------+---------
27 | 1 |
27 | 2 | 7
27 | 3 |
27 | 4 |
27 | 5 |
27 | 6 |
27 | (...) |
27 | 25 |
27 | 26 |
27 | 27 |
27 | (...) |
27 | 50 | 8
27 | (...) |
27 | 75 | 9
27 | (...) |
27 | 98 |
27 | 99 |
27 | 100 | 10
30 | 1 |
30 | 2 | 7
30 | 3 |
30 | 4 |
30 | 5 |
30 | 6 |
30 | (...) |
30 | 25 |
30 | 26 |
30 | 27 |
30 | (...) |
30 | 50 | 8
30 | 75 | 9
30 | (...) |
30 | 98 |
30 | 99 |
30 | 100 | 10
I have this query but i don't know how to make it work for each stationtest :
insert into tabletest (id_date)
select i
from generate_series(1, (select max(id_date) from tabletest)) i
left join tabletest on tabletest.id_date = i
where tabletest.id_date is null;
It is possible ? Thank you for help.
Try this:
DO $$
DECLARE
st_test integer;
i integer;
BEGIN
FOR st_test in (SELECT distinct stationtest FROM tabletest) LOOP
EXECUTE 'INSERT INTO test(stationtest, id_date) SELECT $1 as stationtest, generate_series as id_date FROM generate_series((SELECT min(id_date) FROM test), (SELECT max(id_date) FROM test))' USING st_test;
END LOOP;
END;
$$;
I don't have the data handy, but the general format should work.

SQL Joining 2 Tables

I would like to merge two tables into one and also add a counter next to that. What i have now is
SELECT [CUCY_DATA].*, [DIM].[Col1], [DIM].[Col2],
(SELECT COUNT([Cut Counter]) FROM [MSD]
WHERE [CUCY_DATA].[Cut Counter] = [MSD].[Cut Counter]
) AS [Nr Of Errors]
FROM [CUCY_DATA] FULL JOIN [DIM]
ON [CUCY_DATA].[Cut Counter] = [DIM].[Cut Counter]
This way the data is inserted but where the values don't match nulls are inserted. I want for instance this
Table CUCY_DATA
|_Cut Counter_|_Data1_|_Data2_|
| 1 | 12 | 24 |
| 2 | 13 | 26 |
| 3 | 10 | 20 |
| 4 | 11 | 22 |
Table DIM
|_Cut Counter_|_Col1_|_Col2_|
| 1 | 25 | 40 |
| 3 | 50 | 45 |
And they need to be merged into:
|_Cut Counter_|_Data1_|_Data2_|_Col1_|_Col2_|
| 1 | 12 | 24 | 25 | 40 |
| 2 | 13 | 26 | 25 | 40 |
| 3 | 10 | 20 | 50 | 45 |
| 4 | 11 | 22 | 50 | 45 |
SO THIS IS WRONG:
|_Cut Counter_|_Data1_|_Data2_|_Col1__|_Col2__|
| 1 | 12 | 24 | 25 | 40 |
| 2 | 13 | 26 | NULL | NULL |
| 3 | 10 | 20 | 50 | 45 |
| 4 | 11 | 22 | NULL | NULL |
Kind regards, Bob
How are you getting the col1 and col2 values where there is no corresponding row in your DIM table? (Rows 2 and 4). Your "wrong" result is exactly correct, that's what the outer join does.