postgres, add row when a value is missing - sql

Forgive what may be a silly question, but I'm not much of a database guru.
I have a table with three columns. Here's a sample:
stationtest | id_date | val_no3
------------+-------------+---------
27 | 1 |
27 | 2 | 7
27 | 25 |
27 | 50 | 8
27 | 75 | 9
27 | 100 | 10
30 | 1 |
30 | 14 | 7
30 | 25 |
30 | 65 | 8
30 | 75 | 9
30 | 100 | 10
I would like to have a new table that have one row for each value id_date missing and it combines stationtest number,
like this one :
stationtest | id_date | val_no3
------------+-------------+---------
27 | 1 |
27 | 2 | 7
27 | 3 |
27 | 4 |
27 | 5 |
27 | 6 |
27 | (...) |
27 | 25 |
27 | 26 |
27 | 27 |
27 | (...) |
27 | 50 | 8
27 | (...) |
27 | 75 | 9
27 | (...) |
27 | 98 |
27 | 99 |
27 | 100 | 10
30 | 1 |
30 | 2 | 7
30 | 3 |
30 | 4 |
30 | 5 |
30 | 6 |
30 | (...) |
30 | 25 |
30 | 26 |
30 | 27 |
30 | (...) |
30 | 50 | 8
30 | 75 | 9
30 | (...) |
30 | 98 |
30 | 99 |
30 | 100 | 10
I have this query but i don't know how to make it work for each stationtest :
insert into tabletest (id_date)
select i
from generate_series(1, (select max(id_date) from tabletest)) i
left join tabletest on tabletest.id_date = i
where tabletest.id_date is null;
It is possible ? Thank you for help.

Try this:
DO $$
DECLARE
st_test integer;
i integer;
BEGIN
FOR st_test in (SELECT distinct stationtest FROM tabletest) LOOP
EXECUTE 'INSERT INTO test(stationtest, id_date) SELECT $1 as stationtest, generate_series as id_date FROM generate_series((SELECT min(id_date) FROM test), (SELECT max(id_date) FROM test))' USING st_test;
END LOOP;
END;
$$;
I don't have the data handy, but the general format should work.

Related

Reset sum when condition is met in Oracle

My data is structured as follows:
Timestamp | Hour | Count
--------------------------
20190801 01 | 1 | 10
20190801 02 | 2 | 20
20190801 03 | 3 | 10
20190801 04 | 4 | 5
20190801 05 | 5 | 15
20190801 06 | 6 | 10
20190802 01 | 1 | 5
20190802 02 | 2 | 20
20190802 03 | 3 | 5
20190802 04 | 4 | 15
20190802 05 | 5 | 20
20190802 06 | 6 | 5
20190803 01 | 1 | 30
I'm trying to make an SQL query that will calculate a running SUM but resets when the hour is 3. The result should look like this:
Hour | Count | SUM
------------------
1 | 10 | 10
2 | 20 | 30
3 | 10 | 10 /* RESET */
4 | 5 | 15
5 | 15 | 30
6 | 10 | 40
1 | 5 | 45
2 | 20 | 65
3 | 5 | 5 /* RESET */
4 | 15 | 20
5 | 20 | 40
6 | 5 | 45
1 | 30 | 75
You could create subgroup using conditional sum:
WITH cte AS (
SELECT t.*,SUM(CASE WHEN hour=3 THEN 1 ELSE 0 END) OVER(ORDER BY timestamp) grp
FROM t
)
SELECT cte.*, SUM(Count) OVER(PARTITION BY grp ORDER BY timestamp) AS total
FROM cte

Integrate row information with previous rows SQL

I need to integrate row information with previous rows
| ID | no| number |
+--------------+-----+--------+
| 1 | 40| 10 |
| 2 | 32| 12 |
| 3 | 40| 15 |
| 4 | 45| 23 |
| 5 | 32| 15 |
| 6 | 12| 14
| 7 | 40| 20
| 8 | 32| 18
| 9 | 45| 27
| 10 | 12| 16
Desired result :
| ID | no | number | last number
+--------------+-----+--------+-------------
| 1 | 40 | 10 | 0
| 3 | 32 | 12 | 0
| 3 | 40 | 15 | 0
| 4 | 45 | 23 | 0
| 5 | 32 | 15 | 12
| 6 | 12 | 14 | 0
| 7 | 40 | 20 | 15
| 8 | 32 | 18 | 15
| 9 | 45 | 27 | 23
| 10 | 12 | 16 | 14
The best guess from me is - you are looking for a script as below. But according to the below logic, row with "id = 3" should get 10 as value in the column 'last number'
You can check the DEMO HERE
SELECT *,
ISNULL
(
(
SELECT number
FROM your_table C
WHERE C.ID =
(
SELECT MAX(ID) FROM your_table B WHERE B.ID < A.ID AND B.no = A.No
)
)
,0) [last number]
FROM your_table A
Output is-
ID no number last number
1 40 10 0
2 32 12 0
3 40 15 10
4 45 23 0
5 32 15 12
6 12 14 0
7 40 20 15
8 32 18 15
9 45 27 23
10 12 16 14

Get RMSE score while fetching data from the Table directly.Write a query for that

I have a table in the Database which has many features each feature is having its own actual and predicted value in its and we have two more column which is Id_partner and Id_accounts.My main goal is to get the RMSE score for each feature for each accounts in each partners, I have done that with the for loop but it is taking hell lot of time to complete in PySpark is there an efficient way of doing that directly with the help of query while reading the data only so I get the RMSE score for each accounts in each partner.
My Table is something like this
Actual_Feature_1 = Act_F_1
Predicted_Feature_1 = Pred_F_1
Actual_Feature_1 = Act_F_2
Predicted_Feature_1 = Pred_F_2
Table 1:
ID_PARTNER | ID_ACCOUNT | Act_F_1 | Pred_F_1 | Act_F_2 | Pred_F_2 |
4 | 24 | 10 | 12 | 22 | 20 |
4 | 24 | 11 | 13 | 23 | 21 |
4 | 24 | 11 | 12 | 24 | 23 |
4 | 25 | 13 | 15 | 22 | 20 |
4 | 25 | 15 | 12 | 21 | 20 |
4 | 25 | 15 | 14 | 21 | 21 |
4 | 27 | 13 | 12 | 35 | 32 |
4 | 27 | 12 | 16 | 34 | 31 |
4 | 27 | 17 | 14 | 36 | 34 |
5 | 301 | 19 | 17 | 56 | 54 |
5 | 301 | 21 | 20 | 58 | 54 |
5 | 301 | 22 | 19 | 59 | 57 |
5 | 301 | 24 | 22 | 46 | 50 |
5 | 301 | 25 | 22 | 49 | 54 |
5 | 350 | 12 | 10 | 67 | 66 |
5 | 350 | 12 | 11 | 65 | 64 |
5 | 350 | 14 | 13 | 68 | 67 |
5 | 350 | 15 | 12 | 61 | 61 |
5 | 350 | 12 | 10 | 63 | 60 |
7 | 420 | 51 | 49 | 30 | 29 |
7 | 420 | 51 | 48 | 32 | 30 |
7 | 410 | 49 | 45 | 81 | 79 |
7 | 410 | 48 | 44 | 83 | 80 |
7 | 410 | 45 | 43 | 84 | 81 |
I need the RMSE score for each account in each partners in this format
Resulted Table :
ID_PARTNER | ID_ACCOUNT | FEATURE_1 | FEATURE_2 |
4 | 24 | rmse_score | rmse_score |
4 | 25 | rmse_score | rmse_score |
4 | 27 | rmse_score | rmse_score |
5 | 301 | rmse_score | rmse_score |
5 | 350 | rmse_score | rmse_score |
7 | 420 | rmse_score | rmse_score |
7 | 410 | rmse_score | rmse_score |
Note : For this we need to do consideration of both id_account and id_partner by seeing the above table i.e actual table we see that id_accounts can be just used for getting rmse but different id_partner can have the same accounts as other partner is having.
I need an SQL query that provides the resulted table directly while reading the table from the database.
Yes, you can calculate the root-mean-square-error in SQL.
SELECT ID_PARTNER, ID_ACCOUNT
, SQRT(Avg( POWER(Act_F_1 - Pred_F_1 , 2) ) ) as feature_1_rmse
FROM ...
GROUP BY ID_PARTNER, ID_ACCOUNT

Checking for Consecutive 12 Weeks of 0 Sales

I have a table with customer_number, week, and sales. I need to check if there were 12 consecutive weeks of no sales for each customer and create a flag of 0/1.
I can check the last 12 weeks or a certain time frame, but what's the best way to check for consecutive runs? Here is the code I have so far:
select * from weekly_sales
where customer_nbr in (123, 234)
and week < '2015-11-01'
and week > '2014-11-01'
order by customer_nbr, week
;
Sql Fiddle Demo
Here is a simplify version only need a week_id and sales
SELECT S1.weekid start_week, MAX(S2.weekid) end_week, SUM (S2.sales)
FROM Sales S1
JOIN Sales S2
ON S2.weekid BETWEEN S1.weekid and S1.weekid + 11
WHERE S1.weekid BETWEEN 1 and 25 -- your search range
GROUP BY S1.weekid
Let me know if that work for you
OUTPUT
| start_week | end_week | |
|------------|----------|----|
| 1 | 12 | 12 |
| 2 | 13 | 8 |
| 3 | 14 | 3 |
| 4 | 15 | 2 |
| 5 | 16 | 0 | <-
| 6 | 17 | 0 | <- no sales for 12 week
| 7 | 18 | 0 | <-
| 8 | 19 | 4 |
| 9 | 20 | 9 |
| 10 | 21 | 11 |
| 11 | 22 | 15 |
| 12 | 23 | 71 |
| 13 | 24 | 78 |
| 14 | 25 | 86 |
| 15 | 25 | 86 | < - less than 12 week range
| 16 | 25 | 86 | < - below this line
| 17 | 25 | 86 |
| 18 | 25 | 86 |
| 19 | 25 | 86 |
| 20 | 25 | 82 |
| 21 | 25 | 77 |
| 22 | 25 | 75 |
| 23 | 25 | 71 |
| 24 | 25 | 15 |
| 25 | 25 | 8 |
Your final query should have
HAVING SUM (S2.sales) = 0
AND COUNT(*) = 12
Ummmmm...You could use between 'week' and 'week', and you can use too the "count(column)" in order to improve performance.
So you only have to compare if result is bigger than 0

SQL Joining 2 Tables

I would like to merge two tables into one and also add a counter next to that. What i have now is
SELECT [CUCY_DATA].*, [DIM].[Col1], [DIM].[Col2],
(SELECT COUNT([Cut Counter]) FROM [MSD]
WHERE [CUCY_DATA].[Cut Counter] = [MSD].[Cut Counter]
) AS [Nr Of Errors]
FROM [CUCY_DATA] FULL JOIN [DIM]
ON [CUCY_DATA].[Cut Counter] = [DIM].[Cut Counter]
This way the data is inserted but where the values don't match nulls are inserted. I want for instance this
Table CUCY_DATA
|_Cut Counter_|_Data1_|_Data2_|
| 1 | 12 | 24 |
| 2 | 13 | 26 |
| 3 | 10 | 20 |
| 4 | 11 | 22 |
Table DIM
|_Cut Counter_|_Col1_|_Col2_|
| 1 | 25 | 40 |
| 3 | 50 | 45 |
And they need to be merged into:
|_Cut Counter_|_Data1_|_Data2_|_Col1_|_Col2_|
| 1 | 12 | 24 | 25 | 40 |
| 2 | 13 | 26 | 25 | 40 |
| 3 | 10 | 20 | 50 | 45 |
| 4 | 11 | 22 | 50 | 45 |
SO THIS IS WRONG:
|_Cut Counter_|_Data1_|_Data2_|_Col1__|_Col2__|
| 1 | 12 | 24 | 25 | 40 |
| 2 | 13 | 26 | NULL | NULL |
| 3 | 10 | 20 | 50 | 45 |
| 4 | 11 | 22 | NULL | NULL |
Kind regards, Bob
How are you getting the col1 and col2 values where there is no corresponding row in your DIM table? (Rows 2 and 4). Your "wrong" result is exactly correct, that's what the outer join does.