Get RMSE score while fetching data from the Table directly.Write a query for that - sql

I have a table in the Database which has many features each feature is having its own actual and predicted value in its and we have two more column which is Id_partner and Id_accounts.My main goal is to get the RMSE score for each feature for each accounts in each partners, I have done that with the for loop but it is taking hell lot of time to complete in PySpark is there an efficient way of doing that directly with the help of query while reading the data only so I get the RMSE score for each accounts in each partner.
My Table is something like this
Actual_Feature_1 = Act_F_1
Predicted_Feature_1 = Pred_F_1
Actual_Feature_1 = Act_F_2
Predicted_Feature_1 = Pred_F_2
Table 1:
ID_PARTNER | ID_ACCOUNT | Act_F_1 | Pred_F_1 | Act_F_2 | Pred_F_2 |
4 | 24 | 10 | 12 | 22 | 20 |
4 | 24 | 11 | 13 | 23 | 21 |
4 | 24 | 11 | 12 | 24 | 23 |
4 | 25 | 13 | 15 | 22 | 20 |
4 | 25 | 15 | 12 | 21 | 20 |
4 | 25 | 15 | 14 | 21 | 21 |
4 | 27 | 13 | 12 | 35 | 32 |
4 | 27 | 12 | 16 | 34 | 31 |
4 | 27 | 17 | 14 | 36 | 34 |
5 | 301 | 19 | 17 | 56 | 54 |
5 | 301 | 21 | 20 | 58 | 54 |
5 | 301 | 22 | 19 | 59 | 57 |
5 | 301 | 24 | 22 | 46 | 50 |
5 | 301 | 25 | 22 | 49 | 54 |
5 | 350 | 12 | 10 | 67 | 66 |
5 | 350 | 12 | 11 | 65 | 64 |
5 | 350 | 14 | 13 | 68 | 67 |
5 | 350 | 15 | 12 | 61 | 61 |
5 | 350 | 12 | 10 | 63 | 60 |
7 | 420 | 51 | 49 | 30 | 29 |
7 | 420 | 51 | 48 | 32 | 30 |
7 | 410 | 49 | 45 | 81 | 79 |
7 | 410 | 48 | 44 | 83 | 80 |
7 | 410 | 45 | 43 | 84 | 81 |
I need the RMSE score for each account in each partners in this format
Resulted Table :
ID_PARTNER | ID_ACCOUNT | FEATURE_1 | FEATURE_2 |
4 | 24 | rmse_score | rmse_score |
4 | 25 | rmse_score | rmse_score |
4 | 27 | rmse_score | rmse_score |
5 | 301 | rmse_score | rmse_score |
5 | 350 | rmse_score | rmse_score |
7 | 420 | rmse_score | rmse_score |
7 | 410 | rmse_score | rmse_score |
Note : For this we need to do consideration of both id_account and id_partner by seeing the above table i.e actual table we see that id_accounts can be just used for getting rmse but different id_partner can have the same accounts as other partner is having.
I need an SQL query that provides the resulted table directly while reading the table from the database.

Yes, you can calculate the root-mean-square-error in SQL.
SELECT ID_PARTNER, ID_ACCOUNT
, SQRT(Avg( POWER(Act_F_1 - Pred_F_1 , 2) ) ) as feature_1_rmse
FROM ...
GROUP BY ID_PARTNER, ID_ACCOUNT

Related

Select All rows WHERE ARRAY = "type"

I have a problem with a PostgreSQL query.
I have a table named "pokemons" with different columns :
id | name | pv | attaque | defense | attaque_spe | defense_spe | vitesse | type
-----+------------+-----+---------+---------+-------------+-------------+---------+------------------
1 | Bulbizarre | 45 | 49 | 49 | 65 | 65 | 45 | {Plante,Poison}
2 | Herbizarre | 60 | 62 | 63 | 80 | 80 | 60 | {Plante,Poison}
3 | Florizarre | 80 | 82 | 83 | 100 | 100 | 80 | {Plante,Poison}
4 | Salameche | 39 | 52 | 43 | 60 | 50 | 65 | {Feu}
5 | Reptincel | 58 | 64 | 58 | 80 | 65 | 80 | {Feu}
6 | Dracaufeu | 78 | 84 | 78 | 109 | 85 | 100 | {Feu,Vol}
7 | Carapuce | 44 | 48 | 65 | 50 | 64 | 43 | {Eau}
8 | Carabaffe | 59 | 63 | 80 | 65 | 80 | 58 | {Eau}
9 | Tortank | 79 | 83 | 100 | 85 | 105 | 78 | {Eau}
In this table, I want to select all the rows with a specific type.
Kind of things, I did try :
SELECT * FROM pokemons WHERE 'feu'=ANY(type);
I tried many things with ALL or stuff like that but can't get a single a row.
Can you help me there please.
Thanks.
Adding to the solution mentioned in the comment, you may also make a case insensitive comparison like this.
SELECT * FROM pokemons WHERE 'feu' ilike ANY(type);

How to reshape a table having multiple records for the same id into a table with one record per id without losing information?

Basically, I want to transform this(Initial) into this(Final). In other words, I want to
"squash" the initial table so that it will have only one record per id
"dilate" the initial table so that I won't lose any information: create a different column for every possible combination of source and column from the initial table (create c1_A, c1_B, ...).
I can work with the initial table as a csv in Python (maybe Pandas) and manually hardcode the mapping between the Initial and the Final table. However, I don't find this solution elegant at all and I'm much more interested in a sql / sas solution. Is there any way of doing that?
Edit: I what to change
+----+--------+------+-----+------+
| ID | source | c1 | c2 | c3 |
+----+--------+------+-----+------+
| 1 | A | 432 | 56 | 1 |
| 1 | B | 53 | 3 | 73 |
| 1 | C | 7 | 342 | 83 |
| 1 | D | 543 | 43 | 73 |
| 2 | A | 8 | 882 | 39 |
| 2 | B | 5 | 54 | 46 |
| 2 | C | 8 | 3 | 2226 |
| 2 | D | 87 | 2 | 45 |
| 3 | A | 93 | 143 | 45 |
| 3 | B | 1023 | 72 | 8 |
| 3 | C | 3 | 3 | 704 |
| 4 | A | 2 | 5 | 0 |
| 4 | B | 78 | 888 | 2 |
| 4 | C | 87 | 23 | 34 |
| 4 | D | 112 | 7 | 712 |
+----+--------+------+-----+------+
into
+----+------+------+------+------+------+------+------+------+------+------+------+------+
| ID | c1_A | c1_B | c1_C | c1_D | c2_A | c2_B | c2_C | c2_D | c3_A | c3_B | c3_C | c3_D |
+----+------+------+------+------+------+------+------+------+------+------+------+------+
| 1 | 432 | 53 | 7 | 543 | 56 | 3 | 342 | 43 | 1 | 73 | 83 | 73 |
| 2 | 8 | 5 | 8 | 87 | 882 | 54 | 3 | 2 | 39 | 46 | 2226 | 45 |
| 3 | 93 | 1023 | 3 | | 143 | 72 | 3 | | 45 | 8 | 704 | |
| 4 | 2 | 78 | 87 | 112 | 5 | 888 | 23 | 7 | 0 | 2 | 34 | 712 |
+----+------+------+------+------+------+------+------+------+------+------+------+------+
Abandon hope ... ?
data want;
input
ID source $ c1 c2 c3;datalines;
1 A 432 56 1
1 B 53 3 73
1 C 7 342 83
1 D 543 43 73
2 A 8 882 39
2 B 5 54 46
2 C 8 3 2226
2 D 87 2 45
3 A 93 143 45
3 B 1023 72 8
3 C 3 3 704
4 A 2 5 0
4 B 78 888 2
4 C 87 23 34
4 D 112 7 712
;
* one to grow you oh data;
proc transpose data=want out=stage1;
by id source;
var c1-c3;
run;
* and one to shrink;
proc transpose data=stage1 out=want(drop=_name_) delim=_;
by id;
id _name_ source;
run;

Pandas - Grouping Rows With Same Value in Dataframe

Here is the dataframe in question:
|City|District|Population| Code | ID |
| A | 4 | 2000 | 3 | 21 |
| A | 8 | 7000 | 3 | 21 |
| A | 38 | 3000 | 3 | 21 |
| A | 7 | 2000 | 3 | 21 |
| B | 34 | 3000 | 6 | 84 |
| B | 9 | 5000 | 6 | 84 |
| C | 4 | 9000 | 1 | 28 |
| C | 21 | 1000 | 1 | 28 |
| C | 32 | 5000 | 1 | 28 |
| C | 46 | 20 | 1 | 28 |
I want to regroup the population counts by city to have this kind of output:
|City|Population| Code | ID |
| A | 14000 | 3 | 21 |
| B | 8000 | 6 | 84 |
| C | 15020 | 1 | 28 |
df = df.groupby(['City', 'Code', 'ID'])['Population'].sum()
You can make a group by 'City', 'Code' and 'ID then make sum of 'population'.

Checking for Consecutive 12 Weeks of 0 Sales

I have a table with customer_number, week, and sales. I need to check if there were 12 consecutive weeks of no sales for each customer and create a flag of 0/1.
I can check the last 12 weeks or a certain time frame, but what's the best way to check for consecutive runs? Here is the code I have so far:
select * from weekly_sales
where customer_nbr in (123, 234)
and week < '2015-11-01'
and week > '2014-11-01'
order by customer_nbr, week
;
Sql Fiddle Demo
Here is a simplify version only need a week_id and sales
SELECT S1.weekid start_week, MAX(S2.weekid) end_week, SUM (S2.sales)
FROM Sales S1
JOIN Sales S2
ON S2.weekid BETWEEN S1.weekid and S1.weekid + 11
WHERE S1.weekid BETWEEN 1 and 25 -- your search range
GROUP BY S1.weekid
Let me know if that work for you
OUTPUT
| start_week | end_week | |
|------------|----------|----|
| 1 | 12 | 12 |
| 2 | 13 | 8 |
| 3 | 14 | 3 |
| 4 | 15 | 2 |
| 5 | 16 | 0 | <-
| 6 | 17 | 0 | <- no sales for 12 week
| 7 | 18 | 0 | <-
| 8 | 19 | 4 |
| 9 | 20 | 9 |
| 10 | 21 | 11 |
| 11 | 22 | 15 |
| 12 | 23 | 71 |
| 13 | 24 | 78 |
| 14 | 25 | 86 |
| 15 | 25 | 86 | < - less than 12 week range
| 16 | 25 | 86 | < - below this line
| 17 | 25 | 86 |
| 18 | 25 | 86 |
| 19 | 25 | 86 |
| 20 | 25 | 82 |
| 21 | 25 | 77 |
| 22 | 25 | 75 |
| 23 | 25 | 71 |
| 24 | 25 | 15 |
| 25 | 25 | 8 |
Your final query should have
HAVING SUM (S2.sales) = 0
AND COUNT(*) = 12
Ummmmm...You could use between 'week' and 'week', and you can use too the "count(column)" in order to improve performance.
So you only have to compare if result is bigger than 0

How to SUM from MySQL for every n record

I have a following result from query:
+---------------+------+------+------+------+------+------+------+-------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total |
+---------------+------+------+------+------+------+------+------+-------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 |
+---------------+------+------+------+------+------+------+------+-------+
I would like to insert a SUM before enter different order_main_id, it would be like this result:
+---------------+------+------+------+------+------+------+------+-------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total |
+---------------+------+------+------+------+------+------+------+-------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 |
| | 450 | 853 | 1107 | 1098 | 796 | 423 | 172 | 4899 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| | 50 | 70 | 70 | 70 | 40 | NULL | NULL | 300 |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 |
| | 107 | 144 | 144 | 70 | 35 | NULL | NULL | 500 |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 |
| | 21 | 45 | 51 | 41 | 21 | 3 | NULL | 182 |
+---------------+------+------+------+------+------+------+------+-------+
How to make this possible ?
You'll need to write a second Query which makes use of GROUP BY order_main_id.
Something like:
SELECT sum(S41+...) FROM yourTable GROUP BY orderMainId
K
You can actually do this in one query, but with a union all (really two queries, but the result sets are combined to make one awesome result set):
select
order_main_id,
S36,
S37,
S38,
S39,
S40,
S41,
S42,
S36 + S37 + S38 + S39 + S40 + S41 + S42 as total,
'Detail' as rowtype
from
tblA
union all
select
order_main_id,
sum(S36),
sum(S37),
sum(S38),
sum(S39),
sum(S40),
sum(S41),
sum(S42),
sum(S36 + S37 + S38 + S39 + S40 + S41 + S42),
'Summary' as rowtype
from
tblA
group by
order_main_id
order by
order_main_id, RowType
Remember that the order by affects the entirety of the union all, not just the last query. So, your resultset would look like this:
+---------------+------+------+------+------+------+------+------+-------+---------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total | rowtype |
+---------------+------+------+------+------+------+------+------+-------+---------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 | Detail |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 | Detail |
| 26 | 450 | 853 | 1107 | 1098 | 796 | 423 | 172 | 4899 | Summary |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 | Detail |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 | Detail |
| 35 | 21 | 45 | 51 | 41 | 21 | 3 | NULL | 182 | Summary |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 | Detail |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 | Detail |
| 38 | 50 | 70 | 70 | 70 | 40 | NULL | NULL | 300 | Summary |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 | Detail |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 | Detail |
| 39 | 107 | 144 | 144 | 70 | 35 | NULL | NULL | 500 | Summary |
+---------------+------+------+------+------+------+------+------+-------+---------+
This way, you know what is and what isn't a detail or summary row, and the order_main_id that it's for. You could always (and probably should) hide this column in your presentation layer.
For things like these I think you should use a reporting library(such as Crystal Reports), it'll save you a lot of trouble, check JasperReports and similar projects on osalt