How to reshape a table having multiple records for the same id into a table with one record per id without losing information? - sql

Basically, I want to transform this(Initial) into this(Final). In other words, I want to
"squash" the initial table so that it will have only one record per id
"dilate" the initial table so that I won't lose any information: create a different column for every possible combination of source and column from the initial table (create c1_A, c1_B, ...).
I can work with the initial table as a csv in Python (maybe Pandas) and manually hardcode the mapping between the Initial and the Final table. However, I don't find this solution elegant at all and I'm much more interested in a sql / sas solution. Is there any way of doing that?
Edit: I what to change
+----+--------+------+-----+------+
| ID | source | c1 | c2 | c3 |
+----+--------+------+-----+------+
| 1 | A | 432 | 56 | 1 |
| 1 | B | 53 | 3 | 73 |
| 1 | C | 7 | 342 | 83 |
| 1 | D | 543 | 43 | 73 |
| 2 | A | 8 | 882 | 39 |
| 2 | B | 5 | 54 | 46 |
| 2 | C | 8 | 3 | 2226 |
| 2 | D | 87 | 2 | 45 |
| 3 | A | 93 | 143 | 45 |
| 3 | B | 1023 | 72 | 8 |
| 3 | C | 3 | 3 | 704 |
| 4 | A | 2 | 5 | 0 |
| 4 | B | 78 | 888 | 2 |
| 4 | C | 87 | 23 | 34 |
| 4 | D | 112 | 7 | 712 |
+----+--------+------+-----+------+
into
+----+------+------+------+------+------+------+------+------+------+------+------+------+
| ID | c1_A | c1_B | c1_C | c1_D | c2_A | c2_B | c2_C | c2_D | c3_A | c3_B | c3_C | c3_D |
+----+------+------+------+------+------+------+------+------+------+------+------+------+
| 1 | 432 | 53 | 7 | 543 | 56 | 3 | 342 | 43 | 1 | 73 | 83 | 73 |
| 2 | 8 | 5 | 8 | 87 | 882 | 54 | 3 | 2 | 39 | 46 | 2226 | 45 |
| 3 | 93 | 1023 | 3 | | 143 | 72 | 3 | | 45 | 8 | 704 | |
| 4 | 2 | 78 | 87 | 112 | 5 | 888 | 23 | 7 | 0 | 2 | 34 | 712 |
+----+------+------+------+------+------+------+------+------+------+------+------+------+

Abandon hope ... ?
data want;
input
ID source $ c1 c2 c3;datalines;
1 A 432 56 1
1 B 53 3 73
1 C 7 342 83
1 D 543 43 73
2 A 8 882 39
2 B 5 54 46
2 C 8 3 2226
2 D 87 2 45
3 A 93 143 45
3 B 1023 72 8
3 C 3 3 704
4 A 2 5 0
4 B 78 888 2
4 C 87 23 34
4 D 112 7 712
;
* one to grow you oh data;
proc transpose data=want out=stage1;
by id source;
var c1-c3;
run;
* and one to shrink;
proc transpose data=stage1 out=want(drop=_name_) delim=_;
by id;
id _name_ source;
run;

Related

Get RMSE score while fetching data from the Table directly.Write a query for that

I have a table in the Database which has many features each feature is having its own actual and predicted value in its and we have two more column which is Id_partner and Id_accounts.My main goal is to get the RMSE score for each feature for each accounts in each partners, I have done that with the for loop but it is taking hell lot of time to complete in PySpark is there an efficient way of doing that directly with the help of query while reading the data only so I get the RMSE score for each accounts in each partner.
My Table is something like this
Actual_Feature_1 = Act_F_1
Predicted_Feature_1 = Pred_F_1
Actual_Feature_1 = Act_F_2
Predicted_Feature_1 = Pred_F_2
Table 1:
ID_PARTNER | ID_ACCOUNT | Act_F_1 | Pred_F_1 | Act_F_2 | Pred_F_2 |
4 | 24 | 10 | 12 | 22 | 20 |
4 | 24 | 11 | 13 | 23 | 21 |
4 | 24 | 11 | 12 | 24 | 23 |
4 | 25 | 13 | 15 | 22 | 20 |
4 | 25 | 15 | 12 | 21 | 20 |
4 | 25 | 15 | 14 | 21 | 21 |
4 | 27 | 13 | 12 | 35 | 32 |
4 | 27 | 12 | 16 | 34 | 31 |
4 | 27 | 17 | 14 | 36 | 34 |
5 | 301 | 19 | 17 | 56 | 54 |
5 | 301 | 21 | 20 | 58 | 54 |
5 | 301 | 22 | 19 | 59 | 57 |
5 | 301 | 24 | 22 | 46 | 50 |
5 | 301 | 25 | 22 | 49 | 54 |
5 | 350 | 12 | 10 | 67 | 66 |
5 | 350 | 12 | 11 | 65 | 64 |
5 | 350 | 14 | 13 | 68 | 67 |
5 | 350 | 15 | 12 | 61 | 61 |
5 | 350 | 12 | 10 | 63 | 60 |
7 | 420 | 51 | 49 | 30 | 29 |
7 | 420 | 51 | 48 | 32 | 30 |
7 | 410 | 49 | 45 | 81 | 79 |
7 | 410 | 48 | 44 | 83 | 80 |
7 | 410 | 45 | 43 | 84 | 81 |
I need the RMSE score for each account in each partners in this format
Resulted Table :
ID_PARTNER | ID_ACCOUNT | FEATURE_1 | FEATURE_2 |
4 | 24 | rmse_score | rmse_score |
4 | 25 | rmse_score | rmse_score |
4 | 27 | rmse_score | rmse_score |
5 | 301 | rmse_score | rmse_score |
5 | 350 | rmse_score | rmse_score |
7 | 420 | rmse_score | rmse_score |
7 | 410 | rmse_score | rmse_score |
Note : For this we need to do consideration of both id_account and id_partner by seeing the above table i.e actual table we see that id_accounts can be just used for getting rmse but different id_partner can have the same accounts as other partner is having.
I need an SQL query that provides the resulted table directly while reading the table from the database.
Yes, you can calculate the root-mean-square-error in SQL.
SELECT ID_PARTNER, ID_ACCOUNT
, SQRT(Avg( POWER(Act_F_1 - Pred_F_1 , 2) ) ) as feature_1_rmse
FROM ...
GROUP BY ID_PARTNER, ID_ACCOUNT

delete duplicate rows from my table

i need to delete all duplicate rows in my table - but leave only one row
MyTbl
====
Code | ID | Place | Qty | User
========================================
1 | 22 | 44 | 34 | 333
2 | 22 | 44 | 34 | 333
3 | 22 | 55 | 34 | 333
4 | 22 | 44 | 34 | 666
5 | 33 | 77 | 12 | 999
6 | 44 | 11 | 87 | 333
7 | 33 | 77 | 12 | 999
i need to see this:
Code | ID | Place | Qty | User
=======================================
1 | 22 | 44 | 34 | 333
3 | 22 | 55 | 34 | 333
4 | 22 | 44 | 34 | 666
5 | 33 | 77 | 12 | 999
6 | 44 | 11 | 87 | 333
In most databases, the fastest way to do this is:
select distinct t.*
into saved
from mytbl;
delete from mytbl;
insert into mytbl
select *
from saved;
The above syntax should work in Access. Other databases would use truncate table instead of delete.
Try this,
WITH CTEMyTbl (A,duplicateRecCount)
AS
(
SELECT id,ROW_NUMBER() OVER(PARTITION by id,place,qty,us ORDER BY id)
AS duplicateRecCount
FROM MyTbl
)
DELETE FROM CTEMyTbl
WHERE duplicateRecCount > 1

Oracle SQL: getting all row maximum number from specific multiple criteria

I have the following table named foo:
ID | D1 | D2 | D3 |
---------------------
1 | 47 | 3 | 71 |
2 | 47 | 98 | 82 |
3 | 0 | 99 | 3 |
4 | 3 | 100 | 6 |
5 | 48 | 10 | 3 |
6 | 49 | 12 | 4 |
I want to run a select query and have the results show like this
ID | D1 | D2 | D3 | Result |
------------------------------
1 | 47 | 3 | 71 | D3 |
2 | 47 | 98 | 82 | D2 |
3 | 0 | 99 | 3 | D2 |
4 | 3 | 100 | 6 | D2 |
5 | 48 | 10 | 3 | D1 |
6 | 49 | 12 | 4 | D1 |
So, basically I want to get Maximum value between D1, D2, D3 column divided by id.
As You may seen , ID 1 have D3 in the Result column since Maximum value between
D1 : D2 : D3
That Means 4 : 3 : 71 , Max value is 71. Thats Why The Result show 'D3'
Is there a way to do this in a sql query ?
Thanks!
For Oracle please try this one
select foo.*, case when greatest(d1, d2, d3) = d1 then 'D1'
when greatest(d1, d2, d3) = d2 then 'D2'
when greatest(d1, d2, d3) = d3 then 'D3'
end result
from foo
Consider the following - a normalized approach...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL
,d INT NOT NULL
,val INT NOT NULL
,PRIMARY KEY(id,d)
);
INSERT INTO my_table VALUES
(1,1,47),
(2,1,47),
(3,1,0),
(4,1,3),
(5,1,48),
(6,1,49),
(1,2,3),
(2,2,98),
(3,2,99),
(4,2,100),
(5,2,10),
(6,2,12),
(1,3,71),
(2,3,82),
(3,3,3),
(4,3,6),
(5,3,3),
(6,3,4);
SELECT * FROM my_table;
+----+---+-----+
| id | d | val |
+----+---+-----+
| 1 | 1 | 47 |
| 1 | 2 | 3 |
| 1 | 3 | 71 |
| 2 | 1 | 47 |
| 2 | 2 | 98 |
| 2 | 3 | 82 |
| 3 | 1 | 0 |
| 3 | 2 | 99 |
| 3 | 3 | 3 |
| 4 | 1 | 3 |
| 4 | 2 | 100 |
| 4 | 3 | 6 |
| 5 | 1 | 48 |
| 5 | 2 | 10 |
| 5 | 3 | 3 |
| 6 | 1 | 49 |
| 6 | 2 | 12 |
| 6 | 3 | 4 |
+----+---+-----+
SELECT x.*
FROM my_table x
JOIN
( SELECT id,MAX(val) max_val FROM my_table GROUP BY id) y
ON y.id = x.id
AND y.max_val = x.val;
+----+---+-----+
| id | d | val |
+----+---+-----+
| 1 | 3 | 71 |
| 2 | 2 | 98 |
| 3 | 2 | 99 |
| 4 | 2 | 100 |
| 5 | 1 | 48 |
| 6 | 1 | 49 |
+----+---+-----+
(This is intended as a MySQL solution - I'm not familiar with ORACLE syntax, so apologies if this doesn't port)
Does this answer your comment?
SELECT x.* , y.max_val
FROM my_table x
JOIN
( SELECT id,MAX(val) max_val FROM my_table GROUP BY id) y
ON y.id = x.id ;
+----+---+-----+---------+
| id | d | val | max_val |
+----+---+-----+---------+
| 1 | 1 | 47 | 71 |
| 1 | 2 | 3 | 71 |
| 1 | 3 | 71 | 71 |
| 2 | 1 | 47 | 98 |
| 2 | 2 | 98 | 98 |
| 2 | 3 | 82 | 98 |
| 3 | 1 | 0 | 99 |
| 3 | 2 | 99 | 99 |
| 3 | 3 | 3 | 99 |
| 4 | 1 | 3 | 100 |
| 4 | 2 | 100 | 100 |
| 4 | 3 | 6 | 100 |
| 5 | 1 | 48 | 48 |
| 5 | 2 | 10 | 48 |
| 5 | 3 | 3 | 48 |
| 6 | 1 | 49 | 49 |
| 6 | 2 | 12 | 49 |
| 6 | 3 | 4 | 49 |
+----+---+-----+---------+

Selecting records from Foreign Key table where multiple items match the condition

I am getting data from a third party in form of below tables in MS Access 2000 file format
Papers and
PaperTags
Below is the sample data from these tables.
Papers table and sample data
+----+----------+
| ID | PaperID |
+----+----------+
| 1 | 658 |
| 2 | 659 |
| 3 | 660 |
| 4 | 661 |
| 5 | 662 |
| 6 | 663 |
| 7 | 664 |
+----+----------+
PaperTags table and sample data
+----+----------+----------------------------------------+
| ID | PaperID | TagID |
+----+----------+----------------------------------------+
| 1 | 663 | 3 |
| 2 | 663 | 15 --Y |
| 3 | 663 | 17 |
| 4 | 663 | 18 --Y |
| 5 | 664 | 14 |
| 62 | 658 | 9 |
| 63 | 658 | 14 |
| 64 | 658 | 17 |
| 65 | 659 | 15 --Y |
| 66 | 659 | 17 |
| 67 | 659 | 18 --Y |
| 68 | 660 | 17 |
| 69 | 660 | 18 --N as it has only 18 and not 15 |
| 70 | 661 | 10 |
| 71 | 661 | 17 |
| 72 | 661 | 18 --N as it has only 18 and not 15 |
| 73 | 662 | 18 --N as it has only 18 and not 15 |
| 74 | 662 | 14 |
| 75 | 662 | 17 |
| 76 | 662 | 18 --N as it has only 18 and not 15 |
+----+----------+----------------------------------------+
Now my end user will pass one or more TagIDs for example 15 and 18 my goal is to find all the PaperIDs which have all of these TagIDs. In these example I need to return 663 and 659
I have tried below query but if there is any glitch in data then it doesn't work. For example PaperID 662 appears twice in the table with the same TagID so count(PaperID) = 2 turns out to be true and it will end up in my result.
select Count(PaperID), PaperID from PaperTags
group by TagID, PaperID
having TagID = 15 or TagID = 18
and count(PaperID) = 2
The other query that I tried is
select * from Papers
where Papers.PaperID
in
(
select PaperTags.PaperID from PaperTags
where (PaperTags.Tagid = 15 or PaperTags.Tagid = 18)
and PaperTags.PaperID = Papers.PaperID
)
I have gone thru below article but as I am using MSAccess I can't use this approach.
Select records from a table where all other records with same foreign key have a certain value
I think there has to be better way to filter. Any help is much appreciated.
Something like this:
SELECT G.ContentID
FROM (
SELECT PT.ContentID, PT.TagID
FROM PaperTags AS PT
WHERE PT.TagID IN (15, 18)
GROUP BY PT.ContentID, PT.TagID
) AS G
GROUP BY G.ContentId
HAVING Count(*) = 2

How to SUM from MySQL for every n record

I have a following result from query:
+---------------+------+------+------+------+------+------+------+-------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total |
+---------------+------+------+------+------+------+------+------+-------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 |
+---------------+------+------+------+------+------+------+------+-------+
I would like to insert a SUM before enter different order_main_id, it would be like this result:
+---------------+------+------+------+------+------+------+------+-------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total |
+---------------+------+------+------+------+------+------+------+-------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 |
| | 450 | 853 | 1107 | 1098 | 796 | 423 | 172 | 4899 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| | 50 | 70 | 70 | 70 | 40 | NULL | NULL | 300 |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 |
| | 107 | 144 | 144 | 70 | 35 | NULL | NULL | 500 |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 |
| | 21 | 45 | 51 | 41 | 21 | 3 | NULL | 182 |
+---------------+------+------+------+------+------+------+------+-------+
How to make this possible ?
You'll need to write a second Query which makes use of GROUP BY order_main_id.
Something like:
SELECT sum(S41+...) FROM yourTable GROUP BY orderMainId
K
You can actually do this in one query, but with a union all (really two queries, but the result sets are combined to make one awesome result set):
select
order_main_id,
S36,
S37,
S38,
S39,
S40,
S41,
S42,
S36 + S37 + S38 + S39 + S40 + S41 + S42 as total,
'Detail' as rowtype
from
tblA
union all
select
order_main_id,
sum(S36),
sum(S37),
sum(S38),
sum(S39),
sum(S40),
sum(S41),
sum(S42),
sum(S36 + S37 + S38 + S39 + S40 + S41 + S42),
'Summary' as rowtype
from
tblA
group by
order_main_id
order by
order_main_id, RowType
Remember that the order by affects the entirety of the union all, not just the last query. So, your resultset would look like this:
+---------------+------+------+------+------+------+------+------+-------+---------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total | rowtype |
+---------------+------+------+------+------+------+------+------+-------+---------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 | Detail |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 | Detail |
| 26 | 450 | 853 | 1107 | 1098 | 796 | 423 | 172 | 4899 | Summary |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 | Detail |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 | Detail |
| 35 | 21 | 45 | 51 | 41 | 21 | 3 | NULL | 182 | Summary |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 | Detail |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 | Detail |
| 38 | 50 | 70 | 70 | 70 | 40 | NULL | NULL | 300 | Summary |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 | Detail |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 | Detail |
| 39 | 107 | 144 | 144 | 70 | 35 | NULL | NULL | 500 | Summary |
+---------------+------+------+------+------+------+------+------+-------+---------+
This way, you know what is and what isn't a detail or summary row, and the order_main_id that it's for. You could always (and probably should) hide this column in your presentation layer.
For things like these I think you should use a reporting library(such as Crystal Reports), it'll save you a lot of trouble, check JasperReports and similar projects on osalt