SQL - Calculating cell-value based on other cell-values - sql

So I have run into a problem when working on some SQL coding.
I have a data table that looks somewhat like this:
ID TimeID IndicatorID Score
1 111 45 20
1 111 46 14
1 111 47 83
1 111 48 91
1 112 45 20
1 112 46 14
1 112 47 83
1 112 48 91
2 111 45 25
2 111 46 12
2 111 47 70
2 111 48 82
2 112 45 25
2 112 46 12
2 112 47 70
2 112 48 82
I want to add new rows containing values for indicator 240 and 241 where the score for indicator 240 is the score for indicator 45 / score for indicator 46 and similarly the score for indicator 241 is the score for indicator 47/ score for indicator 48. This has to be done for each TimeID for each ID.
The full table is huge as the number of IDs, TimeIDs for each ID, and IndicatorIDs for each TimeID is large.

Assuming your requirements are as stated, and all the IndicatorID values are hard-coded this can be done with some simply sub-queries and a straightforward INSERT statement:
insert into your_table
with yt as (
select * from your_table where IndicatorID in (45,46,47,48)
)
, yt45 as (select * from yt where IndicatorID = 45 )
, yt46 as (select * from yt where IndicatorID = 46 )
, yt47 as (select * from yt where IndicatorID = 47 )
, yt48 as (select * from yt where IndicatorID = 48 )
select yt45.id
, yt45.timeID
, 240 as IndicatorID
, yt45.score/yt46.score as score
from yt45
join yt46
on yt45.id = yt46.id
and yt45.timeID = yt46.timeID
union all
select yt47.id
, yt47.timeID
, 240 as IndicatorID
, yt47.score/yt48.score as score
from yt47
join yt48
on yt47.id = yt48.id
and yt47.timeID = yt48.timeID
/

This can be easily solved using MODEL clause.
SQL Fiddle
select id, timeid, indicatorid, score
from myt
model return updated rows
partition by (id, timeid)
dimension by (indicatorid)
measures(score)
rules(
score[240] = score[45]/score[46],
score[241] = score[47]/score[48]
);
Results:
| ID | TIMEID | INDICATORID | SCORE |
|----|--------|-------------|--------------------|
| 2 | 111 | 241 | 0.8536585365853658 |
| 2 | 111 | 240 | 2.0833333333333335 |
| 1 | 112 | 241 | 0.9120879120879121 |
| 1 | 112 | 240 | 1.4285714285714286 |
| 2 | 112 | 241 | 0.8536585365853658 |
| 2 | 112 | 240 | 2.0833333333333335 |
| 1 | 111 | 241 | 0.9120879120879121 |
| 1 | 111 | 240 | 1.4285714285714286 |
insert into myt
select id, timeid, indicatorid, score
from myt
model return updated rows
partition by (id, timeid)
dimension by (indicatorid)
measures(score)
rules(
score[240] = score[45]/score[46],
score[241] = score[47]/score[48]
);
Results:
select id, timeid, indicatorid, score
from myt
Results:
| ID | TIMEID | INDICATORID | SCORE |
|----|--------|-------------|--------------------|
| 1 | 111 | 45 | 20 |
| 1 | 111 | 46 | 14 |
| 1 | 111 | 47 | 83 |
| 1 | 111 | 48 | 91 |
| 1 | 111 | 240 | 1.4285714285714286 |
| 1 | 111 | 241 | 0.9120879120879121 |
| 1 | 112 | 45 | 20 |
| 1 | 112 | 46 | 14 |
| 1 | 112 | 47 | 83 |
| 1 | 112 | 48 | 91 |
| 1 | 112 | 240 | 1.4285714285714286 |
| 1 | 112 | 241 | 0.9120879120879121 |
| 2 | 111 | 45 | 25 |
| 2 | 111 | 46 | 12 |
| 2 | 111 | 47 | 70 |
| 2 | 111 | 48 | 82 |
| 2 | 111 | 240 | 2.0833333333333335 |
| 2 | 111 | 241 | 0.8536585365853658 |
| 2 | 112 | 45 | 25 |
| 2 | 112 | 46 | 12 |
| 2 | 112 | 47 | 70 |
| 2 | 112 | 48 | 82 |
| 2 | 112 | 240 | 2.0833333333333335 |
| 2 | 112 | 241 | 0.8536585365853658 |

Related

How to reshape a table having multiple records for the same id into a table with one record per id without losing information?

Basically, I want to transform this(Initial) into this(Final). In other words, I want to
"squash" the initial table so that it will have only one record per id
"dilate" the initial table so that I won't lose any information: create a different column for every possible combination of source and column from the initial table (create c1_A, c1_B, ...).
I can work with the initial table as a csv in Python (maybe Pandas) and manually hardcode the mapping between the Initial and the Final table. However, I don't find this solution elegant at all and I'm much more interested in a sql / sas solution. Is there any way of doing that?
Edit: I what to change
+----+--------+------+-----+------+
| ID | source | c1 | c2 | c3 |
+----+--------+------+-----+------+
| 1 | A | 432 | 56 | 1 |
| 1 | B | 53 | 3 | 73 |
| 1 | C | 7 | 342 | 83 |
| 1 | D | 543 | 43 | 73 |
| 2 | A | 8 | 882 | 39 |
| 2 | B | 5 | 54 | 46 |
| 2 | C | 8 | 3 | 2226 |
| 2 | D | 87 | 2 | 45 |
| 3 | A | 93 | 143 | 45 |
| 3 | B | 1023 | 72 | 8 |
| 3 | C | 3 | 3 | 704 |
| 4 | A | 2 | 5 | 0 |
| 4 | B | 78 | 888 | 2 |
| 4 | C | 87 | 23 | 34 |
| 4 | D | 112 | 7 | 712 |
+----+--------+------+-----+------+
into
+----+------+------+------+------+------+------+------+------+------+------+------+------+
| ID | c1_A | c1_B | c1_C | c1_D | c2_A | c2_B | c2_C | c2_D | c3_A | c3_B | c3_C | c3_D |
+----+------+------+------+------+------+------+------+------+------+------+------+------+
| 1 | 432 | 53 | 7 | 543 | 56 | 3 | 342 | 43 | 1 | 73 | 83 | 73 |
| 2 | 8 | 5 | 8 | 87 | 882 | 54 | 3 | 2 | 39 | 46 | 2226 | 45 |
| 3 | 93 | 1023 | 3 | | 143 | 72 | 3 | | 45 | 8 | 704 | |
| 4 | 2 | 78 | 87 | 112 | 5 | 888 | 23 | 7 | 0 | 2 | 34 | 712 |
+----+------+------+------+------+------+------+------+------+------+------+------+------+
Abandon hope ... ?
data want;
input
ID source $ c1 c2 c3;datalines;
1 A 432 56 1
1 B 53 3 73
1 C 7 342 83
1 D 543 43 73
2 A 8 882 39
2 B 5 54 46
2 C 8 3 2226
2 D 87 2 45
3 A 93 143 45
3 B 1023 72 8
3 C 3 3 704
4 A 2 5 0
4 B 78 888 2
4 C 87 23 34
4 D 112 7 712
;
* one to grow you oh data;
proc transpose data=want out=stage1;
by id source;
var c1-c3;
run;
* and one to shrink;
proc transpose data=stage1 out=want(drop=_name_) delim=_;
by id;
id _name_ source;
run;

Compare current row with all previous rows and a condition to consider

I am trying to compare my school_start date for same student with previous school_start_date. If school_start date are different for a given student it should check the school_end_date. If the school_end_date (from previous rows) is same for the school_start_date of the student we are currently checking.
Then, I need to consider that first type_sub with respect to school_end_date or we can directly type_sub w.r.t to school_start_date (if school_start_date and current school_start_date are same)
I have tried to work using PARTITION BY clause but I am stuck.
Table
student_subscriber | school_start | school_end | typeofsub
--------------------|--------------|------------|-----------
22 | 12/7/2016 | 12/8/2016 | 111
22 | 12/7/2016 | 12/12/2016 | 112
22 | 12/7/2016 | 12/10/2016 | 112
22 | 12/7/2016 | 12/20/2016 | 112
22 | 12/8/2016 | 12/10/2016 | 112
23 | 12/12/2016 | 12/13/2016 | 111
23 | 12/12/2016 | 12/12/2016 | 112
23 | 12/12/2016 | 12/14/2016 | 112
23 | 12/12/2016 | 12/20/2016 | 112
23 | 12/13/2016 | 12/20/2016 | 112
Table_Output
student_subscriber | school_start | school_end | typeofsub | First_Typeofsub
--------------------|--------------|------------|-----------|-----------------
22 | 12/7/2016 | 12/8/2016 | 111 | 111
22 | 12/7/2016 | 12/12/2016 | 112 | 111
22 | 12/7/2016 | 12/10/2016 | 112 | 111
22 | 12/7/2016 | 12/20/2016 | 112 | 111
22 | 12/8/2016 | 12/10/2016 | 112 | 111
23 | 12/12/2016 | 12/13/2016 | 112 | 112
23 | 12/12/2016 | 12/12/2016 | 113 | 112
23 | 12/12/2016 | 12/14/2016 | 113 | 112
23 | 12/12/2016 | 12/20/2016 | 113 | 112
23 | 12/13/2016 | 12/20/2016 | 113 | 112
I tried like this, but I guess I am missing something in second min clause.
SELECT
student_id,
type_of_sub,
school_start_dt,
school_end_dt,
case
when type_of_sub != min(type_of_sub) OVER (
PARTITION BY student_id, school_start_dt
ORDER by type_of_sub
ROWS BETWEEN 1 preceding and 1 preceding
)
then 1
else 0
end as TOB_DIFFERENT,
min(type_of_sub) OVER (
PARTITION BY student_id, school_start_dt
ROWS between unbounded preceding and unbounded following
) as FIRST_TOB
FROM students
If you want the first value, use first_value():
select s.*,
first_value(typeofsub) over (partition by student_subscriber order by school_start, school_end) as first_typeofsub
from students s;
Here is a db<>fiddle.

CTE - recursive query doing too much

I have the current table of data...
| LoanRollupID | NewLoanID | PreviousLoanID |
|--------------|-----------|----------------|
| 11 | 76 | 44 |
| 12 | 80 | 75 |
| 13 | 83 | 82 |
| 14 | 84 | 83 |
| 15 | 86 | 85 |
| 16 | 87 | 54 |
| 17 | 88 | 87 |
| 18 | 90 | 48 |
| 19 | 91 | 34 |
| 20 | 93 | 41 |
| 21 | 94 | 76 |
| 22 | 95 | 90 |
| 23 | 96 | 94 |
| 24 | 100 | 92 |
| 25 | 101 | 99 |
| 26 | 102 | 98 |
| 27 | 103 | 101 |
| 28 | 104 | 81 |
| 29 | 105 | 80 |
| 30 | 107 | 52 |
| 31 | 110 | 108 |
| 1029 | 1105 | 103 |
| 1030 | 1106 | 104 |
| 1031 | 1108 | 1106 |
| 1032 | 1109 | 73 |
I'm trying to jump in at NewLoanID 1108 and see how it has evolved from previous Loans. e.g 1108 came from 1106, which came from 104, which came from 81, etc.
When I run this query:
WITH OldLoans (PreviousLoanID, NewLoanID, start)
AS
(
---- Anchor member definition
SELECT l.NewLoanID, l.PreviousLoanID, 0 as start
FROM dscs_public.LoanRollup l
Where NewLoanID = 1108
UNION ALL
-- Recursive member definition
SELECT l.NewLoanID, l.PreviousLoanID, start + 1
FROM dscs_public.LoanRollup l
INNER JOIN OldLoans AS o
ON o.NewLoanID = l.PreviousLoanID
)
---- Statement that executes the CTE
SELECT PreviousLoanID, NewLoanID, start
FROM OldLoans
It fails with this error:
The statement terminated. The maximum recursion 100 has been exhausted
before statement completion.
Can anyone spot my mistake please?
Thanks.
The aliases in the CTE definition are in the wrong order:
-- Instead of (PreviousLoanID, NewLoanID, start)
WITH OldLoans (NewLoanID, PreviousLoanID, start)
AS
(
---- Anchor member definition
SELECT l.NewLoanID, l.PreviousLoanID, 0 as start
FROM mytable l --LoanRollup l
Where NewLoanID = 1108
UNION ALL
-- Recursive member definition
SELECT l.NewLoanID, l.PreviousLoanID, start + 1
FROM mytable l --dscs_public.LoanRollup l
INNER JOIN OldLoans AS o
-- Instead of o.NewLoanID = l.PreviousLoanID
ON l.NewLoanID = o.PreviousLoanID
)
---- Statement that executes the CTE
SELECT PreviousLoanID, NewLoanID, start
FROM OldLoans
The same thing holds for the ON clause in the recursive member definition.

delete duplicate rows from my table

i need to delete all duplicate rows in my table - but leave only one row
MyTbl
====
Code | ID | Place | Qty | User
========================================
1 | 22 | 44 | 34 | 333
2 | 22 | 44 | 34 | 333
3 | 22 | 55 | 34 | 333
4 | 22 | 44 | 34 | 666
5 | 33 | 77 | 12 | 999
6 | 44 | 11 | 87 | 333
7 | 33 | 77 | 12 | 999
i need to see this:
Code | ID | Place | Qty | User
=======================================
1 | 22 | 44 | 34 | 333
3 | 22 | 55 | 34 | 333
4 | 22 | 44 | 34 | 666
5 | 33 | 77 | 12 | 999
6 | 44 | 11 | 87 | 333
In most databases, the fastest way to do this is:
select distinct t.*
into saved
from mytbl;
delete from mytbl;
insert into mytbl
select *
from saved;
The above syntax should work in Access. Other databases would use truncate table instead of delete.
Try this,
WITH CTEMyTbl (A,duplicateRecCount)
AS
(
SELECT id,ROW_NUMBER() OVER(PARTITION by id,place,qty,us ORDER BY id)
AS duplicateRecCount
FROM MyTbl
)
DELETE FROM CTEMyTbl
WHERE duplicateRecCount > 1

How to SUM from MySQL for every n record

I have a following result from query:
+---------------+------+------+------+------+------+------+------+-------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total |
+---------------+------+------+------+------+------+------+------+-------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 |
+---------------+------+------+------+------+------+------+------+-------+
I would like to insert a SUM before enter different order_main_id, it would be like this result:
+---------------+------+------+------+------+------+------+------+-------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total |
+---------------+------+------+------+------+------+------+------+-------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 |
| | 450 | 853 | 1107 | 1098 | 796 | 423 | 172 | 4899 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| | 50 | 70 | 70 | 70 | 40 | NULL | NULL | 300 |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 |
| | 107 | 144 | 144 | 70 | 35 | NULL | NULL | 500 |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 |
| | 21 | 45 | 51 | 41 | 21 | 3 | NULL | 182 |
+---------------+------+------+------+------+------+------+------+-------+
How to make this possible ?
You'll need to write a second Query which makes use of GROUP BY order_main_id.
Something like:
SELECT sum(S41+...) FROM yourTable GROUP BY orderMainId
K
You can actually do this in one query, but with a union all (really two queries, but the result sets are combined to make one awesome result set):
select
order_main_id,
S36,
S37,
S38,
S39,
S40,
S41,
S42,
S36 + S37 + S38 + S39 + S40 + S41 + S42 as total,
'Detail' as rowtype
from
tblA
union all
select
order_main_id,
sum(S36),
sum(S37),
sum(S38),
sum(S39),
sum(S40),
sum(S41),
sum(S42),
sum(S36 + S37 + S38 + S39 + S40 + S41 + S42),
'Summary' as rowtype
from
tblA
group by
order_main_id
order by
order_main_id, RowType
Remember that the order by affects the entirety of the union all, not just the last query. So, your resultset would look like this:
+---------------+------+------+------+------+------+------+------+-------+---------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total | rowtype |
+---------------+------+------+------+------+------+------+------+-------+---------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 | Detail |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 | Detail |
| 26 | 450 | 853 | 1107 | 1098 | 796 | 423 | 172 | 4899 | Summary |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 | Detail |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 | Detail |
| 35 | 21 | 45 | 51 | 41 | 21 | 3 | NULL | 182 | Summary |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 | Detail |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 | Detail |
| 38 | 50 | 70 | 70 | 70 | 40 | NULL | NULL | 300 | Summary |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 | Detail |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 | Detail |
| 39 | 107 | 144 | 144 | 70 | 35 | NULL | NULL | 500 | Summary |
+---------------+------+------+------+------+------+------+------+-------+---------+
This way, you know what is and what isn't a detail or summary row, and the order_main_id that it's for. You could always (and probably should) hide this column in your presentation layer.
For things like these I think you should use a reporting library(such as Crystal Reports), it'll save you a lot of trouble, check JasperReports and similar projects on osalt