I am working on a report for which I need an SQL query which changes the following:
DATABASE SCHEMA Table Name Rows
A 1 X 12
B 1 X 32
A 2 X 10
B 2 X 22
A 3 Y 14
B 3 Y 21
A 4 Z 33
B 4 Z 33
to something like this:
SCHEMA TABLE A - Rows B - Rows
1 X 12 32
2 X 10 22
3 Y 14 21
4 Z 33 33
There are multiple entries for the same table in both the databases which is why I'm not able to figure it out. Can someone help me out with this.
Your sample data suggests conditional aggregation :
select SCHEMA, table_name,
sum(case when db_name = 'A' then rows else 0 end) as A_row,
sum(case when db_name = 'B' then rows else 0 end) as B_row,
. . .
from table t
group by SCHEMA, table_name;
Related
I have a postgres table that looks like this:
A B
5 4
10 10
13 15
100 250
20 Null
Using SQL, I would like to check whether the value in column A is larger than the value in column B and if so, then add a 1 to the column True. If the value in column A is smaller or equal to the value in column B or if column B contains a [NULL] value, I would like to add a 1 to the column False, like so:
A B True False
5 4 1 0
10 10 0 1
13 15 0 1
100 25 1 0
20 [NULL] 0 1
What is the best way to achieve this?
You can use case logic:
select t.*,
(case when A > B then 1 else 0 end) as true_col,
(case when A > B then 0 else 1 end) as false_col
from t;
This post follow this one: SAS sum observations not in a group, by group
Where my minimal example was a bit too minimal sadly,I wasn't able to use it on my data.
Here is a complete case example, what I have is :
data have;
input group1 group2 group3 $ value;
datalines;
1 A X 2
1 A X 4
1 A Y 1
1 A Y 3
1 B Z 2
1 B Z 1
1 C Y 1
1 C Y 6
1 C Z 7
2 A Z 3
2 A Z 9
2 A Y 2
2 B X 8
2 B X 5
2 B X 5
2 B Z 7
2 C Y 2
2 C X 1
;
run;
For each group, I want a new variable "sum" with the sum of all values in the column for the same sub groups (group1 and group2), exept for the group (group3) the observation is in.
data want;
input group1 group2 group3 $ value $ sum;
datalines;
1 A X 2 8
1 A X 4 6
1 A Y 1 9
1 A Y 3 7
1 B Z 2 1
1 B Z 1 2
1 C Y 1 13
1 C Y 6 8
1 C Z 7 7
2 A Z 3 11
2 A Z 9 5
2 A Y 2 12
2 B X 8 17
2 B X 5 20
2 B X 5 20
2 B Z 7 18
2 C Y 2 1
2 C X 1 2
;
run;
My goal is to use either datasteps or proc sql (doing it on around 30 millions observations and proc means and such in SAS seems slower than those on previous similar computations).
My issue with solutions provided in the linked post is that is uses the total value of the column and I don't know how to change this by using the total in the sub group.
Any idea please?
A SQL solution will join all data to an aggregating select:
proc sql;
create table want as
select have.group1, have.group2, have.group3, have.value
, aggregate.sum - value as sum
from
have
join
(select group1, group2, sum(value) as sum
from have
group by group1, group2
) aggregate
on
aggregate.group1 = have.group1
& aggregate.group2 = have.group2
;
SQL can be slower than hash solution, but SQL code is understood by more people than those that understand SAS DATA Step involving hashes ( which can be faster the SQL. )
data want2;
if 0 then set have; * prep pdv;
declare hash sums (suminc:'value');
sums.defineKey('group1', 'group2');
sums.defineDone();
do while (not hash_loaded);
set have end=hash_loaded;
sums.ref(); * adds value to internal sum of hash data record;
end;
do while (not last_have);
set have end=last_have;
sums.sum(sum:sum); * retrieve group sum.;
sum = sum - value; * subtract from group sum;
output;
end;
stop;
run;
SAS documentation touches on SUMINC and has some examples
The question does not address this concept:
For each row compute the tier 2 sum that excludes the tier 3 this row is in
A hash based solution would require tracking each two level and three level sums:
data want2;
if 0 then set have; * prep pdv;
declare hash T2 (suminc:'value'); * hash for two (T)iers;
T2.defineKey('group1', 'group2'); * one hash record per combination of group1, group2;
T2.defineDone();
declare hash T3 (suminc:'value'); * hash for three (T)iers;
T3.defineKey('group1', 'group2', 'group3'); * one hash record per combination of group1, group2, group3;
T3.defineDone();
do while (not hash_loaded);
set have end=hash_loaded;
T2.ref(); * adds value to internal sum of hash data record;
T3.ref();
end;
T2_cardinality = T2.num_items;
T3_cardinality = T3.num_items;
put 'NOTE: |T2| = ' T2_cardinality;
put 'NOTE: |T3| = ' T3_cardinality;
do while (not last_have);
set have end=last_have;
T2.sum(sum:t2_sum);
T3.sum(sum:t3_sum);
sum = t2_sum - t3_sum;
output;
end;
stop;
drop t2_: t3:;
run;
I have a data set :
data have;
input group $ value;
datalines;
A 4
A 3
A 2
A 1
B 1
C 1
D 2
D 1
E 1
F 1
G 2
G 1
H 1
;
run;
The first variable is a group identifier, the second a value.
For each group, I want a new variable "sum" with the sum of all values in the column, exept for the group the observation is in.
My issue is having to do that on nearly 30 millions of observations, so efficiency matters.
I found that using data step was more efficient than using procs.
The final database should looks like :
data want;
input group $ value $ sum;
datalines;
A 4 11
A 3 11
A 2 11
A 1 11
B 1 20
C 1 20
D 2 18
D 1 18
E 1 20
F 1 20
G 2 18
G 1 20
H 1 20
;
run;
Any idea how to perform this please?
Edit: I don't know if this matter but the example I gave is a simplified version of my issue. In the real case, I have 2 other group variable, thus taking the sum of the whole column and substract the sum in the group is not a viable solution.
The requirement
sum of all values in the column, except for the group the observation is in
indicates two passes of the data must occur:
Compute the all_sum and each group's group_sumA hash can store each group's sum -- computed via a specified suminc: variable and .ref() method invocation. A variable can accumulate allsum.
Compute allsum - group_sum for each row of a group.The group_sum is retrieved from hash and subtracted from allsum.
Example:
data want;
if 0 then set have; * prep pdv;
declare hash sums (suminc:'value');
sums.defineKey('group');
sums.defineDone();
do while (not hash_loaded);
set have end=hash_loaded;
sums.ref(); * adds value to internal sum of hash data record;
allsum + value;
end;
do while (not last_have);
set have end=last_have;
sums.sum(sum:sum); * retrieve groups sum. Do you hear the Dragnet theme too?;
sum = allsum - sum; * subtract from allsum;
output;
end;
stop;
run;
What is wrong with a straight forward approach? You need to make two passes no matter what you do.
Like this. I included extra variables so you can see how the values are derived.
proc sql ;
create table want as
select a.*,b.grand,sum(value) as total, b.grand - sum(value) as sum
from have a
, (select sum(value) as grand from have) b
group by a.group
;
quit;
Results:
Obs group value grand total sum
1 A 3 21 10 11
2 A 1 21 10 11
3 A 2 21 10 11
4 A 4 21 10 11
5 B 1 21 1 20
6 C 1 21 1 20
7 D 2 21 3 18
8 D 1 21 3 18
9 E 1 21 1 20
10 F 1 21 1 20
11 G 1 21 3 18
12 G 2 21 3 18
13 H 1 21 1 20
Note it does not matter what you have as your GROUP BY clause.
Do you really need to output all of the original observations? Why not just output the summary table?
proc sql ;
create table want as
select a.group, b.grand - sum(value) as sum
from have a
, (select sum(value) as grand from have) b
group by a.group
;
quit;
Results
Obs group total sum
1 A 10 11
2 B 1 20
3 C 1 20
4 D 3 18
5 E 1 20
6 F 1 20
7 G 3 18
8 H 1 20
I would break this out into two different segments:
1.) You could start by using PROC SQL to get the sums by the group
2.) Then use some IF/THEN statements to reassign the values by group
Given below is the snapshot of my data
NameAgeIncome Group
Asd 20 A
Asd 20 A
b 19 E
c 21 B
c 21 B
c 21 B
df 21 C
rd 24 D
I want ot include a flag variable where it says 1 to one of the duplicate row and 0 to another. And also 0 to rest of the rows which are not duplicate. Given below is the snapshot of final desired output
NameAgeIncome Group Flag
Asd 20 A 1
Asd 20 A 0
b 19 E 0
c 21 B 1
c 21 B 1
c 21 B 0
df 21 C 0
rd 24 D 0
Can anyone help me how to create this Flag variable in Oracle database
You can do this using analytic functions and case:
select t.*,
(case when row_number() over (partition by name, age, income order by name) = 1
then 0 else 1
end) as GroupFlag
from table t;
RSC by comparing values from tab.col1 , tab.col2 ,tab.col3 ,tab.col4 to Result.INTR
Tab table has 1000s of rows
If any of the col1 to 4 has NULL then return 1
Col1 will hold values pertaining to RID = 10
Col2 will hold values pertaining to RID = 20
Col3 will hold values pertaining to RID = 30
Col4 will hold values pertaining to RID = 40
For eg:
if tab.col1 is 3 then 4
if tab.col2 is 'R' then 3
if tab.col3 is 1900 then it query should give 4
if 1945 then 3
if 1937 then 3 (lower bound is less than and upper bound is greater than equal to)
if tab.col4 is 6 then 5
and so on.....
Result table
RID INTR RSC
----- ----- -----
10 1 0
10 2 1
10 3 4
10 4 2
20 I 4
20 R 3
20 U 1
30 1900 5
30 1900-1937 4
30 1937-1967 3
30 1967 3
40 3-4 2
40 1-3 1
40 4 5
Check CASE and DECODE functions of Oracle.Google and check some examples.You will be able to implement your requirement with them.
For example check this one http://www.club-oracle.com/forums/case-and-decode-two-powerfull-constructs-of-sql-t181/
You would do this like:
select (case when tab.col1 = 3 then 4
when tab.col2 = 'R' then 3
when tab.col3 = 1900 then 4
when tab.col3 in (1945, 1937) then 3
when tab.col4 = 6 then 5
. . .
Try something like:
select t1.RSC, t2.RSC, t3.RSC, t4.RSC
from your_table tab join Result t1 on t1.RID=10 and t1.INTR=tab.col1
join Result t2 on t2.RID=20 and t2.INTR=tab.col2
join Result t3 on t3.RID=30 and t3.INTR >= regexp_substr(tab.col3, '^\d*') and t3.INTR <= regexp_substr(tab.col3, '\d*$')
join Result t4 on t4.RID=40 and t4.INTR >= regexp_substr(tab.col4, '^\d*') and t4.INTR <= regexp_substr(tab.col4, '\d*$')