Segregate dataset based on certain matching variables

Segregate dataset based on certain matching variables - sql

I have 2 datasets , one is base dateset and the other is subset of it , I want to create a dataset where the record is not present in the subset dataset but present in base dataset. So if combination of acct_num test_id trandate actual_amt is not present in the subset then it should come in the resultant dataset.
DATA base;
INPUT acct_num test_id tran_date:anydtdte. actual_amt final_amt final_amt_added ;
format tran_date date9.;
DATALINES;
55203610 2542 12-jan-20 30 45 45
16124130 8062 . 56 78 78
16124130 8062 14-dec-19 8 78 78
80479512 2062 19-mar-19 32 32 32
70321918 2062 20-dec-19 1 93 54
17312410 6712 . 45 90 90
17312410 6712 15-jun-18 0 90 90
74623123 2092 17-aug-18 34 87 87
24245321 2082 22-jan-17 22 56 67
;
run;
data subset;
input acct_num test_id tran_date:anydtdte. actual_amt final_amt final_amt_added ;
format tran_date date9.;
DATALINES;
55203610 2542 12-jan-20 30 45 45
16124130 8062 . 56 78 78
16124130 8062 14-dec-19 8 78 78
17312410 6712 . 45 90 90
74623123 2092 17-aug-18 34 87 87
24245321 2082 22-jan-17 22 56 67
;
run;
data that I want
80479512 2062 19-mar-19 32 32 32
70321918 2062 20-dec-19 1 93 54
17312410 6712 15-jun-18 0 90 90
I have tried using not in function in SQL but it does not match multiple variable in that statement.
Any help will be appreciated.

It is about how to solve minus set, see Except operator
proc sql noprint;
create table want as
select * from base
except
select * from subset
;
quit;

Make a list of all the observed values in subset, then simply merge the base file with the combinations found in subset and output the records that are in base only.
Note it is important to restrict subset_combinations to non-duplicates and only keep the sorting variables, else you may overwrite the values from subset.
proc sort data=base;
by acct_num test_id tran_date actual_amt;
proc sort data=subset out=subset_combinations (keep=acct_num test_id tran_date actual_amt) nodupkey;
by acct_num test_id tran_date actual_amt;
data want;
merge base (in=in1) subset_combinations (in=in2);
by acct_num test_id tran_date actual_amt;
if in1 & ^in2;
run;

Related

how to order student rank on the basis of obtain marks on different subject in sql

Here is our table
name math physics chemistry hindi english
pk 85 65 45 54 40
ashis 87 44 87 78 74
rohit 77 47 68 63 59
mayank 91 81 78 47 84
komal 47 51 73 61 55
we want to result show as (summing the grades essentially)
rank name total
1 mayank 381
2 ashis 370
3 rohit 314
4 pk 289
5 komal 287

SET #rank=0;
SELECT #rank:=#rank+1 AS rank,name,(math+physics+chemistry+hindi+english) as total
FROM tablename ORDER BY total DESC
this will produce your desired result as
rank | name | total
--------------------
1 | mayank | 381
2 | ashis | 370
for more details take a look mysql ranking results

Try this
SELECT #curRank := #curRank + 1 AS rank, name, (math + physics + chemistry + hindi + history) AS total FROM table, (SELECT #curRank := 0) r ORDER BY total DESC;
This will sum all the fields and sort them by descending order and add a rank.
By doing SELECT #curRank := 0 you can keep it all in one SQL statement without having to do a SET first.

Calculate SUM Column Wise using dinaymic query in SQL Server

id varname 1area 2area 3area 4area
------------------------------------
1 abc 345 3.7 34 87
1 pqr 46 67 78 55
1 lmn 67 99 33 44
2 xyz 78 78 33 32
I need to calculate SUM of column query.
Is it possible to get column count using while loop?

You probably want to do a SUM() of the Narea column group by your id column like
select id, sum(1area),
sum(2area), sum(3area), sum(4area)
from tbl1
group by id;

How do I merge by more than one variable using proc SQL is SAS

I have 2 datasets in SAS:
main_1
ID Rep Dose Response
1 2 34 567
1 1 45 756
2 1 35 456
3 1 56 345
main_2
ID Rep Hour Day
1 1 89 157
2 1 62 365
3 1 12 689
I can easily merge these 2 datasets first by ID and then by Rep (as one of the ID's has two observations) with the following code in SAS:
proc import out=main_1
datafile='/folders/myfolders/sasuser.v94/main_1.xls'
dbms=xls replace;
/*optional*/
sheet='Sheet1';
getnames=yes;
run;
proc import out=main_2
datafile='/folders/myfolders/sasuser.v94/main_2.xls'
dbms=xls replace;
/*optional*/
sheet='Sheet1';
getnames=yes;
run;
/*merge datasets based on common variable (ID then Rep)*/
/*first sort all datasets by target variables*/
proc sort data=main_1;
by ID Rep;
proc sort data=main_2;
by ID Rep;
run;
/*can now be merged*/
data main_merge;
merge main_1 main_2;
by ID Rep;
run;
this produces the following table:
ID Rep Dose Response Hour Day
1 1 45 756 89 157
1 2 34 567 . .
2 1 35 456 62 365
3 1 56 345 12 689
I currently have the following proc SQL alternative (I am learning so sorry of its terrible) but cannot seem to merge by more than 1 variable (i.e. ID and Rep):
proc sql;
create table merged_sql as
select L.*, R.*
from main_1 as L
LEFT JOIN main_2 as R
on L.ID = R.ID;
quit;
producing the following:
ID Rep Dose Response Hour Day
1 2 34 567 89 157
1 1 45 756 89 157
2 1 35 456 62 365
3 1 56 345 12 689
Any suggestion on a proc SQL code to achieve the same table as previously? My current code adds the '89 157' to both ID=1 observations.
Many thanks.

You're almost there...
proc sql;
create table merged_sql as
select L.*,
R.HOUR,
R.DAY
from main_1 as L
LEFT JOIN main_2 as R
on L.ID = R.ID
and L.REP = R.REP;
quit;
The reason not to use R.* is to avoid a note or warning about having duplicate ID and REP fields.

Diabetes: Prediction using discrimination on PCA

I have a table:
col1 col2
2 20
2.5 25
2.67 30
2.99 40
I'm looking to get
varone = 2 x col2, vartwo= 2.5 x col2, varthree= 2.67 x col3, varfour=2.99 x col2
i.e. extracting a specific value from a table
and then multiplying an entire column by that value (scalar x vector).
I tried transposing col1
col1a col1b col1c col1d col2
2 2.5 2.67 2.99 20
25
30
40
and then tried multiplying col1a x col2, but it didn't seem to work.

In SAS, you can just use proc sql:
proc sql;
select 2*col2 as varone, 2.5*col2 as vartwo, 2.67*col3 as varthree, 2.99*col2 as varfour
from atable;

Assuming you're using SAS and either PROC FACTOR or PROC PRINCOMP then you can use PROC SCORE.
Example straight from the documentation:
http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_score_sect017.htm
/* This data set contains only the first 12 observations */
/* from the full data set used in the chapter on PROC REG. */
data Fitness;
input Age Weight Oxygen RunTime RestPulse RunPulse ##;
datalines;
44 89.47 44.609 11.37 62 178 40 75.07 45.313 10.07 62 185
44 85.84 54.297 8.65 45 156 42 68.15 59.571 8.17 40 166
38 89.02 49.874 9.22 55 178 47 77.45 44.811 11.63 58 176
40 75.98 45.681 11.95 70 176 43 81.19 49.091 10.85 64 162
44 81.42 39.442 13.08 63 174 38 81.87 60.055 8.63 48 170
44 73.03 50.541 10.13 45 168 45 87.66 37.388 14.03 56 186
;
proc factor data=Fitness outstat=FactOut
method=prin rotate=varimax score;
var Age Weight RunTime RunPulse RestPulse;
title 'Factor Scoring Example';
run;
proc print data=FactOut;
title2 'Data Set from PROC FACTOR';
run;
proc score data=Fitness score=FactOut out=FScore;
var Age Weight RunTime RunPulse RestPulse;
run;
proc print data=FScore;
title2 'Data Set from PROC SCORE';
run;

You can make use of array to achieve this.
Below program is dynamic. It will work for any number of observations.
****data we have****;
data have;
input col1 col2;
datalines;
2 20
2.5 25
2.67 30
2.99 40
;
run;
****Taking Count****;
****Creating macro "value" to store col1 data****;
proc sql ;
select count(*) into :cnt_rec from have;
select col1 into :value1 - :value&SysMaxLong from have;
quit;
data want(drop=i);
set have;
array NewColumn(&cnt_rec);
****processing the array and multiplying col2 data****;
do i = 1 to &cnt_rec;
NewColumn[i] = symget('value'||left(i)) * col2;
end;
run;

SQL SERVER: data insert

Sorry guys! I just dont know what to do with this task.
The data is the same as in this question here, but what to do in case when I insert new data in big_table.bench_id and i want this data be visible also in BATCH_ID table?
I have tried to bound them with keys, but big_table already has main key so dont know what to do. Please any advice will be appreciated.
Big_table.bench_id:
**bench_id**
31
51
51
61
61
61
71
71
I have created another BATCH_ID table with two columns:
**distinct_bench** **number**
-----------------------------
31 1
51 2
61 3
71 2
So for example, if i will add new code to the big_table.bench_id like '111':
**bench_id**
31
51
51
61
61
61
71
71
111
so it will also appears in another table:
**distinct_bench** **number**
-----------------------------
31 1
51 2
61 3
71 2
111 1

Do you really need another table? You can create a view to achieve that.
create table xxTemp (bench_id int) ;
insert into xxTemp (bench_id)
values (31)
,(51)
,(51)
,(61)
,(61)
,(61)
,(71)
,(71) ;
create view xxTempCount as
Select bench_id
, COUNT(1) number
From xxTemp
Group By bench_id ;
select *
from xxTempCount ;
insert into xxTemp (bench_id)
values (111) ;
select *
from xxTempCount ;
Elmer

Instead of creating a table for that purpose, you can create a view that will return you the desired information. For example, try following:
CREATE VIEW vwBigTable
AS
SELECT bench_id AS [**distinct_bench**], COUNT(*) AS [**number**]
FROM big_table
GROUP BY bench_id
And then:
SELECT * FROM vwBigTable

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Segregate dataset based on certain matching variables - sql

It is about how to solve minus set, see Except operator proc sql noprint; create table want as select * from base except select * from subset ; quit;

Related

how to order student rank on the basis of obtain marks on different subject in sql

Calculate SUM Column Wise using dinaymic query in SQL Server

How do I merge by more than one variable using proc SQL is SAS

Diabetes: Prediction using discrimination on PCA

SQL SERVER: data insert

Categories

Resources