SAS Transpose and summarize - sql

I'm working on following scenario in SAS.
Input 1
AccountNumber Loans
123 abc, def, ghi
456 jkl, mnopqr, stuv
789 w, xyz
Output 1
AccountNumbers Loans
123 abc
123 def
123 ghi
456 jkl
456 mnopqr
456 stuv
789 w
789 xyz
Input 2
AccountNumbers Loans
123 15-abc
123 15-def
123 15-ghi
456 99-jkl
456 99-mnopqr
456 99-stuv
789 77-w
789 77-xyz
Output 2
AccountNumber Loans
123 15-abc, 15-def, 15-ghi
456 99-jkl, 99-mnopqr, 99-stuv
789 77-w, 77-xyz
I manged to get Input 2 from output 1, just need Output 2 now.
I will really appreciate the help.
Thanks!

Try this, replacing [Input 2] with the actual name of your Input 2 table.
data output2 (drop=loans);
do until (last.accountnumbers);
set [Input 2];
by accountnumbers;
length loans_combined $100;
loans_combined=catx(', ',loans_combined,loans);
end;
run;

Related

How find duplicates of the same id but different date with SQL (Oracle)

I've got a datatable like that :
id
line
datedt
123
1
01/01/2021
123
2
01/01/2021
123
3
01/01/2021
777
1
13/04/2020
777
2
13/04/2020
123
1
12/04/2021
123
2
12/04/2021
888
1
01/07/2020
888
2
01/07/2020
452
1
05/01/2020
888
1
02/05/2021
888
2
02/05/2021
I'd like to obtain a result like that, ie : the nb of same id with differents dates.
Example we can find 123 with 2 diffents dates, 888 too, but only date for 777 and 452
id
nb
123
2
777
1
888
2
452
1
How could I get that ?
Hope it's clear :)
Thanks you
select id , count(distinct datedt) as nb
from table
group by id

How to count the distinct values across a column in pandas

i have a dataframe like:
Company Date Country
ABC 2017-09-17 USA
BCD 2017-09-16 USA
ABC 2017-09-17 USA
BCD 2017-09-16 USA
BCD 2017-09-16 USA
ABC 2017-09-19 USA
I want to get a resultant df as :
Company No: of distinct Days
ABC 2
BCD 1
How do i do it ?
This should work:
df[['Company', 'Date']].drop_duplicates()['Company'].value_counts()
You can use nunique method of the groupby objects:
df.groupby('Company')['Date'].nunique()
Out:
Company
ABC 2
BCD 1
Name: Date, dtype: int64

Pig script to tranpose on basis of certain criteria

I have a file containing data in following format:
abc 123 456
cde 45 32
efg 322 654
abc 445 856
cde 65 21
efg 147 384
abc 815 078
efg 843 286
and so on.
How can transpose it into following format using pig:
abc 123 456 cde 45 32 efg 322 654
abc 445 856 cde 65 21 efg 147 348
abc 815 078 efg 843 286
Also, in case cde is missing after abc, it should insert blank spaces instead, since it is a fixed width file.
I tried grouping but it ain't worked for me.
Well, you can do it by writing custom loader. The easiest attempt is to extend PigStorage and override getNext() method making it call record reader three times, instead of 1 and produce unioned Tuple.

AWK Remove lines that have the same value in fields 1,...,n but differ in the value of the n+1 field except if they are unique

I have file1.tsv which looks like this:
1 ABC 10 XYZ Null Null
1 ABC 10 XYZ 1000 FFGG
1 ABC 10 XYZ 1001 FFHH
2 DEF 11 UVW Null Null
3 GHI 30 RST Null Null
3 GHI 30 RST 1002 JJKK
3 GHI 30 RST 1003 JJLL
I would like awk to print to file2.tsv the output:
1 ABC 10 XYZ 1000 FFGG
1 ABC 10 XYZ 1001 FFHH
2 DEF 11 UVW Null Null
3 GHI 30 RST 1002 JJKK
3 GHI 30 RST 1003 JJLL
That is, removing (not printing) line 1 and line 5 because they are not unique consdering ONLY the values of fields $1-$4 and because $5="Null"and $6="Null"
Thanks in advance.
Don't print lines where ABC or GHI matches Null.
awk '/ABC|GHI/~!/Null/' file
1 ABC 10 XYZ 1000 FFGG
1 ABC 10 XYZ 1001 FFHH
2 DEF 11 UVW Null Null
3 GHI 30 RST 1002 JJKK
3 GHI 30 RST 1003 JJLL

SQL - How to return entire row sets where some rows match a given list

Let's say there is a table of medical records. Each visit has a unique ID but is made up of several rows corresponding to various codes/services rendered for the visit.
For example, there could be 3 rows with claimID "John" for each unique procedure code "123", "456", and "789"; 15 rows for "Jane" with codes; 6 rows for "David"...
ID Code
John 123
John 456
John 789
Jane 123
Jane 456
Jane 789
Jane 321
Jane 654
David 123
David 456
David 789
David 987
I have a list of 50 unique procedure codes and want to return the entire set of claim lines (i.e. all rows of "John") where any combination of these 50 codes have been billed with another, but not with themselves ("123" with "321", but not "123" with "123"). If "123" is in my list of 50 but "456" and "789" are not, it should not return the set of "John" claims since only one code of my 50 are present. I hope this makes sense.
Positive Result Codes
123
321
987
The query should return all 5 Jane rows (123 and 321) and all 4 David rows (123 & 987).
ID Code
Jane 123
Jane 456
Jane 789
Jane 321
Jane 654
David 123
David 456
David 789
David 987
Try this code:
;WITH Visits as (
SELECT claimID,COUNT(DISTINCT Code) as CNT FROM tbl_Visits
WHERE Code in (123,123,321,987)
GROUP by claimID
HAVING COUNT(DISTINCT Code) > 1
)
SELECT * FROM tbl_Visits
WHERE claimID in (SELECT claimID FROM Visits);