SAS proc summary types alternative in Pandas - pandas

I'm searching for a Pandas alternative to types statement in SAS PROC SUMMARY.
Simply speaking, each line in types is an individual groupby.
The output result is one table with all the requested types summary.
A loop with different groupby inputs could be a solution, but I wonder if there is a simpler way to do it in Pandas?
proc summary data=inDS;
class A B C D E;
types
A * B * C
A * B * D
A * B * E
A * B
A
;
var sales;
output out=outDS sum= ;
run;

Related

How to read excel two dimensional parameter in Gams?

I have a Gams model and I want read sets and parameters from Excel to Gams.As shown below:
How can I read this parameter in Gams?
Thanks
For that table you need 2 indexes (i.e. sets) e.g. set i for the column of a, b and c. And set j for the row of d, e and f. Try this:
parameter d(i,j) "Data with column of a, b and c and row of e, d and f";
$Call GDXXRW.exe i=C:\Input.xlsx par=d rng=Sheet1!C1:F4 Rdim=1 Cdim=1 o=C:\Input.gdx
$GDXIN C:\Input.gdx
$LOAD d
$GDXIN
Display d;

Find continuity of elements in Pig

how can i find the continuity of a field and starting position
The input is like
A-1
B-2
B-3
B-4
C-5
C-6
The output i want is
A,1,1
B,3,2
C,2,5
Thanks.
Assuming you do not have discontinuous data with respect to a value, you can get the desired results by first grouping on value and using COUNT and MIN to get continuous_counts and start_index respectively.
A = LOAD 'data' USING PigStorage('-') AS (value:chararray;index:int);
B = FOREACH (GROUP A BY value) GENERATE
group as value,
COUNT(A) as continuous_counts,
MIN(A.value) as start_index;
STORE B INTO 'output' USING PigStorage(',');
If your data does have the possibility of discontinuous data, the solution is not longer trivial in native pig and you might need to write a UDF for that purpose.
Group and count the number of values for continous_counts. i.e.
A,1
B,3
C,2
Get the top row for each value. i.e.
A,1
B,2
C,5
Join the above two relations and get the desired output.
A = LOAD 'data.txt' USING PigStorage('-') AS (value:chararray;index:int);
B = GROUP A BY value;
C = FOREACH B GENERATE group as value,COUNT(A.value) as continuous_counts;
D = FOREACH B {
ordered = ORDER B BY index;
first = LIMIT ordered 1;
GENERATE first.value,first.index;
}
E = JOIN C BY value,D BY value;
F = FOREACH E GENERATE C::value,C::continuous_counts,D::index;
DUMP F;

Using pig, How do I parse and comapre a grouped item

I have
A B
a, d
a, e
a, y
z, v
z, k
z, o
and so on.
Column B is of type cararray and contains key value pairs separated by &.
For example - d = 'abc=1&c=1&p=success'
What I want to figure out --
Suppose -
d = 'abc=1&c=1&xyz=23423423'
e = 'xyz=1&it=ssd'
y = 'abc=1&c=1&p=success'
For every 'a' I want to figure out if it has column b which contains the same value of abc and have c=1 and p = success. I also want to extract the value of abc and c from d and y.
For instance lets take the above example -
d contains abc=1 and c=1
y contains abc=1 and p= success
So this satisfies what I am looking for i.e for a given 'a' i have same value of abc and c=1 and p =success.
I started with grouping my data :
grouped = group data BY (A, B);
which gives me
a, (a,b)(a,e)(a,y)
z, (z,v)(z,k)(z,o)
But after this I am clueless on how to compare data within each group so that the above condition is satisfied.
Any help on this is appreciated.
Please let me know if you want me to clarify further on my question.
Since you are only concerned with some of the fields in the query string (I assume that's what it is), you will want to split the data with a FOREACH and STRSPLIT. Flatten it so you have something that looks like this
(a, b) where b would be a single key/value from the query ex: abc=1
Filter out the key/value pairs you don't care about, join them back together and then group by the combined key/value pairs. That will give you a list of every a with the same b where b only contains abc=X, c=1 and p=success

SAS: Changing multiple variable names

This is my current issue:
I have 53 variable headers in a SAS data set that need to be changed, for example:
Current_Week_0 TS | Current_Week_1 TS | Current_Week_2 TS -- etc.
I need it to change such that Current_Week_# TS = Current_Week_# -- dropping the TS
Is there a way to automate this such as looping it like:
i = 0,53
Current_week_i TS = Current_Week_i ?
I just don't understand the proper syntax.
Edit: Thank you for editing my formats Sergiu, appreciate it! :)
Edit:
I used the following code, but I get the following error:
Missing numeric suffix on a numbered variable list (TS-Current_Week_53)
DATA True_Start_8;
SET True_Start_7;
ARRAY oldnames (53) Current_Week_1 TS-Current_Week_53 TS;
ARRAY newnames (53) Current_Week_1-Current_Week_53;
DO i = 1 TO 53;
newnames(i) = oldnames(i) ;
END;
RUN;
#Joe EDIT
Here's what the data looks like before and after the "denorm" / transpose
BEFORE
Product ID CurrentWeek Market TS
X 75av2kz Current_Week_0 Z 1
Y 7sav2kz Current_Week_0 Z 1
X 752v2kz Current_Week_1 Z 1
Y 255v2kz Current_Week_1 Z 1
Product ID Market Current_Week_0_TS Current_Week_1_TS
X 75av2kz Z 1 0
Y 7sav2kz Z 1 1
X 752v2kz Z 1 1
Y 255v2kz Z 1 0
This isn't too hard. I assume these are variable labels.
proc sql;
select cats('%relabel_nots(',name,')') into :relabellist separated by ' '
from dictionary.columns
where libname='WORK' and memname='True_Start_7'
and name like '%TS'; *you may need to upper case the dataset name (memname) depending on your OS;
quit;
%macro relabel_nots(name);
label &name.= substr(vlabel(&name.),1,length(vlabel(&name.))-3);
%mend relabel_nots;
data want;
set True_Start_7;
&relabellist.;
run;
Basically the PROC SQL grabs the different names that qualify for the relabelling, and generates a large macro variable with all of the rename macro calls. The relabel_nots macro generates the new labels. You may need to change the logic behind the WHERE in the PROC SQL if the variable names don't also contain the TS.
Another option is to do this in the transpose. Your example data either doesn't match the example desired output, or there is something in logic not explained, but this does the simple transpose; if there is a logical reason that the current_week_0/1 are different in yours than in the below, explain why.
data have;
format currentWeek $20.;
input Product $ ID $ CurrentWeek $ Market $ TS;
datalines;
X 75av2kz Current_Week_0 Z 1
Y 7sav2kz Current_Week_0 Z 1
X 752v2kz Current_Week_1 Z 1
Y 255v2kz Current_Week_1 Z 1
;;;;
run;
proc sort data=have;
by market id product;
run;
proc transpose data=have out=want;
by market id product ;
id currentWeek;
var TS;
run;

SAS dynamic variable names from other variables

I am trying to create SAS variable names based on data contained within other variables. For example, I could start with
Obs Var1 Var2
1 abc X
2 def X
3 ghi Y
4 jkl X
and I would like to end up with
Obs Var1 Var2 X Y
1 abc X abc
2 def X def
3 ghi Y ghi
4 jkl X jkl
I do have one way of doing this but it requires somewhat ugly macros to first create the variables needed (using a length statement) and then creating a whole series of numbered macro variables (1 per observation) that are later called inside a data step loop. It works but is complicated and I don't think will scale well to the real data, which contain multiple variables for creation per row, and a few thousand rows.
I've also tried something with arrays - saving variables names in a macro var, using it to generate an array statement, and trying to keep track of which array index is needed for each new variable, but it is also complicated.
What would really help would be something analogous to
vvaluex(var2)=var1
except vvaluex can't be on the left-hand side of an equals. Any thoughts or ideas?
PROC TRANSPOSE is a handy way to do the example in the question.
data have;
input Obs Var1 $ Var2 $;
datalines;
1 abc X
2 def X
3 ghi Y
4 jkl X
;;;;
run;
proc transpose data=have out=want;
by obs;
id var2;
var var1;
copy var1 var2;
run;
Another option is probably similar to what you've tried before, using arrays and VNAME:
proc sql;
select var2 into :var2list separated by ' ' from have;
quit;
data want;
set have;
array newvars $ &var2list;
do _t = 1 to dim(newvars);
if vname(newvars[_t]) = Var2 then do;
newvars[_t] = var1;
leave;
end;
end;
run;
PROC TRANSPOSE should be faster and is probably more flexible, but this might work better for some purposes.