I am new to SAS and SQL. I have a task to create similar column but with different number.
For example: | DATE | NAME | A1 | A2 | A3 | B |
So I code in SAS like this
PROC SQL;
CREATE TABLE TEST AS
SELECT DATE, NAME,
DO i = 1 to 3
0 AS A&i.,
END
1 as B
FROM SOURCE;
QUIT;
When I run, I got this error
Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, '.', /, <, <=, <>, =, >, >=, AND, EQ,
EQT, GE, GET, GT, GTT, LE, LET, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.
I appreciate any kind of help. Thank you.
I think you should use macro code to generate column names depend on loop counter.
For example, in your case:
%macro create_table(); %macro d; %mend d;
PROC SQL;
CREATE TABLE TEST AS
SELECT DATE, NAME,
%DO i = 1 %to 3;
0 AS A&i.,
%END;
1 as B
FROM SOURCE;
QUIT;
%mend create_table;
%create_table();
Output:
+-------+------+----+----+----+---+
| date | name | A1 | A2 | A3 | B |
+-------+------+----+----+----+---+
In addition, there is another way to complete task. Use data step instead of proc sql:
data test(drop=i);
set source;
array a{3};
do i=1 to 3;
a{i} = 0;
end;
b=1;
run;
Data step array statement can be used to create variables (columns in SQL terminology) and initialize them. The retain statement will initialize a variable (once) to a value that is maintained from output row to row. This is different from b=1; which will perform the value assignment as each row is processed.
data want;
set source;
array a(3) (3*0); /* initialization syntax, 3*0 means 3 zero (0) values */
retain b 1;
run;
Variables created by the array statement are the array name suffixed with a sequential number starting from 1. The name list syntax can be used in array to create variables with a different base name and sequence range. The name, such as
array a(3) x7-x9 (3*0);
Related
I do this to get a TABLE like below
PROC FREQ data=projet.matchs;
TABLES circuit/ NOCUM;
run;
Circuit Fréquence Pourcentage
ATP 127 50.00
WTA 127 50.00
I need exactly the same except that I want "Male" instead of ATP and "female instead of "WTA"
So i tues it is a renaming function but I don't know how to use it.
Thanks for the help
Note those are not "row variable names". They are the actual (or formatted) values of your variable CIRCUIT.
Looks like you want to create a custom format to change how the values in your variable are displayed.
proc format ;
value $gender 'ATP'='Male' 'WTA'='Female';
run;
Then tell the proc to use that format for your variable.
PROC FREQ data=projet.matchs;
TABLES circuit/ NOCUM;
format circuit $gender. ;
run;
I'm trying to resolve a datastep variable in the in() function. I have a dataset that looks like the following:
|Run|Sample Level|Samples Tested|
| 1 | 1 | 1-5 |
| 1 | 2 | 1-5 |
...etc
| 1 | 5 | 1-5 |
---------------------------------
| 2 | 1 | 1-4 |
| 2 | 2 | 1-4 |
The samples tested vary by run. Normally the only sample levels in the dataset are the ones in the range provided by "Samples Tested". However occasionally this is not the case, and it can get messy. For example the last one I worked on looked like this:
|Run|Sample Level|Samples Tested|
| 1 | 1 |2-9, 12-35, 37-40|
In this case I'd want to drop all rows with sample levels that were not included in Samples Tested, which I did by manually adding the code:
Data Want;
set Have;
if sample_level not in (2:9, 12:35, 37:40) then delete;
run;
But what I want to do is have this done automatically by looking at the samples tested column. It's easy enough to turn a "-" into a ":", but where I'm stuck is getting the IN() function to recognize or resolve a variable. I would like code that looks like this: if sample_level not in(Samples_Tested) then delete; where samples_tested has been transformed to be something that the IN() function can handle. I'm also not opposed to using proc sql; if anyone has a solution that they think will work. I know you can do things like
Proc sql; Create table want as select * from HAVE where Sample_Level in (Select Samples_Tested from Have); Quit;
But the problem is that the samples tested varies by run and there could be 16 different runs. Hopefully I've explained the challenge clearly enough. Thanks for taking the time to read this and thanks in advance for your help!
Assuming the values of SAMPLES_TESTED is constant for each value of RUN you could use it to generate the selection criteria. For example you could use a data _null_ step to write a WHERE statement to a file and then %include that code into another data step.
filename code temp;
data _null_;
file code;
if eof then put ';';
set have end=eof;
by run;
if first.run;
if _n_=1 then put 'where ' # ;
else put ' or ' # ;
samples_tested=translate(samples_tested,':','-');
put '(' run= 'and sample_level in (' samples_tested '))';
run;
data want;
set have;
%include code;
run;
Note: IN is an operator and not a function.
Good to see SAS code ;-)
That would work with one range:
select * from HAVE where level in (tested);
For multiple ranges I would use SUBSTRING_INDEX in MySQL or just combination of SUBSTRING and INDEX to find next condition.
select * from HAVE where level in (tested1) or level in (tested2) or level in (tested3);
Where you replace tested1 for example as substr(tested,1, index(tested,',')
I used the following to generate sample:
create table have
(run int,
level int,
tested varchar(20));
INSERT INTO have (run, level, tested)
VALUES (1, 1, "3-5");
INSERT INTO have (run, level, tested)
VALUES (1, 3, "3-5, 12:35");
INSERT INTO have (run, level, tested)
VALUES (1, 20, "3-5, 12-35");
I have a character column which has dates (dd/mm/yyyy) in character format.
While applying filter (where clause), I need that these characters are recognized as dates in the where statement, without actually making any change to the existing column or without creating a new column.
How can I make this happen.
Any help would be deeply appreciated.
Thank you.
In proc sql, you can come close with like:
select (case when datecol like '__/__/____'
then . . .
else . . .
end)
This is only an approximation. _ is a wildcard that matches any character, not just numbers. On the other hand, this is standard SQL, so it will work in any database.
The SAS INPUT function with a ? informat modifier will convert a string (source value) to a result and not show an error if the source value is not conformant to the informat.
INPUT can be used in a WHERE statement or clause. The input can also be part of a BETWEEN statement.
* some of these free form values are not valid date representations;
data have;
length freeform_date_string $10;
do x = 0 to 1e4-1;
freeform_date_string =
substr(put(x,z4.),1,2) || '/' ||
substr(put(x,z4.),3,2) || '/' ||
'2018'
;
output;
end;
run;
* where statement;
data want;
set have;
where input(freeform_date_string,? ddmmyy10.);
run;
* where clause;
proc sql;
create table want2 as
select * from have
where
input(freeform_date_string,? ddmmyy10.) is not null
;
* where clause with input used with between operator operands;
proc sql;
create table want3 as
select * from have
where
input(freeform_date_string,? ddmmyy10.)
between
'15-JAN-2018'D
and
'15-MAR-2018'D
;
quit;
It is not great idea to store date as character value, it can lead to lot of data accuracy related issues and you may not even know that you have data issues for a long time. say someone enters wrong character date and you may not even know. it is always good to maintain date as date value rather than as character value
In your code Filter dates using like becomes little complex for dates. You can try below code which will work for you by using input statement in where clause
data have;
input id datecolumn $10.;
datalines;
1 20/10/2018
1 25/10/2018
2 30/10/2018
2 01/11/2018
;
proc sql;
create table want as
select * from have
where input(datecolumn, ddmmyy10.) between '20Oct2018'd and '30Oct2018'd ;
using like as shown below for above same code
proc sql;
create table want as
select * from have
/*include all dates which start with 2 */
where datecolumn like '2%' and datecolumn like '%10/2018'
or datecolumn = '30/10/2018';
Edit1:
looks like you have data quality issue and sample dataset is shown below. try this. Once again i want to say approach of storing dates as character values is not good and can lead to lot of issues in future.
data have;
input id datecolumn $10.;
datalines;
1 20/10/2018
1 25/10/2018
2 30/10/2018
2 01/11/2018
3 01/99/2018
;
proc sql;
create table want(drop=newdate) as
select *, case when input(datecolumn, ddmmyy10.) ne .
then input(datecolumn, ddmmyy10.)
else . end as newdate from have
where calculated newdate between '20Oct2018'd and '30Oct2018'd
;
or you can put your case statement without making and dropping new column as shown below.
proc sql;
create table want as
select * from have
where
case when input(datecolumn, ddmmyy10.) ne .
then input(datecolumn, ddmmyy10.) between '20Oct2018'd and '30Oct2018'd
end;
I have a table that contains a zip code field (numeric type), and some of the zip codes contain only 4 digits. I need to pad the 4 digits zip codes with leading spaces.
I created a character filed as follows:
proc sql;
create table myTable as
select * , put(Zip,5.) as ZipChar
from Mytable;
create table myTable as
select *, case when Zip<10000 then " "||ZipChar else ZipChar end as Zip_Fixed
from Mytable;
quit;
Now my difficulty is how to locate Zip_Fixed instead of Zip column. Zip is a numeric type and Zip_Fixed is a character type. the replacement is necessary as the order of the column must be kept. I'm all ears for any other creative solution.
Thanks,
Adi
I wrote a macro that reorders variables many months ago. It's probably not the shortest way of doing this, but it should solve your problem.
Assume you have a dataset and want to move move_me before v1
data temp;
input v1 v2 v3 v4 v5 move_me;
datalines;
1 2 3 4 5 0
1 2 3 4 5 0
1 2 3 4 5 0
;
run;
Run the %order macro below:
%macro order(dsn, var1, before_or_after, var2);
/* get list of variables in your dataset from dictionary.columns*/
proc sql;
create table vars as select
varnum, name
from dictionary.columns
where memname = upcase("&dsn.");
quit;
/* assign the final position of the variable that you want to move*/
proc sql;
create table vars2 as select
a.*,
case when a.name = "&var1." then max(b.varnum) else . end as varnum_want
from vars as a
left join vars (where = (name = "&var2.")) as b
on a.varnum = b.varnum;
quit;
/* move the variable to that location*/
data vars3 (drop = varnum_want);
set vars2;
%if &before_or_after. = before %then %do;
if name = "&var1." then varnum = varnum_want - 0.5;
%end;
%else %if &before_or_after. = after %then %do;
if name = "&var1." then varnum = varnum_want + 0.5;
%end;
%else %do;
putlog "ERROR: Pick 'before' or 'after'";
%end;
proc sort; by varnum;
run;
/* select variables into a macro variable in correct order*/
proc sql noprint;
select name into: ordered_vars separated by " " from vars3 order by varnum;
quit;
/* reorder variables*/
data &dsn._reordered;
retain &ordered_vars.;
set &dsn.;
run;
%mend order;
And then you can use the syntax %order(temp, move_me, before, v1); to create a dataset called temp_reordered that has move_me slotted in before v1. In your case, it sounds like you would want to run %order(myTable, zipFixed, before, [your 8th variable's name]) and then drop any extraneous variables to keep your variables ordered correctly.
Your use of the PUT() function will create a character field with leading spaces. Your second step will add another leading space.
Why not just use leading zeros instead? Then the values will look more like numbers and still sort properly.
put(zip,Z5.)
If the final goal is to create a text file with fixed width fields (as one of your other comments mentions) then you just use the format in the PUT statement you use to write the text file.
data _null_;
set mytable ;
file 'myfile.txt';
put ... zip 5. ... ;
run;
Zip codes are typically padded with zeros as Tom notes, not with spaces. They also can be three digits in a few cases (for example, Puerto Rico), so be aware of that.
Further, depending on your needs, formatting the column may be sufficient. It won't change the contents of the numeric column, but it will change how it is displayed.
proc datasets;
modify have;
format zip z5.;
quit;
Again, for some use cases this won't be helpful, but for others it may be superior to converting to character.
I have a macro variable which has more than 1 observation like below.
%let age = 12,34,56;
%put &age;
Now I want to use the age as in my where parameter in proc sql.
proc sql;
select *
from family
where age in ("&age");
quit;
I have used %bquote,%quote,%str() and many more but not successful yet.
Try without any quotes at all (I assume age is a numeric variable).
And if age is a character one then you can re-write WHERE statement like this;
where input(age, 8.) in (&age)
When you write
"&age"
SAS creates single string with all values and commas inside, which is not what we need for IN operator.
Added.
This is my code that works:
%let age = 12,34,56;
%put &age;
data family;
input age ##;
datalines;
15 16 17 12 33 34 55 56
;
run;
proc sql;
select *
from family
where age in (&age);
quit;