SAS: Can't understand the use of '+1' in this code? - input

I was using the following code to read data in SAS, the given is the code that I tried
data libcards;
infile datalines;
input name $11. birthdate date9. issuedate mmddyy10.;
datalines;
A. Jones 1jan60 9-15-03
M. Rincon 05OCT1949 02-29-2000
Z. Grandage 18mar1988 10-10-2002
K. Kaminaka 29may2001 01-24-2003
;
run;
Needless to say the dates were not read in correctly, except the ones on the first row. Then I changed the format but it still didn't work. Then I looked up the solution, and this is the code that was given.
data libcards;
infile datalines;
input name $11. +1 birthdate date9. +1 issuedate mmddyy10.;
datalines;
A. Jones 1jan60 9-15-03
M. Rincon 05OCT1949 02-29-2000
Z. Grandage 18mar1988 10-10-2002
K. Kaminaka 29may2001 01-24-2003
;
run;
And this code works perfectly. I can see that the difference is the "+1" part, but I don't understand how it's working. The book that I am using has no explanation about it.
Can anyone tell me what's going on here? Thanks for your help.

+n moves the pointer n columns, in this case, just 1 to the right to read the data. This SAS doc page may help with more details.

To read this file you can use a combination of formatted input (for the first field) and "format modified" list input for the other two fields.
name : format.
tells SAS to use list input with a specific informat. You can also do same by using INFORMAT statement to associate the informat to the variable.
data libcards;
infile datalines;
input name $11. birthdate :date9. issuedate :mmddyy10.;
datalines;
A. Jones 1jan60 9-15-03
M. Rincon 05OCT1949 02-29-2000
Z. Grandage 18mar1988 10-10-2002
K. Kaminaka 29may2001 01-24-2003
;
run;
proc print;
run;
You may want to associate a FORMAT with the date variables.

Related

Is there a way to input hundreds of variables into SAS without using each variable separately?

I have a set of data of gym membership starting with an ID, then 119 in-time columns and 119 out-time columns. The in-time and out-time columns are in the syntax of ##:##:## and I am trying to input the variables in the simplest way. Rather than writing [ID in1 $ in2 $ inX $ out1 $ out2 $ outX $], is there a way to easily input hundreds of columns in a simple line of code?
Just use variable lists. Let's assume your data file is comma delimited.
data want ;
infile 'myfile.csv' dsd truncover ;
input id (in1-in119 out1-out119) (:time8.) ;
format in1-in119 out1-out119 time8.;
run;
"proc import" can be an alternative solution.
It defines data type automatically.
The statement looks like the following:
proc import
datafile = myfile.csv
out = work.destination_table
dbms = csv replace
;
run;

Is there some way to tell SAS that for any obs ####1, ####2, or ####3 (where # = 1-9), I want them formatted #### Spring, #### Fall, and #### Winter?

So I have a 1000 observations for one variable that look like this:
19962
19943
19972
19951
19951
19912
The first four digits vary a bit, but the last digit is always 1, 2, or 3. Is there a way to only format the last digit, while not having to type out each iteration of the first four digits in a value statement?
That is, I want to avoid doing this:
proc format;
value varfmt
19911 = '1991 Spring'
19912 = '1991 Fall'
19913 = '1991 Winter'
19921 = '
19922 = '
[…]
19991 = '1999 Spring'
19992 = '1999 Fall'
19993 = '
;
run;
Instead, is there some way to tell SAS that for any ####1, ####2, or ####3, I want #### Spring, #### Fall, and #### Winter (which would be three lines under the value statement)?
Thanks in advance for any help.
As you are applying the format on the last digit only, so using the all the digits in the proc format is not required. Just extract the last digit and apply the format on it and concatenate it with other first four digits.
Creating the sample dataset
data test;
infile datalines;
input year;
datalines;
19962
19943
19972
19951
19951
19912
;
run;
Creating the formats
proc format;
value $varfmt
1 = 'Spring'
2 = 'Fall'
3 = 'Winter'
;
run;
Here, doing the following things
Extracting the last digit
Applying the format on it, created above
Extracting the first four digits of the number
Concatenating the output of 2 and 3
data final;
set test;
year_new = cat(substr(compress(year),1,4)," ",put(substr(compress(year),5,1),$varfmt.));
run;
You also have the option of creating a format from a dataset, if you do want a format for the whole value. You will have to create all possible rows, but it's not particularly hard.
data forfmt;
fmtname='SEASONF';
length start $5 label $8;
do startyr = 1990 to 2015;
start=cats(startyr,'1');
label=catx(' ',startyr,'Spring');
output;
start=cats(startyr,'2');
label=catx(' ',startyr,'Fall');
output;
start=cats(startyr,'3');
label=catx(' ',startyr,'Winter');
output;
end;
run;
proc format cntlin=forfmt;
quit;

SAS specific observation format

I would like to create a new variable in SAS which takes the value 1 if an observation in the variable "TEXT" contains 8 numbers. The problem is, that TEXT is a character variable. Is it possible to make some kind of a format search in SAS?
I assume by '8 numbers' you actually mean 8 digits. For 8 separate numbers, that would be different.
So something like the code below might help.
The modifier 'kd' meaning KEEP DIGITS in COMPRESS function does the magic here:
data indata;
length TEXT $20;
input TEXT;
datalines;
a
123
12345678
A12345678
;
run;
data outdata;
set indata;
length TEXT_DIGITS $20 _8_DIGIT_INDICATOR 3;
TEXT_DIGITS = compress(TEXT, , 'kd');
if length(TEXT_DIGITS)=8 then _8_DIGIT_INDICATOR = 1;
run;
Adjust the logic as you need - e.g. if no other character in input value is allowed or something else.
Also functions like ANYDIGIT, NOTDIGIT might be useful.

SAS - Creating a Numbered Range List of character variables?

I can create a Numbered Range List of numeric type, but not character type.
My code is similar to this:
DATA TestDataset;
INPUT a1-a3 $;
DATALINES;
A B C
;
RUN;
This produces 3 variables - [a1], [a2] and [a3] as expected. However [a3] is character, but [a1] and [a2] are numeric. This leaves me with missing values as per the following table:
a1 a2 a3
. . C
The following code works, but obviously it does not scale nicely.
INPUT a1 $ a2 $ a3 $;
Am I missing something?
I believe you can use the hyphen notation on the length statement to get what you want. You really should use a length statement regardless..otherwise it defaults to $8.
DATA TestDataset;
length a1-a3 $20;
INPUT a1-a3 ;
DATALINES;
A B C
;
RUN;
I came up with a macro solution:
%MACRO var_list_char (var_prefix, n);
%LOCAL i ;
%DO i = 1 %TO &n;
&var_prefix&i$
%END;
%MEND;
DATA TestDataset;
INPUT %var_list_char (a, 3);
DATALINES;
A B C
;
RUN;
I wish I could find a way to do this without macros - I will keep digging for a bit and will update this post if I find more. In the meantime, the above approach will definitely work.
UPDATE 1: #carolinajay65's solution above is the correct non-macro approach.
UPDATE 2: There is another way that I found.
DATA TestDataset;
INPUT (a1-a3) ($);
DATALINES;
A B C
;
RUN;
More documentation of the language features supporting this technique can be found here, in the section labeled "How to Group Variables and Informats".

Convert Numeric to Character 20

Hi I am building a dataset, but the data I am merging is in different formats.
From the Excel sheet i import its in numeric 8, and the other 2 datasets im merging to are character 20, so I want to change the numeric 8 to char 20.
How can I change the variable acctnum, to char 20? (I also want to keep this as its name, as I presume a new variable will be created)
data WORK.T82APR;
set WORK.T82APR;
rename F1 = acctnum f2 = tariff;
run;
proc contents data=T82APR;
run;
While this thread is already dead, I thought I'd way in and answer why the 14 digits conversion became in E notation.
Typically, or rather, unless otherwise specified, numeric formats in SAS use BEST12 format. As such, when a numeric value is longer than 12 characters (including any commas and periods), BEST12 chooses E notation as the best way to format the value.
The input function, in that case receives the formatted value put(acctnum, BEST12.). There would've been 2 ways around it.
Either use
input(put(acctnum, 14.), $20.);
Or, change the format of the variable using the format statement (directly in a data step or with proc datasets like) - this has the added benefit that if you open the table in SAS, you will see the 14 digits and not the scientific formatted value.
proc datasets library=work nolist;
modify dsname;
format acctnum 14.;
run;
Vincent
Try this:
data WORK.T82APR ;
set WORK.T82APR;
acctnum = put(F1, $20.);
rename f2 = tariff;
run;
Ok, I didn't pay attention to your own rename statement, so I adjusted my answer to reflect that now.