Break the character values into multiple names in SAS - variables

Suppose I have a data set as follows
data mydat;
input samplename$ value;
datalines;
AA:77:D1 45
BB:08:D3 50
;
run;
I would like to separate the 'samplename' variable values into three part with three new variable.
Expected outcome is

For your particular task you can use the folowing code:
data mydat;
length v1 v2 v3 $200;
input v1 v2 v3 value;
infile cards dlm=": ";
datalines;
AA:77:D1 45
BB:08:D3 50
;
run;
Tom has absolutely correctly suggested more traditional and might be flexible for the changes way:
data mydat;
length samplename v1 v2 v3 $200;
input samplename$ value;
v1 = scan (samplename, 1, ':');
v2 = scan (samplename, 2, ':');
v3 = scan (samplename, 3, ':');
datalines;
AA:77:D1 45
BB:08:D3 50
;
run;

For the case of samplename data values always having three parts you can specify dlm=': '
Example:
data want;
infile datalines dlm=': ';
input (var1-var3) ($) value;
datalines;
AA:77:D1 45
BB:08:D3 50
;
run;

Related

How do i assign a value to a new variable, using another dataset which contains one value in SAS

I have a dataframe
ID value1
1 12
2 345
3 342
i have a second dataframe
value2
3823
how do I get the following result?
ID value1 value2
1 12 3823
2 345 3823
3 342 3823
any joins I have done have given me
ID value1 value2
1 12 .
2 345 .
3 342 .
. . 3823
No need for joins or helper variables:
data have;
do i = 1 to 3;
output;
end;
run;
data lookup;
j = 1;
run;
data want;
set have;
if _n_ = 1 then set lookup;
run;
Without the if _n_ = 1, the data step stops after one iteration when it tries to read a second row from the lookup dataset and finds that there are no rows remaining.
N.B. this requires that the have dataset doesn't already contain a variable with the same name as the variable(s) attached from the lookup dataset.
By far the easiest way to do this is to utilize PROC SQL and defining the condition 1=1, which is always true for each comparison:
data first;
input ID value1 ##;
cards;
1 12 2 345 3 342
run;
data second;
input value2 ;
cards;
3823
run;
proc sql;
create table wanted as
select * from first
left join second
on 1 =1
;quit;
Edit: As far as I know, there isn't direct way to merge datasets by each row, but you can do the following trick:
Add variable Help:
data second_trick;
set second;
help=1;
run;
data first_trick;
set first;
help=1;
run;
Then we just perform the merge by the static variable:
data wanted_trick;
merge first_trick(in=a) second_trick;
by help;
if a; /*Left join, just to be sure.*/
run;
now this only works if you want to add single static value. Don't try to use it your Second set has more rows.
For more on Merges and joins see: https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/249-30.pdf

How to import specific lines from dat file

I've got a .dat file with numbers that I need imported into a SAS dataset. However, there's plenty of information that I do not need, and I only want specific lines of data (e.g. every 6th line starting from line 1000, until I have 100 observations). I also require a unique identifier based on what is displayed on the first line.
So for example, the .dat file contains this:
DATANOTREQUIRED
DATANOTREQUIRED
DATANOTREQUIRED
UPDATE AAA_1111111_Q_BBBBBB_0_1_#
123.4,
123.5,
124.0,
124.1
DATANOTREQUIRED
DATANOTREQUIRED
DATANOTREQUIRED
UPDATE AAA_1111111__Q_BBBBBB_0_2_#
125.1,
126.0,
127.1,
130.0
What I want the eventual SAS dataset to look like is this
Identifier | Value
X.1. | 124.1
X.2. | 130.0
I'm using the infile in SAS and using input to point to line 1000 but I'm stuck and cannot get the SAS dataset I want. (Updated code based on contributors below)
data work.test;
infile '\\filepath\mydatasource.dat' dsd firstobs=1042 truncover;
input #8 ID :$40.
#4 Value1 :8.;
run;
but what I'm seeing now is that the header lines are appearing fine, but the first observation has a . and instead the first data value is appearing for the 2nd header line.
ID | Value1
UPDATE AAA_1111111_Q_BBBBBB_0_1_# | .
UPDATE AAA_1111111__Q_BBBBBB_0_2_# | 124.1
Here's an example assuming that you have the same number of rows between each header row:
data want;
if _n_ > 2 then stop; /*Stop after we've output 2 rows */
infile cards firstobs=6; /*Skip the first 5 lines in the file*/
input #1 #8 ID :$32.
#5 myvar :8.;
cards;
UPDATE AAA_1111111_Q_BBBBBB_0_1_#
123.4,
123.5,
124.0,
124.1
UPDATE AAA_1111111__Q_BBBBBB_0_2_#
125.1,
126.0,
127.1,
130.0
UPDATE AAA_1111111_Q_BBBBBB_0_3_#
123.4,
123.5,
124.0,
124.1
UPDATE AAA_1111111__Q_BBBBBB_0_4_#
125.1,
126.0,
127.1,
130.0
;
run;
Use the FIRSTOBS= option to skip the beginning of the file.
If there are always 5 records per block you could just read them individually.
data want;
infile rawdata dsd firstobs=1000 truncover;
input id :$40. (4*value) (/) ;
run;
Or you could do something like this that should allow for a variable number of values per id and just keep the last one.
data want;
infile rawdata dsd firstobs=1000 end=eof;
input # ;
length id $32 value 8 ;
retain id value;
if _infile_ =: 'UPDATE' then do;
if _n_ > 1 then output;
id = scan(_infile_,-1,' ');
end;
else input value;
if eof and _n_ > 1 then output;
run;

Creating ID variable using digits from two different variables on SAS

I'm trying to create a new variable on SAS. There is a column called "Statefip" and a column called "countyfip". I need a four digit ID number that combines these two columns.
For example:
enter image description here
How do I tell SAS to follow this format when creating this new variable?
This is easy to do using put and input statements. The z3 format includes leading 0's in the output. || concatenates the put statements and then input converts the id field back to numeric.
data have;
input statefip countyfip;
datalines;
1 1
8 109
12 57
13 313
;
run;
data want;
set have;
id = input(put(statefip,2.) || put(countyfip,z3.),8.);
run;
proc print;
Output:
Obs statefip countyfip id
1 1 1 1001
2 8 109 8109
3 12 57 12057
4 13 313 13313

Is there some way to tell SAS that for any obs ####1, ####2, or ####3 (where # = 1-9), I want them formatted #### Spring, #### Fall, and #### Winter?

So I have a 1000 observations for one variable that look like this:
19962
19943
19972
19951
19951
19912
The first four digits vary a bit, but the last digit is always 1, 2, or 3. Is there a way to only format the last digit, while not having to type out each iteration of the first four digits in a value statement?
That is, I want to avoid doing this:
proc format;
value varfmt
19911 = '1991 Spring'
19912 = '1991 Fall'
19913 = '1991 Winter'
19921 = '
19922 = '
[…]
19991 = '1999 Spring'
19992 = '1999 Fall'
19993 = '
;
run;
Instead, is there some way to tell SAS that for any ####1, ####2, or ####3, I want #### Spring, #### Fall, and #### Winter (which would be three lines under the value statement)?
Thanks in advance for any help.
As you are applying the format on the last digit only, so using the all the digits in the proc format is not required. Just extract the last digit and apply the format on it and concatenate it with other first four digits.
Creating the sample dataset
data test;
infile datalines;
input year;
datalines;
19962
19943
19972
19951
19951
19912
;
run;
Creating the formats
proc format;
value $varfmt
1 = 'Spring'
2 = 'Fall'
3 = 'Winter'
;
run;
Here, doing the following things
Extracting the last digit
Applying the format on it, created above
Extracting the first four digits of the number
Concatenating the output of 2 and 3
data final;
set test;
year_new = cat(substr(compress(year),1,4)," ",put(substr(compress(year),5,1),$varfmt.));
run;
You also have the option of creating a format from a dataset, if you do want a format for the whole value. You will have to create all possible rows, but it's not particularly hard.
data forfmt;
fmtname='SEASONF';
length start $5 label $8;
do startyr = 1990 to 2015;
start=cats(startyr,'1');
label=catx(' ',startyr,'Spring');
output;
start=cats(startyr,'2');
label=catx(' ',startyr,'Fall');
output;
start=cats(startyr,'3');
label=catx(' ',startyr,'Winter');
output;
end;
run;
proc format cntlin=forfmt;
quit;

Simplifying the variable input in SAS

I have 90 variables in the data, I want to do the following in SAS.
Here is my SAS code:
data test;
length id class sex $ 30;
input id $ 1 class $ 4-6 sex $ 8 survial $ 10;
cards;
1 3rd F Y
2 2nd F Y
3 2nd F N
4 1st M N
5 3rd F N
6 2nd M Y
;
run;
data items2;
set test;
length tid 8;
length item $8;
tid = _n_;
item = class;
output;
item = sex;
output;
item = survial;
output;
keep tid item;
run;
What if I have 90 variables to input the data like this? There should be a very long list. I want to simplify it.
You could use an ARRAY or alternately a PROC TRANSPOSE.
The following is untested, because you haven't provided an exxample of your input dataset.
DATA ITEMS;
ARRAY VARS {*} VAR1-VAR90;
SET REPLACE;
DO I = LBOUND(VARS) TO HBOUUND(VARS);
ITEM = VARS{I};
OUTPUT;
END;
RUN;
OR
PROC TRANSPOSE DATA = TEST OUT = WANT;
BY ID;
VAR CLASS -- SURVIAL;
RUN;
In the future it would be best is you could supply your input and desired output.
I don't seem to be able to add another comment to the above answer, as such I am adding one here.
You need to extend the VAR statement to include all variables that you want transposed.
CLASS -- SURVIAL means all variables between CLASS and SURVIVAL inclusive.
Post your code and the error so that I can help you better.