I would like to create a new variable in SAS which takes the value 1 if an observation in the variable "TEXT" contains 8 numbers. The problem is, that TEXT is a character variable. Is it possible to make some kind of a format search in SAS?
I assume by '8 numbers' you actually mean 8 digits. For 8 separate numbers, that would be different.
So something like the code below might help.
The modifier 'kd' meaning KEEP DIGITS in COMPRESS function does the magic here:
data indata;
length TEXT $20;
input TEXT;
datalines;
a
123
12345678
A12345678
;
run;
data outdata;
set indata;
length TEXT_DIGITS $20 _8_DIGIT_INDICATOR 3;
TEXT_DIGITS = compress(TEXT, , 'kd');
if length(TEXT_DIGITS)=8 then _8_DIGIT_INDICATOR = 1;
run;
Adjust the logic as you need - e.g. if no other character in input value is allowed or something else.
Also functions like ANYDIGIT, NOTDIGIT might be useful.
Related
I'm trying to convert a character variable to a numeric variable, but unfortunately i'm really struggeling. Help would be appreciated!
I keep getting the following error: 'Invalid argument to function INPUT at line 3259 column 17'
Syntax:
Data want;
Set have;
Dosis_num = input(Dosis, best12.);
run;
I have also tried multiplying the variable by 1. This doesnt work either.
The variable looks like this:
Dosis
155
201
2.1
0.8
123.80
12.0
3333.4
00.6
Want:
Dosis_num
155.0
201.0
2.1
0.8
123.8
12.0
333.4
0.6
Thanks alot!
The code will work with the data you show. So either the values in the character variable are not what you think or you are not using the right variable name for the variable.
The code is trying to only use the first 12 bytes of the character variable. Normally you don't need to restrict the number of characters you ask the INPUT() function to use. In fact the INPUT() function does not care if the width of the informat used is larger than the length of the string being read. So just use 32. as the informat since 32 is the maximum width that the normal numeric informat can read. Note that BEST is the name of a FORMAT, if you use it as the name of informat it is just an alias for the normal numeric informat.
If the variable has a length longer than 12 then perhaps there are leading spaces in the variable (note the ODS output displays do not properly display leading spaces) then use the LEFT() function to remove them.
Dosis_num = input(left(Dosis), 32.);
The typical thing to do here is to find out what's actually in the character variable. There is likely something in there that is causing the issue.
Try this:
data have;
input #1 Dosis $8.;
datalines;
155
201
2.1
0.8
123.80
12.0
3333.4
00.6
;;;;
run;
data check;
set have;
put dosis hex32.;
run;
What I get is this:
83 data check;
84 set have;
85 put dosis hex32.;
86 run;
3135352020202020
3230312020202020
322E312020202020
302E382020202020
3132332E38302020
31322E3020202020
333333332E342020
30302E3620202020
NOTE: There were 8 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.CHECK has 8 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
All those 2020202020 are spaces, which should be there (all strings are space-padded to full length). Period/Decimal Point is 2E, Digits are 3x where x is the digit (because the ASCII for 0 is 30, not because of any other reason). So for example for the last one, 00.6, 30 means zero, 30 means zero, 2E means period, and 36 means 6.
Check to make sure that you don't have any other characters other than digits (3x) and period (2e) and space (20).
The other thing to verify is that your system is set to use . as the decimal separator and not , as many European systems are - otherwise this requires the commaw. informat. You can actually just try the commaw. informat (comma12. is sufficient if 12 is plenty - and don't include anything after the period) as anything that 12. can read in also can be read in by commaw..
I could convert the decimal packed amounts to Numeric amounts but unable to do this reversely.
data HAVE;
amount = '00000258Q';output;
amount = '000000000';output;
amount = '00002488M';output;
amount = '00002126P';output;
amount = '000007{ ';output;
run;
data WANT;
set HAVE;
amount_dollar = input(cats(amount),zdv10.);
run;
That is -
data HAVE;
amount_dollar = -2588;output;
amount_dollar = .;output;
amount_dollar = -24884;output;
amount_dollar = -21267;output;
amount_dollar = 70;output;
run;
Thanks for your help!
Your last value is is shorter than the others and that is why you needed to add the cats() function (or a trim() or strip() function) to remove the trailing blanks from what you pass to the ZDV. informat. Actually your other values are actually only 9 characters long and not 10. Your all zero value is going to get translated to missing by the ZDV. informat, but will be converted to zero by the ZD. informat since it doesn't mind that the nibble with the sign is 0.
Use the ZD. format to generate zoned decimal strings, but note that it will add the leading zeros to the last value and sign nibble to the all zero value.
data test;
input original $9. ;
num=input(original,zd9.);
numv=input(original,zdv9.);
numt=input(trim(original),zd9.);
string=put(numt,zd9.);
same = string=original;
cards;
00000258Q
000000000
00002488M
00002126P
000007{
;
SAS didn't make a ZDV format, as it wouldn't make sense, but you still have the ZD format:
data want;
set have;
amount = put(amount_dollar,zd10.);
run;
If it matters, this is not precisely a packed decimal, but a zoned decimal (packed decimal is, unsurprisingly, PDw.d, among others).
Is there a way to override the default behavior of character length being set by the first value encountered and instead set all character data for a session to have the same fixed length?
Much of the data I work with daily is of a similar format/structure, such as a .csv or .txt. I find that using an infile statement with list input works well for importing this kind of data.
For instance, suppose I have a text file myData.txt.
myData.txt
string1 string2 num1 string3 num2
hello there 12 this 33
is some 45 sample 2
data for 8 you 12
I would then use code like this to bring it in.
%let dataDirectory = C:\path\to\file;
%let dataFile = myData.txt;
filename myFile "&dataDirectory.\&dataFile.";
data in_data;
infile myFile dsd dlm = '09'x firstobs = 2;
length
string1 $ 50.
string2 $ 50.
num1 8
string3 $ 50.
num2 8
;
input
string1 $
string2 $
num1
string3 $
num2
;
run;
filename myFile clear;
I find that it is important to have the length statement so that none of my data is truncated. Since the data sets are not particularly large, it makes sense to set all the character lengths to some fixed amount which will guarantee no truncation occurs. I find that the default numeric length is sufficient.
The problem with this approach is that any time a variable name needs to be changed etc, I need to make an alteration in both the length and input statements. This gets to be a nuisance, especially when there are 150 variables, and I'm hoping it is unnecessary.
List input seems appropriate to my needs. I could use column input, but then I'd have to fiddle around with defining column widths. I can't think of a way to make that a simple process when handling 150 columns. Being able to globally define all character lengths, as with the default 8 for numeric, would solve my problem. Is this possible? Or, maybe you have a better method for bringing in such data as myData.txt?
You could use a macro variable to store your default length. Then you can change it in one place.
You can use a variable list in your INPUT statement so that you don't need to worry about typing variable names more than once.
%let dataDirectory = C:\path\to\file;
%let dataFile = myData.txt;
%let defLength = $80 ;
data in_data;
infile "&dataDirectory/&dataFile" dsd dlm='09'x firstobs=2 truncover ;
length
string1 &defLength
string2 &defLength
num1 8
string3 &defLength
num2 8
;
input (_all_) (:) ;
run;
You can specify how many rows SAS should use to determine field attributes with the "guessingrows" option using proc import. That way proc import will take care of any number of new variables you may have.
proc import out=importeddata
datafile= "/examplepath/file.txt"
dbms=dlm replace;
delimiter='09'X;
getnames=YES;
guessingrows=5000;
run;
If you keep your length statement in the proper order you can use a SAS variable list for the INPUT statement. You don't need the $sign in the input statement. If you have INFORMATS for some variable use an INFORMAT statement to associate.
data in_data;
infile myFile dsd dlm = '09'x firstobs = 2;
length
string1 $ 50.
string2 $ 50.
num1 8
string3 $ 50.
num2 8
;
input (string1--num2)(:);
run;
I've found two parameters defined like these:
&TM_PERIOD+4&/&TM_PERIOD(4)&
It's to pass data from a database to a form.
If the format of the data would be DDMMYYYY what are differences between those two parameters?
if TM_PRIOD is in form of DDMMYYYY then
TM_PERIOD(4) equals DDMM
TM_PERIOD+4 equals YYYY
the (4) means 4 characters
the +4 means after the 4th character
TM_PERIOD+1(2) = DM
(2 characters after the first)
These are not bit operations. +n specifies a string offset and (n) specifies the length.
They can be used independently of each other as well, so you can use just +n or just (n).
So:
data: lv_text(20) type c.
lv_text = "Hello".
write: / lv_text+2(3).
would output 'llo', for example.
Hi I am building a dataset, but the data I am merging is in different formats.
From the Excel sheet i import its in numeric 8, and the other 2 datasets im merging to are character 20, so I want to change the numeric 8 to char 20.
How can I change the variable acctnum, to char 20? (I also want to keep this as its name, as I presume a new variable will be created)
data WORK.T82APR;
set WORK.T82APR;
rename F1 = acctnum f2 = tariff;
run;
proc contents data=T82APR;
run;
While this thread is already dead, I thought I'd way in and answer why the 14 digits conversion became in E notation.
Typically, or rather, unless otherwise specified, numeric formats in SAS use BEST12 format. As such, when a numeric value is longer than 12 characters (including any commas and periods), BEST12 chooses E notation as the best way to format the value.
The input function, in that case receives the formatted value put(acctnum, BEST12.). There would've been 2 ways around it.
Either use
input(put(acctnum, 14.), $20.);
Or, change the format of the variable using the format statement (directly in a data step or with proc datasets like) - this has the added benefit that if you open the table in SAS, you will see the 14 digits and not the scientific formatted value.
proc datasets library=work nolist;
modify dsname;
format acctnum 14.;
run;
Vincent
Try this:
data WORK.T82APR ;
set WORK.T82APR;
acctnum = put(F1, $20.);
rename f2 = tariff;
run;
Ok, I didn't pay attention to your own rename statement, so I adjusted my answer to reflect that now.