I'm trying to convert a character variable to a numeric variable, but unfortunately i'm really struggeling. Help would be appreciated!
I keep getting the following error: 'Invalid argument to function INPUT at line 3259 column 17'
Syntax:
Data want;
Set have;
Dosis_num = input(Dosis, best12.);
run;
I have also tried multiplying the variable by 1. This doesnt work either.
The variable looks like this:
Dosis
155
201
2.1
0.8
123.80
12.0
3333.4
00.6
Want:
Dosis_num
155.0
201.0
2.1
0.8
123.8
12.0
333.4
0.6
Thanks alot!
The code will work with the data you show. So either the values in the character variable are not what you think or you are not using the right variable name for the variable.
The code is trying to only use the first 12 bytes of the character variable. Normally you don't need to restrict the number of characters you ask the INPUT() function to use. In fact the INPUT() function does not care if the width of the informat used is larger than the length of the string being read. So just use 32. as the informat since 32 is the maximum width that the normal numeric informat can read. Note that BEST is the name of a FORMAT, if you use it as the name of informat it is just an alias for the normal numeric informat.
If the variable has a length longer than 12 then perhaps there are leading spaces in the variable (note the ODS output displays do not properly display leading spaces) then use the LEFT() function to remove them.
Dosis_num = input(left(Dosis), 32.);
The typical thing to do here is to find out what's actually in the character variable. There is likely something in there that is causing the issue.
Try this:
data have;
input #1 Dosis $8.;
datalines;
155
201
2.1
0.8
123.80
12.0
3333.4
00.6
;;;;
run;
data check;
set have;
put dosis hex32.;
run;
What I get is this:
83 data check;
84 set have;
85 put dosis hex32.;
86 run;
3135352020202020
3230312020202020
322E312020202020
302E382020202020
3132332E38302020
31322E3020202020
333333332E342020
30302E3620202020
NOTE: There were 8 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.CHECK has 8 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
All those 2020202020 are spaces, which should be there (all strings are space-padded to full length). Period/Decimal Point is 2E, Digits are 3x where x is the digit (because the ASCII for 0 is 30, not because of any other reason). So for example for the last one, 00.6, 30 means zero, 30 means zero, 2E means period, and 36 means 6.
Check to make sure that you don't have any other characters other than digits (3x) and period (2e) and space (20).
The other thing to verify is that your system is set to use . as the decimal separator and not , as many European systems are - otherwise this requires the commaw. informat. You can actually just try the commaw. informat (comma12. is sufficient if 12 is plenty - and don't include anything after the period) as anything that 12. can read in also can be read in by commaw..
Related
I have a question about the format of numbers REAL.
I have a column with this type and after I insert 8 numbers for this column, it doesn't let me to save.
Example: 11406760
When I try with a 7 digit numbers like 1140676, it lets me to save the data.
Any idea why this happens?
If I read this MSDN page correctly, REAL is a synonym for FLOAT(24), which has a precision of 7 digits.
This means that while a column of this type does support values up to about 10^38, it only keeps about 7 most significant digits of that value. So for an 8 digit number, the final digit may not be stored correctly.
Do you really need a REAL (=floating point) value for this column (maybe check out decimal), or rather some integer type?
I would like to create a new variable in SAS which takes the value 1 if an observation in the variable "TEXT" contains 8 numbers. The problem is, that TEXT is a character variable. Is it possible to make some kind of a format search in SAS?
I assume by '8 numbers' you actually mean 8 digits. For 8 separate numbers, that would be different.
So something like the code below might help.
The modifier 'kd' meaning KEEP DIGITS in COMPRESS function does the magic here:
data indata;
length TEXT $20;
input TEXT;
datalines;
a
123
12345678
A12345678
;
run;
data outdata;
set indata;
length TEXT_DIGITS $20 _8_DIGIT_INDICATOR 3;
TEXT_DIGITS = compress(TEXT, , 'kd');
if length(TEXT_DIGITS)=8 then _8_DIGIT_INDICATOR = 1;
run;
Adjust the logic as you need - e.g. if no other character in input value is allowed or something else.
Also functions like ANYDIGIT, NOTDIGIT might be useful.
I'm trying to emulate a function in SQL that a client has produced in Excel. In effect, they have a unique, 10-digit numeric value (VARCHAR) as the primary key in one of their enterprise database systems. Within another database, they require a unique, 5-digit alphanumeric identifier. They want that 5-digit alphanumeric value to be a representation of the 10-digit number. So what they did in excel was to split the 10-digit number into pairs, then convert each of those pairs into a hexadecimal value, then stitch them back together.
The EXCEL equation is:
=IF(VALUE(MID(A2,1,4))>0,DEC2HEX(VALUE(MID(A2,3,2)))&DEC2HEX(VALUE(MID(A2,5,2)))&DEC2HEX(VALUE(MID(A2,7,2)))&DEC2HEX(VALUE(MID(A2,9,2))),DEC2HEX(VALUE(MID(A2,5,2)))&DEC2HEX(VALUE(MID(A2,7,2)))&DEC2HEX((VALUE(MID(A2,9,2)))))
I need the SQL equivalent of this. Of course, should someone out there know a better way to accomplish their goal of "a 5-digit alphanumeric identifier" based off the 10-digit number, I'm all ears.
ADDED 8/2/2011
First of all, thank you to everyone for the replies. Nice to see folks willing to help and even enjoying it! Based on all the responses, I'm apt to tell my client they're intent is sound, only their method is off kilter. I'd also like to recommend a solution. So the challenge remains, just modified slightly:
CHALLENGE: Within SQL, take a 10 digit, unique NUMERIC string and represent it ALPHANUMERICALLY in as few characters as possible. The resulting string must also be unique.
Note that the first 3-4 characters in the 10-digit string are likely to be zeros, and that they could be stripped to shorten the resulting alphanumeric string. Not required, but perhaps helpful.
This problem is inherently impossible. You have a 10 digit numeric value that you want to convert to a 5 digit alphanumeric value. Since there are 10 numeric characters, this means that there are 10^10 = 10 000 000 000 unique values for your 10 digit number. Since there are 36 alphanumeric characters (26 letters + 10 numbers), there are 36^5 = 60 466 176 unique values for your 5 digit number. You cannot map a set of 10 billion elements into a set with around 60 million.
Now, lets take a closer look at what your client's code is doing:
So what they did in excel was to split the 10-digit number into pairs, then convert each of those pairs into a hexadecimal value, then stitch them back together.
This isn't 100% accurate. The excel code never uses the first 2 digits, but performs this operation on the remaining 8. There are two main problems with this algorithm which may not be intuitively obvious:
Two 10 digit numbers can map to the same 5 digit number. Consider the numbers 1000000117 and 1000001701. The last four digits of 1000000117 get mapped to 1 11, where the last four digits of 1000001701 get mapped to 11 1. This causes both to map to 00111.
The 5 digit number may not even end up being 5 digits! For example, 1000001616 gets mapped to 001010.
So, what is a possible solution? Well, if you don't care if that 5 digit number is unique or not, in MySQL you can use something like:
hex(<NUMERIC VALUE> % 0xFFFFF)
The log of 10^10 base 2 is 33.219280948874
> return math.log(10 ^ 10) / math.log(2)
33.219280948874
> = 2 ^ 33.21928
9999993422.9114
So, it takes 34 bits to represent this number. In hex this will take 34/4 = 8.5 characters, much more than 5.
> return math.log(10 ^ 10) / math.log(16)
8.3048202372184
The Excel macro is ignoring the first 4 (or 6) characters of the 10 character string.
You could try encoding in base 36 instead of 16. This will get you to 7 characters or less.
> return math.log(10 ^ 10) / math.log(36)
6.4254860446923
The popular base 64 encoding will get you to 6 characters
> return math.log(10 ^ 10) / math.log(64)
5.5365468248123
Even Ascii85 encoding won't get you down to 5.
> return math.log(10 ^ 10) / math.log(85)
5.1829075929158
You need base 100 to get to 5 characters
> return math.log(10 ^ 10) / math.log(100)
5
There aren't 100 printable ASCII characters, so this is not going to work, as zkhr explained as well, unless you're willing to go beyond ASCII.
I found your question interesting (although I don't claim to know the answer) - I googled a bit for you out of interest and found this which may help you http://dpatrickcaldwell.blogspot.com/2009/05/converting-decimal-to-hexadecimal-with.html
Hi I am building a dataset, but the data I am merging is in different formats.
From the Excel sheet i import its in numeric 8, and the other 2 datasets im merging to are character 20, so I want to change the numeric 8 to char 20.
How can I change the variable acctnum, to char 20? (I also want to keep this as its name, as I presume a new variable will be created)
data WORK.T82APR;
set WORK.T82APR;
rename F1 = acctnum f2 = tariff;
run;
proc contents data=T82APR;
run;
While this thread is already dead, I thought I'd way in and answer why the 14 digits conversion became in E notation.
Typically, or rather, unless otherwise specified, numeric formats in SAS use BEST12 format. As such, when a numeric value is longer than 12 characters (including any commas and periods), BEST12 chooses E notation as the best way to format the value.
The input function, in that case receives the formatted value put(acctnum, BEST12.). There would've been 2 ways around it.
Either use
input(put(acctnum, 14.), $20.);
Or, change the format of the variable using the format statement (directly in a data step or with proc datasets like) - this has the added benefit that if you open the table in SAS, you will see the 14 digits and not the scientific formatted value.
proc datasets library=work nolist;
modify dsname;
format acctnum 14.;
run;
Vincent
Try this:
data WORK.T82APR ;
set WORK.T82APR;
acctnum = put(F1, $20.);
rename f2 = tariff;
run;
Ok, I didn't pay attention to your own rename statement, so I adjusted my answer to reflect that now.
I am writing a custom totaling method for a grid view. I am totaling fairly large numbers so I'd like to use a decimal to get the total. The problem is I need to control the maximum length of the total number. To solve this problem I started using float but it doesn't seem to support large enough numbers, I get this in the totals column(1.551538E+07). So is there some formating string I can use in .ToString() to guarentee that I never get more then X characters in the total field? Keep in mind I'm totaling integers and decimals.
If you're fine with all numbers displaying in scientific notation, you could go with "E[numberOfDecimalPlaces]" as your format string.
For example, if you want to cap your strings at, say, 12 characters, then, accounting for the one character for the decimal point and five characters needed to display the exponential part, you could do:
Function FormatDecimal(ByVal value As Decimal) As String
If value >= 0D Then
Return value.ToString("E5")
Else
' negative sign eats up another character '
Return value.ToString("E4")
End If
End Function
Here's a simple demo of this function:
Dim d(5) As Decimal
d(0) = 1.203D
d(1) = 0D
d(2) = 1231234789.432412341239873D
d(3) = 33.3218403820498320498320498234D
d(4) = -0.314453908342094D
d(5) = 000032131231285432940D
For Each value As Decimal in d
Console.WriteLine(FormatDecimal(value))
Next
Output:
1.20300E+000
0.00000E+000
1.23123E+009
3.33218E+001
-3.1445E-001
3.21312E+016
You could use Decimal.Round, but I don't understand the exact question, it sounds like you're saying that if the total adds up to 12345.67, you might only want to show 4 digits and would then show 2345 or do you just mean that you want to remove the decimals?