SAS importing numeric column to scientific notation - variables

I'm importing a sas7bdat file in sas studio using proc import and one of the variables in the dataset is changing to scientific notation, e.g, 1234567891011121 is showing up as 1.2345678E15
I'm fairly new to SAS and not sure what function would help retain this particular column in its original 16 digit format instead of scientific notation. This column is of numeric data type and its length is being displayed as 8. I have been through other similar posts, but could not find a solution to work with.

SAS stores all numbers as 64bit binary floating point, so using a length of 8 bytes to store the value is the right thing. You cannot use more bytes because it only takes 8 bytes to store all 64 buts. And if you used fewer bytes you would lose precision and could not store all 16 digits.
SAS uses FORMATs to control how values are printed as text. You can use the FORMAT statement to attach a format to a variable.
It looks like you are either using the BEST12. format with that variable, or you are letting SAS use its default way of displaying numbers, which in most cases will be to use the BEST12. format.
If you want the numbers to print with 16 decimal digits then just attach the 16. format to the variable instead.
Or you could use the COMMA21. format instead and the numbers will print with thousand separators so it will be easier for humans to read them.
Example code for attaching a format to variable in a data step.
data want;
set have;
format mynumber 16.;
run;

Related

SAS VARTYPE Function: run it (or equivalent) against all Variables

I am a DB administrator with 0 SAS experience and I work in government and have been tasked with ingesting SAS output from another team. The other team has limited SAS experience apparently and cannot answer the question "what is the data type of each SAS variable". We have dozens of tables and thousands of variables to import. Is there a way to run the SAS function "VarType" against all columns?
I've not found what I needed on SAS docs, SO search, etc.
I am expecting code that I can hand to the other team which they will run to produce the following (with only hard-coding the "dataset" ; no hard-coded table names/variable names):
TableName
VariableName
DataType
DataLength and/or Other attributes as needed
MyTable 1
Column1
char
25
MyTable 1
Col2
numeric
scale 10 precision 2
MyTable 2
Col1
(small? big? 32? ) int
bytes? or something that tells me max range
...
MyTable102
Column100
date
yyyy-mm-dd
Update: here's what I used based on the accepted answer. You would change:
library=SASHELP to library=YourLibrary to change the dataset being scraped
out=yourDataset.sasSchemaDump replace yourDataset with the destination dataset where a new table named sasSchemaDump will be created/populated. Rename sasSchemaDump to your desired table name.
proc datasets library=SASHELP  memtype=data;
contents data=_ALL_ (read=green) out=yourDataset.sasSchemaDump;
title 'SAS Schema Dump';
run;
There is a dedicated SAS procedure for this: PROC CONTENTS
proc contents data=sashelp.cars out=want; run;
It will create a SAS table want with all the information needed.
FYI: TYPE 1 is numeric, TYPE 2 is character.
If all tables are in the same library you could do the following to cycle through all the tables within the library
proc contents data=sashelp._all_ out=want; run;
Run PROC CONTENTS on the dataset and you will have the information you need.
SAS has only two data TYPE. Fixed length character strings and floating point numbers. The LENGTH is the number of bytes that are stored in the dataset. So for character variables the length determines how many characters it can store (assuming you are using a single byte encoding). Floating point numbers require 8 bytes to store, but you can store it with fewer in the dataset if you don't mind the loss of precision that means. For example if you know the values are integers you might choose to store only 4 of the bytes.
You can sometimes tell more information about a variable if the creator attach a permanent FORMAT to control how the variable is displayed. For example SAS stores DATE values as the number of days since 1960. So to make those number meaningful to humans you need to attach a format such as DATE9. or YYMMDD10. so that the numbers print as strings that a human would see as a date. Similarly there are display formats for displaying time of day value (number of seconds since midnight) or datetime values (number of seconds since 1960). Also if they attached a format that does not display decimal places that might mean the values are intended to be integers.
And if they attached a LABEL to the variable that might explain more about the variable than you can learn from the name alone.
They could also attach user defined formats to a variable. Those could be simple code/decode lookups, but they could also be more complex. A common complex one is used for collapsing a range (or multiple values and/or ranges) to a single decode. The definition of a user defined format is stored in a separate file, called a catalog, in particular a format catalog. You can use PROC FORMAT with the FMTLIB or CNTLOUT= option to see the definition of the user defined formats.

SSIS convert exponent number to real (DT_R4)

I have a flat CSV file and some fields contain a value like "1.8e-5, 8.139717345049093e-39" (exponent or scientific numbers). I need to store this value in a SQL real data type field (not float). But the maximum exponent supported by real is e-38.
But I need a mechanism to convert this string field to a real number through SSIS. Basically the e-39 or smaller values should be replaced as 0. and the rest should be stored properly.
I tried setting the data type to DT_R4 in flat file connection field mapping and that didn't help. I tried casting it to DT_R4 through a derived column and that didn't help too. When I check through Data Viewer still the value has the unsupported exponent value and it fails when I insert it to the SQL table.

Convert SAS read-in file into SQL

I am trying to read a .DAT file into SQL. The agency data provider supplied read-in code in SAS here (https://www.health.ny.gov/statistics/sparcs/docs/ip_v2.sas). I would like to read this data into a secure SQL database and was wondering if anyone could help me translate this SAS code into SQL? Here's the start:
OPTIONS NOCENTER NODATE FORMDLIM=' ' compress=yes pagesize=50;
%let yr=11;
/**** READ IN FILE ******* No Check for HexDec ****/
data IUM;
infile eium truncover lrecl=2500 PAD ignoredoseof /*obs=10000*/ ;
INPUT
#0016 ordr $char3.
#0001 RECDTL $char2500.
;
Further down it is specifying the position and length of the the columns, but not the data type. Any SAS users out there feeling smart and generous?
What is there to translate? The #xxxx is saying what column to start in, then you have the variable name and the informat to use to read it. SAS only has two data types, fixed length character strings and floating point numbers. Any informat that starts with a $ will generate a character variable. Others will generate a number. Some obvious informats will generate numbers that can be interpreted as dates, times or datetime (timestamp) values. Such as date9., time8. or datetime20.. But also other informats for other ways of representing dates in text format, like YYMMDD. , MMDDYY. or DDMMYY..
SAS will define the order of the variables in the dataset based on when you first reference them in the code. So ORDR will be defined in the database before RECDTL even though the latter appears first in the text file (column 1 versus column 16).

Float type storing values in format "2.46237846387469E+15"

I have a table ProductAmount with columns
Id [BIGINT]
Amount [FLOAT]
now when I pass value from my form to table it gets stored in format 2.46237846387469E+15 whereas actual value was 2462378463874687. Any ideas why this value is being converted and how to stop this?
It is not being converted. That is what the floating point representation is. What you are seeing is the scientific/exponential format.
I am guessing that you don't want to store the data that way. You can alter the column to use a fixed format representation:
alter table ProductAmount alter amount decimal(20, 0);
This assumes that you do not want any decimal places. You can read more about decimal formats in the documentation.
I would strongly discourage you from using float unless:
You have a real floating point number (say an expected value from a statistical calculation).
You have a wide range of values (say, 0.00000001 to 1,000,000,000,000,000).
You only need a fixed number of digits of precision over a wide range of magnitudes.
Floating point numbers are generally not needed for general-purpose and business applications.
The value gets stored in a binary format, because this is what you specified by requesting FLOAT as the data type for the column.
The value that you store in the field is represented exactly, because 64-bit FLOAT uses 52 bits to represent the mantissa*. Even though you see 2.46237846387469E+15 when selecting the value back, it's only the presentation that is slightly off: the actual value stored in the database matches the data that you inserted.
But i want to store 2462378463874687 as a value in my db
You are already doing it. This is the exact value stored in the field. You just cannot see it, because querying tool of SQL Management Studio formats it using scientific notation. When you do any computations on the value, or read it back into a double field in your program, you will get back 2462378463874687.
If you would like to see the exact number in your select query in SQL Management Studio, use CONVERT:
CONVERT (VARCHAR(50), float_field, 128) -- See note below
Note 1: 128 is a deprecated format. It will work with SQL Server-2008, which is one of the tags of your question, but in versions of SQL Server 2016 and above you need to use 3 instead.
Note 2: Since the name of the column is Amount, good chances are that you are looking for a different data type. Look into decimal data types, which provide a much better fit for representing monetary amounts.
* 2462378463874687 is right on the border for exact representation, because it uses all 52 bits of mantissa.

Bigquery SUM(Float_Values) returns multiple decimal places and Scientific Notation

I am trying to calculate Total Sales at a store. I have product Price in a column called UNIT_PRICE. All the prices have 2 decimal places example: 34.54 or 19.99 etc and they are imported as type:float in the schema. (UNIT_PRICE:float)
When I perform the select Query: "SELECT CompanyName, SUM(Unit_Price) as sumValue" etc I get the following returned in the column, but only "sometimes".
2.697829165015719E7
It should be something like: 26978291.65
As I am piping this out into spreadsheets and then charting it I need it to be in the type float or at least represent a normal price format.
I have tried the following but still having issues:
Source: Tried converting original data type to BigDecimal with only 2 decimal points in the source data and then exporting to the csv for import into bigquery but same result.
Bigquery: Tried converting to a string first and then to a float and then SUM but same result. "SELECT CompanyName, SUM(Float(String(Unit_Price))) as sumValue"
Any ideas on how to deal with this?
Thanks
BigQuery uses default formatting for floating point numbers, which means that depending on the size of the number, may use scientific notation. (See the %g format specifier here)
We tried switching this, but it turns out, it is hard to get a format that makes everyone happy. %f formatting always produces decimal format, but also pads decimals to a 6 digit precision, and drops decimals beyond a certain precision.
I've filed a bug to allow an arbitrary format string conversion function in BigQuery. It would let you run SELECT FORMAT_STRING("%08d", SUM(Unit_Price)) FROM ... in order to be able to control the exact format of the output.
Do you see this in the BQ browser tool or only on your spreadsheet?
BQ float is of size of 8 bytes, so it can hold numbers >9,000,000,000,000...
I find it that sometimes when Excel opens a flat file (csv) it converts it to the format you mentioned. To verify this is the case, try to open your csv with notepad (or other flat file editor), before you try with excel.
If this is indeed the issue, you can configure the excel connector to treat this field as string instead of number. other option would be to convert it to string and concat "" to the number. this way the spreadsheet will automatically treat it as string. afterwards you can convert it back to number in the spreadsheet.
Thanks