Bigquery SUM(Float_Values) returns multiple decimal places and Scientific Notation - google-bigquery

I am trying to calculate Total Sales at a store. I have product Price in a column called UNIT_PRICE. All the prices have 2 decimal places example: 34.54 or 19.99 etc and they are imported as type:float in the schema. (UNIT_PRICE:float)
When I perform the select Query: "SELECT CompanyName, SUM(Unit_Price) as sumValue" etc I get the following returned in the column, but only "sometimes".
2.697829165015719E7
It should be something like: 26978291.65
As I am piping this out into spreadsheets and then charting it I need it to be in the type float or at least represent a normal price format.
I have tried the following but still having issues:
Source: Tried converting original data type to BigDecimal with only 2 decimal points in the source data and then exporting to the csv for import into bigquery but same result.
Bigquery: Tried converting to a string first and then to a float and then SUM but same result. "SELECT CompanyName, SUM(Float(String(Unit_Price))) as sumValue"
Any ideas on how to deal with this?
Thanks

BigQuery uses default formatting for floating point numbers, which means that depending on the size of the number, may use scientific notation. (See the %g format specifier here)
We tried switching this, but it turns out, it is hard to get a format that makes everyone happy. %f formatting always produces decimal format, but also pads decimals to a 6 digit precision, and drops decimals beyond a certain precision.
I've filed a bug to allow an arbitrary format string conversion function in BigQuery. It would let you run SELECT FORMAT_STRING("%08d", SUM(Unit_Price)) FROM ... in order to be able to control the exact format of the output.

Do you see this in the BQ browser tool or only on your spreadsheet?
BQ float is of size of 8 bytes, so it can hold numbers >9,000,000,000,000...
I find it that sometimes when Excel opens a flat file (csv) it converts it to the format you mentioned. To verify this is the case, try to open your csv with notepad (or other flat file editor), before you try with excel.
If this is indeed the issue, you can configure the excel connector to treat this field as string instead of number. other option would be to convert it to string and concat "" to the number. this way the spreadsheet will automatically treat it as string. afterwards you can convert it back to number in the spreadsheet.
Thanks

Related

SAS importing numeric column to scientific notation

I'm importing a sas7bdat file in sas studio using proc import and one of the variables in the dataset is changing to scientific notation, e.g, 1234567891011121 is showing up as 1.2345678E15
I'm fairly new to SAS and not sure what function would help retain this particular column in its original 16 digit format instead of scientific notation. This column is of numeric data type and its length is being displayed as 8. I have been through other similar posts, but could not find a solution to work with.
SAS stores all numbers as 64bit binary floating point, so using a length of 8 bytes to store the value is the right thing. You cannot use more bytes because it only takes 8 bytes to store all 64 buts. And if you used fewer bytes you would lose precision and could not store all 16 digits.
SAS uses FORMATs to control how values are printed as text. You can use the FORMAT statement to attach a format to a variable.
It looks like you are either using the BEST12. format with that variable, or you are letting SAS use its default way of displaying numbers, which in most cases will be to use the BEST12. format.
If you want the numbers to print with 16 decimal digits then just attach the 16. format to the variable instead.
Or you could use the COMMA21. format instead and the numbers will print with thousand separators so it will be easier for humans to read them.
Example code for attaching a format to variable in a data step.
data want;
set have;
format mynumber 16.;
run;

Kusto - format numbers with 1000 comma separator

I have numerical output from a Kusto/KQL query where I would like to format the output to have comma separations. I would also like to round to the nearest whole number. For instance, instead of 1000.2865 it would come out as 1,000. Is there any built-in KQL function to accomplish this? I checked the documentation but couldn't find it. I would hope to find something like this:
format_number(myNumberColumn, 0, "commaThousands")
Note: if it comes out as a string I'm fine with that, and I also realize the displayed output in Azure Data Explorer does visually format. But once I take the data outside of there I lose that formatting, like if I paste into Excel or use the query for a dashboard to show a key metric.
for rounding a number, you can use the round() function, the ceiling() function, the floor() function, or the toint() function.
formatting numbers, e.g. adding separating commas, would be best if done by the client application you're using to present the result (e.g. you mentioned Excel, which certainly has this feature for formatting numeric values in cells).

Float type storing values in format "2.46237846387469E+15"

I have a table ProductAmount with columns
Id [BIGINT]
Amount [FLOAT]
now when I pass value from my form to table it gets stored in format 2.46237846387469E+15 whereas actual value was 2462378463874687. Any ideas why this value is being converted and how to stop this?
It is not being converted. That is what the floating point representation is. What you are seeing is the scientific/exponential format.
I am guessing that you don't want to store the data that way. You can alter the column to use a fixed format representation:
alter table ProductAmount alter amount decimal(20, 0);
This assumes that you do not want any decimal places. You can read more about decimal formats in the documentation.
I would strongly discourage you from using float unless:
You have a real floating point number (say an expected value from a statistical calculation).
You have a wide range of values (say, 0.00000001 to 1,000,000,000,000,000).
You only need a fixed number of digits of precision over a wide range of magnitudes.
Floating point numbers are generally not needed for general-purpose and business applications.
The value gets stored in a binary format, because this is what you specified by requesting FLOAT as the data type for the column.
The value that you store in the field is represented exactly, because 64-bit FLOAT uses 52 bits to represent the mantissa*. Even though you see 2.46237846387469E+15 when selecting the value back, it's only the presentation that is slightly off: the actual value stored in the database matches the data that you inserted.
But i want to store 2462378463874687 as a value in my db
You are already doing it. This is the exact value stored in the field. You just cannot see it, because querying tool of SQL Management Studio formats it using scientific notation. When you do any computations on the value, or read it back into a double field in your program, you will get back 2462378463874687.
If you would like to see the exact number in your select query in SQL Management Studio, use CONVERT:
CONVERT (VARCHAR(50), float_field, 128) -- See note below
Note 1: 128 is a deprecated format. It will work with SQL Server-2008, which is one of the tags of your question, but in versions of SQL Server 2016 and above you need to use 3 instead.
Note 2: Since the name of the column is Amount, good chances are that you are looking for a different data type. Look into decimal data types, which provide a much better fit for representing monetary amounts.
* 2462378463874687 is right on the border for exact representation, because it uses all 52 bits of mantissa.

Preserve leading zeros when importing Excel into SQL

My office uses excel to prepare our data before importing it into a SQL database. However, we have been expreiencing the following error.
When the data is imported from one computer it loses all of the leading zeros. However, when it is imported from a different computer it imports perfectly.
An example of the leading zeros are that our item numbers are required to be formatted as "001, 002, 003,... 010, 011, 012,... 100, 101, 102, ect".
1) The excel file is stored on a server so there is no difference in the file.
2) If the users swap workstations the result stays with the computer, and doesn't switch with the user.
3) The data is formatted as text. It has been formatted as text both from the Data Tab and from Format Cells.
Is there a setting within excel that is specific to the computer and not the spreadsheet which will affect exporting the data? Or is there a non-excel specific setting which will cause this?
Its best to avoid the 'TEXT' format option. Confusingly, it does not force the contents of a cell to be a text data type, and it wreaks havoc when a formula references a 'TEXT' format.
To add to the previous answer (with all of the caveats about if this is a good idea), you can use the TEXT worksheet function
=TEXT(A1,"000")
to guarantee an actual text string with leading zeros if needed.
Depending on number of leading zeroes that you require, you can select your data/column in Excel, go into Excel >> Format >> Custom >> type in however many zeroes you require into the Type field (i.e. 000000000 for a 9-digit number with leading zeroes), and it will automatically preface with the correct number of leading zeroes to make the numerical string the correct length (i.e. 4000 = 00004000).
Note, this only works with numerical data, not text, but depending on the scenario it may be more useful to retain your data in numerical format - the example you gave listed numerical data only, and often retaining the numerical format is a benefit for analysis.
Not sure what the benefit of padding data before inserting it into the database would be...(takes more space, slower searching, etc.). Sounds like you're formatting it for output (?), which might be more efficiently done elsewhere.
But anyway -- here are some ideas for your SELECT (sql) statement:
RIGHT(1000 + [excel field], 3)
or another one would be
REPLICATE('0', 3 - LEN([excel field])) + [excel field]
Something you can do to the Excel field itself (before import) is prefix it with a ' (apostrophe). Notice if you type 0007 into Excel, it will change it to 7, but if you type '0007, it will keep the leading zeros.

Map a decimal column value in result set to column in csv with 2 decimal places only

In SSIS, i have a select query and one of the column result returned is a decimal value. The statement is similar to
select (2*100.0)/19
i get
10.526315
Now, since i need this to be rounded off to two decimal places, used the below format.
select FoRMAT((2*100.0)/19,'N2')
i get
10.53
Now, when i execute the sql on the editor, the expected result is seen. However, when the ssis destination is a csv file, the csv has 12 decimal places
10.530000000000
What exactly needs to be corrected so that it just shows 2 decimal places
10.53
You need to set the scale of your output, assuming you're using a flat file destination. This is under the Advanced tab.