How to I convert bytes to float in big query? - google-bigquery

I am working with data stored as a very long base64 encoded string in Big Query and trying to substring out floating point numbers which are stored as IEE754 format. The best I've been able to achieve so far is a hex representation of the floating point number, is there any way to turn the bytes or hex value into a float?
I have been able to get a hex representation of the floating point number as follows,
to_hex(substring(from_base64(my_string),start_pos,length))

Related

How to find the simplest human-readable float string which would yield the same bytes when converted back to float?

For most numbers, we know there will be some precision error with any floating point value. For a 32-bit float, that works out the be roughly 6 significant digits which will be accurate before you can expect to start seeing incorrect values.
I'm trying to store a human-readable value which can be read in and recreate a bit-accurate recreation of the serialized value.
For example, the value 555.5555 is stored as 555.55548095703125; but when I serialize 555.55548095703125, I could theoretically serialize it as anything in the range (555.5554504395, 555.555511475) (exclusive) and still get the same byte pattern. (Actually, probably that's not the exact range, I just don't know that there's value in calculating it more accurately at the moment.)
What I'd like is to find the most human-readable string representation for the value -- which I imagine would be the fewest digits -- which will be deserialized as the same IEEE float.
This is exactly a problem which was initially solved in 1990 with an algorithm the creators called "Dragon": https://dl.acm.org/citation.cfm?id=93559
There is a more modern technique from last year which is notably faster called "Ryu" (Japanese for "dragon"): https://dl.acm.org/citation.cfm?id=3192369
The GitHub for the library is here: https://github.com/ulfjack/ryu
According to their readme:
Ryu generates the shortest decimal representation of a floating point
number that maintains round-trip safety. That is, a correct parser can
recover the exact original number. For example, consider the binary
64-bit floating point number 00111110100110011001100110011010. The
stored value is exactly 0.300000011920928955078125. However, this
floating point number is also the closest number to the decimal number
0.3, so that is what Ryu outputs.

Why when converting SQL Real to Numeric does the scale slightly increase?

I'm storing a value (0.15) as a Real datatype in a Quantity field in SQL.
Just playing around, when I cast as numeric, there are some very slight changes to scale.
I'm unsure why this occurs, and why these particular numbers?
select CAST(Quantity AS numeric(18,18)) -- Quantity being 0.15
returns
0.150000005960464480
Real and float are approximate numerics, not exact ones. If you need exact ones, use DECIMAL.
The benefit of the estimated ones is that they allow storing very large numbers using fewer storage bytes.
https://learn.microsoft.com/en-us/sql/t-sql/data-types/float-and-real-transact-sql?view=sql-server-2017
PS:Numeric and decimal are synonymous.
PS2: See Eric's Postpischil clarification comment below:
"Float and real represent a number as a significand multiplied by a power of two. decimal represents a number as a significand multiplied by a power of ten. Both means of representation are incapable of representing all real numbers, and both means of representation are subject to rounding errors. As I wrote, dividing 1 by 3 in a decimal format will have a rounding error"

Number Formatting and storing

I have Big numpy array with numbers like 1.01594734e+09
I just want this data as integers or rounded off till 5 decimals in case of 1.01594734e+03
or something like that
You need to choose what you want. I assume you want to make the array smaller.
If you want to convert array a into integers, then:
a_int = a.astype('int')
However, keep in mind that this does not save any storage space as int is an 8-octet (64-bit) integer, and float is an 8-octet float.
If you know you have integer data which is limited in size, you may specify the storage format to be something shorter:
a_int = a.astype('int32')
If you have pure integer data which fits into the destination type, there is no loss of precision in this conversion.
On the other hand - depending on your data - you may have equally good results by using 4-octet (32-bit) floats:
a_shortfloat = a.astype('float32')
This conversion causes some loss of precision depending on the data.
The second alternative you suggest is to round a number into a given number of decimals, there are two quite different possibilities.
Simple rounding to 5 decimals:
a_rounded = a.round(decimals=5)
This, however, does not save any storage space, the numbers are only rounded (and they are not accurate even after that due to the limitations of the floating point representation).
Another possibility is to use a fixed point notation:
a_fixedpoint = (a * 100000 + .5).astype('int32')
With this representation your example number 1.01594734e+03 will become 101 594 734. Whether or not this is a useful representation depends on the application. Sometimes fixed-point numbers are very useful, but if your numbers have a wide dynamic range (e.g. from 1e-5 to 1e5), then floating point numbers are the correct way of handling them.

Proper Data Type in SQL Server to store Scientific Notation value? (Ex. 10^3)

Say I have test results values for a lab procedure that come in as 103. What would be the best way to store this in SQL Server? I would think since this is numerical data it would be improper to just store it as string text and then program around calculating the data value from the string.
If you want to use your data in numeric calculations, it is probably best to represent your data using once of SQL servers native numeric data type. Since you show scientific notation, it is likely you will want to use either REAL or FLOAT.
Real is basically 7 decimal digits of precision and float has 15 digits of precision (at least this is how they are normally used). You can actually specify reduced precision for FLOAT, but in practice most people just use REAL in that case. REAL takes 4 bytes of storage, and FLOAT requires 8 bytes.
The other numeric types are for fixed decimal point arithmetic.
Numbers in scientific notation like this have three pieces of information:
The significand
The precision of the significand
The exponent of 10
Presuming we want to keep all this information as exact as possible, it may be best to store these in three non-floating point columns (floating-point values are inexact):
DECIMAL significand
INT precision (# of decimal places)
INT exponent
The downside to the approach of separating these parts out, of course, is that you'll have to put the values back together when doing calculations -- but by doing that you'll know the correct number of significant figures for the result. Storing these three parts will also take up 25 bytes per value (17 for the DECIMAL, and 4 each for the two INTs), which may be a concern if you're storing a very large quantity of values.
Update per explanatory comments:
Given that your goal is to store an exponent from 1-8, you really only need to store the exponent, since you know the mantissa is always 10. Therefore, if your value is always going to be a whole number, you can just use a single INT column; if it will have decimal places, you can use a FLOAT or REAL per Gary Walker, or use a DECIMAL to store a precise decimal to a specified number of places.
If you specify a DECIMAL, you can provide two arguments in the column type; the first is the total number of digits to be stored, while the second is the number of digits to the right of the decimal point. So if your values are going to be accurate to the tenths place, you might create a column of DECIMAL(2,1). SQL Server MSDN documentation: DECIMAL and NUMERIC types

Convert negative decimal to binary in T-SQL

I have tried to find information and how.
But it did not contain any information that would help.
With T-SQL, I want to convert negative decimal to binary
and convert it back.
Sample value: -9223372036854775543
I try in convert with Calculater this value to Bin result is ...
1000000000000000000000000000000000000000000000000000000100001001
and Convert back to Dec. It's OK.
How i can Convert like this with T-SQL(SQL2008) Script/Function ?
Long time to find information for how to.
Anyone who knows about this, please help.
There is no build in functionality.
for INT and BIGINT you can use CONVERT(VARCHAR(100),CAST(3 AS VARBINARY(100)),2) to get the hex representation as a string. then you can do a simple search replace as every hex digit represents exactly 4 binary digits. However, with values outside of the BIGINT range there is no standard as to how they are represented internally. You might get the right result or not and that behavior might even change between versions.
There is also no standard as to how negative numbers are represented. Most implementations of integers use the two's-complement representation. In that representation the top most bit indicates the sign of the number. How many bits you have is a metter of convention and fully dependent on your environment.
In mathematics -3 woud be -11 in binary and not 11111101.
To solve your problem you can either use a CLR function or you go through your number the old fashioned way:
Is it odd? -> output a 1
Is it even? -> output a 0
integer divide by 2
repeat until the value is 0
This will give you the digits in opposite order, so you have to flip the result. To get the two's-complement representation of a negative number n calculate 1-n, convert the result to binary using the above algorithm but with reversed digits (0 instead of 1 and vice versa). After flipping the digits into the right order prepend with enough 1s to fill your "box".