How to convert an eight hexadecimal bytes (16 digits) into decimal value in SQL Server? - sql

We are attempting to import a SAS7BDAT file into a SQL Server database.
The only issue we're running into is that source decimal values are being read as float.
Using a command line tool named dsread, we found an option that:
Converting the IEEE floating-point numeric values in the SAS7BDAT file to their decimal representation may cause a loss of precision. To get a lossless representation of the data, use the /l flag:
sashelp> dsread /v /l prdsale
ACTUAL,PREDICT,COUNTRY,...
0x0000000000e88c40,0x0000000000908a40,CANADA...
0x0000000000388f40,0x0000000000907240,CANADA...
0x0000000000008340,0x0000000000708a40,CANADA...
0x0000000000108440,0x0000000000a88040,CANADA...
0x0000000000808440,0x0000000000308440,CANADA...
0x0000000000a08d40,0x0000000000607e40,CANADA...
...etc...
The numerics are output as eight hexadecimal bytes (16 digits) giving the internal floating-point representation, which can then be used to reconstruct the exact same value in the receiving software. Use /L to get the bytes in big-endian order
Running some tests, we can see that the decimal value -1.457263 is being represented by the value 0xcbbbea01f350f7bf when we use that /L flag. What we haven't been able to figure out is, how can we convert that hexadecimal value into a SQL Server decimal value?
We've tried many variants, including:
select CONVERT(decimal, convert(varbinary,'0xcbbbea01f350f7bf',1))
but that results in:
Msg 8115, Level 16, State 6, Line 17
Arithmetic overflow error converting varbinary to data type numeric.

If 0xcbbbea01f350f7bf = -1.457263, then it looks like that's a IEEE double-precision floating point number with a big endian byte order.
So reverse the bytes (or get it to export in little endian instead)
0xcbbbea01f350f7bf -> 0xbff750f301eabbcb
Then convert it to a float. You can use CLR or there's a TSQL function you can try here:
Unpacking a binary string with TSQL
Then convert it to a decimal.
select convert(decimal(36,17), dbo.[fnBinaryFloat2Float]( 0xbff750f301eabbcb ), 3)
Which, you can see has preserved a closer approximation to the floating point value
-1.45726299999999998
Is there a way to reconstruct the "exact same value" with no differences?
Then leave the data as float(53) which is exactly the same data type from the source, and don't convert it to decimal at all. decimal and float each store finite subsets of the rational numbers, and many numbers can be exactly represented in either system. But some float values don't have an exact match in decimal, and vice versa.

Related

GDAL version 3 and higher does not work with Mapinfo and Decimal Fields

I'm having a problem trying to convert a MapInfo file from MID/MIF format to TAB format.
This problem occurs from version GDAL 3.0.4 and higher. On version 2.1.2, everything works without problems.
I use the following command
ogr2ogr -f "MapInfo file" "test.tab" "test.mif"
Error following
ERROR 1: Cannot format 1234.1 as a 20.16 field
ERROR 3: Failed writing attributes for feature id 1 in test.tab
ERROR 1: Unable to write feature 1 from layer test.
ERROR 1: Terminating translation prematurely after failed
translation of layer test (use -skipfailures to skip errors)
Here example of MapInfo file MID/MIF format
test.mif
test.mid
Can anyone explain what is the reason for this error?
Im trying to use GDAL version 3.5, but still getting this error.
If I change the column type to Float than everything works fine.
But I can't just change the format of the existing file
Your value "1234.1" is to big.
From the documentation:
Decimal fields store single and double precision floating point values.
Width is the total number of characters allocated to the field, including the decimal point.
Precision controls the precision of the data and is the number of digits to the right of the decimal.
Your decimal definition "Decimal (20,16)" leaves only 3 digits for the integer part. Try a lesser value, i.e: 999.4 or change the decimal format to Decimal (20,15)

Representing data types e.g. Chars, Strings, Integers etc

I am a .NET Developer and I do not believe I know enough about encoding. I have read this article: http://www.joelonsoftware.com/articles/Unicode.html.
Say I declare this string:
Dim TestString As String = "1"
I believe this will be represented as a Unicode character. Say I declare this integer:
Dim TestInt As Integer = 1
How is this represented? I assume that Unicode is not used? i.e. it is only used for String and Chars? Is that correct? Therefore I believe that on a 32 bit machine 1 would simply be represented as:
00000000 0000000 0000000 00000001
Do numeric data types have byte order marks: http://en.wikipedia.org/wiki/Byte_order_mark ?
All strings in .NET are UTF-16. From the language spec:
Visual Basic .NET defines the following primitive types:
...
The Char value type, which represents a single Unicode character and
maps to System.Char...
The String reference type, which
represents a sequence of Unicode characters and maps to System.String...
Why should an integral value types like an integer be represented with Unicode in computer memory? Unicode is (citing from Wikipedia):
a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems.
So yes, it's only used for Strings and Chars.
Also note that an Integer will always be 4-byte signed integer, no matter if you use a 32 bit or 64 bit machine.
Byte order marks are an entire different topic. As already said in a comment, it's used in text file or stream.

Hexadecimal numbers vs. hexadecimal enocding (with base64 as well)

Encoding with hexadecimal numbers seems to be different from using hexadecimals to represent numbers. For example, then hex number 0x40 to me should be equal to 64, or BA_{64}, but when I put it through this hex to base64 converter, I get the output: QA== which to me is equal to some number times 64. Why is this?
Also when I check the integer value of the hex string deadbeef I get 3735928559, but when I check it other places I get: 222 173 190 239. Why is this?
Addendum: So I guess it is because it is easier to break the number into bit chunks than treat it as a whole number when encoding? That is pretty confusing to me but I guess I get it.
You may wish to read this:
http://en.wikipedia.org/wiki/Base64
In summary, base64 specifies a specific encoding, which involves using different values for letters than their ASCII encoding.
For the second part, one source is treating the entire string as a 32 bit integer, and the other is dividing it into bytes and giving the value of each byte.

Fortran 90: How to correctly read an integer among other real

I have created a Fortran 90 code to filter and convert the text output of another program in a csv form. The file contains a table with columns of various types (character, real, integer). There is a column that generally contains decimal values (probability values). BUΤ, in some rows, where the value should be decimal "1.000", the value is actually integer "1".
I use "F5.3" specifier to read this column and I have the same format statement for every row of the table. So, when the code finds "1", it reads ".001", because it does not find a decimal point.
What ways could I use to correctly (and generally) read integers among other decimals?
Could I specify "unformatted" input only for a number of "spaces"?
The data edit descriptor fw.d for floating point format specification is for input normally used with zero d (it cannot be ommited). Nonzero d is used in the rare case when the floating point data is stored as scaled integers, or you do some unit conversion from the integer values.
You could try using list-directed input: use a * instead of a format specifier. This would be for the entire read, not selected items. Or you could read the lines into a string test their contents to decide how to read them. If the sub-string has a decimal point: read (string(M:N), '(F5.3)') value. If it doesn't, use a different format, e.g., perhaps read as as F5.0.
P.S. "unformatted" is reading binary data without conversion ... it is a direct copy of the data from the file to the data item. "listed-directed" is the Fortran term for reading & converting data without using a format specification.
well here's someting new to me: f90 allows a mix of comma and space delimiters for a simple list directed read:
read(unit,*)v1,v2,v3,v4
with input
1.222 2 , 3.14 , 4
yields
1.222000 2.000000 3.140000 4.000000

Correct termiology for documentation [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
The documentation below is for a module, which has now been "decommissioned"
and I'm writing it's replacement.
Before i write the replacement I want to get my terms right.
I know the terms are wrong in the documentation - it was hacked together quickly
so i could instruct a college working on the hardware side of this project on how to use a program I'ld made.
Full documentary can be found here for any who are interested (in so much as has been written and added to our wiki), the Website may only be available to certain IPS's (depends on you ISP - university internet connections are most likely to work), and the SVN repo is private.
So there are alot of terms that are wrong.
such as.
deliminators
formatted string containing value expressions (might now be wrong but is hard to say)
What are the correct terms for these.
And what other mistakes have I made
==== formatted string containing value expressions ====
Before I start on actual programs an explanation of:
"formatted string containing value expressions" and how to encode values in them.
The ''formatted string containing value expressions'' is at the core of doing low level transmission.
We know the decimal 65, hex 41, binary 0100 0001, and the ascii character 'A' all have the same binary representation, so to tell which we are using we have a series of deliminators - numbers preceded by:
# are decimal
$ are Hex
# are binary
No deliminator, then ascii.
Putting a sign indicator after the deliminator is optional. It is required if you want to send a negative number.
You may put muliple values in the same string.
eg: "a#21#1001111$-0F"
All values in a ''formatted string containing value expressions'' must be in the range -128 to 255 (inclusive) as they must fit in 8bytes (other values will cause an error). Negative numbers have the compliment of 2 representation for their binary form.
There are some problems with ascii - characters that can't be sent (in future versions this will be fixed by giving ascii a delineator and some more code to make that deliminator work, I think).
Characters that can't be sent:
* The delineator characters: $##
* Numbers written immediately after a value that could have contained those digits:
* 0,1,2,3,4,5,6,7,8,9 for decimal
* 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f,A,B,C,D,E,F for hex
* 0,1 for binary
For a start, deliminator would probably be delimiter, although I notice your text has both delineator and deliminator in it - perhaps deliminator is a special delimiter/terminator combination :-)
However, a delimiter is usually used to separate fields and is usually present no matter what. What you have is an optional prefix which dictates the following field type. So I would probably call that a "prefix" or "type prefix" instead.
The "formatted string containing value expressions" I would just call a "value expression string" or "value string" to change it to a shorter form.
One other possible problem:
must be in the range -128 to 255 (inclusive) as they must fit in 8bytes
I think you mean 8 bits.
Try something like the following:
==== Value string encoding ====
The value string is at the core of the data used for low level
transmissions.
Within the value string the following refixes are used:
# decimal
$ Hex
# binary
No prefix - ASCII.
An optional sign may be included after the delimiter for negative numbers.
Negative numbers are represented using twos complement.
The value string may contain multiple values:
eg: "a#21#1001111$-0F"
All elements of the value string must represent an 8bit value and must
be in the range -128 to 255
When using ASCII representation the following characters that can't be sent
* The delineator characters: $## (use prefixed hex value.)
* Numbers written immediately after a value that could have
contained those digits:
* 0,1,2,3,4,5,6,7,8,9 for decimal
* 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f,A,B,C,D,E,F for hex
* 0,1 for binary