How is input to the MySQL function md5 handled?

How is input to the MySQL function md5 handled? - sql

I'm having problems understanding how input to the md5 function in MySQL 4.1.22 is handled. Basically I'm not able to recreate the md5sum of a specific value combination for comparison. I guess it has something to do with the format of the input data.
I have set up a table with two columns of type double(direction and elevation) + a third for storing a md5 sum.
With a setup script i add data to the direction and elevation fields + create a checksum using the following syntax:
insert into polygons (
direction,
elevation,
md5sum
)
values (
(select radians(20.0)),
(select radians(30.0)),
( md5( (select radians(20.0)) + (select radians(20.0)) ) )
)
which ends up as: 0.349065850398866, 0.523598775598299, '0c2cd2c2a9fe40305c4e3bd991812df5'
Later I compare the stored md5sum with a newly calculated, the newly calculated is created using md5('0.349065850398866' + '0.523598775598299') and I get the following checksum:
'8c958bcf912664d6d27c50f1034bdf34'
If I modify the last decimal in the passed "string" from a 9 to a 8 0.523598775598298 i get the same checksum as previously stored, '0c2cd2c2a9fe40305c4e3bd991812df5'. Any value of the last decimal from 8 down to 0 gives the same checksum.
Using BINARY, (md5( (select BINARY(radians(20.0))) + (select BINARY(radians(20.0))) ) in the setup script creates the same checksum as the my original "runtime calculation"
Worth mentioning is that the original method works for all other rows I have (55)
I guess I'm using the function in a somewhat strange way, but I'm not sure of a better way, so in order to find a better way I feel I need to understand why the current is failing.

The two numbers you are adding are stored in binary form, but displayed in decimal form. There is no guarantee that you will get exactly the same number back if you give the decimal form back to the machine.
In this case, this causes the addition to give a slightly different result, which gives an entirely different MD5 sum:
mysql> select radians(20.0) + radians(30.0), '0.349065850398866' + '0.523598775598299';
+-------------------------------+-------------------------------------------+
| radians(20.0) + radians(30.0) | '0.349065850398866' + '0.523598775598299' |
+-------------------------------+-------------------------------------------+
| 0.87266462599716 | 0.87266462599717 |
+-------------------------------+-------------------------------------------+
1 row in set (0.00 sec)
If you want to consistently get the same result, you need to store the results of radians(20.0) and radians(30.0) in variables somewhere, never relying on their printed representations.

The output of radians(20.0) is computed with many more digits than are shown in the printed output. When the result is passed to the md5 function, the full non-truncated value is used, whereas the printed value only will show a limited number of digits. Thus, it is not the same value being passed into the md5 function in the two cases.

Related

Query to ignore rows which have non hex values within field

Initial situation
I have a relatively large table (ca. 0.7 Mio records) where an nvarchar field "MediaID" contains largely media IDs in proper hexadecimal notation (as they should).
Within my "sequential" query (each query depends on the output of the query before, this is all in pure T-SQL) I have to convert these hexadecimal values into decimal bigint values in order to do further calculations and filtering on these calculated values for the subsequent queries.
--> So far, no problem. The "sequential" query works fine.
Problem
Unfortunately, some of these Media IDs do contain non-hex characters - most probably because there was some typing errors by the people which have added them or through import errors from the previous business system.
Because of these non-hex chars, the whole query fails (of course) because the conversion hits an error.
For my current purpose, such rows must be skipped/ignored as they are clearly wrong and cannot be used (there are no medias / data carriers in use with the current business system which can have non-hex character IDs).
Manual editing of the data is not an option as there are too many errors and it is not clear with what the data must be replaced.
Challenge
To create a query which only returns records which have valid hex values within the media ID field.
(Unfortunately, my SQL skills are not enough to create the above query. Your help is highly appreciated.)
The relevant section of the larger query looks like this (xxxx is where your help comes in :-))
select
pureMediaID
, mediaID
, CUSTOMERID
,CONTRACT_CUSTOMERID
from
(
select concat('0x', Replace(Ltrim(Replace(mediaID, '0', ' ')), ' ', '0')) AS pureMediaID
--, CUSTOMERID
, *
from M_T_CONTRACT_CUSTOMERS
where mediaID is not null
and mediaID like '0%'
and xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
) as inner1
EDIT: As per request I have added here some good and some bad data:
Good:
4335463357
4335459809
1426427996
4335463509
4335515039
4335465134
4427370396
4335415661
4427369036
4335419089
004BB03433
004e7cf9c6
00BD23133
00EE13D8C1
00CCB5522C
00C46522C
00dbbe3433
Bad:
4564589+
AB6B8BFC.8
7B498DFCnm
DB218DFChb
d<tgfh8CFC
CB9E8AFCzj
B458DFCjhl
rytzju8DFC
BFCtdsjshj
DB9888FCgf
9BC08CFCyx
EB198DFCzj
4B628CFChj
7B2B8DFCgg

After I did upgrade the compatibility level of the SQL instance to SQL2016 (it was below 2012 before) I could use try_convert with same syntax as the original convert function as donPablo has pointed out. With that the query could run fully through and every MediaID which is not a correct hex value gets nicely converted into a null value - really, really nice.
Exactly what I needed.
Unfortunately, the solution of ALICE... didn't work out for me as this was also (strangely) returning records which had the "+" character within them.
Edit: The added comment of Alice... where you create a calculated field like this:
CASE WHEN "KEY" LIKE '%[^0-9A-F]%' THEN 0 ELSE 1 end as xyz
and then filter in the next query like this:
where xyz = 1
works also with SQL Instances with compatibility level < SQL 2012.
Great addition for people which still have to work with older SQL instances.

An option (although not ideal in terms of performance) is to check the characters in the MediaID through a case statement and regular expression
Hexadecimals cannot contain characters other than A-F and numbers between 0 and 9
CASE WHEN MediaID LIKE '%[0-9A-F]%' THEN 1 ELSE 0 END
I would recommend writing a function that can be used to evaluate MediaID first and checks if it is hexadecimal and then running the query for conversion

How to find float rounding errors in SQL server

I've narrowed down a data issue on a legacy SQL Server 2008 database.
The column is a 'float'. SSMS shows four of the records as '0.04445' but when i query for all records that match the first value, only 3 of the four are returned. The last record is somehow different, i suspect it is off by 0.0000000001 or something and the SMSS GUI is rounding it for display(?). Using the '<' operator has similar results ('where my_column < 0.04445' returns three of the four) This is causing some catastrophic calculation errors in the calling app.
I tried casting it to a decimal ('SELECT CAST(my_column as DECIMAL(38,20)) FROM...') but all four records just come back 0.044450000000000000000000000000
I suspect that there are many other similar errors in this same column, as the data has been entered in various ways over the years.
Is there any way to see this column in its full value/precision/mantissa, rather than the rounded value?
I can't change the schema or the app.
Update - using the 'Edit Top 200 Rows' feature, I can see that about three quarters of them are 0.044449999999999996 and the other quarter are ecxactly 0.04445. But I can't get it to display that level of accuracy in a regular query result

You can use CONVERT(VARBINARY(8), my_column) to the number in its original form. What you get should be 0x3FA6C226809D4952 or 0x3FA6C226809D4951. And what number that really is? 3FA6C226809D4951 is binary
0 01111111010 0110110000100010011010000000100111010100100101010001
0 => number is positive
01111111010 => 1018-1023 = -5 is exponent (so we get 2^-5)
1.0110110000100010011010000000100111010100100101010001 => 6405920109971793*2^-52
so the 0x3FA6C226809D4951 is exactly 6405920109971793*2^-57, which is 0.044449999999999996458388551445750636048614978790283203125
and 0x3FA6C226809D4952 is exactly 6405920109971794*2^-57, which is 0.04445000000000000339728245535297901369631290435791015625

So, your question is really about SSMS, not about your application or SQL Server itself, right? You want to see the actual float values in SSMS without rounding, right?
By design SSMS rounds float during display. For example, see this answer.
But, you can see the actual value that is stored in the column if you convert it to a string explicitly using CONVERT function.
float and real styles
For a float or real expression, style can have
one of the values shown in the following table. Other values are
processed as 0.
0 (default) A maximum of 6 digits. Use in scientific notation, when appropriate.
1 Always 8 digits. Always use in scientific notation.
2 Always 16 digits. Always use in scientific notation.
3 Always 17 digits. Use for lossless conversion.
With this style, every distinct float or real value is guaranteed to
convert to a distinct character string.
It looks like style 3 is just what you need:
convert(varchar(30), my_column, 3)
Here is my test:
DECLARE #v1 float = 0.044449999999999996e0;
DECLARE #v2 float = 0.044445e0;
SELECT #v1, #v2, convert(varchar(30), #v1, 3), convert(varchar(30), #v2, 3)
Result that I see in SSMS:
+------------------+------------------+-------------------------+-------------------------+
| (No column name) | (No column name) | (No column name) | (No column name) |
+------------------+------------------+-------------------------+-------------------------+
| 0.04445 | 0.044445 | 4.4449999999999996e-002 | 4.4444999999999998e-002 |
+------------------+------------------+-------------------------+-------------------------+

Why do I get different results depending on the function I use? (SQL Server)

I've been tasked with creating a report for my company. The report is generated from the results returned by the Stored Procedure spGenerateReport, which has multiple filters.
Inside the SP, this is how the filter is expected to work:
SELECT * FROM MyTable WHERE column1 IN (
'filters', 'for', 'this', 'report'
)
Entering the code above yields ~30000 rows in 9s. However, I want to be able to change my SP's filter by passing it a single argument (since I may use 1 or 2 or n filters), like so:
spGenerateReport 'Filters,for,this,report'
For this I have the User-Created Function fnSplitString (yes, I do know that there is a STRING_SPLIT function but I can't use it due to a lower compatibility level of my database) which splits a single string into a table, like so:
SELECT splitData FROM fnSplitString('Filters,for,this,report')
Returns:
splitData
------
Filters
for
this
report
Thus the final code in my SP is:
SELECT * FROM MyTable WHERE column1 IN (
SELECT * FROM fnSplitString('Filters,for,this,report')
)
However, this instead yields ~10000 rows in 60s. The time taken to complete this SP is weird but isn't too much of a problem, however nearly a quarter of my rows disappearing into the void certainly is. The results only have rows from the first couple filters (for example, 'Filters' and 'for'; if I change the order of the arguments (e.g.: fnSplitString('report,for,Filters,this')), I get a different number of rows, and only from filters 'report', 'for', 'Filters'! I don't understand why using the function returns different results than those obtained when using the literal strings. Is there some inside gimmick that I'm not aware of?
PS - I'm sorry in advance for being bad at explaining myself, and for any grammar mistakes

You should definitely be getting the same results with both techniques. Something is wrong.
You havent posted the fnSplitString code but I suspect fnSplitString is not outputting the last string in the list, or maybe the last string in the list is being truncated before it reaches fnSplitString so that no matches are found.
e.g. if the parameter going into your spGenerateReport stored procedure is varchar(20) then what will reach the function is 'Filters,for,this,rep' with the last bit truncated.
SSRS, for example, will truncate strings that are being passed into an SP instead of warning you with an error message

Additional 0 in varbinary insert in SSMS

I have a problem when I am trying to move a varbinary(max) field from one DB to another.
If I insert like this:
0xD0CF11E0A1B11AE10000000
It results the beginning with an additional '0':
0x0D0CF11E0A1B11AE10000000
And I cannot get rid of this. I've tried many tools, like SSMS export tool or BCP, but without any success. And it would be better fro me to solve it in a script anyway.
And don't have much kowledge about varbinary (a program generates it), my only goal is to copy it:)

0xD0CF11E0A1B11AE10000000
This value contains an odd number of characters. Varbinary stores bytes. Each byte is represented by exactly two hexadecimal characters. You're either missing a character, or your not storing bytes.
Here, SQL Server is guessing that the most significant digit is a zero, which would not change the numeric value of the string. For example:
select 0xD0C "value"
,cast(0xD0C as int) "as_integer"
,cast(0x0D0C as int) "leading_zero"
,cast(0xD0C0 as int) "trailing_zero"
value 3_char leading_zero trailing_zero
---------- --------- --------------- ----------------
0d0c 3340 3340 53440
Or:
select 1 "test"
where 0xD0C = 0x0D0C
test
-------
1
It's just a difference of SQL Server assuming that varbinary always represents bytes.

Update varbinary(MAX) field in SQLServer 2012 Lost Last 4 bits

Recently I would like to do some data patching, and try to update a column of type varbinary(MAX), the update value is like this:
0xFFD8F...6DC0676
However, after update query run successfully, the value becomes:
0x0FFD8...6DC067
It seems the last 4 bits are lost, or whole value right shifting a byte...
I tried deleting entire row and run an Insert Query, same things happen!
Can anyone tell me why is this happening & how can I solve it? Thanks!
I have tried several varying length of binary, for maximum
43658 characters (Each represents 4 bits, total around 21 KB), the update query runs normally. 1 more character will make the above "bug" appears...
PS1: For a shorter length varbinary as update value, everything is okay
PS2: I can post whole binary string out if it helps, but it is really long and I am not sure if it's suitable to post here
EDITED:
Thanks for any help!
As someone suggested, the value inserted maybe of odd number of 4-bits, so there is a 0 append in front of it. Here is my update information on the value:
The value is of 43677 characters long exluding "0x", which menas Yes, it is odd
It does explain why a '0' is inserted before, but does not explain why the last character disappears...
Then I do an experiment:
I insert a even length value, with me manually add a '0' before the original value,
Now the value to be updated is
0x0FFD8F...6DC0676
which is of 43678 characters long, excluding "0x"
The result is no luck, the updated value is still
0x0FFD8...6DC067

It seems that the binary constant 0xFFD8F...6DC0676 that you used for update contains odd number of hex digits. And the SqlServer added half-byte at the beginning of the pattern so that it represent whole number of bytes.
You can see the same effect running the following simple query:
select 0x1, 0x104
This will return 0x01 and 0x0104.
The truncation may be due to some limitaions in SSMS, that can be observed in the following experiment:
declare #b varbinary(max)
set #b = 0x123456789ABCDEF0
set #b = convert(varbinary(max), replicate(#b, 65536/datalength(#b)))
select datalength(#b) DataLength, #b Data
The results returned are 65536 and 0x123456789ABCDEF0...EF0123456789ABCD, however if in SSMS I copy Data column I'm getting pattern of 43677 characters length (this is without leading 0x), which is 21838.5 bytes effectively. So it seems you should not (if you do) rely on long binary data values obtained via copy/paste in SSMS.
The reliable alternative can be using intermediate variable:
declare #data varbinary(max)
select #data = DataXXX from Table_XXX where ID = XXX
update Table_YYY set DataYYY = #data where ID = YYY

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How is input to the MySQL function md5 handled? - sql

Related

Query to ignore rows which have non hex values within field

How to find float rounding errors in SQL server

Why do I get different results depending on the function I use? (SQL Server)

Additional 0 in varbinary insert in SSMS

Update varbinary(MAX) field in SQLServer 2012 Lost Last 4 bits

Categories

Resources