Update varbinary(MAX) field in SQLServer 2012 Lost Last 4 bits - sql

Recently I would like to do some data patching, and try to update a column of type varbinary(MAX), the update value is like this:
0xFFD8F...6DC0676
However, after update query run successfully, the value becomes:
0x0FFD8...6DC067
It seems the last 4 bits are lost, or whole value right shifting a byte...
I tried deleting entire row and run an Insert Query, same things happen!
Can anyone tell me why is this happening & how can I solve it? Thanks!
I have tried several varying length of binary, for maximum
43658 characters (Each represents 4 bits, total around 21 KB), the update query runs normally. 1 more character will make the above "bug" appears...
PS1: For a shorter length varbinary as update value, everything is okay
PS2: I can post whole binary string out if it helps, but it is really long and I am not sure if it's suitable to post here
EDITED:
Thanks for any help!
As someone suggested, the value inserted maybe of odd number of 4-bits, so there is a 0 append in front of it. Here is my update information on the value:
The value is of 43677 characters long exluding "0x", which menas Yes, it is odd
It does explain why a '0' is inserted before, but does not explain why the last character disappears...
Then I do an experiment:
I insert a even length value, with me manually add a '0' before the original value,
Now the value to be updated is
0x0FFD8F...6DC0676
which is of 43678 characters long, excluding "0x"
The result is no luck, the updated value is still
0x0FFD8...6DC067

It seems that the binary constant 0xFFD8F...6DC0676 that you used for update contains odd number of hex digits. And the SqlServer added half-byte at the beginning of the pattern so that it represent whole number of bytes.
You can see the same effect running the following simple query:
select 0x1, 0x104
This will return 0x01 and 0x0104.
The truncation may be due to some limitaions in SSMS, that can be observed in the following experiment:
declare #b varbinary(max)
set #b = 0x123456789ABCDEF0
set #b = convert(varbinary(max), replicate(#b, 65536/datalength(#b)))
select datalength(#b) DataLength, #b Data
The results returned are 65536 and 0x123456789ABCDEF0...EF0123456789ABCD, however if in SSMS I copy Data column I'm getting pattern of 43677 characters length (this is without leading 0x), which is 21838.5 bytes effectively. So it seems you should not (if you do) rely on long binary data values obtained via copy/paste in SSMS.
The reliable alternative can be using intermediate variable:
declare #data varbinary(max)
select #data = DataXXX from Table_XXX where ID = XXX
update Table_YYY set DataYYY = #data where ID = YYY

Related

Query to ignore rows which have non hex values within field

Initial situation
I have a relatively large table (ca. 0.7 Mio records) where an nvarchar field "MediaID" contains largely media IDs in proper hexadecimal notation (as they should).
Within my "sequential" query (each query depends on the output of the query before, this is all in pure T-SQL) I have to convert these hexadecimal values into decimal bigint values in order to do further calculations and filtering on these calculated values for the subsequent queries.
--> So far, no problem. The "sequential" query works fine.
Problem
Unfortunately, some of these Media IDs do contain non-hex characters - most probably because there was some typing errors by the people which have added them or through import errors from the previous business system.
Because of these non-hex chars, the whole query fails (of course) because the conversion hits an error.
For my current purpose, such rows must be skipped/ignored as they are clearly wrong and cannot be used (there are no medias / data carriers in use with the current business system which can have non-hex character IDs).
Manual editing of the data is not an option as there are too many errors and it is not clear with what the data must be replaced.
Challenge
To create a query which only returns records which have valid hex values within the media ID field.
(Unfortunately, my SQL skills are not enough to create the above query. Your help is highly appreciated.)
The relevant section of the larger query looks like this (xxxx is where your help comes in :-))
select
pureMediaID
, mediaID
, CUSTOMERID
,CONTRACT_CUSTOMERID
from
(
select concat('0x', Replace(Ltrim(Replace(mediaID, '0', ' ')), ' ', '0')) AS pureMediaID
--, CUSTOMERID
, *
from M_T_CONTRACT_CUSTOMERS
where mediaID is not null
and mediaID like '0%'
and xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
) as inner1
EDIT: As per request I have added here some good and some bad data:
Good:
4335463357
4335459809
1426427996
4335463509
4335515039
4335465134
4427370396
4335415661
4427369036
4335419089
004BB03433
004e7cf9c6
00BD23133
00EE13D8C1
00CCB5522C
00C46522C
00dbbe3433
Bad:
4564589+
AB6B8BFC.8
7B498DFCnm
DB218DFChb
d<tgfh8CFC
CB9E8AFCzj
B458DFCjhl
rytzju8DFC
BFCtdsjshj
DB9888FCgf
9BC08CFCyx
EB198DFCzj
4B628CFChj
7B2B8DFCgg
After I did upgrade the compatibility level of the SQL instance to SQL2016 (it was below 2012 before) I could use try_convert with same syntax as the original convert function as donPablo has pointed out. With that the query could run fully through and every MediaID which is not a correct hex value gets nicely converted into a null value - really, really nice.
Exactly what I needed.
Unfortunately, the solution of ALICE... didn't work out for me as this was also (strangely) returning records which had the "+" character within them.
Edit: The added comment of Alice... where you create a calculated field like this:
CASE WHEN "KEY" LIKE '%[^0-9A-F]%' THEN 0 ELSE 1 end as xyz
and then filter in the next query like this:
where xyz = 1
works also with SQL Instances with compatibility level < SQL 2012.
Great addition for people which still have to work with older SQL instances.
An option (although not ideal in terms of performance) is to check the characters in the MediaID through a case statement and regular expression
Hexadecimals cannot contain characters other than A-F and numbers between 0 and 9
CASE WHEN MediaID LIKE '%[0-9A-F]%' THEN 1 ELSE 0 END
I would recommend writing a function that can be used to evaluate MediaID first and checks if it is hexadecimal and then running the query for conversion

Additional 0 in varbinary insert in SSMS

I have a problem when I am trying to move a varbinary(max) field from one DB to another.
If I insert like this:
0xD0CF11E0A1B11AE10000000
It results the beginning with an additional '0':
0x0D0CF11E0A1B11AE10000000
And I cannot get rid of this. I've tried many tools, like SSMS export tool or BCP, but without any success. And it would be better fro me to solve it in a script anyway.
And don't have much kowledge about varbinary (a program generates it), my only goal is to copy it:)
0xD0CF11E0A1B11AE10000000
This value contains an odd number of characters. Varbinary stores bytes. Each byte is represented by exactly two hexadecimal characters. You're either missing a character, or your not storing bytes.
Here, SQL Server is guessing that the most significant digit is a zero, which would not change the numeric value of the string. For example:
select 0xD0C "value"
,cast(0xD0C as int) "as_integer"
,cast(0x0D0C as int) "leading_zero"
,cast(0xD0C0 as int) "trailing_zero"
value 3_char leading_zero trailing_zero
---------- --------- --------------- ----------------
0d0c 3340 3340 53440
Or:
select 1 "test"
where 0xD0C = 0x0D0C
test
-------
1
It's just a difference of SQL Server assuming that varbinary always represents bytes.

Numeric Data Type - Storage

According to Microsoft Site a data with type Numeric(10,2) - 10 means precision should have 9 bytes.
But when I'm doing this:
DECLARE #var as numeric(10,0) = 2147483649
SELECT #var, DATALENGTH(#var)
DATALENGTH(#var) is returning 5 bytes instead of 10. Can someone explain me why?
The documentation specifies:
Maximum storage sizes vary, based on the precision.
The storage is not constant for a given precision. The actual storage depends on the value.
As a note, this has nothing to do with integerness. The following also returns 5:
declare #var numberic(11, 1) = 214483649.8
In actual fact, SQL Server seems to use the amount of storage needed for the value, not for the maximum value of the type. You can readily see this by changing the "10" to "20" and noting that the data length does not change.
EDIT:
You can see the dependence on the value if you run:
declare #a numeric(20, 1) = '123.1';
declare #b numeric(20, 1) = '1234567890123456789.0';
select datalength(#a), datalength(#b);
The two lengths are not the same.
The other answer, by #GordonLinoff is wrong, or at least misleading.
Numeric is not stored with a variable number of bytes, but with a fixed size for a specific precision.
Trying this on SQL Server 2017 gave the same results you got.
The documentation you linked to originally, for numeric, is correct about how many bytes it takes to store a numeric of varying precisions.
This storage requirement is based only on the precision of the numeric column. In other words, that's how many bytes of storage are used. It is not a maximum that depends on the value in that row.
All rows use the same number of bytes for that column.
The key to this variation is the documentation for DATALENGTH says this function
Returns the number of bytes used to represent any expression.
It appears that DATALENGTH goes not mean 'represent' as in 'represent' on disk, but rather 'represent' in memory.
The other documenation regarding numeric is talking about the on-disk storage of numeric.
This is probably because DATALENGTH is intended primarily for var* types or the other BLOB types.
So although a numeric(20,1) requires 13 bytes of storage, depending on the value, SQL Server can represent it in a smaller number of bytes when in memory, which is when DATALENGTH evaluates it.
As I pointed out in my other comment, although numeric has different sizes, it a fixed size data type, because for a specific column in a specific table, every values takes up the same amount of storage.
Roughly, a SQL Server row has 4 parts:
4 byte header
Fixed size data
Offsets into variable size data
Variable size data
Numerics & other fixed size types are stored in 2, var* are stored in 4, with lengths in 3.
This script displays the metadata for a table with some fixed & variable columns.
declare #a numeric(20, 1) = '123.1';
declare #b numeric(20, 1) = '1234567890123456789.0';
select datalength(#a) union select datalength(#b);
create table #numeric(num1 numeric(20,1), text1 varchar(10), char2 char(6));
insert into #numeric(num1, text1, char2) values ('123.1', 'hello', 'first'), ('1234567890123456789.0', 'there', '2nd');
select datalength(num1) from #numeric;
select
t.name as table_name,
c.name as column_name,
pc.partition_column_id,
pc.max_inrow_length,
pc.max_length,
pc.precision,
pc.scale,
pc.collation_name,
pc.leaf_offset
from tempdb.sys.tables as t
join tempdb.sys.partitions as p
on(t.object_id=p.object_id)
join tempdb.sys.system_internals_partition_columns as pc
on(pc.partition_id=p.partition_id)
join tempdb.sys.columns as c
on((c.object_id=p.object_id)and(c.column_id=pc.partition_column_id))
where (t.object_id=object_id('tempdb..#numeric'));
drop table #numeric;
Notice the leaf_offset column. This indicates the starting position of the value in the raw binary data.
The first column starts immediately after the 4 byte header.
The second fixed column starts 13 bytes later, as per the SQL documentation.
The varchar column has an offset of -1, indicating it is a variable length column & it's position in the byte array isn't fixed.
In this case it could be fixed since there's only 1 var column, but an alter table statement could add another column & shift things.
If you want to research further, the best source is a book called SQL Server Internals, by Kalen Delaney. She was part of the team that wrote SQL Server.

Datatype on output column not reflecting change to source item in SSIS

Take a sqlCommand in a source item:
SELECT SUBSTRING(MKGDiagnoses.ICD9Code, 2,7) AS ICD9Code
FROM ...
The column ICD9Code outputs a string with a length of 7 positions.
Now change to:
SELECT SUBSTRING(MKGDiagnoses.ICD9Code, 2,6) AS ICD9Code
FROM ...
So datatype of the column is changed to a string of 6 positions.
A change like this is never reflected in the datatype of the output columns, while the datatype of the external column did change to [DT_STR] with length 6.
Is this expected behaviour? Is this behaviour overwritable?
Expected? Sort of? I can't find an earlier answer from me on this but the basics are that the metadata gets set when the item is first created. This is an expensive operation so the designer tries to limit the number of times it must referesh the metadata and so changing the precision of a numeric or increasing/decreasing the length of a string generally isn't a "big enough" change to force a metadata refresh request.
If I changed a single column as you're demonstrating, my lazy hack would be to rename the column in my source query (and then name it back).
Original
SELECT SUBSTRING(MKGDiagnoses.ICD9Code, 2,7) AS ICD9Code
FROM ...
Temporary
SELECT SUBSTRING(MKGDiagnoses.ICD9Code, 2,6) AS ICD9CodeShort
FROM ...
Revert
SELECT SUBSTRING(MKGDiagnoses.ICD9Code, 2,6) AS ICD9Code
FROM ...
If you have lots of columns to fix, I usually make the query SELECT 1 AS foo and then re-open the editor and use my real source query.

How is input to the MySQL function md5 handled?

I'm having problems understanding how input to the md5 function in MySQL 4.1.22 is handled. Basically I'm not able to recreate the md5sum of a specific value combination for comparison. I guess it has something to do with the format of the input data.
I have set up a table with two columns of type double(direction and elevation) + a third for storing a md5 sum.
With a setup script i add data to the direction and elevation fields + create a checksum using the following syntax:
insert into polygons (
direction,
elevation,
md5sum
)
values (
(select radians(20.0)),
(select radians(30.0)),
( md5( (select radians(20.0)) + (select radians(20.0)) ) )
)
which ends up as: 0.349065850398866, 0.523598775598299, '0c2cd2c2a9fe40305c4e3bd991812df5'
Later I compare the stored md5sum with a newly calculated, the newly calculated is created using md5('0.349065850398866' + '0.523598775598299') and I get the following checksum:
'8c958bcf912664d6d27c50f1034bdf34'
If I modify the last decimal in the passed "string" from a 9 to a 8 0.523598775598298 i get the same checksum as previously stored, '0c2cd2c2a9fe40305c4e3bd991812df5'. Any value of the last decimal from 8 down to 0 gives the same checksum.
Using BINARY, (md5( (select BINARY(radians(20.0))) + (select BINARY(radians(20.0))) ) in the setup script creates the same checksum as the my original "runtime calculation"
Worth mentioning is that the original method works for all other rows I have (55)
I guess I'm using the function in a somewhat strange way, but I'm not sure of a better way, so in order to find a better way I feel I need to understand why the current is failing.
The two numbers you are adding are stored in binary form, but displayed in decimal form. There is no guarantee that you will get exactly the same number back if you give the decimal form back to the machine.
In this case, this causes the addition to give a slightly different result, which gives an entirely different MD5 sum:
mysql> select radians(20.0) + radians(30.0), '0.349065850398866' + '0.523598775598299';
+-------------------------------+-------------------------------------------+
| radians(20.0) + radians(30.0) | '0.349065850398866' + '0.523598775598299' |
+-------------------------------+-------------------------------------------+
| 0.87266462599716 | 0.87266462599717 |
+-------------------------------+-------------------------------------------+
1 row in set (0.00 sec)
If you want to consistently get the same result, you need to store the results of radians(20.0) and radians(30.0) in variables somewhere, never relying on their printed representations.
The output of radians(20.0) is computed with many more digits than are shown in the printed output. When the result is passed to the md5 function, the full non-truncated value is used, whereas the printed value only will show a limited number of digits. Thus, it is not the same value being passed into the md5 function in the two cases.