Best way to compress xml text column using SQL? - sql

Using Microsoft SQL Server 2019.
I have two columns, one text representing some xml, another varbinary(max) representing already compressed xml, that I need to compress.
Please assume I cannot change the source data, but conversions can be made as necessary in the code.
I'd like to compress the text column, and initially it works fine, but if I try to save it into a temp table to be used further along in the process I get weird characters like ‹ or tŠÌK'À3û€Í‚;jw. Again, the first temp table I make stores it just fine, I can select the initial table and it displays compressed correctly. But if I need to pull it into a secondary temp table or variable from there it turns into a mess.
I've tried converting into several different formats, converting later in the process, and bringing in the source data for the column at the very last stage, but my end goal is to populate a variable that will be converted into JSON, and it always ends up weird there as well. i just need the compressed version of the columns do display properly when viewing the json variable I've made.
Any suggestions on how to tackle this?

Collation issue?
This smells of collation issue. tempdb is actually its own database with its own default collation and other settings.
In one database with default CollationA you call COMPRESS(NvarcharData) and that produces some VARBINARY.
In other database (tempdb) with default CollationB you call CONVERT(NVARCHAR(MAX), DECOMPRESS(CompressedData)). Now, what happens under the hood is:
CompressedData gets decompressed into VARBINARY representing NvarcharData in CollationA
that VARBINARY is converted to NVARCHAR assuming the binary data represents NVARCHAR data in CollationB, which is not true!
Try to be more explicit (collation, data type) with conversions between XML, VARBINARY and (N)VARCHAR.
Double compression?
I have also noticed "representing already compressed xml, that I need to compress". If you are doublecompressing, maybe you forgot to doubledecompress?
Example?
You are sadly missing an example, but I have produced minimal example of converting between XML and compressed data that works for me.
BEGIN TRANSACTION
GO
CREATE TABLE dbo.XmlData_Base (
PrimaryKey INTEGER NOT NULL IDENTITY(1, 1),
XmlCompressed VARBINARY(MAX) NULL
);
GO
CREATE OR ALTER VIEW dbo.XmlData
WITH SCHEMABINDING
AS
SELECT
BASE.PrimaryKey,
CONVERT(XML, DECOMPRESS(BASE.XmlCompressed)) AS XmlData
FROM
dbo.XmlData_Base AS BASE;
GO
CREATE OR ALTER TRIGGER dbo.TR_XmlData_instead_I
ON dbo.XmlData
INSTEAD OF INSERT
AS
BEGIN
INSERT INTO dbo.XmlData_Base
(XmlCompressed)
SELECT
COMPRESS(CONVERT(VARBINARY(MAX), I.XmlData))
FROM
Inserted AS I;
END;
GO
CREATE OR ALTER TRIGGER dbo.TR_XmlData_instead_U
ON dbo.XmlData
INSTEAD OF UPDATE
AS
BEGIN
UPDATE BASE
SET
BASE.XmlCompressed = COMPRESS(CONVERT(VARBINARY(MAX), I.XmlData))
FROM
dbo.XmlData_Base AS BASE
JOIN Inserted AS I ON I.PrimaryKey = BASE.PrimaryKey;
END;
GO
INSERT INTO dbo.XmlData
(XmlData)
VALUES
(CONVERT(XML, N'<this><I>I call upon thee!</I></this>'));
SELECT
*
FROM
dbo.XmlData;
SELECT
PrimaryKey,
XmlCompressed,
CONVERT(XML, DECOMPRESS(XmlCompressed))
FROM
dbo.XmlData_Base;
UPDATE dbo.XmlData
SET
XmlData = CONVERT(XML, N'<that><I>I call upon thee!</I></that>');
SELECT
*
FROM
dbo.XmlData;
SELECT
PrimaryKey,
XmlCompressed,
CONVERT(XML, DECOMPRESS(XmlCompressed))
FROM
dbo.XmlData_Base;
GO
ROLLBACK TRANSACTION;

Related

Replace random varbinary data in middle of column (MSSQL)

I have a inventory column with various data and multiple rows, it's necessary replace random varbinary data in middle of column.
Example of Inventory column:
Screenshot
For example this:
0x0500420000000000005000FFFFFFFFFF56730E64FFFFFFFFFFFFFFFFFFFFFFFF0400180000000000006000FFFFFFFFFF56730E72FFFFFFFFFFFFFFFFFFFFFFFF04001E0000000000007000FFFFFFFFFF56730E5EFFFFFFFFFFFFFFFFFFFFFFFF
Need to be changed to:
0x0500420000000000005000FFFFFFFFFF56730E64FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF04001E0000000000007000FFFFFFFFFF56730E5EFFFFFFFFFFFFFFFFFFFFFFFF
I know how to change static data like here. But here is many rows and in every row data is different. It's possible maybe change data somehow by length? Start length position is 67, end 131.
But here is many rows and in every row data is different. It's possible maybe change data somehow by length? Start length position is 67, end 131.
You can use STUFF or SUBSTRING and + to rewrite the entire blob, or you can update it in-place, see Updating Blobs, eg
drop table if exists #temp
go
create table #temp(id int, blob varbinary(max))
insert into #temp(id,blob) values (1,0x0500420000000000005000FFFFFFFFFF56730E64FFFFFFFFFFFFFFFFFFFFFFFF0400180000000000006000FFFFFFFFFF56730E72FFFFFFFFFFFFFFFFFFFFFFFF04001E0000000000007000FFFFFFFFFF56730E5EFFFFFFFFFFFFFFFFFFFFFFFF)
declare #newBytes varbinary(100) = 0xAAAAAAAAAAAA
--only for varbinary(max) but updates in-place
update #temp
set blob.write(#newBytes,10,datalength(#newBytes))
--for varbinary(max) or varbinary(n) replace the whole value
update #temp
set blob = cast(STUFF(blob,30,datalength(#newBytes),#newBytes) as varbinary(max))
select * from #temp

Trigger to convert empty string to 'null' before it posts in SQL Server decimal column

I've got a front table that essentially matches our SSMS database table t_myTable. Some columns I'm having problems with are those with numeric data types in the db. They are set to allow null, but from the front end when the user deletes the numeric value and tries to send a blank value, it's not posting to the database. I suspect because this value is sent back as an empty string "" which does not translate to the null allowable data type.
Is there a trigger I can create to convert these empty strings into null on insert and update to the database? Or, perhaps a trigger would already happen too late in the process and I need to handle this on the front end or API portion instead?
We'll call my table t_myTable and the column myNumericColumn.
I could also be wrong and perhaps this 'empty string' issue is not the source of my problem. But I suspect that it is.
As #DaleBurrell noted, the proper place to handle data validation is in the application layer. You can wrap each of the potentially problematic values in a NULLIF function, which will convert the value to a NULL if an empty string is passed to it.
The syntax would be along these lines:
SELECT
...
,NULLIF(ColumnName, '') AS ColumnName
select nullif(Column1, '') from tablename
SQL Server doesn't allow to convert an empty string to the numeric data type. Hence the trigger is useless in this case, even INSTEAD OF one: SQL Server will check the conversion before inserting.
SELECT CAST('' AS numeric(18,2)) -- Error converting data type varchar to numeric
CREATE TABLE tab1 (col1 numeric(18,2) NULL);
INSERT INTO tab1 (col1) VALUES(''); -- Error converting data type varchar to numeric
As you didn't mention this error, the client should pass something other than ''. The problem can be found with SQL Profiler: you need to run it and see what exact SQL statement is executing to insert data into the table.

Storing Symbols like ϱπΩ÷√νƞµΔϒᵨλθ→%° in SQL Server XML

I ran these quires in my SQL server
select cast('<Answers>
<AnswerDescription> ϱπΩ÷√νƞµΔϒᵨλθ→%° </AnswerDescription>
</Answers>' as xml)
select ' ϱπΩ÷√νƞµΔϒᵨλθ→%°'
And got the following results
<Answers>
<AnswerDescription> ?pO÷v??µ??????%° </AnswerDescription>
</Answers>
and
" ?pO÷v??µ??????%°"
How to make my SQL server store or display these values as they are being sent from Application ?
In SQL Server, scalar string values are cast to VARCHAR by default.
Your example can be made to work by indicating that the strings should be treated as NVARCHAR by adding N before the opening single quote:
select cast(N'<Answers>
<AnswerDescription> ϱπΩ÷√νƞµΔϒᵨλθ→%° </AnswerDescription>
</Answers>' as xml)
select N' ϱπΩ÷√νƞµΔϒᵨλθ→%°'
If these strings are being incorrectly stored in the database, it is likely that they are being implicitly cast to VARCHAR at some point during insertion (e.g. INSERT). It's also possible that they are being stored correctly and are cast to VARCHAR on retrieval (e.g. SELECT).
If you add some code to the question showing how you're inserting data and the datatypes of the target tables, it should be possible to provide more detailed assistance.
I believe its problem with incorectly set character set,
change charecter set to UTF8.
I just tested it on my MySQL database, i changed character set to utf8-bin using
ALTER TABLE `tab1` CHANGE `test` `test` VARCHAR( 255 ) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
worked without any problem

How to get Sql Server XML variable into TEXT column

I need to update an XML document stored in a Microsoft SQL Server database, however the vendor of the product chose to store the XML in a TEXT column.
I've been able to extract the TEXT into an XML-type variable and perform the update I need on the xml within this variable, but when I try to UPDATE the column to push the change back to the database, I run into trouble.
Looking through the documentation it appears that it's not possible to simply CAST/CONVERT an XML type variable to insert it into a TEXT column, but I would think there is some way to extract the xml "string" from the XML-type variable and UPDATE the column using this value.
Any suggestions are appreciated, but I would like to keep the solution pure SQL that it can be run directly (no C# custom function, etc.); just to keep the impact on the database minimal.
(note: isn't it a bit absurd that you can't just CAST XML as TEXT? I'm just saying...)
Casting the XML as VARCHAR(MAX) works.
declare #xml xml
declare #tblTest table (
Id int,
XMLColumn text
)
insert into #tblTest
(Id, XMLColumn)
values
(1, '<MyTest><TestNode>A</TestNode></MyTest>')
set #xml = '<MyTest><TestNode>A</TestNode><TestNode>B</TestNode></MyTest>'
update #tblTest
set XMLColumn = cast(#xml as varchar(max))
where Id = 1
select Id, XMLColumn from #tblTest

TSQL - Case on Ntext (SQL 2005)

Stored Procedures in SQL 2005 - with field type NText
Im Writing a stored procedure to tidy up some data before importing it into Microsoft CRM.
So far all works fine.
However i need to do a case statement on a nText Field. It needs to check this field against about 3 or 4 text values and set a new field (already in the destination table) which is also an nText field.
However i am getting the error
"The data types ntext and varchar are incompatible in the equal to operator.
I have come across a few articles however their solutions all seem very complex.
Thanks for your help and advice in advanced.
I recommend, if at all possible, replacing the NTEXT type with NVARCHAR(MAX), since NTEXT is not a first class type and NVARCHAR is. This should be easy to do with an ALTER TABLE statement.
Most higher level code shouldn't care about the type change. Any procedural code that uses READTEXT, WRITETEXT, etc. to deal with the NTEXT columns can be simplified to just basic selects and updates.
If the type change is not possible you may have to wrap the comparisons and assignments with CAST() or CONVERT() operators, which is ugly.
NTEXT is deprecated in SQL Server 2005. You should use NVARCHAR(MAX) instead (NVARCHAR(MAX) can be used in CASE). Is it possible for you to change the type?
this works as well
CREATE TABLE #TEMP
(
MyDummy NTEXT
)
INSERT INTO #TEMP (MyDummy) Values ('test')
SELECT
CASE CAST(MyDummy AS NVARCHAR(MAX)) WHEN 'test' THEN 'ok' ELSE 'NOK' END MyTest
FROM #temp
drop table #temp