INSERT varchar from a UTF-8 text file into MSSQL table - sql

I am inserting a text file into the database with my following query:
DECLARE #json NVARCHAR(MAX)
SELECT #json = BulkColumn
FROM OPENROWSET(BULK 'c:\mydata.db', SINGLE_CLOB) AS [Insert]
INSERT INTO [neDB].[dbo].[tbl_api] (
number
,DESC
,inf
)
SELECT number
,DESC
,inf
FROM OPENJSON(CONCAT (
'['
,REPLACE(#json, CONCAT (
'}'
,CHAR(10)
,'{'
), '},{')
,']'
)) WITH (
number VARCHAR(200) '$.number'
,DESC VARCHAR(50) '$.desc'
,inf VARCHAR(150) '$.inf'
)
The file "mydata.db" is UTF-8 which contains ü,ä,ö, etc. which will be stored as "ü", "ö" ... in the table.
If I convert the file to ANSI, all looks fine, but I don't want to convert the file all the time. Is there a way to design the query to insert UTF-8 directly?

Try adding the parameter
CODEPAGE = '65001'
to the OPENROWSET call, which is the codepage for UTF-8 (docs).

Related

Bulk import of a huge XML into an SQL Cellt

I'm trying to import an XML file into a SQL cell to process it. My first idea is do an OPENROWSET to keep the XML and the just divide it with NODES. One of the XML its too huge to keep it on a CELL, so the OPENROWSET cut the XML, so It's impossible to work with it then. That is the code:
DECLARE #XMLwithOpenXML TABLE
(
Id INT IDENTITY PRIMARY KEY,
XMLData XML,
LoadedDateTime DATETIME
)
INSERT INTO #XMLwithOpenXML(XMLData, LoadedDateTime)
SELECT CONVERT(XML, BulkColumn) AS BulkColumn
,GETDATE()
FROM OPENROWSET(BULK 'C:\temp\PP015.xml', SINGLE_CLOB) AS x;
SELECT * FROM #XMLwithOpenXML
The second option is use the BCP to do the same, but I'm getting an error.
DECLARE #sql NVARCHAR(500)
SET #sql = 'bcp [ExternaDB].[dbo].[xmltab] IN "C:\temp\PP015.xml" -T -c'
EXEC xp_cmdshell #sql
select * from xmltab
I want to know if I'm on the correct way (How to work with an XML when is already in an SQL cell I know how to do it) and how I can BULK import the full XML into a cell without Length constraint.
What is the size of the XML file on the file system?
Please try the following solution. It is very similar to yours with three differences:
SINGLE_BLOB instead of SINGLE_CLOB
No need in CONVERT(XML, BulkColumn)
DEFAULT clause is used for the LoadedDateTime column
Additionally, you can use SSIS for the task. SSIS has a streaming XML Source Adapter with no XML file size limitation.
SQL
DECLARE #tbl TABLE(
ID INT IDENTITY PRIMARY KEY,
XmlData XML,
LoadedDateTime DATETIME DEFAULT (GETDATE())
);
INSERT INTO #tbl(XmlData)
SELECT BulkColumn
FROM OPENROWSET(BULK N'C:\temp\PP015.xml', SINGLE_BLOB) AS x;
SELECT * FROM #tbl;
Thanks for the help but I found the solution. SQL has configurate a maxium characters retrieved for XML data. To solve this issue just we have to reconfigure this parameter.
enter image description here

Line feed in OPENJSON

I got a problem with OPENJSON.
DECLARE #X AS VARCHAR(1000) = CONCAT('[{"KEY":1, "VALUE": "A', CHAR(10) ,'B"}]')
SELECT
*
FROM
OPENJSON(#X)
WITH (
[KEY] INT 'strict $.KEY',
[VALUE] VARCHAR(1000) 'strict $.VALUE'
)
This doesn't work because I used a line feed (char(10)), how can I fix this? Carriage return (char(13)) doesn't work.
Char(10) and char(13) are special characters in JSON and must be escaped. You can see this question here on how-to How to escape special characters in building a JSON string?
To fix your current issue you can do this
....
OPENJSON(replace(#X, char(10), '\n' ) )
....
You need to escape the CHAR(10) special character and you may use STRING_ESCAPE() function with 'json' as second parameter (currently the only possible option) to escape all special characters in the input text:
Statement:
DECLARE #X AS VARCHAR(1000) = CONCAT(
'[{"KEY":1, "VALUE": "A',
STRING_ESCAPE(CHAR(10), 'json'),
'B"}]'
)
SELECT *
FROM OPENJSON(#X) WITH (
[KEY] INT 'strict $.KEY',
[VALUE] VARCHAR(1000) 'strict $.VALUE'
)
Result:
KEY VALUE
1 A
B

Argument type varchar invalid

I have SQL code that I am running and am getting an error when I pass in certain information.
select * from OBX.BTOCUST
--where [CUSTID] like 'sci'
--order by BRANDING desc
where BRANDING not like '0x4767374ADABABBAB9B96865669F9D9DE4E3E3838182ACABAB9E9D9DE3E3E3848182231F20000000FFFFFFFFFFFF00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000021F90401000026002C000000000D023C000006FF409370482C1A8FC8A472C96C3A9FD0A8744AAD5AAFD8AC76CBED7ABFE0B0784C2E9BCFE8B47ACD6EBBDFF0B87C4EAFDBEFF8BC7ECFEFFBFF8081828384858687884A0D03238D8E8F9091928E1520178998999A9B9C61148F1403A2A3A4A5A68CA1A49F941A9DAEAFB0B19A0C8C8D031925B9BABBBCBDB9231BBD17138F0CB2C7C8C9CA710C8D1505BED1D225CD1DD21AAB13CBDBDCDDDE57C4230CD3BC171916161ABD1613E40CE1DFF1F2F3F18D13E3E4BF8FE8C41B161C8D2EE42BF18982317A08132AD4147060090C8E0658CB550092C38A231A2CDCC8B1239F108D1C426C84A1D7C8900E378CA0E0B1A5CB976C2A8C2839F051B40EAB1CEA83C9B3A7CFFF2D204708CC77C166340FC07496687444410005044C18C882202A9600010008D17A27C0D49F607F3698E910A92369D59432024184C001AB041670B5F28044042B0B489018428084803912B826207120AC619E1039A434EA6B98D212163212A94BC4C0DC2A042024C0AC570291057FE590982B40C1E1D32D19411B98E19187C7D32A6A1C2280F083219B8538BD6D820080DF261064FDCDF5B76500B98560B52A4438F32100F44230325D7780E4267E47BD9E1D000103DC8960259200387821049C561592E0008907067C0367FF60BD89F200A6064080BA7F32D50E7500092EB0F922DB647A91C01F797209C05F667A99A0575D119AA0800006E4151A01A54170FF406E0840A0C0874540306112092C306268128E2601612D3E901709D5C58580020E14A797027A6DE600027915D6A25E0140B89709260610819000B827416D47FA27E52B003A140981BC4C32823407F23523090E9826D501A60DA6557424BCB8978942F66642007E09915776605A4842543012E1805E975546A6092F5AD5D99E85E9B50001115438A78424B6E8C09E0920E0806F15B6A8E391242CD09B82424089E67353868A4995034536A02F66356201395D1281807B9D59FA1B8C685E566175708656A183B2562A44827D0E5161749AB61800117C6E4AC2541526F86B94161E60008F51264BE75E2F1E6B426DA6D576ACB5A2868B08A9039DE40896BA68C0FF18979219E156827D9180D5BCD716C127A2CAC5D9A2BDF2D2FB995E0B1A316CB3A3215BB0A5FB0EF9AC1119566A2D9ADBCADBA9C4DE222CEEC585905BEE95BC9C54413EAD6E8B5DA26806BC95AFC1193A44AE0B1BFC15C37A3980C4C0325BFC2C577A318B29C195D108F1CD27AF2971C4FC550C2EC648FFA1713EEF70ACCB2A23BC960BBABB842C008BD915E6DEA44278F6335F4EAEACAFB578DA859E116A5E96C06D7BEA4623D0C2161C6FCB31B73CF1ACD51E0C310243E73595D107272D381F4B0F148E96E71600113EBD58ED80009BBD4A5B82DA52889D54770D9128D77CEB45A2017B122666110F380900021124A724010EB47962DC0200505BB123C697FF267ABECE48AD5C48D669028542EC09009C51B16EF6EB83278F47E14C1F7E6E073249C2EE6CCAF9F6A4042F9B108100D581E72F1197CDBBA0015733C72186491080C0D559952880572B63F5329FEB3FC01CF9B10F215CFB42907FDB855CE19067BCA72D023C807B0BDA5F56E6953DE539300ECC2347AA1EA198622C0525066AD7318EF6C00E2A248225D8C0C77821204904C328C4186106A9270B0E7AF085F20061D4CA01354850E08228890C4D56980CE4C1F087DF90E144A6060A8C48628491615CE33408C4263A510C32DCC505A046810B18311234F1C00062C3C4277AF18B58882052B6A80F5BE4425D93D049C8C0C8C6363E2182E138E3238672920906FFE3224C24CE6F2E0707F56DE1387C74A320953682D5F402205ACA45334882C8460C11645DFCDA12B425866269EF3D5B60D920370908E6A9441D68DC870756010DE729658D924C1F25C310A50440005456D0242767B907E68DA01DB9A8E12330B01A8BA83192283B425C56F905BE854196B44CA61D4048114890314B8EB8E12F59582F2138094A034C509C52149D0799285366A350E8C4149711DDE9597C6AD2D00EA0A20384E69BAFCA5366DA79A87C614D99F8846021F3D1C846480D9A24392530A3E42967454C5B06F8D3C0C099B99CD5A55810B80B9C3C53CD834A8504A609D4B52230B7DE49F4766FD2573E47FA066696A09F235047C72C38CD227C0D70FF47AA58AF8E94CA08FD2C009A7A51685E2AAF8E12AB9A1582534EF5854C921A350D26BDE23E7911BDA408949A3CFDD6AE8646247FD5742F920480868236B9AC2C14A8D0DAD656437ACFA39A950C2625C82354AA0B5339F231A8AC144C27A62D80B9146536AD108F84A3AFA81A806660D58D7BC86ACFB31AB60C6955892356B58BA66E09AE03EDAAC22C6A82AD3DE7AAF57A17612B5AB1B6D9693A3FAB10AC365BD4C39A960B696D8D237658C64628913591A5EB646B2301DFC0295317E2AAC1EA8526058CF553B2E59156E03495D0EEEC4E632DED69977B85B1B0761AAAC5E02944A19611B0257EE27B13568627BFE6684673F993CAF7B4BB407A012002FC09FD0004BEE2BFE0581502EFFBCABCCACB9FF3668E7BD9192F73F75B0575A9301FEBBA86950AC9DF021F4626CF95C614CF350D0A28A6262330B084C13216690EA4288B8DC63F02C494097B98270DD1497457C2810258600314C0C06BAFD188063000040DF8C08767CC1167AC18BA8A7D4405FE3990823020C51FA88036684C647A8CE5960542073A86A293458280011F10C5073650E42ACB63911430648174820D7B984023231800950B60E53277E370CFDC3224ED610C306B200426B88499E7BC0C5D4EF7CE781605242AD08A215440232AA1B3A0930165E721EED087967111821C66390FFAD1908EB4A4274DE94A5BFAD298CEB4A637CDE94E7B5A994100003B'
When I uncomment the --where [CUSTID] like 'sci' and comment out branding the query runs and am able to see results. But when I run where branding I get an error:
Msg 8116, Level 16, State 1, Line 1
Argument data type varchar is invalid for argument 2 of like function.
also another thing is when i uncomment the order by BRANDING desc it gives me another error.
that error is
Msg 306, Level 16, State 2, Line 3 The text, ntext, and image data
types cannot be compared or sorted, except when using IS NULL or LIKE
operator.
What do I need to do to get the command to actually work?
This error message is what you get if BRANDING is of type IMAGE, which is incomparable (as in, literally, it cannot be compared in any way, not even to another IMAGE). To overcome the limitations of this type, SQL Server 2005 introduced the VARBINARY(MAX) type, which has the same purpose but isn't burdened with the special case handling that IMAGE requires (likewise, (N)VARCHAR(MAX) was introduced to replace (N)TEXT). IMAGE should not be used for new work; VARBINARY(MAX) is superior in all respects. If existing IMAGE columns can be changed to VARBINARY(MAX), do so.
If that isn't possible, the IMAGE can still be converted on the fly. In the query above:
select * from OBX.BTOCUST
where CONVERT(VARBINARY(MAX), BRANDING) <> 0x476737.....00003B
Here the 0x476737.... is a BINARY literal. To convert a hexstring to a binary, use CONVERT(VARBINARY(MAX), #string, 1) (with leading "0x") or CONVERT(VARBINARY(MAX), #string, 2) (without leading "0x").
Convert your image or text or ntext column data type to varbinary(max), varchar(max) or nvarchar(max).
The image, text, and ntext data types are deprecated and will be removed in a future version of SQL Server. They are very difficult and awkward to work with. The varchar(max) and nvarchar(max) have all the benefits of nearly unlimited string size, and none of the drawbacks of text or ntext. They also work with all the normal string functions you'd expect.
This is a quick and dirty way of doing it that I just put together, but keep in mind what you are asking of your server in the absence of fulltext. It ignores the data type mismatches as low level errors in your message output. You can check the message output for a rudimentary sort of progress on it as it is running.
Declare #SearchString nvarchar(50) = 'SEARCH STRING HERE'
Declare #TableList Table (TableName nvarchar(128))
Declare #Table nvarchar(128)
Declare #ColumnList Table (ColumnName nvarchar(128))
Declare #Column nvarchar(128)
Declare #Results Table (TableName nvarchar(128), ColumnName nvarchar(128), String nvarchar(max))
Declare #cmd nvarchar(max)
Insert Into #TableList
Select TABLE_NAME From INFORMATION_SCHEMA.Tables Where Table_Type = 'BASE TABLE'
While Exists (Select 1 From #TableList)
Begin
Set #Table = (Select Top 1 TableName From #TableList)
Print 'Searching '+#Table+'...'
Insert Into #ColumnList
Select Column_Name From INFORMATION_SCHEMA.Columns Where Table_Name = #Table
While Exists (Select 1 From #ColumnList)
Begin
Set #Column = (Select Top 1 ColumnName From #ColumnList)
Print 'Searching ' +#Table + '.' + #Column+'...'
Set #cmd = 'Select '''+#Table+''', '''+#Column+''', '+#Column+' From '+#Table+' Where '+#Column+' Like ''%'+#SearchString+'%'''
--Select #cmd
Insert Into #Results
Exec (#cmd)
Delete From #ColumnList Where ColumnName = #Column
End
Delete From #TableList Where TableName = #Table
End
Select * From #Results

Storing special character(e.g. &) in XML datatype

If I do
Declare #t table(Email xml)
Declare #email varchar(100) = 'xxx&xx#monop.com'
Insert into #t
select '<Emails> <Email>' + #email +'</Email></Emails>'
select * From #t
I will get expected error
Msg 9411, Level 16, State 1, Line 8
XML parsing: line 1, character 27, semicolon expected
One solution which I found almost everywhere(including SO) is to replace '&' with '& and it works
Insert into #t
select CAST('<Emails><Email>' + REPLACE(#email, '&', '&') + '</Email></Emails>' AS XML)
Output
<Emails><Email>xxx&xx#monop.com</Email></Emails>
However, I was trying with CData approach (just another way to approach the problem)
Declare #t table(Email xml)
Declare #email varchar(100) = 'xxx&xx#monop.com'
Insert into #t
Select CAST('<![CDATA[Emails> <Email>' + #email + '</Email> </Emails]]>' AS XML)
select * From #t
When I got the below output
Emails> <Email>xxx&xx#monop.com</Email> </Emails
What I am trying to achieve is to store the data as it is i.e. the desired output should be
<Emails><Email>xxx&xx#monop.com</Email></Emails>
Is it at all possible?
I know that the replace function will fail if any other special character that xml fails to understand will be passed as an input to it e.g. '<' i which case again we need to replace it...
Thanks
Tags are PCDATA, not CDATA, so don't put them in the CDATA section.
When you work with XML you should use XML-related features of SQL Server.
For example:
/* Create xml and add a variable to it */
DECLARE
#xml xml = '<Emails />',
#email varchar(100) = 'xxx&xx#monop.com';
SET #xml.modify ('insert (
element Email {sql:variable("#email")}
) into (/Emails)[1]');
SELECT #xml;
/* Output:
<Emails>
<Email>xxx&xx#monop.com</Email>
</Emails>
*/
/* Extract value from xml */
DECLARE #email_out varchar(200);
SET #email_out = #xml.value ('(/Emails/Email)[1]', 'varchar (200)');
SELECT #email_out; /* Returns xxx&xx#monop.com */
Good luck
Roman

Inserting XML documents into SQL Server 2008 database

I need help inserting xml files into SQL Server 2008.
I have the following SQL statement:
insert into dbo.articles(id, title, contents)
SELECT X.article.query('id').value('.', 'INT'),
X.article.query('article').value('.', 'VARCHAR(50)'),
X.article.query('/doc/text()').value('.', 'VARCHAR(MAX)')
FROM (
SELECT CAST(x AS XML)
FROM OPENROWSET(
BULK 'E:\test\test_files\1000006.xml',
SINGLE_BLOB) AS T(x)
) AS T(x)
CROSS APPLY x.nodes('doc') AS X(article);
which basically shreds an XML doc into a columns. However, I want to be able to insert all the files in a folder, and not manually specify the file, as in this case E:\test\test_files\1000006.xml
Ok, first crack at answering a question in stackoverflow...
You have two issues:- firstly getting the filenames from the folder into a SQL table or table variable, and then reading the XML from each.
The first is easy, if you don't mind using xp_cmdshell
DECLARE #Folder VARCHAR(255) = 'C:\temp\*.xml'
DECLARE #Command VARCHAR(255)
DECLARE #FilesInAFolder TABLE (XMLFileName VARCHAR(500))
--
SET #Command = 'DIR ' + #Folder + ' /TC /b'
--
INSERT INTO #FilesInAFolder
EXEC MASTER..xp_cmdshell #Command
--
SELECT * FROM #FilesInAFolder
WHERE XMLFileName IS NOT NULL
The second part, converting the XML files to SQL rows is a little trickier because BULK INSERT won't take a parameter and you can't BULK INSERT into an XML table type. Here's code that works for ONE file...
DECLARE #x xml
DECLARE #Results TABLE (result xml)
DECLARE #xmlFileName NVARCHAR(300) = 'C:\temp\YourXMLFile.xml'
DECLARE #TempTable TABLE
(
ID INT,
Article NVARCHAR(50),
doctext NVARCHAR(MAX)
)
/* ---- HAVE TO USE DYNAMIC sql BECAUSE BULK INSERT WON'T TAKE A PARAMETER---------*/
DECLARE #sql NVARCHAR(4000) =
'SELECT * FROM OPENROWSET ( BULK ''' + #xmlFileName + ''', SINGLE_BLOB )AS xmlData'
/* ---- have to use a normal table variable because we can't directly bulk insert
into an XML type table variable ------------------------------------------*/
INSERT INTO #results EXEC(#SQL)
SELECT #x = result FROM #Results
/* ---- this is MUCH faster than using a cross-apply ------------------------------*/
INSERT INTO #TempTable(ID,Article,doctext)
SELECT
x.value('ID[1]', 'INT' ),
x.value('Article[1]', 'NVARCHAR(50)' ),
x.value('doctext[1]', 'NVARCHAR(MAX)' )
FROM #x.nodes(N'/doc') t(x)
SELECT * FROM #TempTable
Now the hard bit is putting these two together. I tried several ways to get this code into a function but you can't use dynamic SQL or EXEC in a function and you can't call an SP from a function and you can't put the code into two separate SPs because you can't have cascading EXEC statements i.e. you try and EXEC an SP with the above code in it that also has an EXEC in it, so... you have to either use a cursor to put the two code blocks above together i.e. cursor through the #FilesInAFolder passing each XMLFileName value into the second code block as variable #XMLFileName or you use SSIS or CLR.
Sorry I ran out of time to build a complete SP with a directory name as a parameter and a cursor but that is pretty straightforward. Phew!
Are you using a stored procedure? You can specify the file name as a parameter.
Something like...
CREATE PROCEDURE sp_XMLLoad
#FileName
AS SET NOCOUNT ON
SELECT X.article.query('id').value('.', 'INT'),
X.article.query('article').value('.', 'VARCHAR(50)'),
X.article.query('/doc/text()').value('.', 'VARCHAR(MAX)')
FROM (
SELECT CAST(x AS XML)
FROM OPENROWSET(
BULK #FileName,
SINGLE_BLOB) AS T(x)
Not exactly like that ... you'll need to add quotes around the #Filename I bet. Maybe assemble it with quotes and then use that variable.
If you're using SSIS, you can then pump all the files from a directory to the stored procedure, or to the SSIS code used.
I think you can do it with a cursor and xp_cmdshell. I would not recommend to ever use xp_cmdshell though.
DECLARE #FilesInAFolder TABLE (FileNames VARCHAR(500))
DECLARE #File VARCHAR(500)
INSERT INTO #FilesInAFolder
EXEC MASTER..xp_cmdshell 'dir /b c:\'
DECLARE CU CURSOR FOR
SELECT 'c:\' + FileNames
FROM #FilesInAFolder
WHERE RIGHT(FileNames,4) = '.xml'
OPEN CU
FETCH NEXT FROM CU INTO #File
WHILE ##FETCH_STATUS = 0
BEGIN
INSERT INTO dbo.articles(id, title, contents)
SELECT X.article.query('id').value('.', 'INT'),
X.article.query('article').value('.', 'VARCHAR(50)'),
X.article.query('/doc/text()').value('.', 'VARCHAR(MAX)')
FROM (
SELECT CAST(x AS XML)
FROM OPENROWSET(
BULK #File,
SINGLE_BLOB) AS T(x)
) AS T(x)
CROSS APPLY x.nodes('doc') AS X(article);
FETCH NEXT FROM CU INTO #File
END
CLOSE CU
DEALLOCATE CU