Bulk import of a huge XML into an SQL Cellt - sql

I'm trying to import an XML file into a SQL cell to process it. My first idea is do an OPENROWSET to keep the XML and the just divide it with NODES. One of the XML its too huge to keep it on a CELL, so the OPENROWSET cut the XML, so It's impossible to work with it then. That is the code:
DECLARE #XMLwithOpenXML TABLE
(
Id INT IDENTITY PRIMARY KEY,
XMLData XML,
LoadedDateTime DATETIME
)
INSERT INTO #XMLwithOpenXML(XMLData, LoadedDateTime)
SELECT CONVERT(XML, BulkColumn) AS BulkColumn
,GETDATE()
FROM OPENROWSET(BULK 'C:\temp\PP015.xml', SINGLE_CLOB) AS x;
SELECT * FROM #XMLwithOpenXML
The second option is use the BCP to do the same, but I'm getting an error.
DECLARE #sql NVARCHAR(500)
SET #sql = 'bcp [ExternaDB].[dbo].[xmltab] IN "C:\temp\PP015.xml" -T -c'
EXEC xp_cmdshell #sql
select * from xmltab
I want to know if I'm on the correct way (How to work with an XML when is already in an SQL cell I know how to do it) and how I can BULK import the full XML into a cell without Length constraint.

What is the size of the XML file on the file system?
Please try the following solution. It is very similar to yours with three differences:
SINGLE_BLOB instead of SINGLE_CLOB
No need in CONVERT(XML, BulkColumn)
DEFAULT clause is used for the LoadedDateTime column
Additionally, you can use SSIS for the task. SSIS has a streaming XML Source Adapter with no XML file size limitation.
SQL
DECLARE #tbl TABLE(
ID INT IDENTITY PRIMARY KEY,
XmlData XML,
LoadedDateTime DATETIME DEFAULT (GETDATE())
);
INSERT INTO #tbl(XmlData)
SELECT BulkColumn
FROM OPENROWSET(BULK N'C:\temp\PP015.xml', SINGLE_BLOB) AS x;
SELECT * FROM #tbl;

Thanks for the help but I found the solution. SQL has configurate a maxium characters retrieved for XML data. To solve this issue just we have to reconfigure this parameter.
enter image description here

Related

INSERT varchar from a UTF-8 text file into MSSQL table

I am inserting a text file into the database with my following query:
DECLARE #json NVARCHAR(MAX)
SELECT #json = BulkColumn
FROM OPENROWSET(BULK 'c:\mydata.db', SINGLE_CLOB) AS [Insert]
INSERT INTO [neDB].[dbo].[tbl_api] (
number
,DESC
,inf
)
SELECT number
,DESC
,inf
FROM OPENJSON(CONCAT (
'['
,REPLACE(#json, CONCAT (
'}'
,CHAR(10)
,'{'
), '},{')
,']'
)) WITH (
number VARCHAR(200) '$.number'
,DESC VARCHAR(50) '$.desc'
,inf VARCHAR(150) '$.inf'
)
The file "mydata.db" is UTF-8 which contains ü,ä,ö, etc. which will be stored as "ü", "ö" ... in the table.
If I convert the file to ANSI, all looks fine, but I don't want to convert the file all the time. Is there a way to design the query to insert UTF-8 directly?
Try adding the parameter
CODEPAGE = '65001'
to the OPENROWSET call, which is the codepage for UTF-8 (docs).

want to get the Email information from XML, But getting error

CREATE TABLE XMLTABLE(id int IDENTITY PRIMARY KEY,XML_DATA XML,DATE DATETIME);
go
INSERT INTO XMLTABLE(XML_DATA,DATE)
SELECT CONVERT(XML,BULKCOLUMN)AS DATA,getdate()
FROM OPENROWSET(BULK 'c:\Demo.xml',SINGLE_BLOB)AS x
go
DECLARE #XML AS XML
DECLARE #OUPT AS INT
DECLARE #SQL NVARCHAR (MAX)
SELECT #XML= XML_DATA FROM XMLTABLE
EXEC sp_xml_preparedocument #OUPT OUTPUT,#XML,'<root xmlns:d="http://abc" xmlns:ns2="http://def" />'
SELECT EMAILR
FROM OPENXML(#OUPT,'d:ns2:FORM/ns2:Form1/ns2:Part/ns2:Part1/ns2:Ba')
WITH
(EMAILR [VARCHAR](100) 'ns2:EmailAddress')
EXEC sp_xml_removedocument #OUPT
go
i.e Demo.xml contains>>
<ns2:FORM xmlns="http://abc" xmlns:ns2="http://def">
<ns2:Form1>
<ns2:Part>
<ns2:Part1>
<ns2:Ba>
<ns2:EmailA>Hello#YAHOO.COM</ns2:EmailA> ...
Error:Msg 6603, Level 16, State 2, Line 6 XML parsing error: Expected
token 'eof' found ':'.
d:ns2-->:<--FORM/ns2:Form1/ns2:Part/ns2:Part1/ns2:Ba
The approach with sp_xml_... methods and FROM OPENXML is outdated!
You should better use the current XML methods .nodes(), .value(), query() and .modify().
Your XML example is not complete, neither is is valid, had to change it a bit to make it working. You'll probably have to adapt the XPath (at least Part1 is missing).
DECLARE #xml XML=
'<ns2:FORM xmlns="http://abc" xmlns:ns2="http://def">
<ns2:Form1>
<ns2:Part>
<ns2:Ba>
<ns2:EmailA>Hello#YAHOO.COM</ns2:EmailA>
</ns2:Ba>
</ns2:Part>
</ns2:Form1>
</ns2:FORM> ';
This is the secure way with namespaces and full path
WITH XMLNAMESPACES(DEFAULT 'http://abc'
,'http://def' AS ns2)
SELECT #xml.value('(/ns2:FORM/ns2:Form1/ns2:Part/ns2:Ba/ns2:EmailA)[1]','nvarchar(max)');
And this is the lazy approach
SELECT #xml.value('(//*:EmailA)[1]','nvarchar(max)')
You should - however - prefer the full approach. The more you give, the better and fast you get...

How to download a webpage and parse in SQL

I am simply trying to download a webpage and store it in an accessible format in SQL Server 2012. I have resorted to using dynamic SQL, but perhaps there is a cleaner, easier way to do this. I have been able to successfully download the htm files to my local drive using the below code, but I am having difficulty working with the html itself. I am trying to convert the webpage to XML and parse from there, but I think I am not addressing the HTML to XML conversion properly.
I get the following error, "Parsing XML with internal subset DTDs not allowed. Use CONVERT with style option 2 to enable limited internal subset DTD support"
DECLARE #URL NVARCHAR(500);
DECLARE #Ticker NVARCHAR(10)
DECLARE #DynamicTickerNumber INT
SET #DynamicTickerNumber = 1
CREATE TABLE Parsed_HTML(
[Date] DATETIME
,[Ticker] VarChar (8)
,[NodeName] VarChar (50)
,[Value] NVARCHAR (50));
WHILE #DynamicTickerNumber <= 2
BEGIN
SET #Ticker = (SELECT [Ticker] FROM [Unique Tickers Yahoo] WHERE [Unique Tickers Yahoo].[Ticker Number]= #DynamicTickerNumber)
SET #URL ='http://finance.yahoo.com/q/ks?s=' + #Ticker + '+Key+Statistics'
DECLARE #cmd NVARCHAR(250);
DECLARE #tOutput TABLE(data NVARCHAR(100));
DECLARE #file NVARCHAR(MAX);
SET #file='D:\Ressources\Execution Model\Execution Model for SQL\DB Temp\quoteYahooHTML.htm'
SET #cmd ='powershell "(new-object System.Net.WebClient).DownloadFile('''+#URL+''','''+#file+''')"'
EXEC master.dbo.xp_cmdshell #cmd, no_output
CREATE TABLE XmlImportTest
(
xmlFileName VARCHAR(300),
xml_data xml
);
DECLARE #xmlFileName VARCHAR(300)
SELECT #xmlFileName = 'D:\Ressources\Execution Model\Execution Model for SQL\DB Temp\quoteYahooHTML.htm'
EXEC('
INSERT INTO XmlImportTest(xmlFileName, xml_data)
SELECT ''' + #xmlFileName + ''', xmlData
FROM
(
SELECT *
FROM OPENROWSET (BULK ''' + #xmlFileName + ''' , SINGLE_BLOB) AS XMLDATA
) AS FileImport (XMLDATA)
')
DECLARE #x XML;
DECLARE #string VARCHAR(MAX);
SET #x = (SELECT xml_data FROM XmlImportTest)
SET #string = CONVERT(VARCHAR(MAX), #x, 1);
INSERT INTO [Parsed_HTML] ([NodeName], [Value])
SELECT [NodeName], [Value] FROM dbo.XMLTable(#string)
--above references XMLTable Parsing function that works consistently
END
Unfortunately this needs to be run within the confines of SQL Server, and my understanding is that the HTML Agility Pack is not immediately compatible. I also notice that the intermediate table, XMLimportTest, never gets populated, so this is likely not a function of malformed HTML.
Short answer: don't.
SQL is very good for some things but for downloading and parsing HTML it's a terrible choice. In your example you're using PowerShell to download the file, why not parse the HTML in PowerShell too? Then you could write the parsed data into something like a CSV file and load that in using OPENROWSET.
Another option, still not using SQL but a bit more within SQL Server might be to use a .Net SP via SQL CLR.
As a few of the comments point out, if you could guarantee the HTML was well formed XML then you could use SQL XML functionality to parse it, but web pages are rarely well formed XML so this would be a risky choice.

Bulk Import XML into SQL Server

I was looking at these examples on Microsoft.com here:
http://support.microsoft.com/kb/316005
http://msdn.microsoft.com/en-us/library/aa225754%28v=sql.80%29.aspx
But it's saying in part of it's steps that VBScript code has to be executed, and I wasn't able to find where the VBScript should be executed. Is it possible to be executed in SQL Server itself?
The code from the site looks something like this:
Set objBL = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad")
objBL.ConnectionString = "provider=SQLOLEDB.1;data source=MySQLServer;
database=MyDatabase;uid=MyAccount;pwd=MyPassword"
objBL.ErrorLogFile = "c:\error.log"
objBL.Execute "c:\customermapping.xml", "c:\customers.xml"
Set objBL = Nothing
This looks like it could be executed in classic asp or something, but I prefer to keep it inside SQL Server. Does anyone know how to execute something like this all with-in SQL Server? or does anyone have a better method for Bulk import XML into SQL server?
SQL Server is capable of reading XML and inserting it as you need. Here is an example of an XML file and insertion pulled from here:
XML:
<Products>
<Product>
<SKU>1</SKU>
<Desc>Book</Desc>
</Product>
<Product>
<SKU>2</SKU>
<Desc>DVD</Desc>
</Product>
<Product>
<SKU>3</SKU>
<Desc>Video</Desc>
</Product>
</Products>
Insert statement that is parsing the XML:
INSERT INTO Products (sku, product_desc)
SELECT X.product.query('SKU').value('.', 'INT'),
X.product.query('Desc').value('.', 'VARCHAR(30)')
FROM (
SELECT CAST(x AS XML)
FROM OPENROWSET(
BULK 'C:\Products.xml',
SINGLE_BLOB) AS T(x)
) AS T(x)
CROSS APPLY x.nodes('Products/Product') AS X(product);
I tried this and for 975 rows from a 1MB XML file, this took about 2.5 minutes to execute on a very fast PC.
I switched to using OpenXml in a multi-step process and process takes less than a second.
CREATE TABLE XMLwithOpenXML
(
Id INT IDENTITY PRIMARY KEY,
XMLData XML,
LoadedDateTime DATETIME
)
INSERT INTO XMLwithOpenXML(XMLData, LoadedDateTime)
SELECT CONVERT(XML, BulkColumn) AS BulkColumn, GETDATE()
FROM OPENROWSET(BULK 'clients.xml', SINGLE_BLOB) AS x;
DECLARE #XML AS XML, #hDoc AS INT, #SQL NVARCHAR (MAX)
SELECT #XML = XMLData FROM XMLwithOpenXML WHERE ID = '1' -- The row to process
EXEC sp_xml_preparedocument #hDoc OUTPUT, #XML
INSERT INTO Clients
SELECT CustomerID, CustomerName
FROM OPENXML(#hDoc, 'Clients/Client')
WITH
(
CustomerID [varchar](50) 'ID',
CustomerName [varchar](100) 'Name'
)
EXEC sp_xml_removedocument #hDoc
GO
I got this from here:
http://www.mssqltips.com/sqlservertip/2899/importing-and-processing-data-from-xml-files-into-sql-server-tables/
Basically you load the XML into a table as a big blob of text, then you use OpenXml to process it.

Inserting XML documents into SQL Server 2008 database

I need help inserting xml files into SQL Server 2008.
I have the following SQL statement:
insert into dbo.articles(id, title, contents)
SELECT X.article.query('id').value('.', 'INT'),
X.article.query('article').value('.', 'VARCHAR(50)'),
X.article.query('/doc/text()').value('.', 'VARCHAR(MAX)')
FROM (
SELECT CAST(x AS XML)
FROM OPENROWSET(
BULK 'E:\test\test_files\1000006.xml',
SINGLE_BLOB) AS T(x)
) AS T(x)
CROSS APPLY x.nodes('doc') AS X(article);
which basically shreds an XML doc into a columns. However, I want to be able to insert all the files in a folder, and not manually specify the file, as in this case E:\test\test_files\1000006.xml
Ok, first crack at answering a question in stackoverflow...
You have two issues:- firstly getting the filenames from the folder into a SQL table or table variable, and then reading the XML from each.
The first is easy, if you don't mind using xp_cmdshell
DECLARE #Folder VARCHAR(255) = 'C:\temp\*.xml'
DECLARE #Command VARCHAR(255)
DECLARE #FilesInAFolder TABLE (XMLFileName VARCHAR(500))
--
SET #Command = 'DIR ' + #Folder + ' /TC /b'
--
INSERT INTO #FilesInAFolder
EXEC MASTER..xp_cmdshell #Command
--
SELECT * FROM #FilesInAFolder
WHERE XMLFileName IS NOT NULL
The second part, converting the XML files to SQL rows is a little trickier because BULK INSERT won't take a parameter and you can't BULK INSERT into an XML table type. Here's code that works for ONE file...
DECLARE #x xml
DECLARE #Results TABLE (result xml)
DECLARE #xmlFileName NVARCHAR(300) = 'C:\temp\YourXMLFile.xml'
DECLARE #TempTable TABLE
(
ID INT,
Article NVARCHAR(50),
doctext NVARCHAR(MAX)
)
/* ---- HAVE TO USE DYNAMIC sql BECAUSE BULK INSERT WON'T TAKE A PARAMETER---------*/
DECLARE #sql NVARCHAR(4000) =
'SELECT * FROM OPENROWSET ( BULK ''' + #xmlFileName + ''', SINGLE_BLOB )AS xmlData'
/* ---- have to use a normal table variable because we can't directly bulk insert
into an XML type table variable ------------------------------------------*/
INSERT INTO #results EXEC(#SQL)
SELECT #x = result FROM #Results
/* ---- this is MUCH faster than using a cross-apply ------------------------------*/
INSERT INTO #TempTable(ID,Article,doctext)
SELECT
x.value('ID[1]', 'INT' ),
x.value('Article[1]', 'NVARCHAR(50)' ),
x.value('doctext[1]', 'NVARCHAR(MAX)' )
FROM #x.nodes(N'/doc') t(x)
SELECT * FROM #TempTable
Now the hard bit is putting these two together. I tried several ways to get this code into a function but you can't use dynamic SQL or EXEC in a function and you can't call an SP from a function and you can't put the code into two separate SPs because you can't have cascading EXEC statements i.e. you try and EXEC an SP with the above code in it that also has an EXEC in it, so... you have to either use a cursor to put the two code blocks above together i.e. cursor through the #FilesInAFolder passing each XMLFileName value into the second code block as variable #XMLFileName or you use SSIS or CLR.
Sorry I ran out of time to build a complete SP with a directory name as a parameter and a cursor but that is pretty straightforward. Phew!
Are you using a stored procedure? You can specify the file name as a parameter.
Something like...
CREATE PROCEDURE sp_XMLLoad
#FileName
AS SET NOCOUNT ON
SELECT X.article.query('id').value('.', 'INT'),
X.article.query('article').value('.', 'VARCHAR(50)'),
X.article.query('/doc/text()').value('.', 'VARCHAR(MAX)')
FROM (
SELECT CAST(x AS XML)
FROM OPENROWSET(
BULK #FileName,
SINGLE_BLOB) AS T(x)
Not exactly like that ... you'll need to add quotes around the #Filename I bet. Maybe assemble it with quotes and then use that variable.
If you're using SSIS, you can then pump all the files from a directory to the stored procedure, or to the SSIS code used.
I think you can do it with a cursor and xp_cmdshell. I would not recommend to ever use xp_cmdshell though.
DECLARE #FilesInAFolder TABLE (FileNames VARCHAR(500))
DECLARE #File VARCHAR(500)
INSERT INTO #FilesInAFolder
EXEC MASTER..xp_cmdshell 'dir /b c:\'
DECLARE CU CURSOR FOR
SELECT 'c:\' + FileNames
FROM #FilesInAFolder
WHERE RIGHT(FileNames,4) = '.xml'
OPEN CU
FETCH NEXT FROM CU INTO #File
WHILE ##FETCH_STATUS = 0
BEGIN
INSERT INTO dbo.articles(id, title, contents)
SELECT X.article.query('id').value('.', 'INT'),
X.article.query('article').value('.', 'VARCHAR(50)'),
X.article.query('/doc/text()').value('.', 'VARCHAR(MAX)')
FROM (
SELECT CAST(x AS XML)
FROM OPENROWSET(
BULK #File,
SINGLE_BLOB) AS T(x)
) AS T(x)
CROSS APPLY x.nodes('doc') AS X(article);
FETCH NEXT FROM CU INTO #File
END
CLOSE CU
DEALLOCATE CU