Inserting XML documents into SQL Server 2008 database - sql

I need help inserting xml files into SQL Server 2008.
I have the following SQL statement:
insert into dbo.articles(id, title, contents)
SELECT X.article.query('id').value('.', 'INT'),
X.article.query('article').value('.', 'VARCHAR(50)'),
X.article.query('/doc/text()').value('.', 'VARCHAR(MAX)')
FROM (
SELECT CAST(x AS XML)
FROM OPENROWSET(
BULK 'E:\test\test_files\1000006.xml',
SINGLE_BLOB) AS T(x)
) AS T(x)
CROSS APPLY x.nodes('doc') AS X(article);
which basically shreds an XML doc into a columns. However, I want to be able to insert all the files in a folder, and not manually specify the file, as in this case E:\test\test_files\1000006.xml

Ok, first crack at answering a question in stackoverflow...
You have two issues:- firstly getting the filenames from the folder into a SQL table or table variable, and then reading the XML from each.
The first is easy, if you don't mind using xp_cmdshell
DECLARE #Folder VARCHAR(255) = 'C:\temp\*.xml'
DECLARE #Command VARCHAR(255)
DECLARE #FilesInAFolder TABLE (XMLFileName VARCHAR(500))
--
SET #Command = 'DIR ' + #Folder + ' /TC /b'
--
INSERT INTO #FilesInAFolder
EXEC MASTER..xp_cmdshell #Command
--
SELECT * FROM #FilesInAFolder
WHERE XMLFileName IS NOT NULL
The second part, converting the XML files to SQL rows is a little trickier because BULK INSERT won't take a parameter and you can't BULK INSERT into an XML table type. Here's code that works for ONE file...
DECLARE #x xml
DECLARE #Results TABLE (result xml)
DECLARE #xmlFileName NVARCHAR(300) = 'C:\temp\YourXMLFile.xml'
DECLARE #TempTable TABLE
(
ID INT,
Article NVARCHAR(50),
doctext NVARCHAR(MAX)
)
/* ---- HAVE TO USE DYNAMIC sql BECAUSE BULK INSERT WON'T TAKE A PARAMETER---------*/
DECLARE #sql NVARCHAR(4000) =
'SELECT * FROM OPENROWSET ( BULK ''' + #xmlFileName + ''', SINGLE_BLOB )AS xmlData'
/* ---- have to use a normal table variable because we can't directly bulk insert
into an XML type table variable ------------------------------------------*/
INSERT INTO #results EXEC(#SQL)
SELECT #x = result FROM #Results
/* ---- this is MUCH faster than using a cross-apply ------------------------------*/
INSERT INTO #TempTable(ID,Article,doctext)
SELECT
x.value('ID[1]', 'INT' ),
x.value('Article[1]', 'NVARCHAR(50)' ),
x.value('doctext[1]', 'NVARCHAR(MAX)' )
FROM #x.nodes(N'/doc') t(x)
SELECT * FROM #TempTable
Now the hard bit is putting these two together. I tried several ways to get this code into a function but you can't use dynamic SQL or EXEC in a function and you can't call an SP from a function and you can't put the code into two separate SPs because you can't have cascading EXEC statements i.e. you try and EXEC an SP with the above code in it that also has an EXEC in it, so... you have to either use a cursor to put the two code blocks above together i.e. cursor through the #FilesInAFolder passing each XMLFileName value into the second code block as variable #XMLFileName or you use SSIS or CLR.
Sorry I ran out of time to build a complete SP with a directory name as a parameter and a cursor but that is pretty straightforward. Phew!

Are you using a stored procedure? You can specify the file name as a parameter.
Something like...
CREATE PROCEDURE sp_XMLLoad
#FileName
AS SET NOCOUNT ON
SELECT X.article.query('id').value('.', 'INT'),
X.article.query('article').value('.', 'VARCHAR(50)'),
X.article.query('/doc/text()').value('.', 'VARCHAR(MAX)')
FROM (
SELECT CAST(x AS XML)
FROM OPENROWSET(
BULK #FileName,
SINGLE_BLOB) AS T(x)
Not exactly like that ... you'll need to add quotes around the #Filename I bet. Maybe assemble it with quotes and then use that variable.
If you're using SSIS, you can then pump all the files from a directory to the stored procedure, or to the SSIS code used.

I think you can do it with a cursor and xp_cmdshell. I would not recommend to ever use xp_cmdshell though.
DECLARE #FilesInAFolder TABLE (FileNames VARCHAR(500))
DECLARE #File VARCHAR(500)
INSERT INTO #FilesInAFolder
EXEC MASTER..xp_cmdshell 'dir /b c:\'
DECLARE CU CURSOR FOR
SELECT 'c:\' + FileNames
FROM #FilesInAFolder
WHERE RIGHT(FileNames,4) = '.xml'
OPEN CU
FETCH NEXT FROM CU INTO #File
WHILE ##FETCH_STATUS = 0
BEGIN
INSERT INTO dbo.articles(id, title, contents)
SELECT X.article.query('id').value('.', 'INT'),
X.article.query('article').value('.', 'VARCHAR(50)'),
X.article.query('/doc/text()').value('.', 'VARCHAR(MAX)')
FROM (
SELECT CAST(x AS XML)
FROM OPENROWSET(
BULK #File,
SINGLE_BLOB) AS T(x)
) AS T(x)
CROSS APPLY x.nodes('doc') AS X(article);
FETCH NEXT FROM CU INTO #File
END
CLOSE CU
DEALLOCATE CU

Related

Could EXEC and Select Into work together?

Im running following statement to get a result from remote server
EXEC ( #var) AT [linkedservername]
Note :
#var holds select query to run on linked server.
Linkedserver is DB2 server.
I would like to save the result into a temptable.
Is it possible ?
Can we achieve like below?
EXEC ( #var) AT [linkedservername] into #t
-- Updating to provider more info.
I came up to this,
read query from file.
execute it in desired DB2 server using linked server.
now i need to save it in a temptable
DECLARE #FileContents VARCHAR(MAX)
SELECT #FileContents=BulkColumn
FROM OPENROWSET(BULK'E:\ADDRESS.txt',SINGLE_BLOB) x;
set #FileContents = Replace(#FileContents,'''''','''')
set #FileContents = #FileContents + ' FETCH FIRST 1 ROWS only'
EXEC ( #FileContents) AT [linkedservername]
I Need something like below,
select * from (
EXEC ( #FileContents) AT [linkedservername] )
Updated for new information.
The answer is no for the way you're attempting to do it.
If you know the structure already, create the temp table first. You might have to change your RPC and DTC settings to get the call to work.
CREATE TABLE #temp
(<Table definition for your results>)
DECLARE #var VARCHAR(100)= 'your command'
INSERT INTO #temp EXEC (#var) AT TRANSACCOUNT
SELECT * FROM #temp
If the structure is unknown you, you can use OPENROWSET or OPENQUERY to generate the table on the fly.
Openquery:
SELECT *
INTO #temp FROM
OPENQUERY(targetServer, 'your command')
Openrowset:
SELECT * INTO
#temp
FROM
OPENROWSET(
'SQLNCLI',
'Server=targetServer;Trusted_Connection=yes;',
'your command'
)
SELECT * FROM #temp
OPENQUERY and OPENROWSET require string literals, so if you need to dynamically set your base command, you'd need to also build your OPENROWSET call. The context of the call would also change the context of any temporary table, so you could use a temporary permanent table to store your results like this:
DECLARE #var VARCHAR(100)= 'your command'
DECLARE #command VARCHAR(MAX)
SELECT #command = CONCAT(
'SELECT * INTO temporaryTable
FROM
OPENROWSET(
''SQLNCLI'',
''Server=targetServer;Trusted_Connection=yes;'',
''', #var,'''
)')
EXEC (#command)
SELECT * FROM temporaryTable

How to download a webpage and parse in SQL

I am simply trying to download a webpage and store it in an accessible format in SQL Server 2012. I have resorted to using dynamic SQL, but perhaps there is a cleaner, easier way to do this. I have been able to successfully download the htm files to my local drive using the below code, but I am having difficulty working with the html itself. I am trying to convert the webpage to XML and parse from there, but I think I am not addressing the HTML to XML conversion properly.
I get the following error, "Parsing XML with internal subset DTDs not allowed. Use CONVERT with style option 2 to enable limited internal subset DTD support"
DECLARE #URL NVARCHAR(500);
DECLARE #Ticker NVARCHAR(10)
DECLARE #DynamicTickerNumber INT
SET #DynamicTickerNumber = 1
CREATE TABLE Parsed_HTML(
[Date] DATETIME
,[Ticker] VarChar (8)
,[NodeName] VarChar (50)
,[Value] NVARCHAR (50));
WHILE #DynamicTickerNumber <= 2
BEGIN
SET #Ticker = (SELECT [Ticker] FROM [Unique Tickers Yahoo] WHERE [Unique Tickers Yahoo].[Ticker Number]= #DynamicTickerNumber)
SET #URL ='http://finance.yahoo.com/q/ks?s=' + #Ticker + '+Key+Statistics'
DECLARE #cmd NVARCHAR(250);
DECLARE #tOutput TABLE(data NVARCHAR(100));
DECLARE #file NVARCHAR(MAX);
SET #file='D:\Ressources\Execution Model\Execution Model for SQL\DB Temp\quoteYahooHTML.htm'
SET #cmd ='powershell "(new-object System.Net.WebClient).DownloadFile('''+#URL+''','''+#file+''')"'
EXEC master.dbo.xp_cmdshell #cmd, no_output
CREATE TABLE XmlImportTest
(
xmlFileName VARCHAR(300),
xml_data xml
);
DECLARE #xmlFileName VARCHAR(300)
SELECT #xmlFileName = 'D:\Ressources\Execution Model\Execution Model for SQL\DB Temp\quoteYahooHTML.htm'
EXEC('
INSERT INTO XmlImportTest(xmlFileName, xml_data)
SELECT ''' + #xmlFileName + ''', xmlData
FROM
(
SELECT *
FROM OPENROWSET (BULK ''' + #xmlFileName + ''' , SINGLE_BLOB) AS XMLDATA
) AS FileImport (XMLDATA)
')
DECLARE #x XML;
DECLARE #string VARCHAR(MAX);
SET #x = (SELECT xml_data FROM XmlImportTest)
SET #string = CONVERT(VARCHAR(MAX), #x, 1);
INSERT INTO [Parsed_HTML] ([NodeName], [Value])
SELECT [NodeName], [Value] FROM dbo.XMLTable(#string)
--above references XMLTable Parsing function that works consistently
END
Unfortunately this needs to be run within the confines of SQL Server, and my understanding is that the HTML Agility Pack is not immediately compatible. I also notice that the intermediate table, XMLimportTest, never gets populated, so this is likely not a function of malformed HTML.
Short answer: don't.
SQL is very good for some things but for downloading and parsing HTML it's a terrible choice. In your example you're using PowerShell to download the file, why not parse the HTML in PowerShell too? Then you could write the parsed data into something like a CSV file and load that in using OPENROWSET.
Another option, still not using SQL but a bit more within SQL Server might be to use a .Net SP via SQL CLR.
As a few of the comments point out, if you could guarantee the HTML was well formed XML then you could use SQL XML functionality to parse it, but web pages are rarely well formed XML so this would be a risky choice.

How to check if a column exists in a csv file during sql server OpenRowset?

I got a very tricky question and not sure if there is a way to get it work.
Basically, I have user uploaded a csv file. The first row is header. I need to check if a column name does exists in this csv file. For example I want to know if that csv file has a StudentName column. How do I do that?
I tried to import the file to a temp table. Once I have it in a temp table, I can get everything I need.
select * into #temp
from openrowset('Microsoft.ACE.OLEDB.12.0'
, 'Text; Database='+#folder+'; HDR=NO; IMEX=1;'
,'select top 1 * from ' + #filename)
But that does not work because openrowset cannot have variable in it. In order to make it work, I have to put this whole thing into a #sql variable then exec it.
However, that would not work either because if I do exec (#sql) I cannot use #temp because #temp will not be available after exec(#sql).
Any other way to get this work?
I hope to have some function or stored procedure like this: IsColumnExists(#directory, #filename, #colName)
Thanks.
Great question. I've never had the need to do this, but I can see how useful it could be. I tested this on SQL 2012.
--We'll do this dynamically. Change these values for your scenario.
--If you want to make this into a stored proc, these next two variables would be the sp input params.
DECLARE #Folder NVARCHAR(256) = 'D:\' --Or other folder
DECLARE #FileName NVARCHAR(128) = 'Test.csv' --Or other file.
--Temp table. This will ultimately hold the names of the columns from the OPENROWSET query.
CREATE TABLE #ColNames (name SYSNAME)
--Build tsql string...
DECLARE #Tsql NVARCHAR(MAX) = '
SELECT *
INTO #Temp
FROM OPENROWSET (
''Microsoft.ACE.OLEDB.12.0'',
''Text;Database=' + #Folder + ';HDR=YES'',
''SELECT * FROM [' + #FileName + '] WHERE 1 = 2''
) dt;
SELECT name
FROM tempdb.sys.columns
WHERE object_id = object_id(''tempdb..#Temp'')
'
--...pass string to EXEC
INSERT INTO #ColNames
EXEC (#Tsql)
--Return a resultset of column names.
--Add additional logic to test for presence/absence of specific column.
SELECT *
FROM #ColNames
DROP TABLE #ColNames

Procedure to insert Xml Into Sql Server -- Must Declare Scalar Variable

I am using the below procedure to try and insert xml via the filepath into a xml column. I am getting an error must declare scalar variable for ForeignId. Is there a better way of doing what I am trying to do, or am I on the right path?
Here is the procedure
ALTER PROC [dbo].[InsertXml] #path nvarchar(100)
,#ForeignId uniqueidentifier
AS
BEGIN
SET NOCOUNT ON
DECLARE #SQL nvarchar(4000) =
'INSERT INTO XmlTable(XmlId
, ForeignId
, TestXml)
SELECT NEWID()
, #ForeignId
,* FROM OPENROWSET(
BULK ''' + #path + ''',
SINGLE_BLOB) AS x;'
EXECUTE(#SQL);
RETURN ##ERROR;
END
When you're executing the SQL statement using EXECUTE(SQL) it has no access to the #ForeignId value
One way to solve this is to use sp_excuteSQL and do this instead of EXECUTE(#SQL);
DECLARE #ParmDefinition nvarchar(500);
SET #ParmDefinition = N'#ForeignId uniqueidentifier';
EXECUTE sp_executesql #SQL, #ParmDefinition, #ForeignId ;
You could also just concatenate the #ForeignId to your sql string but I can't recall if there are issues with that when using a uniqueidentifier

How to BULK INSERT a file into a *temporary* table where the filename is a variable?

I have some code like this that I use to do a BULK INSERT of a data file into a table, where the data file and table name are variables:
DECLARE #sql AS NVARCHAR(1000)
SET #sql = 'BULK INSERT ' + #tableName + ' FROM ''' + #filename + ''' WITH (CODEPAGE=''ACP'', FIELDTERMINATOR=''|'')'
EXEC (#sql)
The works fine for standard tables, but now I need to do the same sort of thing to load data into a temporary table (for example, #MyTable). But when I try this, I get the error:
Invalid Object Name: #MyTable
I think the problem is due to the fact that the BULK INSERT statement is constructed on the fly and then executed using EXEC, and that #MyTable is not accessible in the context of the EXEC call.
The reason that I need to construct the BULK INSERT statement like this is that I need to insert the filename into the statement, and this seems to be the only way to do that. So, it seems that I can either have a variable filename, or use a temporary table, but not both.
Is there another way of achieving this - perhaps by using OPENROWSET(BULK...)?
UPDATE:
OK, so what I'm hearing is that BULK INSERT & temporary tables are not going to work for me. Thanks for the suggestions, but moving more of my code into the dynamic SQL part is not practical in my case.
Having tried OPENROWSET(BULK...), it seems that that suffers from the same problem, i.e. it cannot deal with a variable filename, and I'd need to construct the SQL statement dynamically as before (and thus not be able to access the temp table).
So, that leaves me with only one option which is to use a non-temp table and achieve process isolation in a different way (by ensuring that only one process can be using the tables at any one time - I can think of several ways to do that).
It's annoying. It would have been much more convenient to do it the way I originally intended. Just one of those things that should be trivial, but ends up eating a whole day of your time...
You could always construct the #temp table in dynamic SQL. For example, right now I guess you have been trying:
CREATE TABLE #tmp(a INT, b INT, c INT);
DECLARE #sql NVARCHAR(1000);
SET #sql = N'BULK INSERT #tmp ...' + #variables;
EXEC master.sys.sp_executesql #sql;
SELECT * FROM #tmp;
This makes it tougher to maintain (readability) but gets by the scoping issue:
DECLARE #sql NVARCHAR(MAX);
SET #sql = N'CREATE TABLE #tmp(a INT, b INT, c INT);
BULK INSERT #tmp ...' + #variables + ';
SELECT * FROM #tmp;';
EXEC master.sys.sp_executesql #sql;
EDIT 2011-01-12
In light of how my almost 2-year old answer was suddenly deemed incomplete and unacceptable, by someone whose answer was also incomplete, how about:
CREATE TABLE #outer(a INT, b INT, c INT);
DECLARE #sql NVARCHAR(MAX);
SET #sql = N'SET NOCOUNT ON;
CREATE TABLE #inner(a INT, b INT, c INT);
BULK INSERT #inner ...' + #variables + ';
SELECT * FROM #inner;';
INSERT #outer EXEC master.sys.sp_executesql #sql;
It is possible to do everything you want. Aaron's answer was not quite complete.
His approach is correct, up to creating the temporary table in the inner query. Then, you need to insert the results into a table in the outer query.
The following code snippet grabs the first line of a file and inserts it into the table #Lines:
declare #fieldsep char(1) = ',';
declare #recordsep char(1) = char(10);
declare #Lines table (
line varchar(8000)
);
declare #sql varchar(8000) = '
create table #tmp (
line varchar(8000)
);
bulk insert #tmp
from '''+#filename+'''
with (FirstRow = 1, FieldTerminator = '''+#fieldsep+''', RowTerminator = '''+#recordsep+''');
select * from #tmp';
insert into #Lines
exec(#sql);
select * from #lines
Sorry to dig up an old question but in case someone stumbles onto this thread and wants a quicker solution.
Bulk inserting a unknown width file with \n row terminators into a temp table that is created outside of the EXEC statement.
DECLARE #SQL VARCHAR(8000)
IF OBJECT_ID('TempDB..#BulkInsert') IS NOT NULL
BEGIN
DROP TABLE #BulkInsert
END
CREATE TABLE #BulkInsert
(
Line VARCHAR(MAX)
)
SET #SQL = 'BULK INSERT #BulkInser FROM ''##FILEPATH##'' WITH (ROWTERMINATOR = ''\n'')'
EXEC (#SQL)
SELECT * FROM #BulkInsert
Further support that dynamic SQL within an EXEC statement has access to temp tables outside of the EXEC statement. http://sqlfiddle.com/#!3/d41d8/19343
DECLARE #SQL VARCHAR(8000)
IF OBJECT_ID('TempDB..#BulkInsert') IS NOT NULL
BEGIN
DROP TABLE #BulkInsert
END
CREATE TABLE #BulkInsert
(
Line VARCHAR(MAX)
)
INSERT INTO #BulkInsert
(
Line
)
SELECT 1
UNION SELECT 2
UNION SELECT 3
SET #SQL = 'SELECT * FROM #BulkInsert'
EXEC (#SQL)
Further support, written for MSSQL2000 http://technet.microsoft.com/en-us/library/aa175921(v=sql.80).aspx
Example at the bottom of the link
DECLARE #cmd VARCHAR(1000), #ExecError INT
CREATE TABLE #ErrFile (ExecError INT)
SET #cmd = 'EXEC GetTableCount ' +
'''pubs.dbo.authors''' +
'INSERT #ErrFile VALUES(##ERROR)'
EXEC(#cmd)
SET #ExecError = (SELECT * FROM #ErrFile)
SELECT #ExecError AS '##ERROR'
http://msdn.microsoft.com/en-us/library/ms191503.aspx
i would advice to create table with unique name before bulk inserting.