Converting Complex XML to CSV - sql

I have some (complex to me) XML code that I need to convert into CSV, I need absolutely every value added to the CSV for every submission, I have tried a few basic things however I cant get past the deep nesting and the different structures of this file.
Could someone please help me with a powershell script that would, I have started but cannot get the output of all data out I only get Canvas Results
Submissions.xml To large to post here (102KB)
$d=([xml](gc submissions.xml)).CANVASRESULTS | % {
foreach ($i in $_.CANVASRESULTS) {
$o = New-Object Object
Add-Member -InputObject $o -MemberType NoteProperty -Name Submissions -Value $_.Submission
Add-Member -InputObject $o -MemberType NoteProperty -Name Submission -Value $i
$o
}
}
$d | ConvertTo-Csv -NoTypeInformation -Delimiter ","

Anytime a complex XML has deeply nested structures and you require migration into a flat file format (i.e., txt, csv, xlsx, sql), consider using XSLT to simplify your XML format. As information, XSLT is a declarative, special-purpose programming language used to style, re-format, re-structure XML/HTML and other SGML markup documents for various end-use purposes. Aside - SQL is also a declarative, special-purpose programming language.
For most softwares to import XML into flat file formats in two dimensions of rows and columns, XML files must follow repeating elements (i.e., rows/records) with one level of children for columns/fields:
<data>
<row>
<column1>value</column1>
<column1>value</column1>
<column1>value</column1>
...
</row>
<row>
...
</data>
Nearly every programming language maintains an XSLT processor including PowerShell, Java, C#, Perl, PHP, Python, SAS, even VBA with your everyday MS Excel. For your complex XML, below is an example XSLT stylesheet with following output. Do note I manually create nodes based on values from original XML:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="CanvasResult">
<Data>
<xsl:for-each select="//Responses">
<Submission>
<Fitter><xsl:value-of select="Response[contains(Label, 'Fitter Name')]/Value"/></Fitter>
<Date><xsl:value-of select="Response[Label='Date']/Value"/></Date>
<Time><xsl:value-of select="Response[Label='Time']/Value"/></Time>
<Client><xsl:value-of select="Response[Label='Client']/Value"/></Client>
<Machine><xsl:value-of select="Response[Label='Machine']/Value"/></Machine>
<Hours><xsl:value-of select="Response[Label='Hours']/Value"/></Hours>
<Signature><xsl:value-of select="Response[Label='Signature']/Value"/></Signature>
<SubmissionDate><xsl:value-of select="Response[Label='Submission Date:']/Value"/></SubmissionDate>
<SubmissionTime><xsl:value-of select="Response[Label='Submission Time:']/Value"/></SubmissionTime>
<Customer><xsl:value-of select="Response[Label='Customer:']/Value"/></Customer>
<PlantLocation><xsl:value-of select="Response[Label='Plant Location']/Value"/></PlantLocation>
<PlantType><xsl:value-of select="Response[Label='Plant Type:']/Value"/></PlantType>
<PlantID><xsl:value-of select="Response[Label='Plant ID:']/Value"/></PlantID>
<PlantHours><xsl:value-of select="Response[Label='Plant Hours:']/Value"/></PlantHours>
<RegoExpiryDate><xsl:value-of select="Response[Label='Rego Expiry Date:']/Value"/></RegoExpiryDate>
<Comments><xsl:value-of select="Response[Label='Comments:']/Value"/></Comments>
</Submission>
</xsl:for-each>
</Data>
</xsl:template>
</xsl:stylesheet>
Output
<?xml version='1.0' encoding='UTF-8'?>
<Data>
...
<Submission>
<Fitter>Damian Stewart</Fitter>
<Date/>
<Time/>
<Client/>
<Machine/>
<Hours/>
<Signature/>
<SubmissionDate>28/09/2015</SubmissionDate>
<SubmissionTime>16:30</SubmissionTime>
<Customer>Dicks Diesels</Customer>
<PlantLocation/>
<PlantType>Dozer</PlantType>
<PlantID>DZ09</PlantID>
<PlantHours>2213.6</PlantHours>
<RegoExpiryDate>05/03/2016</RegoExpiryDate>
<Comments>Moving tomorrow from Daracon BOP to KCE BOP S6A Dam
Cabbie to operate</Comments>
</Submission>
...
</Data>
From there, you can import the two-dimensional XML into a usable rows/columns format. Below are the same import into an MS Access Database and MS Excel spreadsheet. You will notice gaps in the data due to XML content not populating the created nodes (handled in XSLT). A simple SQL cleanup can render final dataset.
Database Import

Related

How can I transform an Excel file using DataWeave?

I have the following XML, which is inside an Excel (.xlsx) file. I want to put the word "test" in all the Country columns:
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><dimension ref="A1"/><sheetViews><sheetView workbookViewId="0" tabSelected="true"/></sheetViews><sheetFormatPr defaultRowHeight="15.0"/><sheetData>
<row r="1">
<c r="A1"
t="inlineStr"><is><t>FirstName</t></is></c><c r="B1"
t="inlineStr"><is><t>MiddleName</t></is></c><c r="C1"
t="inlineStr"><is><t>LastName</t></is></c><c r="D1"
t="inlineStr"><is><t>Street</t></is></c><c r="F1"
t="inlineStr"><is><t>State</t></is></c><c r="G1"
t="inlineStr"><is><t>PostalCode</t></is></c><c r="H1"
t="inlineStr"><is><t>Country</t></is></c><c r="I1"
t="inlineStr"><is><t>Birthdate</t></is></c><c r="J1"</row>
<row r="2">
<c r="A2"
t="inlineStr"><is><t>Willy</t></is></c><c r="B2"
t="inlineStr"><is><t></t></is></c><c r="C2"
t="inlineStr"><is><t>Kelly</t></is></c><c r="D2"
t="inlineStr"><is><t>1234 TEST</t></is></c><c r="F2"
t="inlineStr"><is><t>TE</t></is></c><c r="G2"
t="inlineStr"><is><t>12345</t></is></c><c r="H2"
t="inlineStr"><is><t></t></is></c><c r="I2"
t="inlineStr"><is><t>1997-15-08T00:00:00</t></is></c><c r="J2"
</row>
<row r="3">
Mule Runtime version 4.4.0 EE
DataWeave already supports reading and writing Excel (.xlsx) files natively, with some restrictions. You don't need to decompress the .xlsx file, nor find the right XML. Just ensure the MIME type of the input file is application/xlsx. You can force it in most connectors if needed with the attribute outputMimeType="application/xlsx".
Then the DataWeave script to transform is something simple like:
%dw 2.0
output application/json
---
payload.mytabname
map ($ update {
case .Country -> "test"
})
Replace mytabname by the tab or sheet name that contains the table.

Need to Revise version of SQL Server generated XML

The following lines of code generate me an XML file looking like the below set, which is almost acceptable to my client. I say ALMOST because the one change I need to to have <?xml version="1.0" encoding="UTF-8"?> versus the standard <?xml version="1.0"> which is at the top of every XML file I generate using the EXEC xp_cmdshell command below. I essentially need <?xml version="1.0" encoding="UTF-8"?> instead of <?xml version="1.0">. Can someone please tell me how this can be accomplished?
-- SQL CODE USED TO GENERATE XML FILE - Using XML Path
SET #FileString = #FileName + '.xml" -S ALSCG-JPATHIL\SQLEXPRESS -T -c -t,'
SET #SQLSTRING = 'bcp ";WITH XMLNAMESPACES (DEFAULT ''urn:CP-xml'') select A.TargetSystem AS ''Header/Target'' from [Header] A FOR XML PATH(''Qty'')" queryout "C:\Program Files\'
SET #SQLSTRING = #SQLSTRING + #FileString
EXEC xp_cmdshell #SQLSTRING
-- XML FILE CONTENTS GENERATED - Missing the Encoding Condition here
<?xml version="1.0">
<Qty xmlns="urn:CP-xml">
<Header>
<Target></Target>
</Header>
</Qty>
-- XML FILE CONTENTS DESIRED - Note only difference is the Encoding!
<?xml version="1.0" encoding="UTF-8"?>
<Qty xmlns="urn:CP-xml">
<Header>
<Target></Target>
</Header>
</Qty>
The very last response in this link has the only "solution" to this problem that I've been able to find
Hi, (late response but might help someone in future) VARBINARY did not
work for me, probably 'coz of the my datasource didn't comply. Heres
what worked for me at the SQL end;
1) Store your raw xml data as VARCHAR or TEXT (instead of NVARCHAR or
NTEXT) into a variable,
2) Read this variable into the xml data type using utf-8 encoding.
Something like:
DECLARE #TempHTMLText
SET #TempHTMLText = --your raw xml data DECLARE #XMLDataText XML
SELECT #XMLDataType = '<?xml version="1.0" encoding="utf-8" ?>' + #TempHTMLText
I don't think you'd need to do exactly that, rather use the following query:
DECLARE #Xml XML;
WITH XMLNAMESPACES(DEFAULT 'urn:CP-xml')
SELECT #Xml = (select A.TargetSystem AS 'Header/Target' from [Header] A FOR XML PATH('Qty'));
SELECT CONCAT('<?xml version="1.0" encoding="utf-8" ?>', CAST(#Xml AS VARCHAR(MAX)))
Note that in my testing, I was unable to convert this data back to XML while preserving the encoding tag. That seems to be enforced explicitly by the XML data type

Read .csv table and print it as SQL query output

I am using a bastardised version of T-SQL to generate reports about information within a database driven CAD software (Solidworks Electrical). I am trying to generate a Table of Contents. Due to limitations within the software, I have to generate this table using SQL.
What I would like to do is create the Table of Contents in Excel, save it as a .csv, and have my SQL query read this file and spit it out as an output.
Example Table:
Sheet,System
1,Radios
2,Processors
3,Navigation
After some searching I've been unable to find a solution myself. My problems are:
1) Read a .csv file stored on my harddrive
2) Turn this .csv file into a table (cant get stored on the database, is just temporary while we run the query)
3) Output the data in this table as the results of the query
I have tried to use the following to read my .csv table, but recieve the error "Syntax error, permission violation, or other nonspecific error". So it's possible my software just won't allow me to read external files. (NB, my software uses ]] [[ instead of quotes....)
select
]]col1[[,
]]col2[[,
]]col3[[
from openrowset('MSDASQL'
,'Driver={Microsoft Access Text Driver (*.txt, *.csv)}'
,'select * from D:\SQL Queries\input.CSV')
Any assistance would be much appreciated! Thanks
This sql works for me:
select * from openrowset (bulk N'C:\Temp\source.csv', formatfile = N'C:\Temp\format.xml', firstrow=2) SourceFile
Content of source.csv is this:
Sheet,System
1,Radios
2,Processors
3,Navigation
Content of format.xml is this:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="128" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="ID" xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="Name" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>

gawk printf missing characters

I'm trying to create a script in (g)AWK in which I'd like to put the following EXACT lines at the beginning of the output text file:
<?xml version="1.0" encoding="UTF-8"?>
<notes version="1">
<labels>
<label id="0" color="30DBFF">Custom Label 1</label>
<label id="1" color="30FF97">Custom Label 2</label>
<label id="2" color="E1FF80">Custom Label 3</label>
<label id="3" color="FF9B30">Custom Label 4</label>
<label id="4" color="FF304E">Custom Label 5</label>
<label id="5" color="FF30D7">Custom Label 6</label>
<label id="6" color="303EFF">Custom Label 7</label>
<label id="7" color="1985FF">Custom Label 8</label>
</labels>
and this one to the end:
</notes>
Here is my script so far:
BEGIN {printf("<?xml version="1.0" encoding="UTF-8"?>\n") > "notes.sasi89.xml"}
END {printf("</notes>") > "notes.sasi89.xml"}
My problem is that it's not printing the way I'd like, it gives me this in the output file:
<?xml version=1 encoding=-8?>
</notes>
Some characters and quotes are missing, I've tried studying manuals but those are sound too complicated to me, I would appriciate if someone would give me a hand or put me to the right direction.
Answer is Community Wiki to give what credit can be given where credit is due.
Primary problem and solution
As swstephe noted in a comment:
You need to escape your quotes:
printf("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n")
Anti-patterns
I regard your outline script as an anti-pattern (actually, two anti-patterns). You have:
BEGIN {printf("<?xml version="1.0" encoding="UTF-8"?>\n") > "notes.sasi89.xml"}
END {printf("</notes>") > "notes.sasi89.xml"}
The anti-patterns are:
You repeat the file name; you shouldn't. You would do better to use:
BEGIN {file = "notes.sasi89.xml"
printf("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n") > file}
END {printf("</notes>") > file}
You shouldn't be doing the I/O redirection in the awk script in the first place. You should let the shell do the I/O redirection.
awk '
BEGIN {printf("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n")}
END {printf("</notes>")}
' > notes.sasi89.xml
There are times when I/O redirection in the script is appropriate, but that's when you need output to multiple files. When, as appears very probable here, you have just one output file, make the script write to standard output and have the shell do the I/O redirection. It is much more flexible; you can rename the file more easily, and send the output to other programs via a pipe, etc, which is very much harder if you have the output file name embedded in the awk script.

Error using SSIS Bids in Visual Studio 2012

I have to load several tables into SQL Server 2012 from SQL Server 2000. I heard BIDS could do this and I'm pretty new to it and wanted to some help. I would really appreciate whatever help I get with it.
I have Installed BIDS helper. already and used the below code. But it gives me errors stating,
Error 1187 Illegal syntax. Expecting valid start name character.
Error 1188 Character '#', hexadecimal value 0x23 is illegal in an XML name.
Error 1189 The character '#', hexadecimal value 0x40 is illegal at the beginning of an XML name.
<## template language="C#" hostspecific="true" #>
<## import namespace="System.Data" #>
<## import namespace="System.Data.SqlClient" #>
<## import namespace="System.IO" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<!--
<#
string connectionStringSource = #"Server=xxxxx;Initial Catalog=xxxx;Integrated Security=SSPI;Provider=sqloledb";
string connectionStringDestination = #"Server=xxxxxx;Initial Catalog=xxxxxxx;Integrated Security=SSPI;Provider=SQLNCLI11.1";
string SrcTableQuery = #"
SELECT
SCHEMA_NAME(t.schema_id) AS schemaName
, T.name AS tableName
FROM
sys.tables AS T
WHERE
T.is_ms_shipped = 0
AND T.name <> 'sysdiagrams';
";
DataTable dt = null;
dt = ExternalDataAccess.GetDataTable(connectionStringSource, SrcTableQuery);
#>
-->
<Connections>
<OleDbConnection
Name="SRC"
CreateInProject="false"
ConnectionString="<#=connectionStringSource#>"
RetainSameConnection="false">
</OleDbConnection>
<OleDbConnection
Name="DST"
CreateInProject="false"
ConnectionString="<#=connectionStringDestination#>"
RetainSameConnection="false">
</OleDbConnection>
</Connections>
<Packages>
<# foreach (DataRow dr in dt.Rows) { #>
<Package ConstraintMode="Linear"
Name="<#=dr[1].ToString()#>"
>
<Variables>
<Variable Name="SchemaName" DataType="String"><#=dr[0].ToString()#></Variable>
<Variable Name="TableName" DataType="String"><#=dr[1].ToString()#></Variable>
<Variable Name="QualifiedTableSchema"
DataType="String"
EvaluateAsExpression="true">"[" + #[User::SchemaName] + "].[" + #[User::TableName] + "]"</Variable>
</Variables>
<Tasks>
<Dataflow
Name="DFT"
>
<Transformations>
<OleDbSource
Name="OLE_SRC <#=dr[0].ToString()#>_<#=dr[1].ToString()#>"
ConnectionName="SRC"
>
<TableFromVariableInput VariableName="User.QualifiedTableSchema"/>
</OleDbSource>
<OleDbDestination
Name="OLE_DST <#=dr[0].ToString()#>_<#=dr[1].ToString()#>"
ConnectionName="DST"
KeepIdentity="true"
TableLock="true"
UseFastLoadIfAvailable="true"
KeepNulls="true"
>
<TableFromVariableOutput VariableName="User.QualifiedTableSchema" />
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
<# } #>
</Packages>
</Biml>
This is the maddening thing about trying to do much BimlScript in Visual Studio. The editor "knows" it's doing XML markup so all of the enhancements that make up BimlScript are "wrong" and so it's going to highlight them and put angry red squigglies and have you question whether you really have valid code here.
In my Error List, I see the same things you're seeing but this is one of the few times you can ignore Visual Studio's built in error checker.
Instead, the true test of whether the code is good is right click on the .biml file(s) and select "Check Biml for errors"
You should get a dialogue like
If so, click Generate SSIS Packages and then get some tape to attach your mind back into your head as it's just been blown ;)
Operational note
Note that the supplied code is going to copy all of the data from the source to the target. But, you've also specified that this is going to be a monthly operation so you'd either want to add a truncate step via an Execute SQL Task, or factor in a Lookup Transformation (or two) to determine new versus existing data and change detection.