gawk printf missing characters - awk

I'm trying to create a script in (g)AWK in which I'd like to put the following EXACT lines at the beginning of the output text file:
<?xml version="1.0" encoding="UTF-8"?>
<notes version="1">
<labels>
<label id="0" color="30DBFF">Custom Label 1</label>
<label id="1" color="30FF97">Custom Label 2</label>
<label id="2" color="E1FF80">Custom Label 3</label>
<label id="3" color="FF9B30">Custom Label 4</label>
<label id="4" color="FF304E">Custom Label 5</label>
<label id="5" color="FF30D7">Custom Label 6</label>
<label id="6" color="303EFF">Custom Label 7</label>
<label id="7" color="1985FF">Custom Label 8</label>
</labels>
and this one to the end:
</notes>
Here is my script so far:
BEGIN {printf("<?xml version="1.0" encoding="UTF-8"?>\n") > "notes.sasi89.xml"}
END {printf("</notes>") > "notes.sasi89.xml"}
My problem is that it's not printing the way I'd like, it gives me this in the output file:
<?xml version=1 encoding=-8?>
</notes>
Some characters and quotes are missing, I've tried studying manuals but those are sound too complicated to me, I would appriciate if someone would give me a hand or put me to the right direction.

Answer is Community Wiki to give what credit can be given where credit is due.
Primary problem and solution
As swstephe noted in a comment:
You need to escape your quotes:
printf("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n")
Anti-patterns
I regard your outline script as an anti-pattern (actually, two anti-patterns). You have:
BEGIN {printf("<?xml version="1.0" encoding="UTF-8"?>\n") > "notes.sasi89.xml"}
END {printf("</notes>") > "notes.sasi89.xml"}
The anti-patterns are:
You repeat the file name; you shouldn't. You would do better to use:
BEGIN {file = "notes.sasi89.xml"
printf("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n") > file}
END {printf("</notes>") > file}
You shouldn't be doing the I/O redirection in the awk script in the first place. You should let the shell do the I/O redirection.
awk '
BEGIN {printf("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n")}
END {printf("</notes>")}
' > notes.sasi89.xml
There are times when I/O redirection in the script is appropriate, but that's when you need output to multiple files. When, as appears very probable here, you have just one output file, make the script write to standard output and have the shell do the I/O redirection. It is much more flexible; you can rename the file more easily, and send the output to other programs via a pipe, etc, which is very much harder if you have the output file name embedded in the awk script.

Related

Read .csv table and print it as SQL query output

I am using a bastardised version of T-SQL to generate reports about information within a database driven CAD software (Solidworks Electrical). I am trying to generate a Table of Contents. Due to limitations within the software, I have to generate this table using SQL.
What I would like to do is create the Table of Contents in Excel, save it as a .csv, and have my SQL query read this file and spit it out as an output.
Example Table:
Sheet,System
1,Radios
2,Processors
3,Navigation
After some searching I've been unable to find a solution myself. My problems are:
1) Read a .csv file stored on my harddrive
2) Turn this .csv file into a table (cant get stored on the database, is just temporary while we run the query)
3) Output the data in this table as the results of the query
I have tried to use the following to read my .csv table, but recieve the error "Syntax error, permission violation, or other nonspecific error". So it's possible my software just won't allow me to read external files. (NB, my software uses ]] [[ instead of quotes....)
select
]]col1[[,
]]col2[[,
]]col3[[
from openrowset('MSDASQL'
,'Driver={Microsoft Access Text Driver (*.txt, *.csv)}'
,'select * from D:\SQL Queries\input.CSV')
Any assistance would be much appreciated! Thanks
This sql works for me:
select * from openrowset (bulk N'C:\Temp\source.csv', formatfile = N'C:\Temp\format.xml', firstrow=2) SourceFile
Content of source.csv is this:
Sheet,System
1,Radios
2,Processors
3,Navigation
Content of format.xml is this:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="128" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="ID" xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="Name" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>

Converting Complex XML to CSV

I have some (complex to me) XML code that I need to convert into CSV, I need absolutely every value added to the CSV for every submission, I have tried a few basic things however I cant get past the deep nesting and the different structures of this file.
Could someone please help me with a powershell script that would, I have started but cannot get the output of all data out I only get Canvas Results
Submissions.xml To large to post here (102KB)
$d=([xml](gc submissions.xml)).CANVASRESULTS | % {
foreach ($i in $_.CANVASRESULTS) {
$o = New-Object Object
Add-Member -InputObject $o -MemberType NoteProperty -Name Submissions -Value $_.Submission
Add-Member -InputObject $o -MemberType NoteProperty -Name Submission -Value $i
$o
}
}
$d | ConvertTo-Csv -NoTypeInformation -Delimiter ","
Anytime a complex XML has deeply nested structures and you require migration into a flat file format (i.e., txt, csv, xlsx, sql), consider using XSLT to simplify your XML format. As information, XSLT is a declarative, special-purpose programming language used to style, re-format, re-structure XML/HTML and other SGML markup documents for various end-use purposes. Aside - SQL is also a declarative, special-purpose programming language.
For most softwares to import XML into flat file formats in two dimensions of rows and columns, XML files must follow repeating elements (i.e., rows/records) with one level of children for columns/fields:
<data>
<row>
<column1>value</column1>
<column1>value</column1>
<column1>value</column1>
...
</row>
<row>
...
</data>
Nearly every programming language maintains an XSLT processor including PowerShell, Java, C#, Perl, PHP, Python, SAS, even VBA with your everyday MS Excel. For your complex XML, below is an example XSLT stylesheet with following output. Do note I manually create nodes based on values from original XML:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="CanvasResult">
<Data>
<xsl:for-each select="//Responses">
<Submission>
<Fitter><xsl:value-of select="Response[contains(Label, 'Fitter Name')]/Value"/></Fitter>
<Date><xsl:value-of select="Response[Label='Date']/Value"/></Date>
<Time><xsl:value-of select="Response[Label='Time']/Value"/></Time>
<Client><xsl:value-of select="Response[Label='Client']/Value"/></Client>
<Machine><xsl:value-of select="Response[Label='Machine']/Value"/></Machine>
<Hours><xsl:value-of select="Response[Label='Hours']/Value"/></Hours>
<Signature><xsl:value-of select="Response[Label='Signature']/Value"/></Signature>
<SubmissionDate><xsl:value-of select="Response[Label='Submission Date:']/Value"/></SubmissionDate>
<SubmissionTime><xsl:value-of select="Response[Label='Submission Time:']/Value"/></SubmissionTime>
<Customer><xsl:value-of select="Response[Label='Customer:']/Value"/></Customer>
<PlantLocation><xsl:value-of select="Response[Label='Plant Location']/Value"/></PlantLocation>
<PlantType><xsl:value-of select="Response[Label='Plant Type:']/Value"/></PlantType>
<PlantID><xsl:value-of select="Response[Label='Plant ID:']/Value"/></PlantID>
<PlantHours><xsl:value-of select="Response[Label='Plant Hours:']/Value"/></PlantHours>
<RegoExpiryDate><xsl:value-of select="Response[Label='Rego Expiry Date:']/Value"/></RegoExpiryDate>
<Comments><xsl:value-of select="Response[Label='Comments:']/Value"/></Comments>
</Submission>
</xsl:for-each>
</Data>
</xsl:template>
</xsl:stylesheet>
Output
<?xml version='1.0' encoding='UTF-8'?>
<Data>
...
<Submission>
<Fitter>Damian Stewart</Fitter>
<Date/>
<Time/>
<Client/>
<Machine/>
<Hours/>
<Signature/>
<SubmissionDate>28/09/2015</SubmissionDate>
<SubmissionTime>16:30</SubmissionTime>
<Customer>Dicks Diesels</Customer>
<PlantLocation/>
<PlantType>Dozer</PlantType>
<PlantID>DZ09</PlantID>
<PlantHours>2213.6</PlantHours>
<RegoExpiryDate>05/03/2016</RegoExpiryDate>
<Comments>Moving tomorrow from Daracon BOP to KCE BOP S6A Dam
Cabbie to operate</Comments>
</Submission>
...
</Data>
From there, you can import the two-dimensional XML into a usable rows/columns format. Below are the same import into an MS Access Database and MS Excel spreadsheet. You will notice gaps in the data due to XML content not populating the created nodes (handled in XSLT). A simple SQL cleanup can render final dataset.
Database Import

shell script to add to the

i need to append to an xml file by reading from an csv file
i/p file pattern
test.dat
account,bill,bill seg
12345,12445,121
14456,14467,903
14456,14467,903
i need to add each line to the xml file sample xml file
<?xml version="1.0" encoding="UTF-8"?>
<BusinessConfiguration
xmlns="http://www.portal.com/schemas/BusinessConfig"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.portal.com/schemas/BusinessConfig BusinessConfiguration.xsd">
<!-- Sample input file for pin_bill_accts containing parameters for bill run management -->
<!-- Modify according to guidelines -->
<BillRunConfiguration>
<!-- List of DOMs for this bill run -->
<DOMList>
<DOM>---1</DOM>
</DOMList>
<BillSegmentList>
</BillSegmentList>
<!-- List of billing segments for this bill run -->
<BillingList>
</BillingList>
i need to add the first 2 record in the csv under the tag like
<BillingList>
<Account>12345</Account>
<Billinfo>12445</Billinfo>
</BillingList>
<BillingList>
<Account>14456</Account>
<Billinfo>14467</Billinfo>
</BillingList>
I have the following code currently:
#!/bin/awk -f
NR==FNR{ a[NR]=$0; next; }
/<BillingList>/{
print;
gsub("'","",a[++i]);
n=split(a[i],arr,",");
if( n!= 3) { next }
print "\t<Account>",arr[1],"</Account>";
print "\t<Billinfo>",arr[2],"</Billinfo>";
next;
}1
but it only replaces the first record and then stops.
I am completely new to unix so please help me in resolving this.
this will give you the segment
$ awk -F, # use comma as field separator
'NR>1&&NF{ # skip first line and blank lines
print # print the following
"<BillingList>
<Account>"$1"</Account> # insert first field
<BillInfo>"$2"</BillInfo> # insert second field
</BillingList>"
}' data.txt
you can add newlines at the right spots if you want.

Error using SSIS Bids in Visual Studio 2012

I have to load several tables into SQL Server 2012 from SQL Server 2000. I heard BIDS could do this and I'm pretty new to it and wanted to some help. I would really appreciate whatever help I get with it.
I have Installed BIDS helper. already and used the below code. But it gives me errors stating,
Error 1187 Illegal syntax. Expecting valid start name character.
Error 1188 Character '#', hexadecimal value 0x23 is illegal in an XML name.
Error 1189 The character '#', hexadecimal value 0x40 is illegal at the beginning of an XML name.
<## template language="C#" hostspecific="true" #>
<## import namespace="System.Data" #>
<## import namespace="System.Data.SqlClient" #>
<## import namespace="System.IO" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<!--
<#
string connectionStringSource = #"Server=xxxxx;Initial Catalog=xxxx;Integrated Security=SSPI;Provider=sqloledb";
string connectionStringDestination = #"Server=xxxxxx;Initial Catalog=xxxxxxx;Integrated Security=SSPI;Provider=SQLNCLI11.1";
string SrcTableQuery = #"
SELECT
SCHEMA_NAME(t.schema_id) AS schemaName
, T.name AS tableName
FROM
sys.tables AS T
WHERE
T.is_ms_shipped = 0
AND T.name <> 'sysdiagrams';
";
DataTable dt = null;
dt = ExternalDataAccess.GetDataTable(connectionStringSource, SrcTableQuery);
#>
-->
<Connections>
<OleDbConnection
Name="SRC"
CreateInProject="false"
ConnectionString="<#=connectionStringSource#>"
RetainSameConnection="false">
</OleDbConnection>
<OleDbConnection
Name="DST"
CreateInProject="false"
ConnectionString="<#=connectionStringDestination#>"
RetainSameConnection="false">
</OleDbConnection>
</Connections>
<Packages>
<# foreach (DataRow dr in dt.Rows) { #>
<Package ConstraintMode="Linear"
Name="<#=dr[1].ToString()#>"
>
<Variables>
<Variable Name="SchemaName" DataType="String"><#=dr[0].ToString()#></Variable>
<Variable Name="TableName" DataType="String"><#=dr[1].ToString()#></Variable>
<Variable Name="QualifiedTableSchema"
DataType="String"
EvaluateAsExpression="true">"[" + #[User::SchemaName] + "].[" + #[User::TableName] + "]"</Variable>
</Variables>
<Tasks>
<Dataflow
Name="DFT"
>
<Transformations>
<OleDbSource
Name="OLE_SRC <#=dr[0].ToString()#>_<#=dr[1].ToString()#>"
ConnectionName="SRC"
>
<TableFromVariableInput VariableName="User.QualifiedTableSchema"/>
</OleDbSource>
<OleDbDestination
Name="OLE_DST <#=dr[0].ToString()#>_<#=dr[1].ToString()#>"
ConnectionName="DST"
KeepIdentity="true"
TableLock="true"
UseFastLoadIfAvailable="true"
KeepNulls="true"
>
<TableFromVariableOutput VariableName="User.QualifiedTableSchema" />
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
<# } #>
</Packages>
</Biml>
This is the maddening thing about trying to do much BimlScript in Visual Studio. The editor "knows" it's doing XML markup so all of the enhancements that make up BimlScript are "wrong" and so it's going to highlight them and put angry red squigglies and have you question whether you really have valid code here.
In my Error List, I see the same things you're seeing but this is one of the few times you can ignore Visual Studio's built in error checker.
Instead, the true test of whether the code is good is right click on the .biml file(s) and select "Check Biml for errors"
You should get a dialogue like
If so, click Generate SSIS Packages and then get some tape to attach your mind back into your head as it's just been blown ;)
Operational note
Note that the supplied code is going to copy all of the data from the source to the target. But, you've also specified that this is going to be a monthly operation so you'd either want to add a truncate step via an Execute SQL Task, or factor in a Lookup Transformation (or two) to determine new versus existing data and change detection.

Using GREP / RegEx to find and replace string

So, I'm trying to migrate a database from Textpattern CMS to something more generic. There are some textpattern-specific commands inside of articles that pull in images. I want to turn these into generic HTML image links. At the moment, they look like this in the sql file:
<txp:upm_image image_id="4" form="dose" />
I want to turn these into something more like this:
<img src="4.jpg" class="dose" />
I've had some luck with TextWrangler doing some regex stuff, but I'm stumped. Any ideas on how to find & replace all of these image paths?
EDIT:
For future reference, here's what I ended up doing in PHP to output it:
$body = $post['Body_html'];
$pattern = '/txp:upm_image image_id="([0-9]+)" form="([^"]*)"/i';
$replacement = 'img src="/images/$1.jpg" class="$2"';
$body = preg_replace($pattern, $replacement, $body);
// outputed <img src="/images/59.jpg" class="dose" />
I wouldn't use grep; it's sed you want
$ echo '<txp:upm_image image_id="4" form="dose" />' | sed -e 's/^.*image_id="\([[:digit:]]*\)".*form="\([[:alpha:]]*\)".*/<img src="\1.jpg" class="\2" \/>/'
<img src="4.jpg" class="dose" />
$
if your class has alphanumeric characters, use [[:alnum:]]
(works on macos darwin)
Not sure which tool you are using but try this regex solution: Search for this:
<txp:upm_image\s+image_id="(\d+)"\s+form="([^"]*)"\s*\/>
And replace with this:
<img src="$1.jpg" class="$2" />
Note that this only works for txp tags having the same form as your example. It will fail if there are txp tags having extra attributes, or if they are in a different order.