Using a variable in replacing instead of a string in awk - awk

I'm using this command in a bash script in order to replace the string "NOTHING_HERE" with "$EMAIL" if it fins the URL "$findURL".
The problem is that I don't know how to tell awk to use the value of the variable $EMAILS instead of using the stirng "$EMAIL".
awk -v RS="</Row>" '/'$findURL'/{sub(/NOTHING_HERE/,"$EMAIL")}1' ORS="</Row>" /home/pi/testJMC/JustLinksJMC2.xml | sed '$d'
Any ideas?
Thanks!
Edit: to provide sample input:
<Row ss:AutoFitHeight="0">
<Cell ss:StyleID="s65" ss:HRef="http://www.mapeo-rse.info/promotor/fundaci%C3%B3n-ecolog%C3%AD-y-desarrollo-ecodes"><Data
ss:Type="String">Fundación Ecología y Desarrollo (ECODES)</Data></Cell>
<Cell><Data ss:Type="String">NOTHING_HERE</Data></Cell>
</Row>
<Row ss:AutoFitHeight="0">
<Cell ss:StyleID="s65" ss:HRef="http://www.mapeo-rse.info/promotor/fundaci%C3%B3n-iberoamericana-para-la-gesti%C3%B3n-de-la-calidad-fundibeq"><Data
ss:Type="String">Fundación Iberoamericana para la Gestión de la Calidad (Fundibeq)</Data></Cell>
<Cell><Data ss:Type="String">NOTHING_HERE</Data></Cell>
</Row>
<Row ss:AutoFitHeight="0">
<Cell ss:StyleID="s65" ss:HRef="http://www.mapeo-rse.info/promotor/fundaci%C3%B3n-interamericana-iaf"><Data
ss:Type="String">Fundación Interamericana (IAF)</Data></Cell>
<Cell><Data ss:Type="String">NOTHING_HERE</Data></Cell>
</Row>
<Row ss:AutoFitHeight="0">
<Cell ss:StyleID="s65" ss:HRef="http://www.mapeo-rse.info/promotor/fundaci%C3%B3n-nuevo-periodismo-iberoamericano-fnpi"><Data
ss:Type="String">Fundación Nuevo Periodismo Iberoamericano (FNPI)</Data></Cell>
<Cell><Data ss:Type="String">NOTHING_HERE</Data></Cell>
</Row>
<Row ss:AutoFitHeight="0">
<Cell ss:StyleID="s65" ss:HRef="http://www.mapeo-rse.info/promotor/fundaci%C3%B3n-para-el-desarrollo-sostenible-fundes"><Data
ss:Type="String">Fundación para el Desarrollo Sostenible (FUNDES)</Data></Cell>
<Cell><Data ss:Type="String">NOTHING_HERE</Data></Cell>
</Row>

You need to awk's way of passing shell variable to awk using -v name=value syntax:
awk -v RS="</Row>" -v u="$findURL" -v email="$EMAIL" '$~u{sub(/NOTHING_HERE/, email)}1' ORS="</Row>" /home/pi/testJMC/JustLinksJMC2.xml | sed '$d'

You can do the same way you did for $findURL:
awk -v RS="</Row>" '/'$findURL'/{sub(/NOTHING_HERE/,'"$EMAIL"')}1' ORS="</Row>" /home/pi/testJMC/JustLinksJMC2.xml | sed '$d'
This should work but I couldn't test as you didn't provide a input snippet.

Related

Loop Through Collection of XML Records in SQL

I have a dataset that looks like this:
<?xml version="1.0" encoding="utf-8"?>
<process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<return>
<approved>
<callbackTable>
<tableRow max="100" value="10" selectedRow="true" maxRow="112.0">
<tableCell term="72" selectedCell="false" maxCell="73">
<number>21.7</number>
</tableCell>
<tableCell term="74" selectedCell="true" maxCell="75">
<number>21.7</number>
</tableCell>
</tableRow>
<tableRow max="200" value="15" selectedRow="false" maxRow="113.0">
<tableCell term="76" selectedCell="false" maxCell="77">
<number>14.5</number>
</tableCell>
<tableCell term="78" selectedCell="false" maxCell="79">
<number>22.5</number>
</tableCell>
</tableRow>
<tableRow max="300" value="20" selectedRow="false" maxRow="114.0">
<tableCell term="80" selectedCell="false" maxCell="81">
<number>14.6</number>
</tableCell>
<tableCell term="82" selectedCell="false" maxCell="83">
<number>15.7</number>
</tableCell>
</tableRow>
</callbackTable>
</approved>
</return>
</process>
I need to be able to loop through the callbackTable entries and add them to a table named Options.
Here is what I need the data to ultimately look like in the Options table.
Id
Max
Value
SelectedRow
MaxRow
Term
SelectedCell
MaxCell
Number
1
100
10
true
112.0
72
false
73
21.7
2
100
10
true
112.0
74
true
75
21.7
3
200
15
false
113.0
76
false
77
14.5
4
200
15
false
113.0
78
false
79
22.5
5
300
20
false
114.0
80
false
81
14.6
6
300
20
false
114.0
82
false
83
15.7
(Note that the Id column is an identity key and does not need to be populated)
The tricky part is that I don't know exactly how many rows or how many cells are in the callbackTable collection so I will need to loop through the results and insert based on the number of items in the collection.
I could really use some help as I'm not entirely sure where to start.
Thanks in advance!
If you can change the encoding in the XML processing instruction to utf-16 or omit it, try the set-based query below. Note the Id column of the target table is omitted from the column list so that SQL Server will assign the IDENTITY value.
DECLARE #xml xml =
<process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<return>
<approved>
<callbackTable>
<tableRow max="100" value="10" selectedRow="true" maxRow="112.0">
<tableCell term="72" selectedCell="false" maxCell="73">
<number>21.7</number>
</tableCell>
<tableCell term="74" selectedCell="true" maxCell="75">
<number>21.7</number>
</tableCell>
</tableRow>
<tableRow max="200" value="15" selectedRow="false" maxRow="113.0">
<tableCell term="76" selectedCell="false" maxCell="77">
<number>14.5</number>
</tableCell>
<tableCell term="78" selectedCell="false" maxCell="79">
<number>22.5</number>
</tableCell>
</tableRow>
<tableRow max="300" value="20" selectedRow="false" maxRow="114.0">
<tableCell term="80" selectedCell="false" maxCell="81">
<number>14.6</number>
</tableCell>
<tableCell term="82" selectedCell="false" maxCell="83">
<number>15.7</number>
</tableCell>
</tableRow>
</callbackTable>
</approved>
</return>
</process>';
INSERT INTO dbo.TargetTable([Max],[Value],[SelectedRow],[MaxRow],[Term],[SelectedCell],[MaxCell],[Number])
SELECT
tableRow.value('data(./#max)', 'varchar(10)')
,tableRow.value('data(./#value)', 'int')
,tableRow.value('data(./#selectedRow)', 'varchar(10)')
,tableRow.value('data(./#maxRow)', 'decimal(10,1)')
,tableCell.value('data(./#term)', 'int')
,tableCell.value('data(./#selectedCell)', 'varchar(10)')
,tableCell.value('data(./#maxCell)', 'int')
,tableCell.value('./number[1]', 'decimal(10,1)')
FROM #xml.nodes('//tableRow') AS tableRow(tableRow)
CROSS APPLY tableRow.nodes('//tableCell') AS tableCell(tableCell);

Parse xml file in pandas

I have this xml file (it's called "LogReg.xml" and it contains some information about a logistic regression (I am interested in the name of the features and their coefficient - I'll explain in more detail below):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_4" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.4">
<Header>
<Application name="JPMML-SkLearn" version="1.6.35"/>
<Timestamp>2022-02-15T09:44:54Z</Timestamp>
</Header>
<MiningBuildTask>
<Extension name="repr">PMMLPipeline(steps=[('classifier', LogisticRegression())])</Extension>
</MiningBuildTask>
<DataDictionary>
<DataField name="Target" optype="categorical" dataType="integer">
<Value value="0"/>
<Value value="1"/>
</DataField>
<DataField name="const" optype="continuous" dataType="double"/>
<DataField name="grade" optype="continuous" dataType="double"/>
<DataField name="emp_length" optype="continuous" dataType="double"/>
<DataField name="dti" optype="continuous" dataType="double"/>
<DataField name="Orig_FicoScore" optype="continuous" dataType="double"/>
<DataField name="inq_last_6mths" optype="continuous" dataType="double"/>
<DataField name="acc_open_past_24mths" optype="continuous" dataType="double"/>
<DataField name="mort_acc" optype="continuous" dataType="double"/>
<DataField name="mths_since_recent_bc" optype="continuous" dataType="double"/>
<DataField name="num_rev_tl_bal_gt_0" optype="continuous" dataType="double"/>
<DataField name="percent_bc_gt_75" optype="continuous" dataType="double"/>
</DataDictionary>
<RegressionModel functionName="classification" algorithmName="sklearn.linear_model._logistic.LogisticRegression" normalizationMethod="logit">
<MiningSchema>
<MiningField name="Target" usageType="target"/>
<MiningField name="const"/>
<MiningField name="grade"/>
<MiningField name="emp_length"/>
<MiningField name="dti"/>
<MiningField name="Orig_FicoScore"/>
<MiningField name="inq_last_6mths"/>
<MiningField name="acc_open_past_24mths"/>
<MiningField name="mort_acc"/>
<MiningField name="mths_since_recent_bc"/>
<MiningField name="num_rev_tl_bal_gt_0"/>
<MiningField name="percent_bc_gt_75"/>
</MiningSchema>
<Output>
<OutputField name="probability(0)" optype="continuous" dataType="double" feature="probability" value="0"/>
<OutputField name="probability(1)" optype="continuous" dataType="double" feature="probability" value="1"/>
</Output>
<RegressionTable intercept="0.8064694059338298" targetCategory="1">
<NumericPredictor name="const" coefficient="0.8013433785974717"/>
<NumericPredictor name="grade" coefficient="0.9010481046582982"/>
<NumericPredictor name="emp_length" coefficient="0.9460686056314133"/>
<NumericPredictor name="dti" coefficient="0.5117062988491518"/>
<NumericPredictor name="Orig_FicoScore" coefficient="0.07944303372859234"/>
<NumericPredictor name="inq_last_6mths" coefficient="0.20516234445402765"/>
<NumericPredictor name="acc_open_past_24mths" coefficient="0.4852503249658917"/>
<NumericPredictor name="mort_acc" coefficient="0.6673203078463711"/>
<NumericPredictor name="mths_since_recent_bc" coefficient="0.1962158305958366"/>
<NumericPredictor name="num_rev_tl_bal_gt_0" coefficient="0.12964661294856686"/>
<NumericPredictor name="percent_bc_gt_75" coefficient="0.04534570018290847"/>
</RegressionTable>
<RegressionTable intercept="0.0" targetCategory="0"/>
</RegressionModel>
</PMML>
I have parsed it using this code:
from lxml import objectify
path = 'LogReg.xml'
parsed = objectify.parse(open(path))
root = parsed.getroot()
data = []
if True:
for elt in root.RegressionModel.RegressionTable:
el_data = {}
for child in elt.getchildren():
el_data[child.tag] = child.text
data.append(el_data)
perf = pd.DataFrame(data)
I am interested in parsing this bit:
<RegressionTable intercept="0.8064694059338298" targetCategory="1">
<NumericPredictor name="const" coefficient="0.8013433785974717"/>
<NumericPredictor name="grade" coefficient="0.9010481046582982"/>
<NumericPredictor name="emp_length" coefficient="0.9460686056314133"/>
<NumericPredictor name="dti" coefficient="0.5117062988491518"/>
<NumericPredictor name="Orig_FicoScore" coefficient="0.07944303372859234"/>
<NumericPredictor name="inq_last_6mths" coefficient="0.20516234445402765"/>
<NumericPredictor name="acc_open_past_24mths" coefficient="0.4852503249658917"/>
<NumericPredictor name="mort_acc" coefficient="0.6673203078463711"/>
<NumericPredictor name="mths_since_recent_bc" coefficient="0.1962158305958366"/>
<NumericPredictor name="num_rev_tl_bal_gt_0" coefficient="0.12964661294856686"/>
<NumericPredictor name="percent_bc_gt_75" coefficient="0.04534570018290847"/>
</RegressionTable>
so that I can build the following dictionary:
myDict = {
"const : 0.8013433785974717,
"grade" : 0.9010481046582982,
"emp_length" : 0.9460686056314133,
"dti" : 0.5117062988491518,
"Orig_FicoScore" : 0.07944303372859234,
"inq_last_6mths" : 0.20516234445402765,
"acc_open_past_24mths" : 0.4852503249658917,
"mort_acc" : 0.6673203078463711,
"mths_since_recent_bc" : 0.1962158305958366,
"num_rev_tl_bal_gt_0" : 0.12964661294856686,
"percent_bc_gt_75" : 0.04534570018290847
}
Basically, in the dictionary the Key is the name of the feature and the value is the coefficient of the logistic regression.
Please can anyone help me with the code?
I'm not sure you need pandas for this, but you do need to handle the namespaces in your xml.
Try something along these lines:
myDict = {}
#register the namespace
ns = {'xx': 'http://www.dmg.org/PMML-4_4'}
#you could collapse the next two into one line, but I believe it's clearer this way
rt = root.xpath('//xx:RegressionTable[.//xx:NumericPredictor]',namespaces=ns)[0]
nps = rt.xpath('./xx:NumericPredictor',namespaces=ns)
for np in nps:
myDict[np.attrib['name']]=np.attrib['coefficient']
myDict
The output should be your expected output.

MAC adressess found on multiple ports

I have processed an access log with :
grep -o -w -E '[[:alnum:]:]{17}.*[0-9]' testlog | awk '{print $1 " " $3}'
Which results in the following (obfuscated)
1.01:03:96:51:9A:31 3:37
2.01:03:96:51:9A:31 3:39
3.00:E0:2B:00:00:01 3:39
4.3C:A9:F4:1C:68:A4 3:37
5.01:01:96:51:A6:5E 3:39
6.01:01:96:51:A6:5E 3:39
How do I print all MACs that are found on multiple ports (in the example ) rows 1-2 and 5-6 ?
If you have this input:
cat file
01:03:96:51:9A:31 3:37
01:03:96:51:9A:31 3:39
00:E0:2B:00:00:01 3:39
3C:A9:F4:1C:68:A4 3:37
01:01:96:51:A6:5E 3:39
01:01:96:51:A6:5E 3:39
You can do
awk '!seen[$0]++' file
01:03:96:51:9A:31 3:37
01:03:96:51:9A:31 3:39
00:E0:2B:00:00:01 3:39
3C:A9:F4:1C:68:A4 3:37
01:01:96:51:A6:5E 3:39
or
awk '!seen[$0]++' file | sort -k2
01:03:96:51:9A:31 3:37
3C:A9:F4:1C:68:A4 3:37
00:E0:2B:00:00:01 3:39
01:01:96:51:A6:5E 3:39
01:03:96:51:9A:31 3:39
Do you need like this?
awk '{ seen[$1]++; } END{ for(idx in seen){ if(seen[idx] != 1 ) print idx }}' file
Input file:
01:03:96:51:9A:31 3:37
01:03:96:51:9A:31 3:39
00:E0:2B:00:00:01 3:39
3C:A9:F4:1C:68:A4 3:37
01:01:96:51:A6:5E 3:39
01:01:96:51:A6:5E 3:39
Output:
$ awk '{ seen[$1]++; } END{ for(idx in seen){ if(seen[idx] != 1 ) print idx }}' file
01:01:96:51:A6:5E
01:03:96:51:9A:31
You never need grep if you're using awk. You don't show what your original testlog looks like so while this will probably produce the output you want, it may not be the best way to do it since it's just using the logic from your grep but there may be a better way in awk:
awk '
match($0,/[[:alnum:]:]{17}.*[0-9]/) {
ip = substr($0,RSTART,RLENGTH)
print ip
sub(/ .*/,"",ip)
count[ip]++
}
END {
for (ip in count) {
if (count[ip] > 1) {
printf "IP %s occurs %d times\n", ip, count[ip] | "cat>&2"
}
}
}
' testlog

XSLT2.0 grouping based on three fields

<passengergroup>
<passengerList>
<passDetails>
<route>LONDON</route>
<lastname>RAY</lastname>
</passDetails>
<seatDetails>
<SeatNo>1A</SeatNo>
</seatDetails>
<customervalue>AB</customervalue>
</passengerList
<passengerList>
<passDetails>
<route>LONDON</route>
<lastname>RAY</lastname>
</passDetails>
<seatDetails>
<SeatNo>1B</SeatNo>
</seatDetails>
<customervalue>good</customervalue>
</passengerList
<passengerList>
<passDetails>
<route>DELHI</route>
<lastname>RAY</lastname>
</passDetails>
<seatDetails>
<SeatNo>2C</SeatNo>
</seatDetails>
<customervalue>BC</customervalue>
</passengerList>
<passengerList>
<passDetails>
<route>DELHI</route>
<lastname>RAY</lastname>
</passDetails>
<seatDetails>
<SeatNo>2D</SeatNo>
</seatDetails>
<customervalue>okey</customervalue>
</passengerList>
</passengergroup>
<xsl:for-each select="passengergroup/passengerList">
<xsl:if test="customervalue='good'
<xsl:value-of select="route"/><xsl:text> </xsl:text>
<xsl:value-of select="customervalue"/><xsl:text> </xsl:text>
<xsl:value-of select="seatDetails/SeatNo"/>
</for-each>
<xsl:for-each select="passengergroup/passengerList">
<xsl:if test="customervalue='ok'
<xsl:value-of select="route"/><xsl:text> </xsl:text>
<xsl:value-of select="customervalue"/><xsl:text> </xsl:text>
<xsl:value-of select="seatDetails/SeatNo"/>
</for-each>
Output
It will produce output like this
LONDON good 1A
LONDON good 1B
DELHI okey 2C
DELHI okey 2D
But i need the output like this
LONDON good 1A 1B
DELHI okey 2C 2D
If 'LONDON good 'is repeating many times,it has to be printed only once.but we got to repeat the seat no like' 1A 1B 1C 1D 1F 2G and so on'.i AM using xslt2.0 AND MY OUTPUT TYPE IS text. Thing is no need to display the items many times
I tried lot ..not able to figure out the solutions please help me out.
IMO your input xml doesn't correspond to desired output (e.g. there is only one LONDON with customervalue = good. But may be I don't understand well what you needs. But following xslt could make a job.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<xsl:output method="text" />
<xsl:template match="/">
<xsl:apply-templates select="passengergroup" />
</xsl:template>
<xsl:template match="passengergroup">
<xsl:for-each-group select="passengerList" group-by="concat(passDetails/route, ' ', customervalue)">
<xsl:value-of select="current-grouping-key()" />
<xsl:text> </xsl:text>
<xsl:value-of select="current-group()/seatDetails/SeatNo" separator=" " />
<xsl:value-of select="'
'" />
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
For input
<?xml version="1.0" encoding="UTF-8"?>
<passengergroup>
<passengerList>
<passDetails>
<route>LONDON</route>
<lastname>RAY</lastname>
</passDetails>
<seatDetails>
<SeatNo>1A</SeatNo>
</seatDetails>
<customervalue>good</customervalue>
</passengerList>
<passengerList>
<passDetails>
<route>LONDON</route>
<lastname>RAY</lastname>
</passDetails>
<seatDetails>
<SeatNo>1B</SeatNo>
</seatDetails>
<customervalue>good</customervalue>
</passengerList>
<passengerList>
<passDetails>
<route>DELHI</route>
<lastname>RAY</lastname>
</passDetails>
<seatDetails>
<SeatNo>2C</SeatNo>
</seatDetails>
<customervalue>BC</customervalue>
</passengerList>
<passengerList>
<passDetails>
<route>DELHI</route>
<lastname>RAY</lastname>
</passDetails>
<seatDetails>
<SeatNo>2D</SeatNo>
</seatDetails>
<customervalue>okey</customervalue>
</passengerList>
</passengergroup>
it produces following output
LONDON good 1A 1B
DELHI BC 2C
DELHI okey 2D

Using VB.NET Regular Expressions to Remove Excel XML Conversion

I have the following lines showing up in files that have been converted to XML from an Excel worksheet:
<Worksheet ss:Name="Sheet1">
<Names>
<NamedRange ss:Name="Print_Area" ss:RefersTo="=Sheet1!R30C1:R8642C15"/>
</Names>
<Table ss:ExpandedColumnCount="14" ss:ExpandedRowCount="8655" x:FullColumns="1"
x:FullRows="1" ss:StyleID="s16">
<Column ss:Index="2" ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="41.25"/>
<Column ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="36"/>
<Column ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="35.25"/>
<Column ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="38.25" ss:Span="1"/>
<Column ss:Index="8" ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="31.5"/>
<Column ss:Index="11" ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="30"/>
<Column ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="33.75"/>
<Column ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="28.5"/>
<Row ss:StyleID="s18">
<Cell ss:StyleID="s17"><Data ss:Type="String">UNITED STATES</Data></Cell>
<Cell ss:StyleID="s17"/>
<Cell ss:StyleID="s17"/>
<Cell ss:StyleID="s17"/>
<Cell ss:StyleID="s17"/>
<Cell ss:StyleID="s17"/>
<Cell ss:StyleID="s17"/>
</Row>
I am attempting to only remove the <Column .. /> lines. I "thought" I had a pretty good handle on Regular Expressions in VB.NET, but I cannot seem to match these lines. I have tried the following match strings:
'Using (RegexOptions.Multiline)
Private Const Column_MatchExpression As String = "^[\s]*<Column[\s\S]+$"
Private Const Column_MatchExpression As String = " <Column[\s\S]+$"
Private Const Column_MatchExpression As String = "^ <Column[\s\S]+$"
Private Const Column_MatchExpression As String = "^[\s]+<Column[\s\w\W]+$"
Any thoughts on the matter would be appreciated.
What about
"^\s*<Column.*/>\s*$"
?
\<Column[^>]*\>
Should work