I have file1 with records that I want to find and replace with # in file2 and redirect the output to file3. I want to translate only the alphanumeric characters in file2. With the below code I'm not able to get the expected output. What am I doing wrong?
file_read=`cat file2`
while read line; do
var=`echo $line | tr '[a-zA-Z0-9]' '#'`
rep=`echo $file_read | awk "{gsub(/$line/,\"$var\"); print}"`
done < file1
echo file2 > file3
cat file1
2001009
#vanti Finserv Co.
2001009
Fund #1
11:11 - Capital
MS&CO(NY)
American Friends Org, Inc. 12X32
Domain-Name (LLC)
MS&CO(NY)
MS&CO(NY)
Ivy/Estate Rd
E*Trade wholesale
cat file2
<html>
<body>
<hr><br><>span class="table">Records</span><table>
<tr class="column">
<td>Rec1</td>
<td>Rec2</td>
<td>Rec3</td>
<td>Rec4</td>
<td>Rec5</td>
<td>Rec6</td>
<td>Rec7</td>
<td>Rec8</td>
</tr>
<tr class="data">
<td>#vanti Finserv Co.</td>
<td>11:11 - Capital</td>
<td>MS&CO(NY)</td>
<td>New York</td>
<td>CDX98XSD</td>
<td>E*Trade wholesale</td>
<td>Domain-Name (LLC)</td>
<td>Ivy/Estate Rd</td>
<td></td>
</tr>
<tr class="data">
<td>#vanti Finserv Co.</td>
<td></td>
<td>MS&CO(NY)</td>
<td>2</td>
<td>2</td>
<td>MS&CO(NY)</td>
<td>MS&CO(NY)</td>
<td>Ivy/Estate Rd</td>
</table>
</body>
</html>
expected output
cat file3
<html>
<body>
<hr><br><>span class="table">Records</span><table>
<tr class="column">
<td>Rec1</td>
<td>Rec2</td>
<td>Rec3</td>
<td>Rec4</td>
<td>Rec5</td>
<td>Rec6</td>
<td>Rec7</td>
<td>Rec8</td>
</tr>
<tr class="data">
<td>###### ####### ##.</td>
<td>##:## - #######</td>
<td>##&##(##)</td>
<td>New York</td>
<td>CDX98XSD</td>
<td>#*##### ########</td>
<td>######-#### (###)</td>
<td>###/###### ##</td>
<td></td>
</tr>
<tr class="data">
<td>###### ####### ##.</td>
<td></td>
<td>##&##(##)</td>
<td>2</td>
<td>2</td>
<td>##&##(##)</td>
<td>##&##(##)</td>
<td>###/###### ##/td>
</table>
</body>
</html>
Would you please try the following:
awk '
NR==FNR {s = $0; gsub("[[:alnum:]]", "#"); a[s] = $0; next}
{
if (match($0, ">[^<]+")) {
str = substr($0, RSTART+1, RLENGTH-1)
if (str in a) {
$0 = substr($0, 1, RSTART) a[str] substr($0, RSTART+RLENGTH)
}
}
}
1 ' file1 file2 > file3
It assumes the strings to be replced are enclosed with tags but will work with the shown example.
You seem to be looking for something like
awk 'NR==FNR {
regex = $0;
gsub(/[][(){}|\\*+?.^$]/, "\\\\&", regex);
a[++n] = regex;
gsub(/[A-Za-z0-9]/, "#");
gsub(/&/, "\\\\&");
b[n] = $0;
next
}
{ for(i=1;i<=n;++i)
gsub(a[i], b[i])
} 1' file1 file2 >file3
In brief, we populate the array a with the phrases from file1, and b with the corresponding replacement strings. The condition FNR==NR will be true for the first input file; we then fall through to the rest of the script, which simply replaces any strings from a with the corresponding string from b, and prints all the lines.
The code is complicated somewhat by the escaping of regex metacharacters in a and further by the fact that & in the replacement string needs to be escaped, too (& alone recalls the matched text).
Demo: https://ideone.com/YkAkAZ
You generally want to avoid while read loops in the shell; Awk is much faster and more idiomatic when you want to perform some transformation on all lines in a file.
As a further aside, please try http://shellcheck.net/ before asking for human assistance. Even after you fixed syntax errors pointed out in comments, your attempt contains common beginner errors such as broken quoting.
Related
Hello I am trying to find a pattern match on some HTML files using AWK but i dont seem to have any luck with it
So for my pattern to match it should have the following
<tr>
<td>Failures</td>
<td>0</td>
</tr>
<tr>
<td>Warnings</td>
<td>4</td>
</tr>
<tr>
<td>Errors</td>
<td>0</td>
</tr>
<tr>
<td>Not Applicable</td>
<td>53</td>
</tr>
<tr>
<td>Manual Checks</td>
<td>9</td>
</tr>
Failures and Manual Checks should be zero. So in the above file failures is 0 and manual check is 9. So i need to match only when failure is 0 and manual check is 0.
SO i tried with and without escaping the new line but awk is not returning any results.
find . -name "*.html" -exec awk '/td\>Failures\<\/td\>\\n.*\<td\>0/ {print FILENAME}' '{}' \;
I have also tried other combinations like below but cant seem to figure out why awk is not going to the next line.
find . -name "*.html" -exec awk '/td\>Failures\<\/td\>\\n\[\^\\\<\]\+\<td\>0/ {print FILENAME}' '{}' \;
Can anyone please have a look and tell me what i am missing?
If your html files are well-formed xml, then xmlstarlet will work:
find . -name '*.html' \
-exec xmlstarlet sel -t \
--if '//tr[td[1] = "Failures" and td[2] = "0"]' \
--if '//tr[td[1] = "Manual Checks" and td[2] = "0"]' \
--inp-name --nl \
'{}' \;
if there's a row where the first cell is Failures and the second cell is 0,
and if there's a row where the first cell is Manual Checks and the second cell is 0,
then print the input filename and a newline.
A more reliable solution is going to be based on a tool designed to parse html; having said that ...
One awk idea using a couple custom regex patterns:
$ cat regex.awk
BEGIN { RS="^$" # whole file treated as a single line of input
regex1="<td>Manual Checks</td>[[:space:]]+<td>0</td>"
regex2="<td>Failures</td>[[:space:]]+<td>0</td>"
}
$0 ~ regex1 && $0 ~ regex2 {print FILENAME}
NOTE: placing the code in a file (regex.awk) will make the follow-on find/awk quite a bit cleaner
Sample input:
$ cat f1.html
... snip ...
<td>Failures</td>
<td>0</td> # match
... snip ...
<td>Manual Checks</td>
<td>9</td> # not a match
... snip ...
$ cat f2.html
... snip ...
<td>Failures</td>
<td>0</td> # match
... snip ...
<td>Manual Checks</td>
<td>0</td> # match
... snip ...
NOTE: comments added for clarification; comments to not exist in the actual files
Adding this to a find call:
$ find . -name "f?.html" -exec awk -f regex.awk '{}' \;
./f2.html
Using any awk in any shell on every Unix box:
$ cat tst.awk
gsub("^[[:space:]]*<td>|</td>[[:space:]]*$","") {
if ( ++cnt % 2 ) {
tag = $0
}
else {
f[tag] = $0+0
}
}
END {
if ( (f["Failures"] == 0) && (f["Manual Checks"] == 0) ) {
print FILENAME
}
}
$ awk -f tst.awk file
The above creates an array f[] that maps the tags (names) of the cells to their values so then in the END section you can do whatever test you like on whatever combination of them you like.
I have a file with the following lines. I can filter a specific word and display the lines below/above it. However, i also wanted to remove it on the original file and append it to a new file.
<tr>
<td>tree</td><td>apple</td><td>red</td>
</tr>
<tr>
<td>tree</td><td>apple</td><td>green</td>
</tr>
<tr>
<td>tree</td><td>apple</td><td>red</td>
</tr>
<tr>
<td>tree</td><td>apple</td><td>red</td>
</tr>
i can do it this by: grep -i green origfile -A1 -B1 >> newfile but how can remove it from the orig file.
origfile:
<tr>
<td>tree</td><td>apple</td><td>red</td>
</tr>
<tr>
<td>tree</td><td>apple</td><td>red</td>
</tr>
<tr>
<td>tree</td><td>apple</td><td>red</td>
</tr>
newfile:
<tr>
<td>tree</td><td>apple</td><td>green</td>
</tr>
Is there a cleaner/quickest way to do it?
You could do it within a single awk, segregating records into different files. This will look for word green and will place one line before and after it and output it into new file along with removing it from original file.
awk '
FNR==NR{
if($0~/green/){
words[FNR]
}
next
}
((FNR+1) in words) || (FNR in words) || ((FNR-1) in words){
print > "newfile"
next
}
1
' Input_file Input_file > temp && mv temp Input_file
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first time Input_file is being read.
if($0~/green/){ ##Checking condition if line contains green string then do following.
words[FNR] ##Creating array of words with index of current line number.
}
next ##next will skip all further statements from here.
}
((FNR+1) in words) || (FNR in words) || ((FNR-1) in words){
##Checking condition if current line+1 OR current line OR current line-1 numbers are in words array then do following.
print > "newfile" ##Printing current line into newfile output file.
next ##next will skip all further statements from here.
}
1 ##Printing current line here.
' Input_file Input_file > temp && mv temp Input_file
##Mentioning Input_file(s) and doing inplace save into it.
$ cat tst.awk
$0 == "<tr>" { inRow=1; row=$0; next }
inRow {
row = row ORS $0
if ( $0 == "</tr>" ) {
inRow = 0
if ( index(row,"<td>green</td>") ) {
print row | "cat>&2"
next
}
else {
$0 = row
}
}
}
!inRow
$ awk -f tst.awk file >o1 2>o2
$ head o?
==> o1 <==
<tr>
<td>tree</td><td>apple</td><td>red</td>
</tr>
<tr>
<td>tree</td><td>apple</td><td>red</td>
</tr>
<tr>
<td>tree</td><td>apple</td><td>red</td>
</tr>
==> o2 <==
<tr>
<td>tree</td><td>apple</td><td>green</td>
</tr>
To modify the original file:
$ awk -f tst.awk file >o1 2>o2 && mv o1 file
$ cat file
<tr>
<td>tree</td><td>apple</td><td>red</td>
</tr>
<tr>
<td>tree</td><td>apple</td><td>red</td>
</tr>
<tr>
<td>tree</td><td>apple</td><td>red</td>
</tr>
Here is an ed solution.
#!/usr/bin/env bash
ed -s origfile.txt <<-EOF
/<td>green<\/td>/;?^<tr>?;/^<\/tr>/w newfile.txt
.;/^<\/tr>/d
w
q
EOF
Or a separate ed script, just name to script.ed
/<td>green<\/td>/;?^<tr>?;/^<\/tr>/w newfile.txt
.;/^<\/tr>/d
w
q
Then
ed -s origfile.txt < script.ed
I have data like below in a csv file
ServerName,Index,Status
10.xxx.xx.xx,1.5.1.1,2
10.xxx.xx.xx,1.5.1.2,3
I need to convert this data to html and also color the row if the value of the "Status" is 3/4/5..
please help me in this.
tried below
awk 'BEGIN{
FS=","
print "<HTML>""<TABLE border="1"><TH>JOB_NAME</TH><TH>RUN_DATE</TH><TH>STATUS</TH>"
}
{
printf "<TR>"
for(i=1;i<=NF;i++)
printf "<TD>%s</TD>", $i
print "</TR>"
}
END{
print "</TABLE></BODY></HTML>"
}
' 10.106.40.45_FinalData.csv > file.html
sed -i "s/2/<font color="green">2<\/font>/g;s/4/<font color="red">4<\/font>/g;s/5/<font color="red">5<\/font>/g;" file.html
in the latest code i tried, i need to check the value of the status column only and need to color the cell.
$ cat tst.awk
BEGIN{
FS = ","
colors[3] = "red"
colors[4] = "green"
colors[5] = "blue"
print "<HTML><BODY>"
print "<TABLE border=\"1\">"
print "<TR><TH>JOB_NAME</TH><TH>RUN_DATE</TH><TH>STATUS</TH></TR>"
}
NR>1 {
printf "<TR>"
for (i=1; i<=NF; i++) {
if ( (i == NF) && ($i in colors) ) {
on = "<font color=\"" colors[$i] "\">"
off = "</font>"
}
else {
on = off = ""
}
printf "<TD>%s%s%s</TD>", on, $i, off
}
print "</TR>"
}
END {
print "</TABLE>"
print "</BODY></HTML>"
}
.
$ awk -f tst.awk file
<HTML><BODY>
<TABLE border="1">
<TR><TH>JOB_NAME</TH><TH>RUN_DATE</TH><TH>STATUS</TH></TR>
<TR><TD>10.xxx.xx.xx</TD><TD>1.5.1.1</TD><TD>2</TD></TR>
<TR><TD>10.xxx.xx.xx</TD><TD>1.5.1.2</TD><TD><font color="red">3</font></TD></TR>
</TABLE>
</BODY></HTML>
You don't actually say what the problem is, but I presume it's colorizing the numbers when they appear in the addresses also?
The best solution is probably to add a conditional into your awk script (untested):
if (i == 3 && $i == 2) {
print "<TD><font color="green">2<\/font></TD>"
} else .....
Alternative, your status field is the only number in the column, whereas the addresses are not, so you can adjust your pattern match:
"s/>2</><font color="green">2<\/font></g;......"
I.e. match the surrounding brackets.
You can also use jq for this task.
jq structures the CSV data instead of working only on a text basis. This makes it easy to remove empty rows or to colour only the 'Status' column.
#!/bin/bash
CSV='
ServerName,Index,Status
10.xxx.xx.xx,1.5.1.1,2
10.xxx.xx.xx,1.5.1.2,3
'
jq -srR '
def colorize($status):
if $status == "3" then "yellow"
elif $status == "4" then "orange"
elif $status == "3" then "red"
else "green"
end
| "<font color=\"\(.)\">\($status)</font>";
split("\n") # split lines
| map(select(length > 0)) # remove empty lines from CSV
| map(split(",")) # split each line
| .[1:] # drop first line with headers
| "<table>", # convert to HTML table
" <tr> <th>ServerName</th> <th>Index</th> <th>Status</th> </tr>",
(.[] | " <tr> <td>\(.[0])</td> <td>\(.[1])</td> <td>\(colorize(.[2]))</td> </tr>"),
"</table>"
' <<< "$CSV"
output:
<table>
<tr> <th>ServerName</th> <th>Index</th> <th>Status</th> </tr>
<tr> <td>10.xxx.xx.xx</td> <td>1.5.1.1</td> <td><font color="green">2</font></td> </tr>
<tr> <td>10.xxx.xx.xx</td> <td>1.5.1.2</td> <td><font color="yellow">3</font></td> </tr>
</table>
I have 2 files, file1 and file2.
# cat /tmp/file1
***** insert new text ****
# cat /tmp/file2
</table>
some text
</table>
<table name="test" description="test line">
some text
I want to insert the text from file1 into file2 but only before the following 2 lines:
</table>
<table name="test" description="test line">
So the end result is:
</table>
some text
*** insert new text ****
</table>
<table name="test" description="test line">
some text
Here is the awk statement/commands I am trying, but the problem is awk is inserting the new text for each match.
# f1="$(</tmp/file1)"
# awk -vf1="$f1" '/<\/table>/,/<table name="test" description="test line">/{print f1;print;next}1' /tmp/file2
***** insert new text ****
</table>
***** insert new text ****
some text
***** insert new text ****
</table>
***** insert new text ****
<table name="test" description="test line">
some text
How do I fix the awk statement to only insert the text from file1 before those specific 2 lines? Thanks in advance.
That works for me:
awk '{if(p=="</table>"&&$0=="<table name=\"test\" description=\"test line\">")
{system("cat file1");}if(p){print p}; p=$0}END{print $0}' file2
The if statement check if the current line matches <table...> and the previous line </table>. If yes, the contents of file1 is printed, else the line in file2 is printed.
I have an input file in following manner
<td> Name1 </td>
<td> <span class="test">Link </span></td>
<td> Name2 </td>
<td> <span class="test">Link </span></td>
I want a awk script to read this file and output in following manner
url1 Name1
url2 Name2
Can anyone help me out in this trivial looking problem? Thanks.
Extracting one href per is relatively simple, so long as they conform to XHTML standards and there is only at most one on a line and you don't care about enclosing tags, but perl is easier:
$ perl -ne 'print "$1\n" if /href="([^"]+)"/'
If you care about enclosing tags or they are not standard conformant, you cannot use regular expressions to parse HTML. It is impossible.
added: oops, you do care about context, forget about regexps and use a real HTML parser
Here is an awk script that does the job
awk '
/a href=\".*\"/ { sub( /^.*a href=\"/,"" ); sub(/\".*/,""); print $0, name }
{ name = $2 }
'
this might work:
awk 'BEGIN
{i=1}{line[i++]=$0}
END
{
j=1;
while (j<i)
{print line[j+1] line[j]; j+=2}
}' yourfile|awk '{print substr($4,7,length($4)-6),$6}'
gawk '/^<td>/ {n = $2; getline; print gensub(/.*href="([^"]*).*/,"\\1",1), n}' infile
url1 Name1
url2 Name2
awk 'BEGIN{RS="></td>\n"; FS="> | </|\""}{print $7, $2}' infile
every 2 lines as a record.