Printing sections with awk - awk

I am using a bash script to print particular sections in a file defined by beg_ere and end_ere by matching title, subtitle, and keywords.
The user will set faml, subtitle, keywords. when a match is encountered in beg_ere, the section gets printed.
For instance, consider
faml="DN"
subtitle="AMBIT"
keywords="resource"
Currently everything is printed except upon reaching end_ere.
spc='[[:space:]]*'
ebl='\\[' ; ebr='\\]' # for awk to apply '\['' and '\]'
pn_ere='^[[:space:]]*([#;!]+|#c|//)[[:space:]]+'
## :- modifier, use GPH if parameters are unset or empty (null).
nfaml=${faml:-"[[:graph:]]+"} # Use GPH if FAML null ("" or '')
nasmb=${asmb:-"[[:graph:]]+"} # Use GPH if ASMB null ("" or '')
nkeys=${keys:-".*"} # Use GPH if KEYS null ("" or '')
local pn_ere="^[[:space:]]*([#;!]+|#c|//)[[:space:]]+"
beg_ere="${pn_ere}(${nfaml}) ${ebl}(${nasmb})${ebr}${spc}(${nkeys})$"
end_ere="${pn_ere}END OF ${nfaml} ${ebl}${nasmb}${ebr}${spc}$"
awk -v beg_ere="$beg_ere" -v pn_ere="$pn_ere" -v end_ere="$end_ere" \
'$0 ~ beg_ere {
title=gensub(beg_ere, "\\2", 1, $0);
subtitle=gensub(beg_ere, "\\3", 1, $0);
keywords=gensub(beg_ere, "\\4", 1, $0);
display=1;
next
}
$0 ~ end_ere { display=0 ; print "" }
display { sub(pn_ere, "") ; print }
' "$filename"
An example file would be
## DN [AMBIT] bash
## hodeuiihoedu
## AVAL:
## + ooeueocu
## END OF DN [AMBIT]
## NAVAID: Pattern Matching (Cogent)
## Cogent Convincing by virtue of clear and thorough presentation.
find ~/Opstk/bin/gungadin-1.0/ -name '*.rc'
-exec grep --color -hi -C 8 -e \"EDV\" -e \"GUN\" {} \+
## DN [AMBIT] bash,resource,rysnc
## hodeuiihoedu
## AVAL:
## + ooeueocu
## END OF DN [AMBIT]
## NAVAID: Pattern Matching (Cogent)
## Cogent Convincing by virtue of clear and thorough presentation.
I want to test for keywords supplied to keywords. Currently the match to beg_ere assumes that the string in keywords is matched exactly. But the the user supplied keywords could be in wrong order.
I want to be able to specify keys="bash,rsync". If there is a match with the begin section, the corresponding section gets printed.

Related

Insert text before the first non-comment-line before a pattern

To update a system configuration file on a Linux server I was in need to add a rule (line of text) before a given other rule (and, if possible, before the comments of that other rule).
With the following input file:
# Foo
# Bar
# Comment about rule on the next line
RULE_A
# Comment about rule on the next line
# Continuation of comment
RULE_B
I want to get the following output:
# Foo
# Bar
# Comment about rule on the next line
RULE_A
# ADDED COMMENT
# ADDED COMMENT CONTINUATION
ADDED_RULE
# Comment about rule on the next line
# Continuation of comment
RULE_B
I ended up with the following combination of :
sed : convert my multi-line text to add in a single line with \n.
tac : to reverse the file.
awk : to work on the file.
A temporary file that will replace the original file (because I don't have "in-place" option on awk)
CONF_FILEPATH="sample.conf"
# Create sample work file:
cat > "${CONF_FILEPATH}" <<EOT
# Foo
# Bar
# Comment about rule on the next line
RULE_1
# Comment about rule on the next line
# Continuation of comment
RULE_2
RULE_3_WITHOUT_COMMENT
RULE_4_WITHOUT_COMMENT
RULE_5_WITHOUT_COMMENT
# Comment about rule on the next line
RULE_6
# Comment about rule on the next line
# Continuation of comment
RULE_7
EOT
# Text (of new rule) to add:
TEXT_TO_ADD="# ADDED COMMENT
# ADDED COMMENT CONTINUATION
ADDED_RULE
"
# The rule before which we want to add our text:
BEFORE_RULE="RULE_7"
# Temporary file:
TMP_FILEPATH="$(mktemp)"
# Convert newlines to \n:
TEXT_TO_ADD_FOR_AWK="$(echo ${TEXT_TO_ADD} | tac | sed -E ':a;N;$!ba;s/\r{0,1}\n/\\n/g')"
# Process
awk 'BEGIN {
ADD_TO_LINE="";
}
{
if ($0 ~ "^'${BEFORE_RULE}'") {
# DEBUG: Got the "deny all" line
ADD_TO_LINE=NR+1 ;
print $0;
} else {
if (ADD_TO_LINE==NR) {
# DEBUG: Current line is the candidate
if ($0 ~ "#") {
ADD_TO_LINE=NR+1;
# DEBUG: Its a comment, wont add here, taking note to try on the next line
print $0;
} else {
# DEBUG: Not a comment: this is the place!
print "'${TEXT_TO_ADD_FOR_AWK}'";
ADD_TO_LINE="";
print $0;
}
} else {
print $0;
}
}
}' <(tac "${CONF_FILEPATH}") \
| tac > "${TMP_FILEPATH}"
# Overwrite:
cat "${TMP_FILEPATH}" > "${CONF_FILEPATH}"
# Cleaning up:
rm "${TMP_FILEPATH}"
I then get (look just before RULE_7):
# Foo
# Bar
# Comment about rule on the next line
RULE_1
# Comment about rule on the next line
# Continuation of comment
RULE_2
RULE_3_WITHOUT_COMMENT
RULE_4_WITHOUT_COMMENT
RULE_5_WITHOUT_COMMENT
# Comment about rule on the next line
RULE_6
# ADDED COMMENT
# ADDED COMMENT CONTINUATION
ADDED_RULE
# Comment about rule on the next line
# Continuation of comment
RULE_7
Which is OK, but I'm sure there is a cleaner/simpler way of doing that with awk.
Context: I am editing the /etc/security/access.conf to add an allow rule before the deny all rule.
Reading the file paragraph-wise makes things simpler:
awk -v text_to_add="$TEXT_TO_ADD" \
-v before_rule="$BEFORE_RULE" \
-v RS='' \
-v ORS='\n\n' \
'$0 ~ "\n" before_rule {print text_to_add} 1' file
Get out of the habit of using ALLCAPS variable names, leave those as
reserved by the shell. One day you'll write PATH=something and then
wonder why
your script is broken.
You never need sed when you're using awk:
text_to_add='# ADDED COMMENT
# ADDED COMMENT CONTINUATION
ADDED_RULE
'
before_rule='RULE_B'
awk -v rule="$before_rule" -v text="$text_to_add" '
/^#/ { cmt = cmt $0 ORS; next }
$0==rule { print text }
{ printf "%s%s\n", cmt, $0; cmt="" }
' file
# Foo
# Bar
# Comment about rule on the next line
RULE_A
# ADDED COMMENT
# ADDED COMMENT CONTINUATION
ADDED_RULE
# Comment about rule on the next line
# Continuation of comment
RULE_B
If you can have comments after the final non-comment line then just add END { printf "%s", cmt } to the end of the script.
Don't use all-caps variable names (see Correct Bash and shell script variable capitalization) and always quote shell variables (see https://mywiki.wooledge.org/Quotes). Copy/paste your original script into http://shellcheck.net and it'll tell you some of the issues.
Regarding ...because I don't have "in-place" option on awk from your question - GNU awk has -i inplace for that.
ed, the standard editor, to the rescue! Because it looks at the entire file, not just a line at a time, it's able to move the current line cursor around forwards and backwards with ease:
ed -s input.txt <<EOF
/RULE_7/;?^[^#]?a
# ADDED COMMENT
# ADDED COMMENT CONTINUATION
ADDED_RULE
.
w
EOF
After this, input.txt looks like your desired result.
It first sets the current line to the first one containing RULE_7, then looks backwards for the first non-empty line above it that doesn't start with # (The line with RULE_6 in this case), and appends the desired text after that line. Then it writes the modified file back to disk.

How to inplace substitute the content between 2 tags with SED (bash)?

I want to inplace edit a file with sed (Oracle-Linux/Bash).
The content between 2 search-tags (in form of "#"-comments) should get commented out.
Example:
Some_Values
#NORMAL_LISTENER_START
LISTENER =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = IPC)
(KEY = LISTENER)
)
)
)
#NORMAL_LISTENER_END
Other_Values
Should result in:
Some_Values
#NORMAL_LISTENER_START
# LISTENER =
# (DESCRIPTION =
# (ADDRESS = (PROTOCOL = IPC)
# (KEY = LISTENER)
# )
# )
# )
#NORMAL_LISTENER_END
Other_Values
The following command already achieves it, but it also puts a comment+blank in front of the search-tags:
sed -i "/#NORMAL_LISTENER_START/,/#NORMAL_LISTENER_END/ s/^/# /" ${my_file}
Now my research told me to exclude those search-tags like:
sed -i '/#NORMAL_LISTENER_START/,/#NORMAL_LISTENER_END/{//!p;} s/^/# /' ${my_file}
But it won't work - with the following message as a result:
sed: -e expression #1, char 56: extra characters after command
I need those SearchTags to be as they are, because I need them afterwards again.
If ed is available/acceptable.
printf '%s\n' 'g/#NORMAL_LISTENER_START/+1;/#NORMAL_LISTENER_END/-1s/^/#/' ,p Q | ed -s file.txt
Change Q to w if you're satisfied with the output and in-place editing will occur.
Remove the ,p If you don't want to see the output.
This might work for you (GNU sed):
sed '/#NORMAL_LISTENER_START/,/#NORMAL_LISTENER_END/{//!s/^/# /}' file
Use a range, delimited by two regexp and insert # before the lines between the regexps but not including the regexps.
Alternative:
sed '/#NORMAL_LISTENER_START/,/#NORMAL_LISTENER_END/{s/^[^#]/# &/}' file
Or if you prefer:
sed '/#NORMAL_LISTENER_START/{:a;n;/#NORMAL_LISTENER_END/!s/^/# /;ta}' file
With your shown samples only, please try following awk code. Simple explanation would be, look for specific string and set/unset vars as per that and then print updated(added # in lines) lines as per that, stop updating lines once we find line from which we need not to update lines.
awk ' /Other_Values/{found=""} found{$0=$0!~/^#/?"#"$0:$0} /Some_Values/{found=1} 1' Input_file
Above will print output on terminal, once you are happy with results you could run following code to do inplace save into Input_file.
awk ' /Other_Values/{found=""} found{$0=$0!~/^#/?"#"$0:$0} /Some_Values/{found=1} 1' Input_file > temp && mv temp Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/Other_Values/{ found="" } ##If line contains Other_Values then nullify found here.
found { $0=$0!~/^#/?"#"$0:$0 } ##If found is SET then check if line already has # then leave it as it is OR add # in starting to it.
/Some_Values/{ found=1 } ##If line contains Some_Values then set found here.
1 ##Printing current line here.
' Input_file ##Mentioning Input_file name here.

Combining awk commands

I currently have this piece of code:
read -p 'Enter the fruit you want to search: ' user_fruit
awk -F ":" -v re="$user_fruit" '$4 ~ re' $fruit_file
Which uses awk to find matches in $4 that match with the pattern provided by the user under the $user_fruit variable in the $fruit_file. However, I need to alter the awk command so that it only displays line matches when the word apple is also on the line.
Any help would be greatly appreciated!
You can extend the awk pattern using boolean operators:
read -p 'Enter the fruit you want to search: ' user_fruit
awk -F ":" -v re="$user_fruit" '/apple/ && $4 ~ re' "$fruit_file"
I.e. print records when the record matches /apple/ and the fourth field matches the regex.
In case you want to check for the presence of literal, fixed strings, you can ise index instead of the regex search:
read -p 'Enter the fruit you want to search: ' user_fruit
awk -F ":" -v re="$user_fruit" 'index($0, "apple") && index($4, re)' file
Here,
index($0, "apple") - checks if there is apple substring on the whole line (if its index is not 0)
&& - AND condition
index($4, re) - checks if there is apple substring in Field 4 (if its index is not 0).
See an online demo:
s='one:two:three:2-plum+pear
apple:two:three:1-plum+pear'
user_fruit='plum+pear'
awk -F ":" -v re="$user_fruit" 'index($0, "apple") && index($4, re)' <<< "$s"
#index($3, "snow") != 0
# => apple:two:three:1-plum+pear

Replace value in nth line ABOVE search string using awk/sed

I have a large firewall configuration file with sections like these distributed all over:
edit 78231
set srcintf "port1"
set dstintf "any"
set srcaddr "srcaddr"
set dstaddr "vip-dnat"
set service "service"
set schedule "always"
set logtraffic all
set logtraffic-start enable
set status enable
set action accept
next
I want to replace value "port1", which is 3 lines above search string "vip-dnat".
It seems the below solution is close but I don't seem to be able to invert the search to check above the matched string. Also it does not replace the value inside the file:
Replace nth line below the searched pattern in a file
I'm able to extract the exact value using the following awk command but simply cannot figure out how to replace it within the file (sub/gsub?):
awk -v N=3 -v pattern=".*vip-dnat.*" '{i=(1+(i%N));if (buffer[i]&& $0 ~ pattern) print buffer[i]; buffer[i]=$3;}' filename
"port1"
We could use tac + awk combination here. I have created a variable occur with value after how many lines(when "vip-dnat" is found) you need to perform substitution.
tac Input_file |
awk -v occur="3" -v new_port="new_port_value" '
/\"vip-dnat\"/{
found=1
print
next
}
found && ++count==occur{
sub(/"port1"/,new_port)
found=""
}
1' |
tac
Explanation: Adding detailed explanation for above.
tac Input_file | ##Printing Input_file content in reverse order, sending output to awk command as an input.
awk -v occur="3" -v new_port="new_port_value" ' ##Starting awk program with 2 variables occur which has number of lines after we need to perform substitution and new_port what is new_port value we need to keep.
/\"vip-dnat\"/{ ##Checking if line has "vip-dnat" then do following.
found=1 ##Setting found to 1 here.
print ##Printing current line here.
next ##next will skip all statements from here.
}
found && ++count==occur{ ##Checking if found is SET and count value equals to occur.
sub(/"port1"/,new_port) ##Then substitute "port1" with new_port value here.
found="" ##Nullify found here.
}
1' | ##Mentioning 1 will print current line and will send output to tac here.
tac ##Again using tac will print output in actual order.
Use a Perl one-liner. In this example, it changes line number 3 above the matched string to set foo bar:
perl -0777 -pe 's{ (.*\n) (.*\n) ( (?:.*\n){2} .* vip-dnat ) }{${1} set foo bar\n${3}}xms' in_file
Prints:
edit 78231
set foo bar
set dstintf "any"
set srcaddr "srcaddr"
set dstaddr "vip-dnat"
set service "service"
set schedule "always"
set logtraffic all
set logtraffic-start enable
set status enable
set action accept
next
When you are satisfied with the replacement written into STDOUT, change perl to perl -i.bak to replace the file in-place.
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
-0777 : Slurp files whole.
(.*\n) : Any character, repeated 0 or more times, ending with a newline. Parenthesis serve to capture the matched part into "match variables", numbered $1, $2, etc, from left to right according to the position of the opening parenthesis.
( (?:.*\n){2} .* vip-dnat ) : 2 lines followed by the line with the desired string vip-dnat. (?: ... ) represents non-capturing parentheses.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start
The regex uses these modifiers:
/x : Ignore whitespace and comments, for readability.
/m : Allow multiline matches.
/s : Allow . to match a newline.
Whenever you have tag-value pairs in your data it's best to first create an array of that mapping (tag2val[] below) and then you can test and/or change and/or print the values in whatever order you like just be using their names:
$ cat tst.awk
$1 == "edit" { editId=$2; next }
editId != "" {
if ($1 == "next") {
# Here is where you test and/or set the values of whatever tags
# you like by referencing their names.
if ( tag2val[ifTag] == ifVal ) {
tag2val[thenTag] = thenVal
}
print "edit", editId
for (tagNr=1; tagNr<=numTags; tagNr++) {
tag = tags[tagNr]
val = tag2val[tag]
print " set", tag, val
}
print $1
editId = ""
numTags = 0
delete tag2val
}
else {
tag = $2
sub(/^[[:space:]]*([^[:space:]]+[[:space:]]+){2}/,"")
sub(/[[:space:]]+$/,"")
val = $0
if ( !(tag in tag2val) ) {
tags[++numTags] = tag
}
tag2val[tag] = val
}
}
$ awk -v ifTag='dstaddr' -v ifVal='"vip-dnat"' -v thenTag='srcintf' -v thenVal='"foobar"' -f tst.awk file
edit 78231
set srcintf "foobar"
set dstintf "any"
set srcaddr "srcaddr"
set dstaddr "vip-dnat"
set service "service"
set schedule "always"
set logtraffic all
set logtraffic-start enable
set status enable
set action accept
next
Note that the above approach:
Will work even if/when either of the values you want to find appear in other contexts (e.g. associated with other tags or as substrings of other values), and doesn't rely on how many lines are between the lines you're interested in or what order they appear in in the record.
Will let you change that value of any tag based on the value of any other tag and is easily extended to do compound comparisons, assignments, etc. and/or print the values in a different order or print a subset of them or add new tags+values.

AWK:Convert columns to rows with condition (create list ) [duplicate]

I have a tab-delimited file with three columns (excerpt):
AC147602.5_FG004 IPR000146 Fructose-1,6-bisphosphatase class 1/Sedoheputulose-1,7-bisphosphatase
AC147602.5_FG004 IPR023079 Sedoheptulose-1,7-bisphosphatase
AC148152.3_FG001 IPR002110 Ankyrin repeat
AC148152.3_FG001 IPR026961 PGG domain
and I'd like to get this using bash:
AC147602.5_FG004 IPR000146 Fructose-1,6-bisphosphatase class 1/Sedoheputulose-1,7-bisphosphatase IPR023079 Sedoheptulose-1,7-bisphosphatase
AC148152.3_FG001 IPR023079 Sedoheptulose-1,7-bisphosphatase IPR002110 Ankyrin repeat IPR026961 PGG domain
So if ID in the first column are the same in several lines, it should produce one line for each ID with all other parts of lines joined. In the example it will give two-row file.
give this one-liner a try:
awk -F'\t' -v OFS='\t' '{x=$1;$1="";a[x]=a[x]$0}END{for(x in a)print x,a[x]}' file
For whatever reason, the awk solution does not work for me in cygwin. So I used Perl instead. It joins around a tab character and separates line by \n
cat FILENAME | perl -e 'foreach $Line (<STDIN>) { #Cols=($Line=~/^\s*(\d+)\s*(.*?)\s*$/); push(#{$Link{$Cols[0]}}, $Cols[1]); } foreach $List (values %Link) { print join("\t", #{$List})."\n"; }'
will depend off file size (and awk limitation)
if too big this will reduce the awk need by sorting file first and only keep 1 label in memory for printing
A classical version with post print using a modification of the whole line
sort YourFile \
| awk '
last==$1 { sub( /^[^[:blank:]]*[[:blank:]]+/, ""); C = C " " $0; next}
NR > 1 { print Last C; Last = $1; C = ""}
END { print Last}
'
Another version using field and pre-print but less "human readable"
sort YourFile \
| awk '
last!=$1 {printf( "%s%s", (! NR ? "\n" : ""), Last=$1)}
last==$1 {for( i=2;i<NF;i++) printf( " %s", $i)}
'
A pure bash version. It has no additional dependencies, but requires bash 4.0 or above (2009) for associative array support.
All on one line:
{ declare -A merged; merged=(); while IFS=$'\t' read -r key value; do merged[$key]="${merged[$key]}"$'\t'"$value"; done; for key in "${!merged[#]}"; do echo "$key${merged[$key]}"; done } < INPUT_FILE.tsv
Readable and commented equivalent:
{
# Define `merged` as an empty associative array.
declare -A merged
merged=()
# Read tab-separated lines. Any leftover fields also end up in `value`.
while IFS=$'\t' read -r key value
do
# Append to any value that's already there, separated by a tab.
merged[$key]="${merged[$key]}"$'\t'"$value"
done
# Loop over the input keys. Note that the order is arbitrary;
# pipe through `sort` if you want a predictable order.
for key in "${!merged[#]}"
do
# Each value is prefixed with a tab, so no need for a tab here.
echo "$key${merged[$key]}"
done
} < INPUT_FILE.tsv