sed or awk with variable and special charachters - awk

I want to use awk or sed to substitute 1 line in my file:
my file content is:
server.modules += ( "mod_redirect" )
$SERVER["socket"] == ":8080" {
$HTTP["host"] =~ "(.*)" {
url.redirect = ( "^/(.*)" => "https://someurl.com/unauthorised" )
}
}
I want to change the line containing url.redirect
The new line is in variable and containing some special charachter will be somthing like this url.redirect = ( "^/(.*)" => "http://newurl.com/newpath" )
so I used the following sed comand:
sed "/url\.redirect =/s/.*/$newline/" 10-redirect.conf
But I got error related to the special characters inside the newline variable.
the newline is an argument of my shell function so I can not change it and add some skip characters inside.
How to use variables with special charachters in sed or awk?

With GNU sed and c command (which replaces the matched lines with the string provided). If there are spaces at the start of string, prefix \ to preserve them
sed '/url\.redirect =/c\'"$newline"
However, c command will still allow escape sequences, for example:
$ s=' rat\tdog\nwolf'
$ seq 3 | sed '2c\'"$s"
1
rat dog
wolf
3
To add contents literally and robustly, use r command
echo "$newline" | sed -e '/url\.redirect =/ {r /dev/stdin' -e 'd}'
Here's r command in action
$ s=' rat\tdog\nwolf'
$ echo "$s" | sed -e '2 {r /dev/stdin' -e 'd}' <(seq 3)
1
rat\tdog\nwolf
3

Could you please try following. This should put same spaces in front of new value what were present before.
newline='url.redirect = ( "^/(.*)" => "https://example.com/authorised" )'
awk -v line="$newline" '
/url.redirect =/{
match($0,/^ +/)
print substr($0,RSTART,RLENGTH) line
next
}
1
' Input_file

You are getting error in sed because it is using a regex and your replacement string may contain meta-characters such as & or / (delimiters) etc.
This awk would be safer to use due to non-regex approach:
newline='url.redirect = ( "^/(.*)" => "https://example.com/authorised" )'
awk -i inplace -v line="$newline" 'index($0, "url.redirect =") {
sub(/[^[:blank:]].*/, "")
$0 = $0 line
} 1' file
server.modules += ( "mod_redirect" )
$SERVER["socket"] == ":8080" {
$HTTP["host"] =~ "(.*)" {
url.redirect = ( "^/(.*)" => "https://example.com/authorised" )
}
}
Note that using ENVIRON would allow all the shell special characters to
awk:
export newline
awk -i inplace 'index($0, "url.redirect =") {
sub(/[^[:blank:]].*/, "")
$0 = $0 ENVIRON["newline"]
} 1' file

Related

How to use variable with or operator | in awk

The following works:
awk '
NR==FNR { sub(/\.(png|txt|jpg|json)$/,""); a[$0]; next }
{ f=$0; sub(/\.(png|txt|jpg|json)$/,"", f) }
!(f in a)
' comp1.txt comp2.txt > result.txt
and now I want it to take the file endings that shall be ignored in the comparison as a variable, but cannot get it to work. My attempt below just compares without ignoring any file endings. I tried with $ and without, with () and without, escaping the |, but so far without success. What is the correct solution?
fileEndingsToIgnore="png|txt|jpg|json"
awk -v fileEndingsToIgnore="${fileEndingsToIgnore}" '
NR==FNR { sub(/\.(fileEndingsToIgnore)$/,""); a[$0]; next }
{ f=$0; sub(/\.(fileEndingsToIgnore)$/,"", f) }
!(f in a)
' comp1.txt comp2.txt > result.txt
GNU AWK doesn't allow you to use variable inside regular expression literal, you might use string variable with ~ and !~ and many String functions, however in such case you need to double escaping as explained in Using Dynamic Regexps. Consider following example, let say you want to output only .txt and .json filenames without extension and you have file.txt with content as follows
file1.txt
file2.bmp
file3.json
then
awk 'BEGIN{s="\\.(txt|json)$"}sub(s,""){print}' file.txt
gives output
file1
file3
Observe \\ rather than \.
(tested in GNU Awk 5.0.1)
One workaround is to dynamically build the regex and store it in a variable, then use the variable in the sub() call.
Sample input file:
$ cat test.file
abc.txt
def.jpg
ghi.exe
jkl.dat
123.json
456.ini
789.pngX
000.png
111.dat
One awk idea:
fileEndingsToIgnore="png|txt|jpg|json"
awk -v fileEndingsToIgnore="${fileEndingsToIgnore}" '
BEGIN { regex="\\.(" fileEndingsToIgnore ")$" } # need to escape the escape char, ie, "\\"
{ out=$1
sub(regex,"",out)
printf "%s => %s\n",$0,out
}
' test.file
This generates:
abc.txt => abc
def.jpg => def
ghi.exe => ghi.exe
jkl.dat => jkl.dat
123.json => 123
456.ini => 456.ini
789.pngX => 789.pngX
000.png => 000
111.dat => 111.dat
Applying this to OP's current code:
fileEndingsToIgnore="png|txt|jpg|json"
awk -v fileEndingsToIgnore="${fileEndingsToIgnore}" '
BEGIN { regex="\\.(" fileEndingsToIgnore ")$" }
NR==FNR { sub(regex,""); a[$0]; next }
{ f=$0; sub(regex,"", f) }
!(f in a)
' comp1.txt comp2.txt > result.txt
I think this should be generic enough :
"-v FS=..." is the list of file extensions to exclude, case sensitive:
mawk -v FS='mp[34]|txt|sh|awk' 'BEGIN { _^= FS = "[.]("FS")$"
split("",__) } FNR==NR ? __[$_] : NF<=($_ in __)' file file

How can I split a long text into array based on empty lines?

I have a text file with following content:
...
LogLevelMax=-1
Id=keyboard-setup.service
LogLevelMax=-1
Id= networkd-dispatcher.service
LogLevelMax=-1
Id=systemd-remote-fs.service
LogLevelMax=-1
Id=systemd-journal-flush.service
LogLevelMax=-1
Id=some-other.service
...
I want to save them into an associative array, being key 'Id', value 'LogLevelMax'.
Between each "entity" there are exactly 2 new lines. Between LogLevelMax and Id there is exactly one new line.
First, I try to replace 2 empty lines with a character '#':
cat file.txt | tr "\n\n" "#". But it replaces all new lines with '#', not only exactly 2 new lines.
How can I do it in bash with sed, awk, regex or bash functions?
Thanks.
With awk:
parse.awk
BEGIN {
RS=""
FS=" *[\n=] *"
}
# Copy references into the h associative array
{ h[$4] = $2 }
# Print collected key/value pairs
END {
for (k in h)
print k " -> " h[k]
}
Run it e.g. like this:
awk -f parse.awk infile | column -t
Output:
networkd-dispatcher.service -> -1
keyboard-setup.service -> -1
systemd-remote-fs.service -> -1
systemd-journal-flush.service -> -1
some-other.service -> -1
With bash:
declare -A array
while IFS='=' read -r a b; do
if [[ "$a" == "Id" ]]; then
array+=(["$b"]="$c")
fi
c="$b"
done < file
And then:
$ for k in "${!array[#]}"; do printf '%s : %s\n' "$k" "${array[$k]}"; done
systemd-journal-flush.service : -1
keyboard-setup.service : -1
systemd-remote-fs.service : -1
networkd-dispatcher.service : -1
some-other.service : -1
I would use awk and Bash this way:
declare -A aarr
while read -r key val; do
aarr["$key"]="$val"
done < <(awk '{print $4, $2}' RS='\n\n' FS="[[:space:]]*[=\n][[:space:]]*" file)
Result:
$ declare -p aarr
declare -A aarr=([systemd-journal-flush.service]="-1" [keyboard-setup.service]="-1" [systemd-remote-fs.service]="-1" [networkd-dispatcher.service]="-1" [some-other.service]="-1" )
If it is possible that there are white spaces in the fields, you can do this instead:
while IFS=# read -r key val; do
aarr["$key"]="$val"
done < <(awk '{print $4 "#" $2}' RS= FS="[[:space:]]*[=\n][[:space:]]*" file)
Where # is a delimiter that is not in your fields.

Check for multi-line content in a file

I'm trying to check if a multi-line string exists in a file using common bash commands (grep, awk, ...).
I want to have a file with a few lines, plain lines, not patterns, that should exists in another file and create a command (sequence) that checks if it does. If grep could accept arbitrary multiline patterns, I'd do it with something similar to
grep "`cat contentfile`" targetfile
As with grep I'd like to be able to check the exit code from the command. I'm not really interested in the output. Actually no output would be preferred since then I don't have to pipe to /dev/null.
I've searched for hints, but can't come up with a search that gives any good hits. There's How can I search for a multiline pattern in a file?, but that is about pattern matching.
I've found pcre2grep, but need to use "standard" *nix tools.
Example:
contentfile:
line 3
line 4
line 5
targetfile:
line 1
line 2
line 3
line 4
line 5
line 6
This should match and return 0 since the sequence of lines in the content file is found (in the exact same order) in the target file.
EDIT: Sorry for not being clear about the "pattern" vs. "string" comparison and the "output" vs. "exit code" in the previous versions of this question.
You didn't say if you wanted a regexp match or string match and we can't tell since you named your search file "patternfile" and a "pattern" could mean anything and at one point you imply you want to do a string match (check if a multi-line _string_ exists) but then you're using grep and pcregpre with no stated args for string rather than regexp matches.
In any case, these will do whatever it is you want using any awk (which includes POSIX standard awk and you said you wanted to use standard UNIX tools) in any shell on every UNIX box:
For a regexp match:
$ cat tst.awk
NR==FNR { pat = pat $0 ORS; next }
{ tgt = tgt $0 ORS }
END {
while ( match(tgt,pat) ) {
printf "%s", substr(tgt,RSTART,RLENGTH)
tgt = substr(tgt,RSTART+RLENGTH)
}
}
$ awk -f tst.awk patternfile targetfile
line 3
line 4
line 5
For a string match:
$ cat tst.awk
NR==FNR { pat = pat $0 ORS; next }
{ tgt = tgt $0 ORS }
END {
lgth = length(pat)
while ( beg = index(tgt,pat) ) {
printf "%s", substr(tgt,beg,lgth)
tgt = substr(tgt,beg+lgth)
}
}
$ awk -f tst.awk patternfile targetfile
line 3
line 4
line 5
Having said that, with GNU awk you could do the following if you're OK with a regexp match and backslash interpretation of the patternfile contents (so \t is treated as a literal tab):
$ awk -v RS="$(cat patternfile)" 'RT!=""{print RT}' targetfile
line 3
line 4
line 5
or with GNU grep:
$ grep -zo "$(cat patternfile)" targetfile | tr '\0' '\n'
line 3
line 4
line 5
There are many other options depending on what kind of match you're really trying to do and which tools versions you have available.
EDIT: Since OP needs outcome of command in form of true or false(yes or no), so edited command in that manner now(created and tested in GNU awk).
awk -v message="yes" 'FNR==NR{a[$0];next} ($0 in a){if((FNR-1)==prev){b[++k]=$0} else {delete b;k=""}} {prev=FNR}; END{if(length(b)>0){print message}}' patternfile targetfile
Could you please try following, tested with given samples and it should print all continuous lines from pattern file if they are coming in same order in target file(count should be at least 2 for continuous lines in this code).
awk '
FNR==NR{
a[$0]
next
}
($0 in a){
if((FNR-1)==prev){
b[++k]=$0
}
else{
delete b
k=""
}
}
{
prev=FNR
}
END{
for(j=1;j<=k;j++){
print b[j]
}
}' patternfile targetfile
Explanation: Adding explanation for above code here.
awk ' ##Starting awk program here.
FNR==NR{ ##FNR==NR will be TRUE when first Input_file is being read.
a[$0] ##Creating an array a with index $0.
next ##next will skip all further statements from here.
}
($0 in a){ ##Statements from here will will be executed when 2nd Input_file is being read, checking if current line is present in array a.
if((FNR-1)==prev){ ##Checking condition if prev variable is equal to FNR-1 value then do following.
b[++k]=$0 ##Creating an array named b whose index is variable k whose value is increment by 1 each time it comes here.
}
else{ ##Mentioning else condition here.
delete b ##Deleting array b here.
k="" ##Nullifying k here.
}
}
{
prev=FNR ##Setting prev value as FNR value here.
}
END{ ##Starting END section of this awk program here.
for(j=1;j<=k;j++){ ##Starting a for loop here.
print b[j] ##Printing value of array b whose index is variable j here.
}
}' patternfile targetfile ##mentioning Input_file names here.
another solution in awk:
echo $(awk 'FNR==NR{ a[$0]; next}{ x=($0 in a)?x+1:0 }x==length(a){ print "OK" }' patternfile targetfile )
This returns "OK" if there is a match.
a one-liner:
$ if [ $(diff --left-column -y patternfile targetfile | grep '(' -A1 -B1 | tail -n +2 | head -n -1 | wc -l) == $(cat patternfile | wc -l) ]; then echo "ok"; else echo "error"; fi
explanation:
first is to compare the two files using diff:
diff --left-column -y patternfile targetfile
> line 1
> line 2
line 3 (
line 4 (
line 5 (
> line 6
then filter to show only interesting lines, which are the lines the '(', plus extra 1-line before, and after match, to check if lines in patternfile match without a break.
diff --left-column -y patternfile targetfile | grep '(' -A1 -B1
> line 2
line 3 (
line 4 (
line 5 (
> line 6
Then leave out the first, and last line:
diff --left-column -y patternfile targetfile | grep '(' -A1 -B1 | tail -n +2 | head -n -1
line 3 (
line 4 (
line 5 (
add some code to check if the number of lines match the number of lines in the patternfile:
if [ $(diff --left-column -y patternfile targetfile | grep '(' -A1 -B1 | tail -n +2 | head -n -1 | grep '(' | wc -l) == $(cat patternfile | wc -l) ]; then echo "ok"; else echo "error"; fi
ok
to use this with a return-code, a script could be created like this:
#!/bin/bash
patternfile=$1
targetfile=$2
if [ $(diff --left-column -y $patternfile $targetfile | grep '(' -A1 -B1 | tail -n +2 | head -n -1 | grep '(' | wc -l) == $(cat $patternfile | wc -l) ];
then
exit 0;
else
exit 1;
fi
The test (when above script is named comparepatterns):
$ comparepatterns patternfile targgetfile
echo $?
0
The easiest way to do this is to use a sliding window. First you read the pattern file, followed by file to search.
(FNR==NR) { a[FNR]=$0; n=FNR; next }
{ b[FNR]=$0 }
(FNR >= n) { for(i=1; i<=n;++i) if (a[i] != b[FNR-n+i]) { delete b[FNR-n+1]; next}}
{ print "match at", FNR-n+1}
{ r=1}
END{ exit !r}
which you call as
awk -f script.awk patternFile searchFile
Following up on a comment from Cyrus, who pointed to How to know if a text file is a subset of another, the following Python one-liner does the trick
python -c "content=open('content').read(); target=open('target').read(); exit(0 if content in target else 1);"
Unless you're talking about 10 GB+, here's an awk-based solution that's fast and clean :
mawk '{ exit NF==NR }' RS='^$' FS="${multiline_pattern}"
The pattern exists only in the file "${m2p}"
which is embedded within multi-file pipeline of 1st test,
but not 2nd one
This solution, for now, doesn't auto handle instances where regex meta-character escaping is needed. Alter it as you see fit.
Unless the pattern occurs far too often, it might even save time to do it all at once instead of having to check line-by-line, including saving lines along the way in some temp pattern space.
NR is always 1 there since RS is forced to the tail end of the input. NF is larger than 1 only when the pattern is found. By evaluating exit NF == NR, it inverts the match, thus matching structure of posix exit codes.
% echo; ( time ( \
\
echo "\n\n multi-line-pattern :: \n\n " \
"-------------\n${multiline_pattern}\n" \
" -----------\n\n " \
"$( nice gcat "${m2m}" "${m3m}" "${m3l}" "${m2p}" \
"${m3r}" "${m3supp}" "${m3t}" | pvE0 \
\
| mawk2 '{ exit NF == NR
}' RS = '^$' \
FS = "${multiline_pattern}" \
\
) exit code : ${?} " ) ) | ecp
in0: 3.10GiB 0:00:01 [2.89GiB/s] [2.89GiB/s] [ <=> ]
( echo ; ) 0.77s user 1.74s system 110% cpu 2.281 total
multi-line-pattern ::
-------------
77138=1159=M
77138=1196=M
77138=1251=M
77138=1252=M
77138=4951=M
77138=16740=M
77138=71501=M
-----------
exit code : 0
% echo; ( time ( \
\
echo "\n\n multi-line-pattern :: \n\n " \
"-------------\n${multiline_pattern}\n" \
" -----------\n\n " \
"$( nice gcat "${m2m}" "${m3m}" "${m3l}" \
"${m3r}" "${m3supp}" "${m3t}" | pvE0 \
\
| mawk2 '{ exit NF == NR
}' RS = '^$' \
FS = "${multiline_pattern}" \
\
) exit code : ${?} " ) ) | ecp
in0: 2.95GiB 0:00:01 [2.92GiB/s] [2.92GiB/s] [ <=> ]
( echo ; ) 0.64s user 1.65s system 110% cpu 2.074 total
multi-line-pattern ::
-------------
77138=1159=M
77138=1196=M
77138=1251=M
77138=1252=M
77138=4951=M
77138=16740=M
77138=71501=M
-----------
exit code : 1
If your pattern is the full file, then something like this - even when using the full file as a single gigantic 153 MB pattern, it finished in less than 2.4 secs against ~3 GB input.
echo
( time ( nice gcat "${m2m}" "${m3m}" "${m3l}" "${m3r}" "${m3supp}" "${m3t}" | pvE0 \
\
| mawk2 -v pattern_file="${m2p}" '
BEGIN {
RS = "^$"
getline FS < pattern_file
close(pattern_file)
} END {
exit NF == NR }' ; echo "\n\n exit code :: $?\n\n" ))|ecp;
du -csh "${m2p}" ;
( time ( nice gcat "${m2m}" "${m3m}" "${m3l}" \
"${m2p}" "${m3r}" "${m3supp}" "${m3t}" | pvE0 \
\
| mawk2 -v pattern_file="${m2p}" '
BEGIN {
RS = "^$"
getline FS < pattern_file
close(pattern_file)
} END {
exit NF == NR }' ; echo "\n\n exit code :: $?\n\n" ))|ecp;
in0: 2.95GiB 0:00:01 [2.58GiB/s] [2.58GiB/s] [ <=> ]
( nice gcat "${m2m}" "${m3m}" "${m3l}" "${m3r}" "${m3supp}" "${m3t}" | pvE 0.)
0.82s user 1.71s system 111% cpu 2.260 total
exit code :: 1
153M /Users/************/m2map_main.txt
153M total
in0: 3.10GiB 0:00:01 [2.56GiB/s] [2.56GiB/s] [ <=> ]
( nice gcat "${m2m}" "${m3m}" "${m3l}" "${m2p}" "${m3r}" "${m3supp}" "${m3t}")
0.83s user 1.79s system 112% cpu 2.339 total
exit code :: 0
Found a portable solution using patch command. The idea is to create a diff/patch in remove direction and check if it could be applied to the source file. Sadly there is no option for a dry-run (in my old patch version). So we've to do the patch and remove the temporary files.
The shell part around is optimized for my ksh usage:
file_in_file() {
typeset -r vtmp=/tmp/${0%.sh}.$$.tmp
typeset -r vbasefile=$1
typeset -r vcheckfile=$2
typeset -ir vlines=$(wc -l < "$vcheckfile")
{ echo "1,${vlines}d0"; sed 's/^/< /' "$vcheckfile"; } |
patch -fns -F0 -o "$vtmp" "$vbasefile" >/dev/null 2>&1
typeset -ir vrc=$?
rm -f "$vtmp"*
return $vrc
}
Explanation:
set variables for local usage (on newer bash you should use declare instead)
count lines of input file
create a patch/diff file in-memory (the line with the curly brackets)
use patch with strict settings patch -F0
cleanup (also eventually created reject files: rm -f "$vtmp"*)
return RC of patch

Can't replace string to multi-lined string with sed

I'm trying to replace a fixed parse ("replaceMe") in a text with multi-lined text with sed.
My bash script goes as follows:
content=$(awk'{print $5}' < data.txt | sort | uniq)
target=$(cat install.sh)
text=$(sed "s/replaceMe/$content/" <<< "$target")
echo "${text}"
If content contains one line only, replacing works, but if it contains sevrel lines I get:
sed:... untarminated `s' command
I read about "fetching" multi-lined content, but I couldn't find something about placing multi lined string
You'll have more problems than that depending on the contents of data.txt since sed doesn't understand literal strings (see Is it possible to escape regex metacharacters reliably with sed). Just use awk which does:
text="$( awk -v old='replaceMe' '
NR==FNR {
if ( !seen[$5]++ ) {
new = (NR>1 ? new ORS : "") $5
}
next
}
s = index($0,old) { $0 = substr($0,1,s-1) new substr($0,s+length(old)) }
{ print }
' data.txt install.sh )"

Redirect input for gawk to a system command

Usually a gawk script processes each line of its stdin. Is it possible to instead specify a system command in the script use the process each line from output of the command in the rest of the script?
For example consider the following simple interaction:
$ { echo "abc"; echo "def"; } | gawk '{print NR ":" $0; }'
1:abc
2:def
I would like to get the same output without using pipe, specifying instead the echo commands as a system command.
I can of course use the pipe but that would force me to either use two different scripts or specify the gawk script inside the bash script and I am trying to avoid that.
UPDATE
The previous example is not quite representative of my usecase, this is somewhat closer:
$ { echo "abc"; echo "def"; } | gawk '/d/ {print NR ":" $0; }'
2:def
UPDATE 2
A shell script parallel would be as follows. Without the exec line the script would read from stdin; with the exec it would use the command that line as input:
/tmp> cat t.sh
#!/bin/bash
exec 0< <(echo abc; echo def)
while read l; do
echo "line:" $l
done
/tmp> ./t.sh
line: abc
line: def
From all of your comments, it sounds like what you want is:
$ cat tst.awk
BEGIN {
if ( ("mktemp" | getline file) > 0 ) {
system("(echo abc; echo def) > " file)
ARGV[ARGC++] = file
}
close("mktemp")
}
{ print FILENAME, NR, $0 }
END {
if (file!="") {
system("rm -f \"" file "\"")
}
}
$ awk -f tst.awk
/tmp/tmp.ooAfgMNetB 1 abc
/tmp/tmp.ooAfgMNetB 2 def
but honestly, I wouldn't do it. You're munging what the shell is good at (creating/destroying files and processes) with what awk is good at (manipulating text).
I believe what you're looking for is getline:
awk '{ while ( ("echo abc; echo def" | getline line) > 0){ print line} }' <<< ''
abc
def
Adjusting the answer to you second example:
awk '{ while ( ("echo abc; echo def" | getline line) > 0){ counter++; if ( line ~ /d/){print counter":"line} } }' <<< ''
2:def
Let's break it down:
awk '{
cmd = "echo abc; echo def"
# line below will create a line variable containing the ouptut of cmd
while ( ( cmd | getline line) > 0){
# we need a counter because NR will not work for us
counter++;
# if the line contais the letter d
if ( line ~ /d/){
print counter":"line
}
}
}' <<< ''
2:def