Awk processing of filenames containing backslash madness - awk

I spent a whole day trying to process some files with backslashes and spaces inside their names. No matter what I do awk (gawk) refuses to print backslashes:
echo "this/pathname/contains/spa ces/and/back\\slashes" | xargs -d'\n' -n1 -I{} bash -c 'echo "{}"; echo whatever | gawk "{printf {}}"'
this/pathname/contains/spa ces/and/back\slashes
gawk: {printf this/pathname/contains/spa ces/and/back\slashes}
gawk: ^ syntax error
gawk: {printf this/pathname/contains/spa ces/and/back\slashes}
gawk: ^ backslash not last character on line
This didn't work since the backspace gets directly into awk code.
echo "this/pathname/contains/spa ces/and/back\\slashes" | xargs -d'\n' -n1 -I{} bash -c 'echo "{}"; echo whatever | gawk "{printf \"{}\"}"'
this/pathname/contains/spa ces/and/back\slashes
gawk: warning: escape sequence `\s' treated as plain `s'
this/pathname/contains/spa ces/and/backslashes
This worked, but awk eats the backslash. As you can see above, echo prints it but awk doesn't.
echo "this/pathname/contains/spa ces/and/back\\slashes" | ./escape.sh | xargs -d'\n' -n1 -I{} bash -c 'echo "{}"; echo whatever | gawk "{printf \"{}\"}"'
this/pathname/contains/spa\ ces/and/back\slashes
gawk: warning: escape sequence `\ ' treated as plain ` '
gawk: warning: escape sequence `\s' treated as plain `s'
Next I tried escaping the filenames using escape.sh
#!/bin/bash
xargs -d'\n' -n1 -I{} bash -c 'echo $(printf "%q" "{}")'
Now there's a double backslash in there but awk still complains.
echo "this/pathname/contains/spa ces/and/back\\slashes" | ./escape.sh | xargs -d'\n' -n1 -I{} bash -c 'echo "{}"; echo whatever | gawk -v VAR=$(printf "%q" "{}") "{printf VAR}"'
this/pathname/contains/spa\ ces/and/back\slashes
gawk: ces/and/back\\slashes
gawk: ^ syntax error
gawk: ces/and/back\\slashes
gawk: ^ unterminated regexp
Now awk said some nonsense about some unterminated regexp.
Any ideas? Thanks!

You are solving the wrong problem: Regardless of the tool, backslashes and spaces in filenames on UNIX-Systems will always mean extra work. In my opinion you should sanitize the filenames, then process them.
Try:
sed "s/ /_/g;s/\\\\/-/g"
HTH Chris

The fix is just to double every backslash that is fed into mawk, either in the input or via variables.
Like this:
# awk needs escaped backslashes
VAR=$(echo "$1" | sed -r 's:\\:\\\\:g')
mawk -v VAR="$VAR" -f "script.awk"
Therefore, if a filename containing backslashes is passed inside $1, this is how you obtain the expected result.

I don't understand why you're piping into xargs. Is that a requirement of your process? Can you do something like this:
filename='this/pathname/contains/spa ces/and/back\slashes'
awk -v "fname=$filename" 'BEGIN {print fname}'

Related

Running an awk command with $SHELL -c returns different results

I am trying to use awk to print the unique lines returned by a command. For simplicity, assume the command is ls -alh.
If I run the following command in my Z shell, awk shows all lines printed by ls -alh
ls -alh | awk '!seen[$0]++'
However, if I run the same command with $SHELL -c while escaping the ! with backslash, I only see the first line of the output printed.
$SHELL -c "ls -alh | awk '\!seen[$0]++'"
How can I ensure the latter command prints the exact same outputs as the former?
EDIT 1:
I initially thought the ! could be the issue. But changing the expression '!seen[$0]++' to 'seen[$0]++==0' has the same problem.
EDIT 2:
It looks like I should have escaped $ too. Since I do not know the reason behind it, I will not post an answer.
In the second form, $0 is being treated as a shell variable in the double-quoted string. The substitution creates an interestingly mangled awk command:
> print $SHELL -c "ls -alh | awk '\!seen[$0]++'"
/bin/zsh -c ls -alh | awk '!seen[-zsh]++'
The variable is not substituted in the first form since it is inside single quotes.
This answer discusses how single- and double-quoted strings are treated in bash and zsh:
Difference between single and double quotes in Bash
Escaping the $ so that $0 is passed to awk should work, but note that quoting in commands that are parsed multiple times can get really tricky.
> print $SHELL -c "ls -alh | awk '\!seen[\$0]++'"
/bin/zsh -c ls -alh | awk '!seen[$0]++'

Print paragraph if it contains a string stored in a variable (blank lines separate paragraphs)

I am trying to isolate the header of a mail in the /var/spool/mail/mysuser file.
Print a paragraph if it contains AAA (blank lines separate paragraphs)
sed is working when searching with the string "AAA"
$ sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;' /var/spool/mail/mysuser
When using a variable is does not work :
$ MyVar="AAA"
$ sed -e '/./{H;$!d;}' -e 'x;/$MyVar/!d;' /var/spool/mail/mysuser
=> No output as the single quotes prevent the expantion of the variable
Trying with singles quotes
$ sed -e "/./{H;$!d;}" -e "x;/$MyVar/!d; /var/spool/mail/mysuser
sed: -e expression #2, char 27: extra characters after command
Actually, the first search is also not working with doubles quotes
$ sed -e "/./{H;$!d;}" -e 'x;/AAA/!d;" /var/spool/mail/mysuser
sed -e "/./{H;$!d;}" -e "x;/AAA/date;" /var/spool/mail/mysuser
sed: -e expression #2, char 9: extra characters after command
I am also considering awk without success so far
Any advices ?
should be trivial with awk
$ awk -v RS= '/AAA/' file
with a variable, little more needed
$ awk -v RS= -v var='AAA' '$0~var'
or if it's defined elsewhere
$ awk -v RS= -v var="$variable_holding_value" '$0~var'
That is happening because of the single quotes. You need to go out of the single quotes to enable interpolation:
sed -e '/./{H;$!d;}' -e 'x;/'$MyVar'/!d;' /var/spool/mail/mysuser
or, better put the variable in double quotes:
sed -e '/./{H;$!d;}' -e 'x;/'"$MyVar"'/!d;' /var/spool/mail/mysuser
Thanks to karakfa
It works with :
MyVar="AAA"
awk -v RS= -v X=$MyVar '$0~X' file

How does gawk -e 'BEGIN {' -e 'print "hello" }' work?

Gawk 5.0.0 was released on April 12, 2019. Going through the announcement I found this:
Changes from 4.2.1 to 5.0.0
(...) 11. Namespaces have been implemented! See the manual. One consequence of this is that files included with -i, read with -f, and command line program segments must all be self-contained syntactic units. E.g., you can no longer do something like this:
gawk -e 'BEGIN {' -e 'print "hello" }'
I was curious about this behaviour that is no longer supported, but unfortunately my Gawk 4.1.3 did not offer much output out of it:
$ gawk -e 'BEGIN {' -e 'print "hello" }'
gawk: cmd. line:1: BEGIN {
gawk: cmd. line:1: ^ unexpected newline or end of string
From what I see in the manual of GAWK 4.2, the -e option was marked as problematic already:
GNU Awk User's Guide, on Options
-e program-text
--source program-text
Provide program source code in the program-text. This option allows you to mix source code in files with source code that you enter on the command line. This is particularly useful when you have library functions that you want to use from your command-line programs (see AWKPATH Variable).
Note that gawk treats each string as if it ended with a newline character (even if it doesn’t). This makes building the total program easier.
CAUTION: At the moment, there is no requirement that each program-text be a full syntactic unit. I.e., the following currently works:
$ gawk -e 'BEGIN { a = 5 ;' -e 'print a }'
-| 5
However, this could change in the future, so it’s not a good idea to rely upon this feature.
But, again, this fails in my console:
$ gawk -e 'BEGIN {a=5; ' -e 'print a }'
gawk: cmd. line:1: BEGIN {a=5;
gawk: cmd. line:1: ^ unexpected newline or end of string
So what is gawk -e 'BEGIN {' -e 'print "hello" }' doing exactly on Gawk < 5?
It's doing just what you'd expect - concatenating the parts to form gawk 'BEGIN {print "hello" }' and then executing it. You can actually see how gawk is combining the code segments by pretty-printing it:
$ gawk -o- -e 'BEGIN {' -e 'print "hello" }'
BEGIN {
print "hello"
}
That script isn't useful to be written in sections and concatenated but if you consider something like:
$ cat usea.awk
{ a++ }
$ echo foo | gawk -e 'BEGIN{a=5}' -f usea.awk -e 'END{print a}'
6
then you can see the intended functionality might be useful for mixing some command-line code with scripts stored in files to run:
$ gawk -o- -e 'BEGIN{a=5}' -f usea.awk -e 'END{print a}'
BEGIN {
a = 5
}
{
a++
}
END {
print a
}

Assign variable to cut -f field

Using cut, I want to know how to use it as:
awk -v id=3 -v RS= -F '::' '($1==id) {print $3}' jenny | a=1 ;cut -d$'\n' -f$a
I want to use it in a loop where i is replaced with, e.g., -f 1...3
Input
0::chkconfig --list autofs::
autofs 0:off 1:off 2:on 3:on 4:on 5:on 6:off
1::grep "^PROMPT=" /etc/sysconfig/init::
PROMPT=yes
2::rpm -q prelink::
prelink-0.4.0-2.el5
3::if [ -z "$(grep -l "hard core" /etc/security/limits.conf /etc/security/limits.d/*)" ]; then echo "empty"; else echo -e "$(grep -l "hard core" /etc/security/limits.conf /etc/security/limits.d/*)"; fi::
/etc/security/limits.conf
/etc/security/limits.d/test
4::sysctl fs.suid_dumpable::
fs.suid_dumpable = 0
5::stat /etc/motd::
File: `/etc/motd'
Size: 17 Blocks: 16 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 10125343 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2019-04-09 07:56:19.000000000 +0500
Modify: 2019-03-30 19:22:13.000000000 +0500
Change: 2019-03-30 19:22:13.000000000 +0500
Expected Output
/etc/security/limits.conf
/etc/security/limits.d/test
As field 1 and field currently it's all coming in $3. I tried separating with newline in awk; it doesn't seem to catch.
To get your desired output from the given input, try:
$ awk '/^$/{f=0} f{print} /3::/{f=1}' file
/etc/security/limits.conf
/etc/security/limits.d/test
To get only one output line as selected with a variable i:
$ awk -v i=1 '/3::/{n=NR+i} n==NR' file
/etc/security/limits.conf
$ awk -v i=2 '/3::/{n=NR+i} n==NR' file
/etc/security/limits.d/test
The awk variable i can, of course, be set to the value of a shell variable i:
$ i=2
$ awk -v i="$i" '/3::/{n=NR+i} n==NR' file
/etc/security/limits.d/test
The stanza can also be selected from a variable:
$ i=2
$ k=3
$ awk -v i="$i" -v k="$k" -F:: '$1==k{n=NR+i} n==NR' file
/etc/security/limits.d/test
How it works:
-v i="$i" -v k="$k"
These options set awk variable i and k to the values of the shell variables $i and $k, respectively.
-F::
This sets the field separator to ::.
$1==k {n=NR+i}
If the first field of the current line equals the variable k, then set variable n to the current line number, NR, plus i.
n==NR
If the current line number, NR, is n, then print this line.
With sed:
$ id=3; sed -En "/^$id::/,/^$/{/^[[:blank:]]*\//p}" jenny
/etc/security/limits.conf
/etc/security/limits.d/test
Explanations:
Your shell will interpret the command and replace id by its value.
/^$id::/,/^$/{} the scope {} will be executed only between the lines that starts with the value of id followed by :: (/^$id::/) until an empty line (/^$/)
/^[[:blank:]]*\//p for the lines that start with some POSIX blank character class (e.g. space/tab) followed by / print the line. This will print your two paths.
To specify a line:
$ id=3; line=1; sed -En "/^$id::/,/^$/{/^[[:blank:]]*\//p}" jenny | cut -d$'\n' -f"$line"
/etc/security/limits.conf
$ id=3; line=2; sed -En "/^$id::/,/^$/{/^[[:blank:]]*\//p}" jenny | cut -d$'\n' -f"$line"
/etc/security/limits.d/test
$ id=3; line=1; sed -En "/^$id::/,/^$/{/^[[:blank:]]*\//p}" jenny | sed -n "${line}p"
/etc/security/limits.conf
$ id=3; line=2; sed -En "/^$id::/,/^$/{/^[[:blank:]]*\//p}" jenny | sed -n "${line}p"
/etc/security/limits.d/test
Assuming you want to build onto your previous question rather than coming up with a completely different approach
$ awk -v id=3 -v lineNr=1 -v RS= -F '::' '$1==id{ split($3,lines,/\n/); print lines[lineNr+1] }' file
/etc/security/limits.conf
$ awk -v id=3 -v lineNr=2 -v RS= -F '::' '$1==id{ split($3,lines,/\n/); print lines[lineNr+1] }' file
/etc/security/limits.d/test

Linux Grep or Awk to find strings and store into array

I would like to print the string in the following pattern. And I would like to store it in a array. Please help me, I need O/p as follows
test11
orcl
My commands/Tries
egrep -i ":Y|:N" /etc/oratab | cut -d":" -f1 | grep -v "\#" | grep -v "\*" | tr -d '\n' | sed 's/ /\n/g' | awk '{print $1}'
Above commands O/p:
test11orcl
Contents of Oratab will be as follows,
[oracle#rhel6112 scripts]$ cat/etc/oratab
#
# This file is used by ORACLE utilities. It is created by root.sh
# and updated by the Database Configuration Assistant when creating
# a database.
# A colon, ':', is used as the field terminator. A new line terminates
# the entry. Lines beginning with a pound sign, '#', are comments.
#
# Entries are of the form:
# $ORACLE_SID:$ORACLE_HOME:<N|Y>:
#
# Multiple entries with the same $ORACLE_SID are not allowed.
#
#
test11:/u01/app/oracle/product/11.2.0/dbhome_1:N
orcl:/u01/app/oracle/product/10.2.0/db_1:N
End of Cat Output
From the above file am trying to extract the STRING before the :/
As a start, try this:
$ cat input.txt
test11:/u01/app/oracle/product/11.2.0/dbhome_1:N
orcl:/u01/app/oracle/product/10.2.0/db_1:N
$ awk -F: '{print $1}' input.txt
test11
orcl
update
Using bash:
#!/bin/bash
ARRAY=()
while read -r line
do
[[ "$line" = \#* ]] && continue
data=$(awk -F: '{print $1}' <<< $line)
ARRAY+=($data)
done < input.txt
for i in "${ARRAY[#]}"
do
echo "$i"
done
In action:
$ ./db.sh
test11
orcl
You could use sed also,
sed -r 's/^([^:]*):.*$/\1/g' file
Example:
$ cat cc
test11:/u01/app/oracle/product/11.2.0/dbhome_1:N
orcl:/u01/app/oracle/product/10.2.0/db_1:N
$ sed -r 's/^([^:]*):.*$/\1/g' cc
test11
orcl
OR
$ sed -nr 's/^(.*):\/.*$/\1/p' file
test11
orcl