How to regex on the dynamic input which may have brackets in it. Here, I am supplying input via the bash command line. This input is coming from some other program that sometimes contains brackets in it and then my simple good old $0 ~ var construct is failing.
Here is my input data:
hello there
this is monk
and this is a random data
piano (sense) is cool
which makes no (sense) to anyone
Command-1: worked, without brackets around the var. Eg: sense
awk -v var='sense' '$0 ~ var {print "worked"}' input
worked
Command-2: worked, when I used . (dot) in place of brackets ( and ).
awk -v var='no .sense.' '$0 ~ var{print "worked"}' input
worked
Command-3: Here I need to supply input with brackets ( and ). Things go crazy and I get no results. awk silently failed by giving a false negative.
awk -v var='no (sense)' '$0 ~ var {print "worked"}' input
I have already tried $0 ~ var and match($0, var) they both exhibits the same behavior. I have also tried, the following but it failed miserably. Although the input var is dynamic I cannot do manual escaping as it is coming from some other program.
awk -v var='no \(sense\)' 'match($0,var){print "worked"}' input
awk: warning: escape sequence `\(' treated as plain `('
awk: warning: escape sequence `\)' treated as plain `)'
Question is, How to supply an input variable that may contain brackets to awk and awk should be able to do sane regex operation on it. Is it just impossible to do?
TLDR:
when working with the above sample input data, when var is no (sense), it should ONLY return which makes no (sense) to anyone
Better to ditch regex and use plain string search using index function:
awk -v var='no (sense)' 'index($0, var) {print "worked"; exit}' file
worked
btw if you want to escape then use \\ to escape special characters like this:
awk -v var='(^|[[:blank:]])no \\(sense\\)([[:blank:]]|$)' '
$0 ~ var {print "worked"; exit}' file
However if you must use regex and you cannot pre-escape content of var then you can escape all special characters in the BEGIN block like this:
awk -v var='no (sense)' '
BEGIN {
gsub(/[^_[:alnum:] ]/, "\\\\&", var)
var = "(^|[[:blank:]])" var "([[:blank:]]|$)"
}
$0 ~ var {print "worked"; exit}
' file
worked
Alternative to escape those characters having special meanings in ERE, you can consider using character class:
$ awk -v var='no [(]sense[)]' '$0 ~ var {print "worked"}' file
worked
IMO, [] could be easier to read than escapes in some cases.
INPUT
hello there
this is monk
and this is a random data
which makes no (sense) to anyone
CODE
{m,n,g}awk -v __='no (sense)' '
BEGIN {
gsub("[[-\140!-/\\]{-~:-#]",
"[&]", __)
gsub(/[\\^]/, "\\\\&",__)
OFS = "worked"
FS = "^.*[^[:alpha:]]?"(__)".*$" } NF*=!_<NF'
OUTPUT
worked
To give a sense what those 2 gsub() does to ASCII :
anything from "!" to "~" that isn't alphanumeric gets
safely "caged" in square brackets,
regardless of whether it's considered metacharacter or not,
which differs among awk flavors.
=
[!] ["] [#] [$] [%] [&] ['] [(]
[)] [*] [+] [,] [-] [.] [/] 0
1 2 3 4 5 6 7 8
9 [:] [;] [<] [=] [>] [?] [#]
A B C D E F G H
I J K L M N O P
Q R S T U V W X
Y Z [[] [\\] []] [\^] [_] [`]
a b c d e f g h
i j k l m n o p
q r s t u v w x
y z [{] [|] [}] [~]
I was trying to do masking of file with command 'tr' and 'awk' but failing with error fatal: cannot open pipe ( Too many open pipes) error. FILE has approx 1000000 records quite a huge number.
Below is the code I am trying :-
awk - F "|" - v OFS="|" '{ "echo \""$1"\" | tr \" 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\" \" QWERTYUIOPASDFGHJKLZXCVBNM9876543210mnbvcxzlkjhgfdsapoiuytrewq\"" | get line $1}1' FILE.CSV > test.CSV
It is showing error :-
awk: (FILENAME=- FNR=1019) fatal: cannot open pipe `echo ""TTP_123"" | tr "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" "QWERTYUIOPASDFGHJKLZXCVBNM9876543210mnbvcxzlkjhgfdsapoiuytrewq"' (Too many open pipes)
Please let me know what I am doing wrong here
Also a Note any number of columns could be used for masking and can be at any positions in this example I have taken 1 and 2 column positions but it could be 3 and 10 or 5,7,25 columns
Thanks
AJ
First things first, you can't have a space between - and F or v.
I was going to suggest sed, but as you only want to translate the first column, that's not as easy.
Unfortunately, awk doesn't have built-in tr functionality, so you'd have to use the shell like you are and just close the pipe:
awk -F "|" -v OFS="|" '{
command="echo \"\\"$1"\\\" | tr \" 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\" \" QWERTYUIOPASDFGHJKLZXCVBNM9876543210mnbvcxzlkjhgfdsapoiuytrewq\""
command | getline $1
close(command)
}1' FILE.CSV > test.CSV
However, I suggest using perl, which can do field splitting and character translation:
perl -F'\|' -lane '$F[0] =~ tr/0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ/QWERTYUIOPASDFGHJKLZXCVBNM9876543210mnbvcxzlkjhgfdsapoiuytrewq/; print join("|", #F)' FILE.CSV > test.CSV
Or, for a shorter command line, just put the program into a file, drop the e in -lane and use the file name instead of the '...' command.
you can do the mapping in awk instead of making a system call for each line, or perhaps simply
paste -d'|' <(cut -d'|' -f1 file | tr '0-9' 'a-z') <(cut -d'|' -f2- file)
replace the tr arguments with yours.
This does not answer your question, but you can implement tr as an awk function that would save having to spawn lots of external processes
$ cat tr.awk
function tr(str, from, to, s,i,c,idx) {
s = ""
for (i=1; i<=length($str); i++) {
c = substr(str, i, 1)
idx = index(from, c)
s = s (idx == 0 ? c : substr(to, idx, 1))
}
return s
}
{
print $1, tr($1,
" 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ",
" QWERTYUIOPASDFGHJKLZXCVBNM9876543210mnbvcxzlkjhgfdsapoiuytrewq")
}
Example:
$ printf "%s\n" hello wor-ld | awk -f tr.awk
hello KGCCN
wor-ld 3N8-CF
I was trying get the total number of "??", " M", "A" and "D" from this:
?? this is a sentence
M this is another one
A more text here
D more and more text
I have this sample line of code but doesn't work:
awk -v pattern="\?\?" '{$1 == pattern} END{print " "FNR}'
$ awk '{ print $1 }' file | sort | uniq -c
1 ??
1 A
1 D
1 M
If for some reason you want an awk-only solution:
awk '{ ++cnt[$1] } END { for (i in cnt) print cnt[i], i }' file
but I think that's needlessly complicated compared to using the built-in unix tools that already do most of the work.
If you just want to count one particular value:
awk -v value='??' '$1 == value' file | wc -l
If you want to count only a subset of values, you can use a regex:
$ awk -v pattern='A|D|(\\?\\?)' '$1 ~ pattern { print $1 }' file | sort | uniq -c
1 ??
1 A
1 D
Here you do need to send a \ in order that the ?s are escaped within the regular expression. And because the \ is itself a special character within the string being passed to awk, you need to escape it first (hence the double backslash).
I have a command that gets the next ID of a table from a pool of sql files, now I am trying to put this command as an alias in ~/.bashrc using a shell function, but I did not figure out how to escape $ so it gets to awk and not replaced by bash, here's the code in .bashrc:
function nextval () {
grep 'INSERT INTO \""$1"\"' *.sql | \
awk '{print $6}' | \
cut -c 2- | \
awk -F "," '{print $1}' | \
sort -n | \
tail -n 1 | \
awk '{print $0+1}'
}
alias nextval=nextval
Usage: # nextval tablename
Escaping with \$ I get an the error: awk: backslash not last character on line.
The $ is not inside double quotes, so why bash is replacing it ?
Perhaps the part you really need to change is this
'INSERT INTO \""$1"\"'
to
"INSERT INTO \"$1\""
#konsolebox answered your question but also you could write the function without so many tools and pipes, e.g.:
function nextval () {
awk -v tbl="$1" '
$0 ~ "INSERT INTO \"" tbl "\"" {
split( substr($6,2), a, /,/ )
val = ( ((val == "") || (a[1] > val)) ? a[1] : val)
}
END { print val+1 }
' *.sql
}
It's hard to tell if the above is 100% correct without any sample input or expected output to test it against but it should be close.
I have a file (user.csv)like this
ip,hostname,user,group,encryption,aduser,adattr
want to print all column sort by user,
I tried awk -F ":" '{print|"$3 sort -n"}' user.csv , it doesn't work.
How about just sort.
sort -t, -nk3 user.csv
where
-t, - defines your delimiter as ,.
-n - gives you numerical sort. Added since you added it in your
attempt. If your user field is text only then you dont need it.
-k3 - defines the field (key). user is the third field.
Use awk to put the user ID in front.
Sort
Use sed to remove the duplicate user ID, assuming user IDs do not contain any spaces.
awk -F, '{ print $3, $0 }' user.csv | sort | sed 's/^.* //'
Seeing as that the original question was on how to use awk and every single one of the first 7 answers use sort instead, and that this is the top hit on Google, here is how to use awk.
Sample net.csv file with headers:
ip,hostname,user,group,encryption,aduser,adattr
192.168.0.1,gw,router,router,-,-,-
192.168.0.2,server,admin,admin,-,-,-
192.168.0.3,ws-03,user,user,-,-,-
192.168.0.4,ws-04,user,user,-,-,-
And sort.awk:
#!/usr/bin/awk -f
# usage: ./sort.awk -v f=FIELD FILE
BEGIN {
FS=","
}
# each line
{
a[NR]=$0 ""
s[NR]=$f ""
}
END {
isort(s,a,NR);
for(i=1; i<=NR; i++) print a[i]
}
#insertion sort of A[1..n]
function isort(S, A, n, i, j) {
for( i=2; i<=n; i++) {
hs = S[j=i]
ha = A[j=i]
while (S[j-1] > hs) {
j--;
S[j+1] = S[j]
A[j+1] = A[j]
}
S[j] = hs
A[j] = ha
}
}
To use it:
awk sort.awk f=3 < net.csv # OR
chmod +x sort.awk
./sort.awk f=3 net.csv
You can choose a delimiter, in this case I chose a colon and printed the column number one, sorting by alphabetical order:
awk -F\: '{print $1|"sort -u"}' /etc/passwd
awk -F, '{ print $3, $0 }' user.csv | sort -nk2
and for reverse order
awk -F, '{ print $3, $0 }' user.csv | sort -nrk2
try this -
awk '{print $0|"sort -t',' -nk3 "}' user.csv
OR
sort -t',' -nk3 user.csv
awk -F "," '{print $0}' user.csv | sort -nk3 -t ','
This should work
To exclude the first line (header) from sorting, I split it out into two buffers.
df | awk 'BEGIN{header=""; $body=""} { if(NR==1){header=$0}else{body=body"\n"$0}} END{print header; print body|"sort -nk3"}'
With GNU awk:
awk -F ',' '{ a[$3]=$0 } END{ PROCINFO["sorted_in"]="#ind_str_asc"; for(i in a) print a[i] }' file
See 8.1.6 Using Predefined Array Scanning Orders with gawk for more sorting algorithms.
I'm running Linux (Ubuntu) with mawk:
tmp$ awk -W version
mawk 1.3.4 20200120
Copyright 2008-2019,2020, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan
random-funcs: srandom/random
regex-funcs: internal
compiled limits:
sprintf buffer 8192
maximum-integer 2147483647
mawk (and gawk) has an option to redirect the output of print to a command. From man awk chapter 9. Input and output:
The output of print and printf can be redirected to a file or command by appending > file, >> file or | command to the end of the print statement. Redirection opens file or command only once, subsequent redirections append to the already open stream.
Below you'll find a simplied example how | can be used to pass the wanted records to an external program that makes the hard work. This also nicely encapsulates everything in a single awk file and reduces the command line clutter:
tmp$ cat input.csv
alpha,num
D,4
B,2
A,1
E,5
F,10
C,3
tmp$ cat sort.awk
# print header line
/^alpha,num/ {
print
}
# all other lines are data lines that should be sorted
!/^alpha,num/ {
print | "sort --field-separator=, --key=2 --numeric-sort"
}
tmp$ awk -f sort.awk input.csv
alpha,num
A,1
B,2
C,3
D,4
E,5
F,10
See man sort for the details of the sort options:
-t, --field-separator=SEP
use SEP instead of non-blank to blank transition
-k, --key=KEYDEF
sort via a key; KEYDEF gives location and type
-n, --numeric-sort
compare according to string numerical value