How to count spaces between columns - awk

How can I count the number of spaces (16) between S1, and // in the following line:
S1, // name

One way:
awk -F '//' '{ n = gsub(/ /, "", $1); print n }'
echo 'S1, // name' | awk -F '//' '{ n = gsub(/ /, "", $1); print n }'

If you really want awk then you can build on the following.
$ echo "S1, // name" | awk '{x=gsub(/ /," ",$0); print x}'
gsub returns the number of replacements made. Obviously this regex will also find and count other spaces but you get the point.
Or try something like this:
echo "S1, // name" |
awk -F[,/] ' { for (i=1;i<=NF;i++) print "$"i " is \""$i"\" of length, " length($i);}'
$ echo "S1, // name" | awk -F[,/] ' { for (i=1;i<=NF;i++) print "$"i " is \""$i"\" of length, " length($i);}'
$1 is "S1" of length, 2
$2 is " " of length, 16
$3 is "" of length, 0
$4 is " name" of length, 5

Count all spaces between S1, and // only with awk:
$ echo 'S1, // name' | awk -F'[,/]' '{print length($2)}'
Or a method based off fedorqui comment:
$ echo 'S1, // name' | grep -Po '(?<=S1,) *(?=//)' | wc -L

Pure bash
x='S1, // name'
echo ${#x}


How to print symbols instead of numbers using awk in bash

I have input file
If I use echo "$file" | awk -F';' '{a[$1]+= 1} END{for(i in a){printf "%-5s: %s\n", i, a[i]}}' | sort
My output will be
AD : 3
EL : 2
GR : 1
But I want to get something like this and I have no idea how to do it
AD : ###
EL : ##
GR : #
Could anyone help me?
A little function
awk -F';' '
{a[$1] += 1}
function repeat(char, num, s) {
s = sprintf("%*s", num, "")
gsub(/ /, char, s)
return s
for (i in a)
printf "%-5s: %s\n", i, repeat("#", a[i])
' file | sort
AD : ###
EL : ##
GR : #
yet another awk
$ awk -F';' 'function repeat(n,c)
{return (n<=0)?"":(c repeat(n-1,c))}
END {for(k in a) printf "%-5s: %s\n",k,repeat(a[k],"#")}' file | sort
AD : ###
EL : ##
GR : #
or with memoization
awk -F';' 'function repeat(n,c)
{return (n<=0)?"":(c memoize(n-1,c))}
function memoize(n,c)
{if(!(n in mem)) mem[n]=repeat(n,c); return mem[n]}
END {for(k in a) printf "%-5s: %s\n",k,memoize(a[k],"#")}'
with the cost of additional complexity, this should be much faster for large counts
awk -F';' 'function repeat(n,c, _t_)
{if(n<=0) return "";
else if(n%2) return c memoize(n-1,c);
else {_t_=memoize(n/2,c); return _t_ _t_}}
function memoize(n,c)
{if(!(n in mem)) mem[n]=repeat(n,c); return mem[n]}
END {for(k in a) printf "%-5s: %s\n",k,memoize(a[k],"#")}'
echo "$file" | awk -F';' '{a[$1]+= 1} END{ for(i in a){ printf "%s : ",i;for (j=1;j<=a[i];j++) { printf "%s","#" } printf "\n", i, a[i]}}' | sort
Print the index of the array and then process another loop from 1 to the value of the index, printing "#"
Using GNU awk and utilising array sorting to negate the need to pipe through to sort
echo "$file" | awk -F';' '{a[$1]+= 1} END{ PROCINFO["sorted_in"]="#val_num_desc";for(i in a){ printf "%s : ",i;for (j=1;j<=a[i];j++) { printf "%s","#" } printf "\n", i, a[i]}}'
This is similar to #glennjackman, but the field-width modifier is used to build a string of spaces for output (b) and then converting the spaces to '#' with gsub() rather than using repeat(). You can do:
awk -F';' '
{ a[$1] += 1 }
for (i in a) {
b = sprintf ("%*s",a[i]," ")
gsub (/ /,"#",b)
printf "%-5s: %s\n", i, b
' file | sort
Example Use/Output
$ awk -F';' '
> { a[$1] += 1 }
> END {
> for (i in a) {
> b = sprintf ("%*s",a[i]," ")
> gsub (/ /,"#",b)
> printf "%-5s: %s\n", i, b
> }
> }
> ' file | sort
AD : ###
EL : ##
GR : #
They do close to the same thing (but if I'd snapped to the use of repeat() earlier, I'd probably have gone that route :). Let me know if you have questions.
With GNU awk for gensub():
$ cut -d';' -f1 file | sort | uniq -c |
awk '{printf "%-5s: %s\n", $2, gensub(/ /,"#","g",sprintf("%*s",$1,""))}'
AD : ###
EL : ##
GR : #
or with any awk:
$ cut -d';' -f1 file | sort | uniq -c |
awk '{str=sprintf("%*s",$1,""); gsub(/ /,"#",str); printf "%-5s: %s\n", $2, str}'
AD : ###
EL : ##
GR : #
I want to propose little change to original code in order to get desired result - simply append # in place of keeping count. Let file.txt content be
awk 'BEGIN{FS=";"}{a[$1]=a[$1] "#"}END{for(i in a){printf "%-5s: %s\n", i, a[i]}}' file.txt
EL : ##
AD : ###
GR : #
For simplicity sake I left echo and sort parts as these should remain unchanged.
(tested in gawk 4.2.1)

AWK: How to number auto-increment?

I have a file.file content is:
I use awk command:
awk -F"|" 'NR==1{print $1};FNR==2{print $2,$3}' testfile
Get the following result:
20210126000000000000000000002207 1220210126080109
I want the number to auto-increase:
awk -F"|" 'NR==1{print $1+1};FNR==2{print $2+1,$3+1}' testfile
But get follow result:
20210126000000000944237587726336 1220210126080110
have question:
I want to the numer is auto-increase: hope the result is:
How to auto_increase?
You may try this gnu awk command:
awk -M 'BEGIN {FS=OFS="|"} NR == 1 {hdr = $1; next} NF>2 {print ++hdr; print $2, $3; print "-------------------"}' file
A more readable version:
awk -M 'BEGIN {
NR == 1 {
hdr = $1
NF > 2 {
print ++hdr
print $2, $3
print "-------------------"
}' file
Here is a POSIX awk solution that doesn't need -M:
awk 'BEGIN {FS=OFS="|"} NR == 1 {hdr = $1; next} NF>2 {"echo " hdr " + 1 | bc" | getline hdr; print hdr; print $2, $3; print "-------------------"}' file
Anubhava has the best solution but for older versions of GNU awk that don't support -M (big numbers) you can try the following:
awk -F\| 'NR==1 { print $1;hed=$1;hed1=substr($1,(length($1)-1));next; } !/^$/ {print $2" "$3 } /^$/ { print "--------------------------------------------------";printf "%s%s\n",substr(hed,1,((length(hed))-(length(hed1)+1))),++hed1 }' testfile
awk -F\| 'NR==1 { # Set field delimiter to | and process the first line
print $1; # Print the first field
hed=$1; # Set the variable hed to the first field
hed1=substr($1,(length($1)-1)); # Set a counter variable hed1 to the last digit in hed ($1)
!/^$/ {
print $2" "$3 # Where there is no blank line, print the second field, a space and the third field
/^$/ {
print "--------------------------------------------------"; # Where there is a blank field, process
printf "%s%s\n",substr(hed,1,((length(hed))-(length(hed1)+1))),++hed1 # print the header extract before the counter, followed by the incremented counter
}' testfile

Awk column with pattern array

Is it possible to do this but use an actual array of strings where it says "array"
awk -F "," '{ if ( $5!="array" ) { print $0; } }' file
I would like to use spaces in some of the strings in my array.
I would also like to be able to match partial matches, so "snow" in my array would match "snowman"
It should be case sensitive.
Example csv
1,african elephant,gd
A,African Elephant,33
8,indian elephant,3k
Example array
african elephant
Expected output
1,african elephant,gd
Cyrus posted this which works well, but it doesn't allow spaces in the array strings and wont match partial matches.
echo "${array[#]}" | awk 'FNR==NR{len=split($0,a," "); next} {for(i=1;i<=len;i++) {if(a[i]==$2){next}} print}' FS=',' - file
The brief approach using a single regexp for all array contents:
$ array=('snow' 'dog' 'african elephant')
$ printf '%s\n' "${array[#]}" | awk -F, 'NR==FNR{r=r s $0; s="|"; next} $2~r' - example.csv
1,african elephant,gd
Or if you prefer string comparisons:
$ cat
#!/bin/env bash
array=('snow' 'dog' 'african elephant')
printf '%s\n' "${array[#]}" |
awk -F',' '
for (val in array) {
if ( index($2,val) ) { # or $2 ~ val for a regexp match
' - example.csv
$ ./
1,african elephant,gd
This prints no line from csv file which contains an element from array in column 5:
echo "${array[#]}" | awk 'FNR==NR{len=split($0,a," "); next} {for(i=1;i<=len;i++) {if(a[i]==$5){next}} print}' FS=',' - file

Check for multi-line content in a file

I'm trying to check if a multi-line string exists in a file using common bash commands (grep, awk, ...).
I want to have a file with a few lines, plain lines, not patterns, that should exists in another file and create a command (sequence) that checks if it does. If grep could accept arbitrary multiline patterns, I'd do it with something similar to
grep "`cat contentfile`" targetfile
As with grep I'd like to be able to check the exit code from the command. I'm not really interested in the output. Actually no output would be preferred since then I don't have to pipe to /dev/null.
I've searched for hints, but can't come up with a search that gives any good hits. There's How can I search for a multiline pattern in a file?, but that is about pattern matching.
I've found pcre2grep, but need to use "standard" *nix tools.
line 3
line 4
line 5
line 1
line 2
line 3
line 4
line 5
line 6
This should match and return 0 since the sequence of lines in the content file is found (in the exact same order) in the target file.
EDIT: Sorry for not being clear about the "pattern" vs. "string" comparison and the "output" vs. "exit code" in the previous versions of this question.
You didn't say if you wanted a regexp match or string match and we can't tell since you named your search file "patternfile" and a "pattern" could mean anything and at one point you imply you want to do a string match (check if a multi-line _string_ exists) but then you're using grep and pcregpre with no stated args for string rather than regexp matches.
In any case, these will do whatever it is you want using any awk (which includes POSIX standard awk and you said you wanted to use standard UNIX tools) in any shell on every UNIX box:
For a regexp match:
$ cat tst.awk
NR==FNR { pat = pat $0 ORS; next }
{ tgt = tgt $0 ORS }
while ( match(tgt,pat) ) {
printf "%s", substr(tgt,RSTART,RLENGTH)
tgt = substr(tgt,RSTART+RLENGTH)
$ awk -f tst.awk patternfile targetfile
line 3
line 4
line 5
For a string match:
$ cat tst.awk
NR==FNR { pat = pat $0 ORS; next }
{ tgt = tgt $0 ORS }
lgth = length(pat)
while ( beg = index(tgt,pat) ) {
printf "%s", substr(tgt,beg,lgth)
tgt = substr(tgt,beg+lgth)
$ awk -f tst.awk patternfile targetfile
line 3
line 4
line 5
Having said that, with GNU awk you could do the following if you're OK with a regexp match and backslash interpretation of the patternfile contents (so \t is treated as a literal tab):
$ awk -v RS="$(cat patternfile)" 'RT!=""{print RT}' targetfile
line 3
line 4
line 5
or with GNU grep:
$ grep -zo "$(cat patternfile)" targetfile | tr '\0' '\n'
line 3
line 4
line 5
There are many other options depending on what kind of match you're really trying to do and which tools versions you have available.
EDIT: Since OP needs outcome of command in form of true or false(yes or no), so edited command in that manner now(created and tested in GNU awk).
awk -v message="yes" 'FNR==NR{a[$0];next} ($0 in a){if((FNR-1)==prev){b[++k]=$0} else {delete b;k=""}} {prev=FNR}; END{if(length(b)>0){print message}}' patternfile targetfile
Could you please try following, tested with given samples and it should print all continuous lines from pattern file if they are coming in same order in target file(count should be at least 2 for continuous lines in this code).
awk '
($0 in a){
delete b
print b[j]
}' patternfile targetfile
Explanation: Adding explanation for above code here.
awk ' ##Starting awk program here.
FNR==NR{ ##FNR==NR will be TRUE when first Input_file is being read.
a[$0] ##Creating an array a with index $0.
next ##next will skip all further statements from here.
($0 in a){ ##Statements from here will will be executed when 2nd Input_file is being read, checking if current line is present in array a.
if((FNR-1)==prev){ ##Checking condition if prev variable is equal to FNR-1 value then do following.
b[++k]=$0 ##Creating an array named b whose index is variable k whose value is increment by 1 each time it comes here.
else{ ##Mentioning else condition here.
delete b ##Deleting array b here.
k="" ##Nullifying k here.
prev=FNR ##Setting prev value as FNR value here.
END{ ##Starting END section of this awk program here.
for(j=1;j<=k;j++){ ##Starting a for loop here.
print b[j] ##Printing value of array b whose index is variable j here.
}' patternfile targetfile ##mentioning Input_file names here.
another solution in awk:
echo $(awk 'FNR==NR{ a[$0]; next}{ x=($0 in a)?x+1:0 }x==length(a){ print "OK" }' patternfile targetfile )
This returns "OK" if there is a match.
a one-liner:
$ if [ $(diff --left-column -y patternfile targetfile | grep '(' -A1 -B1 | tail -n +2 | head -n -1 | wc -l) == $(cat patternfile | wc -l) ]; then echo "ok"; else echo "error"; fi
first is to compare the two files using diff:
diff --left-column -y patternfile targetfile
> line 1
> line 2
line 3 (
line 4 (
line 5 (
> line 6
then filter to show only interesting lines, which are the lines the '(', plus extra 1-line before, and after match, to check if lines in patternfile match without a break.
diff --left-column -y patternfile targetfile | grep '(' -A1 -B1
> line 2
line 3 (
line 4 (
line 5 (
> line 6
Then leave out the first, and last line:
diff --left-column -y patternfile targetfile | grep '(' -A1 -B1 | tail -n +2 | head -n -1
line 3 (
line 4 (
line 5 (
add some code to check if the number of lines match the number of lines in the patternfile:
if [ $(diff --left-column -y patternfile targetfile | grep '(' -A1 -B1 | tail -n +2 | head -n -1 | grep '(' | wc -l) == $(cat patternfile | wc -l) ]; then echo "ok"; else echo "error"; fi
to use this with a return-code, a script could be created like this:
if [ $(diff --left-column -y $patternfile $targetfile | grep '(' -A1 -B1 | tail -n +2 | head -n -1 | grep '(' | wc -l) == $(cat $patternfile | wc -l) ];
exit 0;
exit 1;
The test (when above script is named comparepatterns):
$ comparepatterns patternfile targgetfile
echo $?
The easiest way to do this is to use a sliding window. First you read the pattern file, followed by file to search.
(FNR==NR) { a[FNR]=$0; n=FNR; next }
{ b[FNR]=$0 }
(FNR >= n) { for(i=1; i<=n;++i) if (a[i] != b[FNR-n+i]) { delete b[FNR-n+1]; next}}
{ print "match at", FNR-n+1}
{ r=1}
END{ exit !r}
which you call as
awk -f script.awk patternFile searchFile
Following up on a comment from Cyrus, who pointed to How to know if a text file is a subset of another, the following Python one-liner does the trick
python -c "content=open('content').read(); target=open('target').read(); exit(0 if content in target else 1);"
Unless you're talking about 10 GB+, here's an awk-based solution that's fast and clean :
mawk '{ exit NF==NR }' RS='^$' FS="${multiline_pattern}"
The pattern exists only in the file "${m2p}"
which is embedded within multi-file pipeline of 1st test,
but not 2nd one
This solution, for now, doesn't auto handle instances where regex meta-character escaping is needed. Alter it as you see fit.
Unless the pattern occurs far too often, it might even save time to do it all at once instead of having to check line-by-line, including saving lines along the way in some temp pattern space.
NR is always 1 there since RS is forced to the tail end of the input. NF is larger than 1 only when the pattern is found. By evaluating exit NF == NR, it inverts the match, thus matching structure of posix exit codes.
% echo; ( time ( \
echo "\n\n multi-line-pattern :: \n\n " \
"-------------\n${multiline_pattern}\n" \
" -----------\n\n " \
"$( nice gcat "${m2m}" "${m3m}" "${m3l}" "${m2p}" \
"${m3r}" "${m3supp}" "${m3t}" | pvE0 \
| mawk2 '{ exit NF == NR
}' RS = '^$' \
FS = "${multiline_pattern}" \
) exit code : ${?} " ) ) | ecp
in0: 3.10GiB 0:00:01 [2.89GiB/s] [2.89GiB/s] [ <=> ]
( echo ; ) 0.77s user 1.74s system 110% cpu 2.281 total
multi-line-pattern ::
exit code : 0
% echo; ( time ( \
echo "\n\n multi-line-pattern :: \n\n " \
"-------------\n${multiline_pattern}\n" \
" -----------\n\n " \
"$( nice gcat "${m2m}" "${m3m}" "${m3l}" \
"${m3r}" "${m3supp}" "${m3t}" | pvE0 \
| mawk2 '{ exit NF == NR
}' RS = '^$' \
FS = "${multiline_pattern}" \
) exit code : ${?} " ) ) | ecp
in0: 2.95GiB 0:00:01 [2.92GiB/s] [2.92GiB/s] [ <=> ]
( echo ; ) 0.64s user 1.65s system 110% cpu 2.074 total
multi-line-pattern ::
exit code : 1
If your pattern is the full file, then something like this - even when using the full file as a single gigantic 153 MB pattern, it finished in less than 2.4 secs against ~3 GB input.
( time ( nice gcat "${m2m}" "${m3m}" "${m3l}" "${m3r}" "${m3supp}" "${m3t}" | pvE0 \
| mawk2 -v pattern_file="${m2p}" '
RS = "^$"
getline FS < pattern_file
} END {
exit NF == NR }' ; echo "\n\n exit code :: $?\n\n" ))|ecp;
du -csh "${m2p}" ;
( time ( nice gcat "${m2m}" "${m3m}" "${m3l}" \
"${m2p}" "${m3r}" "${m3supp}" "${m3t}" | pvE0 \
| mawk2 -v pattern_file="${m2p}" '
RS = "^$"
getline FS < pattern_file
} END {
exit NF == NR }' ; echo "\n\n exit code :: $?\n\n" ))|ecp;
in0: 2.95GiB 0:00:01 [2.58GiB/s] [2.58GiB/s] [ <=> ]
( nice gcat "${m2m}" "${m3m}" "${m3l}" "${m3r}" "${m3supp}" "${m3t}" | pvE 0.)
0.82s user 1.71s system 111% cpu 2.260 total
exit code :: 1
153M /Users/************/m2map_main.txt
153M total
in0: 3.10GiB 0:00:01 [2.56GiB/s] [2.56GiB/s] [ <=> ]
( nice gcat "${m2m}" "${m3m}" "${m3l}" "${m2p}" "${m3r}" "${m3supp}" "${m3t}")
0.83s user 1.79s system 112% cpu 2.339 total
exit code :: 0
Found a portable solution using patch command. The idea is to create a diff/patch in remove direction and check if it could be applied to the source file. Sadly there is no option for a dry-run (in my old patch version). So we've to do the patch and remove the temporary files.
The shell part around is optimized for my ksh usage:
file_in_file() {
typeset -r vtmp=/tmp/${}.$$.tmp
typeset -r vbasefile=$1
typeset -r vcheckfile=$2
typeset -ir vlines=$(wc -l < "$vcheckfile")
{ echo "1,${vlines}d0"; sed 's/^/< /' "$vcheckfile"; } |
patch -fns -F0 -o "$vtmp" "$vbasefile" >/dev/null 2>&1
typeset -ir vrc=$?
rm -f "$vtmp"*
return $vrc
set variables for local usage (on newer bash you should use declare instead)
count lines of input file
create a patch/diff file in-memory (the line with the curly brackets)
use patch with strict settings patch -F0
cleanup (also eventually created reject files: rm -f "$vtmp"*)
return RC of patch

Extra newline coming from somewhere

Can someone explain what I'm doing wrong and how to do it better.
I have a file consisting of records with field separator "-" and record separator "\t" (tab). I want to put each record on a line, followed by the line number, separated by a tab. The input file is called foo.txt.
$ cat foo.txt
a-b-c e-f-g x-y-z
$ < foo.txt tr -cd "\t" | wc -c
$ wc foo.txt
1 3 18 foo.txt
My awk script is in the file foo.awk
BEGIN { RS = "\t" ; FS = "-" ; OFS = "\t" }
print $1 "-" $2 "-" $3, NR
And here is what I get when I run it:
$ gawk -f foo.awk foo.txt
a-b-c 1
e-f-g 2
The last record is directly followed by a newline, a tab, and the last number. What is going on?
well I don't know your exact goal, but since you have built the thing with awk, you can just add \n to FS to reach your goal to remove the trailing \n and without starting another process, like tr, sed or awk
BEGIN { RS = "\t" ; FS = "-|\n" ; OFS = "\t" }
There is an newline character at the end of your data that is also output when printing $3.
In particular, it looks like this:
$1 = "x"
$2 = "y"
$3 = "z\n"
You can remove the trailing separator with tr before passing everything to awk:
tr -d '\n' < foo.txt | awk -f foo.awk
or alternatively add \n to the list of field separators (as shown in the answer by Kent), since awk will strip any separators from the fields.
awk 'BEGIN { RS = "\t"; FS = OFS = "-" } { sub(/\n/, ""); print $0 "\t" NR }' file
a-b-c 1
e-f-g 2
x-y-z 3
ORS = "\n" was not necessary.
And with GNU Awk or Mawk, you can just have RS = "[\t\n]+":
awk 'BEGIN { RS = "[\t\n]+"; FS = OFS = "-" } { print $0 "\t" NR }' file