Best Awk Commands

Best Awk Commands - awk

I find AWK really useful. Here is a one liner I put together to manipulate data.
ls | awk '{ print "awk " "'"'"'" " {print $1,$2,$3} " "'"'"'" " " $1 ".old_ext > " $1 ".new_ext" }' > file.csh
I used this AWK to make a script file that would rename some files and only print out selective columns. Anyone know a better way to do this? What are you best AWK one liners or clever manipulations?

The AWK book is full of great examples. They used to be collected for download from Kernighan's webpage (404s now).

You can find several nice one liners here.

I use this:
df -m | awk '{p+=$3}; END {print p}'
To total all disk space used on a system across filesystems.

Many years ago I wrote a tail script in awk:
#!/usr/bin/awk -f
BEGIN {
lines=10
}
{
high = NR % lines + 1
a[high] = $0
}
END {
for (i = 0; i < lines; i++) {
n = (i + high) % lines + 1
if (n in a) {
print a[n]
}
}
}
It's silly, I know, but that's what awk does to you. It's just very fun playing with it.

Henry Spencer wrote a fairly good implementation of nroff on awk. He called it "awf". He also claimed that if Larry Wall had known how powerful awk was, he wouldn't have needed to invent perl.

Here's a couple of awks that I used to use regularly ... note that you can use $1, $2, etc to get out the column you want. So, for manipulating a bunch of files, for example here's a stupid command you could use instead of mv ...
ls -1 *.mp3 | awk '{printf("mv %s newDir/%s\n",$1,$1)}' | /bin/sh
Or if you're looking at a set of processes maybe ...
ps -ef | grep -v username | awk '{printf("kill -9 %s\n",$2)}' | /bin/sh
Pretty trivial but you can see how that would get you quite a ways. =) Most of the stuff I used to do you can use xargs for, but hey, who needs them new fangled commands?

I use this script a lot for editing PATH and path-like environment variables.
Usage:
export PATH=$(clnpath /new/bin:/other/bin:$PATH /old/bin:/other/old/bin)
This command adds /new/bin and /other/bin in front of PATH, removes both /old/bin and /other/old/bin from PATH (if present - no error if absent), and removes duplicate directory entries on path.
: "#(#)$Id: clnpath.sh,v 1.6 1999/06/08 23:34:07 jleffler Exp $"
#
# Print minimal version of $PATH, possibly removing some items
case $# in
0) chop=""; path=${PATH:?};;
1) chop=""; path=$1;;
2) chop=$2; path=$1;;
*) echo "Usage: `basename $0 .sh` [$PATH [remove:list]]" >&2
exit 1;;
esac
# Beware of the quotes in the assignment to chop!
echo "$path" |
${AWK:-awk} -F: '#
BEGIN { # Sort out which path components to omit
chop="'"$chop"'";
if (chop != "") nr = split(chop, remove); else nr = 0;
for (i = 1; i <= nr; i++)
omit[remove[i]] = 1;
}
{
for (i = 1; i <= NF; i++)
{
x=$i;
if (x == "") x = ".";
if (omit[x] == 0 && path[x]++ == 0)
{
output = output pad x;
pad = ":";
}
}
print output;
}'

Count memory used by httpd
ps -ylC httpd | awk '/[0-9]/ {SUM += $8} END {print SUM/1024}'
Or any other process by replacing httpd. Dividing by 1024 to get output in MB.

I managed to build a DOS tree command emulator for UNIX ( find + awk ):
find . -type d -print 2>/dev/null|awk '{for (i=1;i< NF;i++)printf("%"length($i)"s","|");gsub(/[^\/]*\//,"--",$0);print $NF}' FS='/'

Print lines between two patterns:
awk '/END/{flag=0}flag;/START/{flag=1}' inputFile
Detailed explanation: http://nixtip.wordpress.com/2010/10/12/print-lines-between-two-patterns-the-awk-way/

A couple of favorites, essentially unrelated to each other. Read as 2 different, unconnected suggestions.
Identifying Column Numbers Easily
:
For those that use awk frequently, as I do for log analysis at work, I often find myself needing to find out what the column numbers are for a file. So, if I am analyzing, say, Apache access files (some samples can be found here) I run the script below against the file:
NR == 1 {
for (i = 1 ; i <= NF ; i++)
{
print i "\t" $i
}
}
NR > 1 {
exit
}
I typically call it "cn.awk", for 'c'olumn 'n'umbers. Creative, eh? Anyway, the output looks like:
1 64.242.88.10
2 -
3 -
4 [07/Mar/2004:16:05:49
5 -0800]
6 "GET
7 /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables
8 HTTP/1.1"
9 401
10 12846
Very easy to tell what's what. I usually alias this on my servers and have it everywhere.
Referencing Fields by Name
Now, suppose your file has a header row and you'd rather use those names instead of field numbers. This allows you to do so:
NR == 1 {
for (i = 1 ; i <= NF ; i++)
{
field[$i] = i
}
}
Now, suppose I have this header row...
metric,time,val,location,http_status,http_request
...and I'd like to sum the val column. Instead of referring to $3, I can refer to it by name:
NR > 1 {
SUM += $field["val"]
}
The main benefit is making the script much more readable.

Printing fields is one of the first things mentioned in most AWK tutorials.
awk '{print $1,$3}' file
Lesser known but equally useful is excluding fields which is also possible:
awk '{$1=$3=""}1' file

Related

AWK FPAT not working as expected for string parsing

I have to parse a very large length string (from stdin). It is basically a .sql file. I have to get data from it. I am working to parse the data so that I can convert it into csv. For this, I am using awk. For my case, A sample snippet (of two records) is as follows:
b="(abc#xyz.com,www.example.com,'field2,(2)'),(dfr#xyz.com,www.example.com,'field0'),"
echo $b|awk 'BEGIN {FPAT = "([^\\)]+)|('\''[^'\'']+'\'')"}{print $1}'
In my regex, I am saying that split on ")" bracket or if single quotes are found then ignore all text until last quote is found. But my output is as follows:
(abc#xyz.com,www.example.com,'field2,(2
I am expecting this output
(abc#xyz.com,www.example.com,'field2,(2)'
Where is the problem in my code. I am search a lot and check awk manual for this but not successful.

My first answer below was wrong, there is an ERE for what you're trying to do:
$ echo "$b" | awk -v FPAT="[(]([^)]|'[^']*')*)" '{for (i=1; i<=NF; i++) print $i}'
(abc#xyz.com,www.example.com,'field2,(2)')
(dfr#xyz.com,www.example.com,'field0')
Original answer, left as a different approach:
You need a 2-pass approach first to replace all )s within quoted fields with something that can't already exist in the input (e.g. RS) and then to identify the (...) fields and put the RSs back to )s before printing them:
$ echo "$b" |
awk -F"'" -v OFS= '
{
for (i=2; i<=NF; i+=2) {
gsub(/)/,RS,$i)
$i = FS $i FS
}
FPAT = "[(][^)]*)"
$0 = $0
for (i=1; i<=NF; i++) {
gsub(RS,")",$i)
print $i
}
FS = FS
}
'
(abc#xyz.com,www.example.com,'field2,(2)')
(dfr#xyz.com,www.example.com,'field0')
The above is gawk-only due to FPAT (or we could have used gawk patsplit()), with other awks you'd used a while-match()-substr() loop:
$ echo "$b" |
awk -F"'" -v OFS= '
{
for (i=2; i<=NF; i+=2) {
gsub(/)/,RS,$i)
$i = FS $i FS
}
while ( match($0,/[(][^)]*)/) ) {
field = substr($0,RSTART,RLENGTH)
gsub(RS,")",field)
print field
$0 = substr($0,RSTART+RLENGTH)
}
}
'
(abc#xyz.com,www.example.com,'field2,(2)')
(dfr#xyz.com,www.example.com,'field0')

Written and tested with your shown samples in GNU awk. This could be done in simple field separator setting, try following once, where b is your shell variable which has your shown value in it.
echo "$b" | awk -F'\\),\\(' '{print $1}'
(abc#xyz.com,www.example.com,'field2,(2)'
Explanation: Simply setting field separator of awk program to \\),\\( for your input and printing first field of it.

Similar regex approach as Ed has suggested but I usually prefer using RS and RT over FPAT:
b="(abc#xyz.com,www.example.com,'field2,(2)'),(dfr#xyz.com,www.example.com,'field0'),"
awk -v RS="[(]('[^']*'|[^)])*[)]" 'RT {print RT}' <<< "$b"
(abc#xyz.com,www.example.com,'field2,(2)')
(dfr#xyz.com,www.example.com,'field0')

if you wanna do it close to one pass, maybe try this
{mawk/mawk2/gawk} 'BEGIN { OFS = FS = "\047"; ORS = RS = "\n";
XFS = "\376\004\377";
XRS = "\051" ORS;
} ! /[\051]/ { print; next; } { for (x=1; x <= NF; x += 2) {
gsub(/[\051][^\050]*/, XFS, $(x)); } } gsub(XFS, XRS) || 1'
I did it this way with 2 gsubs just in case it starts sending rows below with unintended consequences. \051 = ")", \050 is the open one.
further enhanced it by telling it to instantly print and move on if no close brackets are even found (so nothing to split at all)
It only loops over odd-numbered fields once i split it by the single quote \047 (cuz even numbered ones are precisely the ones within a pair of single quotes you want to avoid chopping at).
As for XFS, just pick any combination of your choice using bytes that are almost impossible to encounter. If you want to play it safe, you can test for whether XFS exists in that row, and use some alternative combo. It's basically to insert a delimiter into the middle of the row that wouldn't run afoul with actual input data. It's not fool proof per se, but the likelihood of running into a combination of UTF16 Byte order mark and ASCII control characters is reasonably low.
(and if you encounter XFS, it's likely you already have corrupted data to begin with, since a 300 series octal must be followed by 200 series ones to be valid UTF8)
This way, i wouldn't need FPAT at all.
*updated with " || 1" towards the end as a safety catch-all, but shouldn't really be needed.

AWK:Convert columns to rows with condition (create list ) [duplicate]

I have a tab-delimited file with three columns (excerpt):
AC147602.5_FG004 IPR000146 Fructose-1,6-bisphosphatase class 1/Sedoheputulose-1,7-bisphosphatase
AC147602.5_FG004 IPR023079 Sedoheptulose-1,7-bisphosphatase
AC148152.3_FG001 IPR002110 Ankyrin repeat
AC148152.3_FG001 IPR026961 PGG domain
and I'd like to get this using bash:
AC147602.5_FG004 IPR000146 Fructose-1,6-bisphosphatase class 1/Sedoheputulose-1,7-bisphosphatase IPR023079 Sedoheptulose-1,7-bisphosphatase
AC148152.3_FG001 IPR023079 Sedoheptulose-1,7-bisphosphatase IPR002110 Ankyrin repeat IPR026961 PGG domain
So if ID in the first column are the same in several lines, it should produce one line for each ID with all other parts of lines joined. In the example it will give two-row file.

give this one-liner a try:
awk -F'\t' -v OFS='\t' '{x=$1;$1="";a[x]=a[x]$0}END{for(x in a)print x,a[x]}' file

For whatever reason, the awk solution does not work for me in cygwin. So I used Perl instead. It joins around a tab character and separates line by \n
cat FILENAME | perl -e 'foreach $Line (<STDIN>) { #Cols=($Line=~/^\s*(\d+)\s*(.*?)\s*$/); push(#{$Link{$Cols[0]}}, $Cols[1]); } foreach $List (values %Link) { print join("\t", #{$List})."\n"; }'

will depend off file size (and awk limitation)
if too big this will reduce the awk need by sorting file first and only keep 1 label in memory for printing
A classical version with post print using a modification of the whole line
sort YourFile \
| awk '
last==$1 { sub( /^[^[:blank:]]*[[:blank:]]+/, ""); C = C " " $0; next}
NR > 1 { print Last C; Last = $1; C = ""}
END { print Last}
'
Another version using field and pre-print but less "human readable"
sort YourFile \
| awk '
last!=$1 {printf( "%s%s", (! NR ? "\n" : ""), Last=$1)}
last==$1 {for( i=2;i<NF;i++) printf( " %s", $i)}
'

A pure bash version. It has no additional dependencies, but requires bash 4.0 or above (2009) for associative array support.
All on one line:
{ declare -A merged; merged=(); while IFS=$'\t' read -r key value; do merged[$key]="${merged[$key]}"$'\t'"$value"; done; for key in "${!merged[#]}"; do echo "$key${merged[$key]}"; done } < INPUT_FILE.tsv
Readable and commented equivalent:
{
# Define `merged` as an empty associative array.
declare -A merged
merged=()
# Read tab-separated lines. Any leftover fields also end up in `value`.
while IFS=$'\t' read -r key value
do
# Append to any value that's already there, separated by a tab.
merged[$key]="${merged[$key]}"$'\t'"$value"
done
# Loop over the input keys. Note that the order is arbitrary;
# pipe through `sort` if you want a predictable order.
for key in "${!merged[#]}"
do
# Each value is prefixed with a tab, so no need for a tab here.
echo "$key${merged[$key]}"
done
} < INPUT_FILE.tsv

Grep that tolerates mismatches to subset .fastq

I am working with bash on a linux cluster. I am trying to extract reads from a .fastq file if they contain a match to a queried sequence. Below is an example .fastq file containing three reads.
$ cat example.fastq
#SRR1111111.1 1/1
CTGGANAAGTGAAATAATATAAATTTTTCCACTATTGAATAAAAGCAACTTAAATTTTCTAAGTCG
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEA<AAEEEEE<6
#SRR1111111.2 2/1
CTATANTATTCTATATTTATTCTAGATAAAAGCATTCTATATTTAGCATATGTCTAGCAAAAAAAA
+
AAAAA#EE6EEEEEEEEEEEEAAEEAEEEEEEEEEEEE/EAE/EAE/EA/EAEAAAE//EEAEAA6
#SRR1111111.3 3/1
CTATANTATTGAAATAATAATGTAGATAAAACTATTGAATAACAGCAACTTAAATTTTCAATAAGA
+
AAAAA#EE6EEEEEEEEEEEEAAEEAEEEEEEEEEEEE/EAE/EAE/EA/EAEAAAE//EEAEAA6
I would like to extract reads containing the sequence GAAATAATA. I can perform this extraction using grep as shown in the following command.
$ grep -F -B 1 -A 2 "GAAATAATA" example.fastq > MATCH.fastq
$ cat MATCH.fastq
#SRR1111111.1 1/1
CTGGANAAGTGAAATAATATAAATTTTTCCACTATTGAATAAAAGCAACTTAAATTTTCTAAGTCG
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEA<AAEEEEE<6
#SRR1111111.3 3/1
CTATANTATTGAAATAATAATGTAGATAAAACTATTGAATAACAGCAACTTAAATTTTCAATAAGA
+
AAAAA#EE6EEEEEEEEEEEEAAEEAEEEEEEEEEEEE/EAE/EAE/EA/EAEAAAE//EEAEAA6
However, this strategy does not tolerate any mismatches. For example, a read containing the sequence GAAATGATA will be ignored. I need this extraction to tolerate one mismatch at any position in the queried sequence. So my question is how can I achieve this? Is there a sequence alignment package available with similar functionality to grep? Are there any fastq subsetting packages available that perform this type of operation? One note is that speed is very important. Thanks for your guidance.

Here is a solution using agrep to get the record numbers of matches and an awk that prints out those records with some context (due to missing -Aand -B in agrep):
$ agrep -1 -n "GAAATGATA" file |
awk -F: 'NR==FNR{for(i=($1-1);i<=($1+2);i++)a[i];next}FNR in a' - file
Output:
#SRR1111111.1 1/1
CTGGANAAGTGAAATAATATAAATTTTTCCACTATTGAATAAAAGCAACTTAAATTTTCTAAGTCG
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEA<AAEEEEE<6
#SRR1111111.3 3/1
CTATANTATTGAAATAATAATGTAGATAAAACTATTGAATAACAGCAACTTAAATTTTCAATAAGA
+
AAAAA#EE6EEEEEEEEEEEEAAEEAEEEEEEEEEEEE/EAE/EAE/EA/EAEAAAE//EEAEAA6

This should work but idk if the MATCH.fastq in your question is the expected output or not or even if your sample input contains any cases that need a working solution to find so idk if it's actually working or not:
$ cat tst.awk
BEGIN {
for (i=1; i<=length(seq); i++) {
regexp = regexp sep substr(seq,1,i-1) "." substr(seq,i+1)
sep = "|"
}
}
{ rec = rec $0 ORS }
!(NR % 4) {
if (rec ~ regexp) {
printf "%s", rec
}
rec = ""
}
$ awk -v seq='GAAATAATA' -f tst.awk example.fastq
#SRR1111111.1 1/1
CTGGANAAGTGAAATAATATAAATTTTTCCACTATTGAATAAAAGCAACTTAAATTTTCTAAGTCG
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEA<AAEEEEE<6
#SRR1111111.3 3/1
CTATANTATTGAAATAATAATGTAGATAAAACTATTGAATAACAGCAACTTAAATTTTCAATAAGA
+
AAAAA#EE6EEEEEEEEEEEEAAEEAEEEEEEEEEEEE/EAE/EAE/EA/EAEAAAE//EEAEAA6

You might try a file of patterns -
$: cat GAAATAATA
.AAATAATA
G.AATAATA
GA.ATAATA
GAA.TAATA
GAAA.AATA
GAAAT.ATA
GAAATA.TA
GAAATAA.A
GAAATAAT.
then
grep -B 1 -A 2 -f GAAATAATA example.fastq > MATCH.fastq
but it will probably slow the process down a bit to add both full regex parsing AND an alternate pattern for each possible single change...
responding to question in comments:
For a given value of $word, such as word=GAAATAATA,
awk '{
for ( i=1; i<=length($0); i++ ) {
split($0,tmp,""); tmp[i]=".";
for ( n=1; n<=length($0); n++ ) { printf tmp[n]; }
printf "\n";
}
}' <<< "$word" > "$word"
This will create this specific file.
Hope that helps, but remember that this will be a lot slower since you are now using regexes instead of just matching plain strings, AND you are introducing a whole series of alternate patterns to match...

AWK, exclude results from one file with regards to a second file

Using Awk, I am able to get a list of URL with a given error number :
awk '($9 ~ /404/)' /var/log/nginx/access.log | awk '{print $7}' | sort | uniq -c | sort -rn
Fine and dandy.
But we would like to further refine it by matching that result against a list of already know 404 URL
example :
awk '($9 ~ /404/)' /var/log/nginx/access.log | awk '{print $7} '| sort | uniq -c | sort -k 2 -r | awk '{print > "/mnt/tmp/404error.txt"}'
yield today :
1 /going-out/restaurants/the-current-restaurent.htm
1 /going-out/restaurants/mare.HTML
1 /going-out/report-content/?cid=5
1 /going-out/report-content/?cid=38550
1 /going-out/report-content/?cid=380
the day after :
1 /going-out/ru/%d0%bd%d0%be%d1%87%d0%bd%d0%b0%d1%8f-%d0%b6%d0%b8%d0%b7%d0%bd%d1%8c-%d0%bd%d0%b0-%d0%bf%d1%85%d1%83%d0%ba%d0%b5%d1%82%d0%b5/%d1%81%d0%be%d0%b2%d0%b5%d1%82%d1%8b-%d0%bb%d1%8e%d0%b1%d0%b8%d1%82%d0%b5%d0%bb%d1%8f%d0%bc-%d0%bd%d0%be%d1%87%d0%bd%d1%8b%d1%85-%d1%80%d0%b0%d0%b7%d0%b2%d0%bb%d0%b5%d1%87%d0%b5%d0%bd%d0%b8%d0%b9/
1 /going-out/restaurants/the-current-restaurent.htm
1 /going-out/restaurants/mare.HTML
1 /going-out/report-content/?cid=5
1 /going-out/report-content/?cid=38550
1 /going-out/report-content/?cid=380
1 /going-out/report-content/?cid=29968
1 /going-out/report-content/?cid=29823
The goal is to have only the new URL.
At that point I am lost, I know I can push first file into an array, I presume I can do the same with the second file (but in a second array), then maybe (not sure if awk does have the capacity) simply cross them, and kept what does not match.
Any help will be fully appreciate.

So you have a file whose $9 field may match /404/. If so, you want to store the 7th field. Then, count how many of them appeared in total, but just if they did not appear before in a file you have.
I think all of this can be done with this (untested, because I have no sample input data):
awk 'FNR==NR {seen[$2];next}
$9 ~ /404/ {if !($7 in seen) a[$7]++}
END {for (i in a) print a[i], i}' old_file log_file
This stores the 2nd column from the file with data into an array seen[]. Then, goes through the new file and stores the 7th column if it wasn't seen before. Finally, it prints the counters.
Since it looks like you have an old awk version that does not support the syntax index in array, you can use this workaround for it:
$9 ~ /404/ {for (i in seen) {if (i==$7) next} a[$7]++}
Note you must be using a veeery old version, since this was introduced in 1987:
A.1 Major Changes Between V7 and SVR3.1
The awk language evolved considerably between the release of Version 7
Unix (1978) and the new version that was first made generally
available in System V Release 3.1 (1987). This section summarizes the
changes, with cross-references to further details:
The expression ‘indx in array’ outside of for statements (see
Reference to Elements)

You can use grep --fixed-strings --file=FILEALL FILENEW or comm -23 FILENEW FILEALL for this. FILEALL is the file containing the urls already found, FILENEW contains the pages found today. For comm both files must be sorted.
http://www.gnu.org/software/gawk/manual/gawk.html#Other-Inherited-Files
http://linux.die.net/man/1/comm
I think commis more efficient because I uses sorted files, but I did not test this.

I came up with the following :
awk 'BEGIN {
while (getline < "/mnt/tmp/404error.txt") {
A[$1] = $1;
};
while (getline < "/var/log/nginx/access.log") {
if( $9 ~ /404/)
{
{
exist[$7] = $7 ;
}
{
if ($7 in A) blah += 1; else new[$7];
}
}
}
{
asort(exist);
for(i in exist)
print exist[i] > "/mnt/tmp/404error.txt"
}
{
asorti(new);
for(i in new)
print new[i] > "/mnt/tmp/new404error.txt"
}
}
' | mutt -s "subject" -a /mnt/tmp/new404error.txt -- whoever#mail.net, whatever#mail.net
that seems providing me what I want (almost).
But I believe it is verbous too much, might be possible one of you genius can improve it
Thanks

Unable to make a factorial function in AWK

The code
#!/usr/bin/awk
# Sed and AWK by O'Reilly (p.179)
# Example of what should happen: factorial 5 gives
# factorial
# Enter number: 3
# The factorial of 3 is 6
BEGIN {
printf("Enter number: ")
}
$1 ~ /^[0-9]+$/ {
# assign value of $1 to number & fact
number = $1
if (number == 0)
fact = 1
else
fact = number
for (x = 1; x < number; x++)
fact *=x
printf("The factorial of %d is %g\n", number, fact)
exit
}
# if not a number, prompt again.
{ printf("\nInvalid entry. Enter a number: ")
}
I run the command unsuccessfully by
./factorial.awk
I get
/usr/bin/awk: syntax error at source line 1
context is
>>> <<< ./factorial.awk
/usr/bin/awk: bailing out at source line 1
What does the error message mean?

I think that the problem is that you are writing a shell script and passing it to awk for execution. The following is a shell script, hence the #! /bin/sh, so it will be passed to the shell (Bourne-compatible in this case).
#! /bin/sh
awk 'BEGIN { printf("Hello world!\n"); exit }'
The she-bang (#!) line tells the current interpreter which interpreter to pass the script to for execution. You have to pass the script to the awk interpreter so you need to call awk explicitly. This assumes that awk is in your path somewhere.
The following, however, is an awk script.
#! /usr/bin/awk -f
BEGIN {
printf("Hello world!\n");
exit
}
The she-bang invokes awk and passes the script as input. You don't need to explicitly invoke awk in this case and you don't have to quote the entire script since it is passed directly to awk.
Think of the she-bang as saying take what follows the she-bang, append the name of the file, and execute it. Wikipedia describes the usage pretty well including some common ways to solve the path to the interpreter problem.

Possibly a dumb answer but in my terminal I would have to type in:
./factorial.awk
where the file is factorial.awk.
You could edit your path environment variable to include . but ./ should work just fine I think. And adding . to $PATH could prove to be very dangerous in some situations where you would run code that you did not expect to.
Does that work??
EDIT:
./factorial.awk
-bash: ./factorial.awk: /usr/bin/gawk: bad interpreter: No such file or directory
That says that it ran the file but could not find the program gawk.
Please type in 'which gawk' and then 'which awk'.
Is your first line supposed to be:
#!/usr/bin/awk
Also, just to amuse me, type in:
sudo apt-get install gawk
That will make sure you actually have gawk on your system.
EDIT2:
I took a look at your code and this is what I have now. I removed two quotes and a dash.
#!/usr/bin/gawk
# I think we do not need these (p.179) so I comment them out, since I do not know where else to put them.
# The same bug occurs also with them.
#fact = number
#for (x = number -1 ; x > 1; x--)
# fact *= x
awk # factorial: returns factorial of user-supplied number
BEGIN {
printf("Enter number: ")
}
$1 ~ /^[0-9]+$/ {
# assign value of $1 to number & fact
number = $1
if (number == 0)
fact = 1
else
fact = number
#loop to multiply fact*x until x = 1
for (x = number - 1; x > 1; x--)
fact *= x
printf("The factorial of %d is %g\n", number, fact)
#exit -- saves user from typing ^-d
exit
}
# if not a number, prompt again.
{ printf("\nInvalid entry. Enter a number: ")
}

may be it wasn't that complicated.
#!/usr/bin/awk ---------> #!/usr/bin/awk -f

Check whether there is a file /usr/bin/gawk; if not, use either the path of awk or the correct location for gawk.
Also, did you make the script executable?
And also, do you have the current directory in your PATH?

I got the script to work in Ubuntu and OS X by running
awk -f factorial.awk
It seems that you cannot run the script as follows although the book says so
./factorial.awk

Here's a recursive version:
#!/usr/bin/awk -f
function f(x) {
if (x <= 1) return 1
return (f(x-1) *x)}
BEGIN {
printf("Enter number: ")
}
$1 ~ /^[0-9]+$/ {
printf("The factorial of %d is %d\n", $1, f($1))
exit
}
{ printf("\nInvalid entry. Enter a number: ")
}

This question was the top hit on Google for the search phrase "awk factorial", so here's a simple way to print a factorial in awk:
$ awk 'BEGIN{x=1;for(i=2;i<=6;i++)x*=i;print x}'
720
As a shell function (the space after -v is required by nawk which comes with macOS but not by gawk):
$ fac(){ awk -v "n=$1" 'BEGIN{x=1;for(i=2;i<=n;i++)x*=i;print x}';}
$ fac 6
720
As an awk function for calculating k-combinations:
$ awk 'function f(x){r=1;for(i=2;i<=x;i++)r*=i;return r}BEGIN{n=5;k=3;print f(n)/(f(k)*f(n-k))}'
10

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas