awk print without a file - awk

How to print using awk without a file.
script.sh
#!/bin/sh
for i in {2..10};do
awk '{printf("%.2f %.2f\n", '$i', '$i'*(log('$i'/('$i'-1))))}'
done
sh script.sh
Desired output
2 value
3 value
4 value
and so on
value indicates the quantity after computation

BEGIN Block is needed if you are not providing any input to awk either by file or standard input. This block executes at the very start of awk execution even before the first file is opened.
awk 'BEGIN{printf.....
so it is like:
From man page:
Gawk executes AWK programs in the following order. First, all variable assignments specified via the -v option are performed. Next, gawk compiles the program into an internal form. Then, gawk executes the code in the BEGIN block(s) (if any), and then proceeds to read each file named in the ARGV array. If there are no files named on the command line, gawk reads the standard input.
awk structure:
awk 'BEGIN{get initialization data from this block}{execute the logic}' optional_input_file

As PS. correctly pointed out, do use the BEGIN block to print stuff when you don't have a file to read from.
Furthermore, in your case you are looping in Bash and then calling awk on every loop. Instead, loop directly in awk:
$ awk 'BEGIN {for (i=2;i<=10;i++) print i, i*log(i/(i-1))}'
2 1.38629
3 1.2164
4 1.15073
5 1.11572
6 1.09393
7 1.07905
8 1.06825
9 1.06005
10 1.05361
Note I started the loop in 2 because otherwise i=1 would mean log(1/(1-1))=log(1/0)=log(inf).

I would suggest a different approach:
seq 2 10 | awk '{printf("%.2f %.2f\n", $1, $1*(log($1/($1-1))))}'

Related

Why does NR==FNR; {} behave differently when used as NR==FNR{ }?

Hoping someone can help explain the following awk output.
awk --version: GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0)
OS: Linux sub system on Windows; Linux Windows11x64 5.10.102.1-microsoft-standard-WSL2
user experience: n00b
Important: In the two code snippets below, the only difference is the semi colon ( ; ) after NR==FNR in sample # 2.
sample # 1
'awk 'NR==FNR { print $0 }' lines_to_show.txt all_lines.txt
output # 1
2
3
4
5
7
sample # 2
'awk 'NR==FNR; { print $0 }' lines_to_show.txt all_lines.txt
output # 2
2 # why is value in file 'lines_to_show.txt appearing twice?
2
3
3
4
4
5
5
7
7
line -01
line -02
line -03
line -04
line -05
line -06
line -07
line -08
line -09
line -10
Generate the text input files
lines_to_show.txt: echo -e "2\n3\n4\n5\n7" > lines_to_show.txt
all_lines.txt: echo -e "line\t-01\nline\t-02\nline\t-03\nline\t-04\nline\t-05\nline\t-06\nline\t-07\nline\t-08\nline\t-09\nline\t-10" > all_lines.txt
Request/Questions:
If you can please explain why you know the answers to the questions below (experience, tutorial, video, etc..)
How does one read an `awk' program? I was under the impression that a semi colon ( ; ) is only a statement terminator, just like in C. It should not have an impact on the execution of the program.
In output # 2, why are the values in the file 'lines_to_show.txt appearing twice? Seems like awk is printing values from the 1st file "lines_to_show.txt" but printing them 10 times, which is the number of records in the file "all_lines.txt". Is this true? why?
Why in output # 1, only output from "lines_to_show.txt" is displayed? I thought awk will process each record in each file, so I expcted to see 15 lines (10 + 5).
What have I tried so far?
going though https://www.linkedin.com/learning/awk-essential-training/using-awk-command-line-flags?autoSkip=true&autoplay=true&resume=false&u=61697657
modifying the code to see the difference and use that to 'understand' what is going on.
trying to work through the flow using pen and paper
going through https://www.baeldung.com/linux/awk-multiple-input-files --> https://www.baeldung.com/linux/awk-multiple-input-files
awk 'NR==FNR { print $0 }' lines_to_show.txt all_lines.txt
Here you have one pattern-action pair, that is if (total) number of row equals file number of row then print whole line.
awk 'NR==FNR; { print $0 }' lines_to_show.txt all_lines.txt
Here you have two pattern-action pairs, as ; follows condition it is assumed that you want default action which is {print $0}, in other words that is equivalent to
awk 'NR==FNR{print $0}{ print $0}' lines_to_show.txt all_lines.txt
first print $0 is used solely when processing 1st file, 2nd print $0 is used indiscriminately (no condition given), so for lines_to_show.txt both prints are used, for all_lines.txt solely 2nd print.
man awk is the best reference:
An awk program is composed of pairs of the form:
pattern { action }
Either the pattern or the action (including the
enclosing brace characters) can be omitted.
A missing pattern shall match any record of input,
and a missing action shall be equivalent to:
{ print }
; terminates a pattern-action block. So you have two pattern/action blocks, both whose action is to print the line.

Line number of last occurrence of a pattern with awk

I am new to awk commands. I am trying a way to print the last line number for my pattern match.
I need to integrate that awk command in tcl script..
If someone has answer to it, please let me know.
exec awk -v search=$var {$0~search{print NR; exit}} file_name
I am using this to print the line number of first occurrence.
I would harness GNU AWK for this task following, let file.txt content be
12
15
120
150
1200
1500
then
awk '$0~"12"{n=NR}END{print n}' file.txt
output
5
Explanation: I am looking for last line containing 12 somewhere, when such line is encountered I set value of variable n to number of row (NR), when all lines of lines are processed I print said value.
(tested in gawk 4.2.1)
Or, without awk:
set fh [open file_name]
set lines [split [read $fh] \n]
close $fh
set line_nums [lmap idx [lsearch -all -regexp $lines with] {expr {$idx + 1}}]
set last_line_num [lindex $line_nums end]
With your shown samples and efforts please try following tac + awk code.
tac Input_file |
awk -v lines=$(wc -l < Input_file) '/12/{print lines-FNR+1;exit}'
Explanation:
Using tac command to print Input_file's output in reverse order from bottom to top(basically to get very last match very quickly at first and exit from awk program, explained later on).
Sending tac Input_file output to awk program as an input.
In awk program creating variable named lines which has total number of lines of Input_file and in main program checking if line contains 12 then printing lines-FNR+1 value and using exit exiting from program then, by which we need NOT to read whole Input_file.

awk difference between commands from file and from commandline

The following script
#! /bin/bash
B=5
#FILE INPUT
cat <<EOF > awk.in
BEGIN{b=$B;printf("B is %s\n", b)}
EOF
awk -f awk.in sometextfile.txt
#COMMANDLINE INPUT
awk 'BEGIN{b=$B;printf("B is %s\n", b)}' sometextfile.txt
produces the output
B is 5
B is
The commands I am issuing to awk are exactly the same, so why is the variable B interpreted correctly in the first case but not in the latter?
Thanks!
In the line
awk 'BEGIN{b=$B;printf("B is %s\n", b)}' sometextfile.txt
The string literal 'BEGIN{b=$B;printf("B is %s\n", b)}' is singly-quoted, therefore $B is not expanded and treated as awk code. In awk code, B is uninitialized, so $B becomes $0, which is in the BEGIN block empty.
In contrast, shell variables in here documents (as in your first example) are expanded, so awk.in ends up containing the value that $B had in the shell script. This, by the way, would have made writing awk code very painful as soon as you'd tried to use a field variable (named $1, $2, and so forth) or the full line (named $0) because you'd have to manually resolve the ambiguity between the awk fields and shell variables of the same name.
Use
awk -v b="$B" 'BEGIN{ printf("B is %s\n", b) }' sometextfile.txt
to make a shell variable known to awk code. Do not try to substitute it directly into awk code; it isn't necessary, you will hate writing awk code that way, and it leads to code injection problems, especially when B comes from an untrusted source.

Can I speed up AWK program using NR function

I am using awk to pull out data form a file that us +30M records. I know within a few 1000 records where the records I want are. I am curious if I can cut down on the time it take awk to find the records by telling it a starting point setting the NR. for example, my record is >25 million lines in I could use the following:
awk 'BEGIN{NR=25000000}{rest of my script}' in
would this make awk skip straight to the 25M record and save me the time of it scanning each record before that?
For a better example, I am using this AWK in a loop in sh. I need the normal output of the awk script, but I would also like it pass along the NR when it finished to the next interation when loop comes back to this script again.
awk -v n=$line -v r=$record 'BEGIN{a=1}$4==n{print $10;a=2}($4!=n&&a==2){(pass NR out to $record);exit}' in
Nope. Let's try it:
$ cat -n file
1 one
2 two
3 three
4 four
$ awk 'BEGIN {NR=2} {print NR, $0}' file
3 one
4 two
5 three
6 four
Are your records fixed length, or do you know the average line length? If yes, then you can use a language that allows you to open a file and seek to a position. Otherwise you have to read all those lines:
awk -v start=25000000 'NR < start {next} {your program here}' file
To maintain your position between runs of the script, I'd use a language like perl: at the end of the run use tell() to output the current position, say to a file; then at the start of the next run, use seek() to pick up where you left off. Add a check that the starting position is less than the current file size, in case the file was truncated.
One way (Using sed), if you know the line numbers
for n in 3 5 8 9 ....
do
sed -n "${n}p" file |awk command
done
or
sed -n "25000,30000p" file |awk command
Records generally have no fixed size so there is no way for awk but to scan the first part of the file even just to skip them.
Should you want to skip the first part of the input file and you (roughly) know the size to ignore, you can use dd to truncate the input, eg here assuming a record is 80 bytes wide:
dd if=inputfile bs=25MB skip=80 | awk ...
Finally, you can avoid awk to scan the last records by exiting from the awk script when you have hit the end of the interesting zone.

Multiple passes with awk and execution order

Two part question:
Part One:
First I have a sequence AATTCCGG which I want to change to TAAGGCC. I used gsub to change A to T, C to G, G to C and T to A. Unfortunetly awk executes these orders sequentially, so I ended up with AAACCCC. I got around this by using upper and lower case, then converting back to upper case values, but I would like to do this in a single step if possible.
example:
echo AATTCCGG | awk '{gsub("A","T",$1);gsub("T","A",$1);gsub("C","G",$1);gsub("G","C",$1);print $0}'
OUTPUT:
AAAACCCC
Part Two:
Is there a way to get awk to run to the end of a file for one set of instructions before starting a second set? I tried some of the following, but with no success
for the data set
1 A
2 B
3 C
4 D
5 E
I am using the following pipe to get the data I want (Just an example)
awk '{if ($1%2==0)print $1,"E";else print $0}' test | awk '{if ($1%2==0 && $2=="E") print $0}'
I am using a pipe to rerun the program, however I have found that it is quicker if I don't have to rerun the program.
This can be efficiently solved with tr:
$ echo AATTCCGG | tr ATCG TAGC
Regarding part two (this should be a different question, really): no, it is not possible with awk, pipe is the way to go.
for part two, try this command:
awk '{if ($1%2==0)print $1,"E"}' test
Here is a method I have found for the first part of the question using awk. It uses an array and a for loop.
cat sub.awk
awk '
BEGIN{d["G"]="C";d["C"]="G";d["T"]="A";d["A"]="T";FS="";OFS=""}
{for(i=1;i<(NF+1);i++)
{if($i in d)
$i=d[$i]}
}
{print}'
Input/Output:
ATCG
TAGC