Something is wrong by compilation - awk

What do I wrong please? I have a code in file
file.awk:
BEGIN { CONVFMT="%0.17f" }
/D Format/ { in_f_format=0; next }
/F Format/ { in_f_format=1; next }
in_f_format != 1 { next }
!($1 ~ /^[1-9]/) { next }
$1 == 11 { prt(180,3.141592653589); next }
$1 == 15 { prt(100,1); next }
$1 == 20 { prt(10,1); next }
$1 == 26 { next }
{ prt(1,1) }
function prt(mult, div) {
print trunc($5 * mult / div) ORS trunc($6 * mult / div)
}
function trunc(n, s) {
s=index(n,".")
return (s ? substr(n,1,s+6) : n)
}
I write:
chmod +x file.awk
./file.awk
to the terminal and I get these mistakes:
./file.awk: řádek 1: BEGIN: příkaz nenalezen
./file.awk: řádek 2: /D: Adresář nebo soubor neexistuje
./file.awk: řádek 2: next: příkaz nenalezen
./file.awk: řádek 3: /F: Adresář nebo soubor neexistuje
./file.awk: řádek 3: next: příkaz nenalezen
./file.awk: řádek 4: in_f_format: příkaz nenalezen
./file.awk: řádek 5: chyba syntaxe poblíž neočekávaného tokenu „{“
./file.awk: řádek 5: `!($1 ~ /^[1-9]/) { next }'
Where is the mistake please?
EDIT
Similarly script
BEGIN { CONVFMT="%0.17f" }
/D Format/ { in_f_format=0; next }
/F Format/ { in_f_format=1; next }
in_f_format != 1 { next }
!($1 ~ /^[1-9]/) { next }
$1 == 35 { print t($5), t($6) }
function trunc(n, s) {
s=index(n,".")
return (s ? substr(n,1,s+6) : n)
}
Gives an error:
fatal: function `t' not defined
I would like to write from this input
Input-Output in F Format
No. Curve Input Param. Correction Output Param. Standard Deviation
26 0 56850.9056460000 -0.0017608883 56850.9038851117 0.0016647171
35 1 0.2277000000 0.0011369754 0.2288369754 0.0014780395
35 2 0.2294000000 0.0000417158 0.2294417158 0.0008601513
35 3 0.2277000000 0.0007425066 0.2284425066 0.0022555311
35 4 0.2298000000 -0.0000518690 0.2297481310 0.0010186846
35 5 0.2295000000 0.0000793572 0.2295793572 0.0014667137
35 6 0.2300000000 0.0000752449 0.2300752449 0.0006258864
35 7 0.2307000000 -0.0001442591 0.2305557409 0.0002837569
35 8 0.2275000000 0.0007358355 0.2282358355 0.0007609550
35 9 0.2292000000 0.0003447650 0.2295447650 0.0007148005
35 10 0.2302000000 -0.0001854710 0.2300145290 0.0006320668
35 11 0.2308000000 -0.0002064324 0.2305935676 0.0008911070
35 12 0.2299000000 -0.0000202967 0.2298797033 0.0002328860
35 13 0.2298000000 0.0000464629 0.2298464629 0.0011609539
35 14 0.2307000000 -0.0003654521 0.2303345479 0.0006827961
35 15 0.2294000000 0.0002157908 0.2296157908 0.0003253584
Input-Output in D Format
numbers that are in $5 and $6 that are in rows starting 35.
EDIT 2
I eddited the position of f function like
BEGIN { CONVFMT="%0.17f" }
function trunc(n, s) {
s=index(n,".")
return (s ? substr(n,1,s+6) : n)
}
/D Format/ { in_f_format=0; next }
/F Format/ { in_f_format=1; next }
in_f_format != 1 { next }
!($1 ~ /^[1-9]/) { next }
$1 == 35 { print t($5), t($6) }

UPDATE: After chatting with OP(which I mentioned in my comments too) OP needs to change function's name to actual function from t and it worked then. Thought to update here so all will know it.
There could be 2 possible solutions.
1st: Mention awk in shellscript and run it as shell script.
cat script.ksh
awk 'BEGIN { CONVFMT="%0.17f" }
/D Format/ { in_f_format=0; next }
/F Format/ { in_f_format=1; next }
in_f_format != 1 { next }
!($1 ~ /^[1-9]/) { next }
$1 == 11 { prt(180,3.141592653589); next }
$1 == 15 { prt(100,1); next }
$1 == 20 { prt(10,1); next }
$1 == 26 { next }
{ prt(1,1) }
function prt(mult, div) {
print trunc($5 * mult / div) ORS trunc($6 * mult / div)
}
function trunc(n, s) {
s=index(n,".")
return (s ? substr(n,1,s+6) : n)
}' Input_file
Give script.ksh proper execute permissions and run it like you are running.
2nd: Run it as a awk script by running it like:
awk -f awk_file Input_file

Without a shebang line, the shell thinks your script is a shell script. To make it executable as an awk script, you have to use the proper shebang line:
cat <<'EOF' > awkscript
#!/usr/bin/awk -f
BEGIN { print "Hello, world!" }
EOF
chmod +x awkscript
where /usr/bin/awk is the path to your awk executable (can be found with type awk). The important bit is the -f flag.
Now you can run it as a standalone script:
$ ./awkscript
Hello, world!

The canonical(not sure if it's the right word) way to use it is like this:
awk -f file.awk datafile
or the pipe way like this:
cat datafile | awk -f file.awk
The file.awk, which is same as your trying, is the awk's script file.(name can change).
And the datafile is the file(s) contains the data you want to dealing with.
And no need to chmod awk file(s) to use it.
Update:
But thanks keith kindly mentioned in comment you can do it like that too.
Put this line:
#/usr/bin/awk -f
at the beginning of your file.awk.(given your awk executable is at that location).
After chmod +x file.awk, you can execute it like this:
./file.awk datafile

Related

Recursive file includes

I'm writing an awk script to include sourced files in a shell script recursively:
$ cat build.awk
function slurp(line) {
if ( line ~ /^\. / || line ~ /^source / ) {
split(line, words)
while ( (getline _line < words[2]) > 0 ) {
slurp(_line)
}
} else if ( NR != 1 && line ~ /^#!/ ) {
# Ignore shebang not in the first line
} else {
print line
}
}
{
slurp($0)
}
For example, with the following four shell scripts,
$ for i in a b c d; do cat $i.sh; echo; done
#!/bin/sh
echo this is a
. b.sh
#!/bin/sh
echo this is b
. c.sh
. d.sh
#!/bin/sh
echo this is c
#!/bin/sh
echo this is d
I expect that by running awk -f build.awk a.sh I get
#!/bin/sh
echo this is a
echo this is b
echo this is c
echo this is d
However, the actual result is
#!/bin/sh
echo this is a
echo this is b
echo this is c
d.sh is not included. How can I fix this? What is my mistake?
Aha, that was because in awk, variables have no lexical scope! I am too noob for the awk language. The following works:
$ cat build.awk
function slurp(line, words) {
if ( line ~ /^\. / || line ~ /^source / ) {
split(line, words)
while ( (getline _line < words[2]) > 0 ) {
slurp(_line)
}
} else if ( NR != 1 && line ~ /^#!/ ) {
# Ignore shebang not in the first line
} else {
print line
}
}
{
slurp($0)
}
$ awk -f build.awk a.sh
#!/bin/sh
echo this is a
echo this is b
echo this is c
echo this is d

Stored each of the first 2 blocks of lines in arrays

I've sorted it by using Google Sheet, but its gonna takes a long time, so I figured it out, to settle it down by awk.
input.txt
Column 1
2
2
2
4
4
Column 2
562
564
119
215
12
Range
13455,13457
13161
11409
13285,13277-13269
11409
I've tried this script, so it's gonna rearrange the value.
awk '/Column 1/' RS= input.txt
(as referred in How can I set the grep after context to be "until the next blank line"?)
But it seems, it's only gonna take one matched line
It should be sorted by respective lines.
Result:
562Value2#13455
562Value2#13457
564Value2#13161
119Value2#11409
215Value4#13285
215Value4#13277-13269
12Value4#11409
it should be something like that, the "comma" will be repeating the value from Column 1 and Column 2
etc:
Range :
13455,13457
Result :
562Value2#13455
562Value2#13457
idk what sorting has to do with it but it seems like this is what you're looking for:
$ cat tst.awk
BEGIN { FS=","; recNr=1; print "Result:" }
!NF { ++recNr; lineNr=0; next }
{ ++lineNr }
lineNr == 1 { next }
recNr == 1 { a[lineNr] = $0 }
recNr == 2 { b[lineNr] = $0 }
recNr == 3 {
for (i=1; i<=NF; i++) {
print b[lineNr] "Value" a[lineNr] "#" $i
}
}
$ awk -f tst.awk input.txt
Result:
562Value2#13455
562Value2#13457
564Value2#13161
119Value2#11409
215Value4#13285
215Value4#13277-13269
12Value4#11409

how swap lines with awk with only a single pass and limited memory use?

in a previous post, this answer was shown: answer user2138595, though beautiful , the problem is that you should read the input file twice.
I wish to make a GNU awk script to read input only once.
cat swap_line.awk
you get
BEGIN {
if(init > end){
exit 1;
}
flag = 1;
memory_init = "";
memory = ""
}
{
if (NR != init && NR != end){
if(flag==1){
print $0;
}else{
memory = memory""$0"\n";
}
}else if(end == init){
print $0;
}else if(NR == init){
flag = 0;
memory_init = $0;
}else{
#NR == end
print $0;
printf("%s",memory);
print memory_init;
flag = 1;
}
}
END {
#if end is greater than the number of lines of the file
if(flag == 0){
printf("%s",memory);
print memory_init;
}
}
The scripts works well
cat input
1
2
3
4
5
awk -v init=2 -v end=4 -f swap_line.awk input
1
4
3
2
5
awk -v init=2 -v end=2 -f swap_line.awk input
1
2
3
4
5
awk -v init=2 -v end=8 -f swap_line.awk input
1
3
4
5
2
QUESTION
how could i make a script in a better way ? because, I do not like to use the memory variable, since for large files can have problems, for example if the input file is 10 million lines and want to do a swap between line 1 and line 10 million, I store 9,999,998 lines in memory variable
#JoseRicardoBustosM. it is impossible to do it in one pass in awk without saving the lines from the init to one before the end line in memory. Just think about the impossibility of getting a line N lines ahead of what you've already read to miraculously show up in place of the current line. The best solution for this is definitely a simple 2-pass approach of saving the lines in the first pass and using them in the 2nd. I am including all solutions that involve grep-ing in advance or using a getline loop in the "2"-pass approach bucket.
FWIW here's the way I'd really do it (this IS a 2-pass approach):
$ cat swap_line.awk
BEGIN { ARGV[ARGC]=ARGV[ARGC-1]; ARGC++ }
NR==FNR { if (NR==end) tl=$0; next }
FNR==init { hd=$0; $0=tl; nr=NR-FNR; if (nr<end) next }
FNR==end { $0=hd }
FNR==nr { if (nr<end) $0 = $0 ORS hd }
{ print }
.
$ awk -v init=2 -v end=4 -f swap_line.awk input
1
4
3
2
5
$ awk -v init=2 -v end=2 -f swap_line.awk input
1
2
3
4
5
$ awk -v init=2 -v end=8 -f swap_line.awk input
1
3
4
5
2
Note that if you didn't have that very specific requirement for how to handle an "end" that's past the end of the file then the solution would simply be:
$ cat swap_line.awk
BEGIN { ARGV[ARGC]=ARGV[ARGC-1]; ARGC++ }
NR==FNR { if (NR==end) tl=$0; next }
FNR==init { hd=$0; $0=tl }
FNR==end { $0=hd }
{ print }
and if you really want something to think about (again, just for the sunny day cases):
$ cat swap_line.awk
NR==init { hd=$0; while ((getline<FILENAME)>0 && ++c<end); }
NR==end { $0=hd }
{ print }
$ awk -v init=2 -v end=4 -f swap_line.awk input
1
4
3
2
5
I would still consider that last one as a "2"-pass approach and I wouldn't do it if I didn't fully understand all the caveats listed at http://awk.info/?tip/getline.
I think you are working too hard. This makes no attempt to deal with extreme cases (eg, if end is greater than the number of lines, the initial line will not be printed, but that can easily be handled in an END block), because I think handling the edge cases obscures the idea. Namely, print until you reach the line you want swapped out, then store data in a file, then print the line to swap, the stored data, and the initial line, and then print the rest of the file:
$ cat swap.sh
#!/bin/sh
trap 'rm -f $T1' 0
T1=$(mktemp)
awk '
NR<init { print; next; }
NR==init { f = $0; next; }
NR<end { print > t1; next; }
NR==end { print; system("cat "t1); print f; next; }
1
' init=${1?} end=${2?} t1=$T1
$ yes | sed 10q | nl -ba | ./swap.sh 4 8
1 y
2 y
3 y
8 y
5 y
6 y
7 y
4 y
9 y
10 y
I agree that 2 passes are required. The first pass can be done with a tool(s) that is designed specifically for the task:
# $init and $end have been defined
endline=$( tail -n "+$end" file | head -n 1 )
awk -v init="$init" -v end="$end" -v endline="$endline" '
NR == init {saved = $0; $0 = endline}
NR == end {$0 = saved}
{print}
' file
Hide the details away in a function:
swap_lines () {
awk -v init="$1" \
-v end="$2" \
-v endline="$(tail -n "+$2" "$3" | head -n 1)" \
'
NR == init {saved = $0; $0 = endline}
NR == end {$0 = saved}
1
' "$3"
}
seq 5 > file
swap_lines 2 4 file
1
4
3
2
5

Sum up from line "A" to line "B" from a big file using awk

aNumber|bNumber|startDate|timeZone|duration|currencyType|cost|
22677512549|778|2014-07-02 10:16:35.000|NULL|NULL|localCurrency|0.00|
22675557361|76457227|2014-07-02 10:16:38.000|NULL|NULL|localCurrency|10.00|
22677521277|778|2014-07-02 10:16:42.000|NULL|NULL|localCurrency|0.00|
22676099496|77250331|2014-07-02 10:16:42.000|NULL|NULL|localCurrency|1.00|
22667222160|22667262389|2014-07-02 10:16:43.000|NULL|NULL|localCurrency|10.00|
22665799922|70110055|2014-07-02 10:16:45.000|NULL|NULL|localCurrency|20.00|
22676239633|433|2014-07-02 10:16:48.000|NULL|NULL|localCurrency|0.00|
22677277255|76919167|2014-07-02 10:16:51.000|NULL|NULL|localCurrency|1.00|
This is the input (sample of million of line) i have in csv file.
I want to sum up duration based on date.
My concern is i want to sum up first 1000000 lines
the awk program i'm using is:
test.awk
BEGIN { FS = "|" }
NR>1 && NR<=1000000
FNR == 1{ next }
{
sub(/ .*/,"",$3)
key=sprintf("%10s",$3)
duration[key] += $5 } END {
printf "%-10s %16s,"dAccused","Duration"
for (i in duration) {
printf "%-4s %16.2f i,duration[i]
}}
i run my script as
$awk -f test.awk 'file'
The input i have doesn't condsidered my condition NR>1 && NR<=1000000
ANY SUGGESTION? PLEASE!
You're looking for this:
BEGIN { FS = "|" }
1 < NR && NR <= 1000000 {
sub(/ .*/, "", $3)
key = sprintf("%10s",$3)
duration[key] += $5
}
END {
printf "%-10s %16s\n", "dAccused", "Duration"
for (i in duration) {
printf "%-4s %16.2f i,duration[i]
}
}
A lot of errors become obvious with proper indentation.
The reason you saw 1,000,000 lines was due to this:
NR>1 && NR<=1000000
That is a condition with no action block. The default action is to print the current record if the condition is true. That's why you see a lot of awk one-liners end with the number 1
You didn't post any expected output and your duration field is always NULL so it's still not clear what you really want output, but this is probably the right approach:
$ cat tst.awk
BEGIN { FS = "|" }
NR==1 { for (i=1;i<NF;i++) f[$i] = i; next }
{
sub(/ .*/,"",$(f["startDate"]))
sum[$(f["startDate"])] += $(f["duration"])
}
NR==1000000 { exit }
END { for (date in sum) print date, sum[date] }
$ awk -f tst.awk file
2014-07-02 0
Instead of discarding your header line, it uses it to create an array f[] that maps the field names to their order in each line so instead of having to hard-code that duration is field 4 (or whatever) you just reference it as $(f["duration"]).
Any time your input file has a header line, don't discard it - use it so your script is not coupled to the order of fields in your input file.

awk Merge two files based on common field and print similarities and differences

I have two files I would like to merge into a third but I need to see both when they share a common field and where they differ.Since there are minor differences in other fields, I cannot use a diff tool and I thought this could be done with awk.
File 1:
aWonderfulMachine 1 mlqsjflk
AnotherWonderfulMachine 2 mlksjf
YetAnother WonderfulMachine 3 sdg
TrashWeWon'tBuy 4 jhfgjh
MoreTrash 5 qsfqf
MiscelleneousStuff 6 qfsdf
MoreMiscelleneousStuff 7 qsfwsf
File2:
aWonderfulMachine 22 dfhdhg
aWonderfulMachine 23 dfhh
aWonderfulMachine 24 qdgfqf
AnotherWonderfulMachine 25 qsfsq
AnotherWonderfulMachine 26 qfwdsf
MoreDifferentStuff 27 qsfsdf
StrangeStuffBought 28 qsfsdf
Desired output:
aWonderfulMachine 1 mlqsjflk aWonderfulMachine 22 dfhdhg
aWonderfulMachine 23 dfhdhg
aWonderfulMachine 24 dfhh
AnotherWonderfulMachine 2 mlksjf AnotherWonderfulMachine 25 qfwdsf
AnotherWonderfulMachine 26 qfwdsf
File1
YetAnother WonderfulMachine 3 sdg
TrashWeWon'tBuy 4 jhfgjh
MoreTrash 5 qsfqf
MiscelleneousStuff 6 qfsdf
MoreMiscelleneousStuff 7 qsfwsf
File2
MoreDifferentStuff 27 qsfsdf
StrangeStuffBought 28 qsfsdf
I have tried a few awks scripts here and there, but they are either based on two fields only, and I don't know how to modify the output, or they delete the duplicates based on two fields only, etc (I am new to this and awk syntax is tough).
Thank you much in advance for your help.
You can come very close using these three commands:
join <(sort file1) <(sort file2)
join -v 1 <(sort file1) <(sort file2)
join -v 2 <(sort file1) <(sort file2)
This assumes a shell, such as Bash, that supports process substitution (<()). If you're using a shell that doesn't, the files would need to be pre-sorted.
To do this in AWK:
#!/usr/bin/awk -f
BEGIN { FS="\t"; flag=1; file1=ARGV[1]; file2=ARGV[2] }
FNR == NR { lines1[$1] = $0; count1[$1]++; next } # process the first file
{ # process the second file and do output
lines2[$1] = $0;
count2[$1]++;
if ($1 != prev) { flag = 1 };
if (count1[$1]) {
if (flag) printf "%s ", lines1[$1];
else printf "\t\t\t\t\t"
flag = 0;
printf "\t%s\n", $0
}
prev = $1
}
END { # output lines that are unique to one file or the other
print "File 1: " file1
for (i in lines1) if (! (i in lines2)) print lines1[i]
print "File 2: " file2
for (i in lines2) if (! (i in lines1)) print lines2[i]
}
To run it:
$ ./script.awk file1 file2
The lines won't be output in the same order that they appear in the input files. The second input file (file2) needs to be sorted since the script assumes that similar lines are adjacent. You will probably want to adjust the tabs or other spacing in the script. I haven't done much in that regard.
One way to do it (albeit with hardcoded file names):
BEGIN {
FS="\t";
readfile(ARGV[1], s1);
readfile(ARGV[2], s2);
ARGV[1] = ARGV[2] = "/dev/null"
}
END{
for (k in s1) {
if ( s2[k] ) printpair(k,s1,s2);
}
print "file1:"
for (k in s1) {
if ( !s2[k] ) print s1[k];
}
print "file2:"
for (k in s2) {
if ( !s1[k] ) print s2[k];
}
}
function readfile(fname, sary) {
while ( getline <fname ) {
key = $1;
if (sary[key]) {
sary[key] = sary[key] "\n" $0;
} else {
sary[key] = $0;
};
}
close(fname);
}
function printpair(key, s1, s2) {
n1 = split(s1[key],l1,"\n");
n2 = split(s2[key],l2,"\n");
for (i=1; i<=max(n1,n2); i++){
if (i==1) {
b = l1[1];
gsub("."," ",b);
}
if (i<=n1) { f1 = l1[i] } else { f1 = b };
if (i<=n2) { f2 = l2[i] } else { f2 = b };
printf("%s\t%s\n",f1,f2);
}
}
function max(x,y){ z = x; if (y>x) z = y; return z; }
Not particularly elegant, but it handles many-to-many cases.