Recursive file includes - awk

I'm writing an awk script to include sourced files in a shell script recursively:
$ cat build.awk
function slurp(line) {
if ( line ~ /^\. / || line ~ /^source / ) {
split(line, words)
while ( (getline _line < words[2]) > 0 ) {
slurp(_line)
}
} else if ( NR != 1 && line ~ /^#!/ ) {
# Ignore shebang not in the first line
} else {
print line
}
}
{
slurp($0)
}
For example, with the following four shell scripts,
$ for i in a b c d; do cat $i.sh; echo; done
#!/bin/sh
echo this is a
. b.sh
#!/bin/sh
echo this is b
. c.sh
. d.sh
#!/bin/sh
echo this is c
#!/bin/sh
echo this is d
I expect that by running awk -f build.awk a.sh I get
#!/bin/sh
echo this is a
echo this is b
echo this is c
echo this is d
However, the actual result is
#!/bin/sh
echo this is a
echo this is b
echo this is c
d.sh is not included. How can I fix this? What is my mistake?

Aha, that was because in awk, variables have no lexical scope! I am too noob for the awk language. The following works:
$ cat build.awk
function slurp(line, words) {
if ( line ~ /^\. / || line ~ /^source / ) {
split(line, words)
while ( (getline _line < words[2]) > 0 ) {
slurp(_line)
}
} else if ( NR != 1 && line ~ /^#!/ ) {
# Ignore shebang not in the first line
} else {
print line
}
}
{
slurp($0)
}
$ awk -f build.awk a.sh
#!/bin/sh
echo this is a
echo this is b
echo this is c
echo this is d

Related

Something is wrong by compilation

What do I wrong please? I have a code in file
file.awk:
BEGIN { CONVFMT="%0.17f" }
/D Format/ { in_f_format=0; next }
/F Format/ { in_f_format=1; next }
in_f_format != 1 { next }
!($1 ~ /^[1-9]/) { next }
$1 == 11 { prt(180,3.141592653589); next }
$1 == 15 { prt(100,1); next }
$1 == 20 { prt(10,1); next }
$1 == 26 { next }
{ prt(1,1) }
function prt(mult, div) {
print trunc($5 * mult / div) ORS trunc($6 * mult / div)
}
function trunc(n, s) {
s=index(n,".")
return (s ? substr(n,1,s+6) : n)
}
I write:
chmod +x file.awk
./file.awk
to the terminal and I get these mistakes:
./file.awk: řádek 1: BEGIN: příkaz nenalezen
./file.awk: řádek 2: /D: Adresář nebo soubor neexistuje
./file.awk: řádek 2: next: příkaz nenalezen
./file.awk: řádek 3: /F: Adresář nebo soubor neexistuje
./file.awk: řádek 3: next: příkaz nenalezen
./file.awk: řádek 4: in_f_format: příkaz nenalezen
./file.awk: řádek 5: chyba syntaxe poblíž neočekávaného tokenu „{“
./file.awk: řádek 5: `!($1 ~ /^[1-9]/) { next }'
Where is the mistake please?
EDIT
Similarly script
BEGIN { CONVFMT="%0.17f" }
/D Format/ { in_f_format=0; next }
/F Format/ { in_f_format=1; next }
in_f_format != 1 { next }
!($1 ~ /^[1-9]/) { next }
$1 == 35 { print t($5), t($6) }
function trunc(n, s) {
s=index(n,".")
return (s ? substr(n,1,s+6) : n)
}
Gives an error:
fatal: function `t' not defined
I would like to write from this input
Input-Output in F Format
No. Curve Input Param. Correction Output Param. Standard Deviation
26 0 56850.9056460000 -0.0017608883 56850.9038851117 0.0016647171
35 1 0.2277000000 0.0011369754 0.2288369754 0.0014780395
35 2 0.2294000000 0.0000417158 0.2294417158 0.0008601513
35 3 0.2277000000 0.0007425066 0.2284425066 0.0022555311
35 4 0.2298000000 -0.0000518690 0.2297481310 0.0010186846
35 5 0.2295000000 0.0000793572 0.2295793572 0.0014667137
35 6 0.2300000000 0.0000752449 0.2300752449 0.0006258864
35 7 0.2307000000 -0.0001442591 0.2305557409 0.0002837569
35 8 0.2275000000 0.0007358355 0.2282358355 0.0007609550
35 9 0.2292000000 0.0003447650 0.2295447650 0.0007148005
35 10 0.2302000000 -0.0001854710 0.2300145290 0.0006320668
35 11 0.2308000000 -0.0002064324 0.2305935676 0.0008911070
35 12 0.2299000000 -0.0000202967 0.2298797033 0.0002328860
35 13 0.2298000000 0.0000464629 0.2298464629 0.0011609539
35 14 0.2307000000 -0.0003654521 0.2303345479 0.0006827961
35 15 0.2294000000 0.0002157908 0.2296157908 0.0003253584
Input-Output in D Format
numbers that are in $5 and $6 that are in rows starting 35.
EDIT 2
I eddited the position of f function like
BEGIN { CONVFMT="%0.17f" }
function trunc(n, s) {
s=index(n,".")
return (s ? substr(n,1,s+6) : n)
}
/D Format/ { in_f_format=0; next }
/F Format/ { in_f_format=1; next }
in_f_format != 1 { next }
!($1 ~ /^[1-9]/) { next }
$1 == 35 { print t($5), t($6) }
UPDATE: After chatting with OP(which I mentioned in my comments too) OP needs to change function's name to actual function from t and it worked then. Thought to update here so all will know it.
There could be 2 possible solutions.
1st: Mention awk in shellscript and run it as shell script.
cat script.ksh
awk 'BEGIN { CONVFMT="%0.17f" }
/D Format/ { in_f_format=0; next }
/F Format/ { in_f_format=1; next }
in_f_format != 1 { next }
!($1 ~ /^[1-9]/) { next }
$1 == 11 { prt(180,3.141592653589); next }
$1 == 15 { prt(100,1); next }
$1 == 20 { prt(10,1); next }
$1 == 26 { next }
{ prt(1,1) }
function prt(mult, div) {
print trunc($5 * mult / div) ORS trunc($6 * mult / div)
}
function trunc(n, s) {
s=index(n,".")
return (s ? substr(n,1,s+6) : n)
}' Input_file
Give script.ksh proper execute permissions and run it like you are running.
2nd: Run it as a awk script by running it like:
awk -f awk_file Input_file
Without a shebang line, the shell thinks your script is a shell script. To make it executable as an awk script, you have to use the proper shebang line:
cat <<'EOF' > awkscript
#!/usr/bin/awk -f
BEGIN { print "Hello, world!" }
EOF
chmod +x awkscript
where /usr/bin/awk is the path to your awk executable (can be found with type awk). The important bit is the -f flag.
Now you can run it as a standalone script:
$ ./awkscript
Hello, world!
The canonical(not sure if it's the right word) way to use it is like this:
awk -f file.awk datafile
or the pipe way like this:
cat datafile | awk -f file.awk
The file.awk, which is same as your trying, is the awk's script file.(name can change).
And the datafile is the file(s) contains the data you want to dealing with.
And no need to chmod awk file(s) to use it.
Update:
But thanks keith kindly mentioned in comment you can do it like that too.
Put this line:
#/usr/bin/awk -f
at the beginning of your file.awk.(given your awk executable is at that location).
After chmod +x file.awk, you can execute it like this:
./file.awk datafile

How to merge lines using awk command so that there should be specific fields in a line

I want to merge some rows in a file so that the lines should contain 22 fields seperated by ~.
Input file looks like this.
200269~7414~0027001~VALTD~OM3500~963~~~~716~423~2523~Y~UN~~2423~223~~~~A~200423
2269~744~2701~VALD~3500~93~~~~76~423~223~Y~
UN~~243~223~~~~A~200123
209~7414~7001~VALD~OM30~963~~~
~76~23~2523~Y~UN~~223~223~~~~A~123
and So on
First line looks fine. 2nd and 3rd line needs to be merged so that it becomes a line with 22 fields. 4th,5th and 6th line should be merged and so on.
Expected output:
200269~7414~0027001~VALTD~OM3500~963~~~~716~423~2523~Y~UN~~2423~223~~~~A~200423
2269~744~2701~VALD~3500~93~~~~76~423~223~Y~UN~~243~223~~~~A~200123
209~7414~7001~VALD~OM30~963~~~~76~23~2523~Y~UN~~223~223~~~~A~123
The file has 10 GB data but the code I wrote (used while loop) is taking too much time to execute . How to solve this problem using awk/sed command?
Code Used:
IFS=$'\n'
set -f
while read line
do
count_tild=`echo $line | grep -o '~' | wc -l`
if [ $count_tild == 21 ]
then
echo $line
else
checkLine
fi
done < file.txt
function checkLine
{
current_line=$line
read line1
next_line=$line1
new_line=`echo "$current_line$next_line"`
count_tild_mod=`echo $new_line | grep -o '~' | wc -l`
if [ $count_tild_mod == 21 ]
then
echo "$new_line"
else
line=$new_line
checkLine
fi
}
Using only the shell for this is slow, error-prone, and frustrating. Try Awk instead.
awk -F '~' 'NF==1 { next } # Hack; see below
NF<22 {
for(i=1; i<=NF; i++) f[++a]=$i }
a==22 {
for(i=1; i<=a; ++i) printf "%s%s", f[i], (i==22 ? "\n" : "~")
a=0 }
NF==22
END {
if(a) for(i=1; i<=a; i++) printf "%s%s", f[i], (i==a ? "\n" : "~") }' file.txt>file.new
This assumes that consecutive lines with too few fields will always add up to exactly 22 when you merge them. You might want to check this assumption (or perhaps accept this answer and ask a new question with more and better details). Or maybe just add something like
a>22 {
print FILENAME ":" FNR ": Too many fields " a >"/dev/stderr"
exit 1 }
The NF==1 block is a hack to bypass the weirdness of the completely empty line 5 in your sample.
Your attempt contained multiple errors and inefficiencies; for a start, try http://shellcheck.net/ to diagnose many of them.
$ cat tst.awk
BEGIN { FS="~" }
{
sub(/^[0-9]+\./,"")
gsub(/[[:space:]]+/,"")
$0 = prev $0
if ( NF == 22 ) {
print ++cnt "." $0
prev = ""
}
else {
prev = $0
}
}
$ awk -f tst.awk file
1.200269~7414~0027001~VALTD~OM3500~963~~~~716~423~2523~Y~UN~~2423~223~~~~A~200423
2.2269~744~2701~VALD~3500~93~~~~76~423~223~Y~UN~~243~223~~~~A~200123
3.209~7414~7001~VALD~OM30~963~~~~76~23~2523~Y~UN~~223~223~~~~A~123
The assumption above is that you never have more than 22 fields on 1 line nor do you exceed 22 in any concatenation of the contiguous lines that are each less than 22 fields, just like you show in your sample input.
You can try this awk
awk '
BEGIN {
FS=OFS="~"
}
{
while(NF<22) {
if(NF==0)
break
a=$0
getline
$0=a$0
}
if(NF!=0)
print
}
' infile
or this sed
sed -E '
:A
s/((.*~){21})([^~]*)/\1\3/
tB
N
bA
:B
s/\n//g
' infile

Redirect input for gawk to a system command

Usually a gawk script processes each line of its stdin. Is it possible to instead specify a system command in the script use the process each line from output of the command in the rest of the script?
For example consider the following simple interaction:
$ { echo "abc"; echo "def"; } | gawk '{print NR ":" $0; }'
1:abc
2:def
I would like to get the same output without using pipe, specifying instead the echo commands as a system command.
I can of course use the pipe but that would force me to either use two different scripts or specify the gawk script inside the bash script and I am trying to avoid that.
UPDATE
The previous example is not quite representative of my usecase, this is somewhat closer:
$ { echo "abc"; echo "def"; } | gawk '/d/ {print NR ":" $0; }'
2:def
UPDATE 2
A shell script parallel would be as follows. Without the exec line the script would read from stdin; with the exec it would use the command that line as input:
/tmp> cat t.sh
#!/bin/bash
exec 0< <(echo abc; echo def)
while read l; do
echo "line:" $l
done
/tmp> ./t.sh
line: abc
line: def
From all of your comments, it sounds like what you want is:
$ cat tst.awk
BEGIN {
if ( ("mktemp" | getline file) > 0 ) {
system("(echo abc; echo def) > " file)
ARGV[ARGC++] = file
}
close("mktemp")
}
{ print FILENAME, NR, $0 }
END {
if (file!="") {
system("rm -f \"" file "\"")
}
}
$ awk -f tst.awk
/tmp/tmp.ooAfgMNetB 1 abc
/tmp/tmp.ooAfgMNetB 2 def
but honestly, I wouldn't do it. You're munging what the shell is good at (creating/destroying files and processes) with what awk is good at (manipulating text).
I believe what you're looking for is getline:
awk '{ while ( ("echo abc; echo def" | getline line) > 0){ print line} }' <<< ''
abc
def
Adjusting the answer to you second example:
awk '{ while ( ("echo abc; echo def" | getline line) > 0){ counter++; if ( line ~ /d/){print counter":"line} } }' <<< ''
2:def
Let's break it down:
awk '{
cmd = "echo abc; echo def"
# line below will create a line variable containing the ouptut of cmd
while ( ( cmd | getline line) > 0){
# we need a counter because NR will not work for us
counter++;
# if the line contais the letter d
if ( line ~ /d/){
print counter":"line
}
}
}' <<< ''
2:def

How to run a .awk file?

I am converting a CSV file into a table format, and I wrote an AWK script and saved it as my.awk. Here is the my script:
#AWK for test
awk -F , '
BEGIN {
aa = 0;
}
{
hdng = "fname,lname,salary,city";
l1 = length($1);
l13 = length($13);
if ((l1 > 2) && (l13 == 0)) {
fname = substr($1, 2, 1);
l1 = length($3) - 4;
lname = substr($3, l1, 4);
processor = substr($1, 2);
#printf("%s,%s,%s,%s\n", fname, lname, salary, $0);
}
if ($0 ~ ",,,,")
aa++
else if ($0 ~ ",fname")
printf("%s\n", hdng);
else if ((l1 > 2) && (l13 == 0)) {
a++;
}
else {
perf = $11;
if (perf ~/^[0-9\.\" ]+$/)
type = "num"
else
type = "char";
if (type == "num")
printf("Mr%s,%s,%s,%s,,N,N,,\n", $0,fname,lname, city);
}
}
END {
} ' < life.csv > life_out.csv*
How can I run this script on a Unix server? I tried to run this my.awk file by using this command:
awk -f my.awk life.csv
The file you give is a shell script, not an awk program. So, try sh my.awk.
If you want to use awk -f my.awk life.csv > life_out.cs, then remove awk -F , ' and the last line from the file and add FS="," in BEGIN.
If you put #!/bin/awk -f on the first line of your AWK script it is easier. Plus editors like Vim and ... will recognize the file as an AWK script and you can colorize. :)
#!/bin/awk -f
BEGIN {} # Begin section
{} # Loop section
END{} # End section
Change the file to be executable by running:
chmod ugo+x ./awk-script
and you can then call your AWK script like this:
`$ echo "something" | ./awk-script`
Put the part from BEGIN....END{} inside a file and name it like my.awk.
And then execute it like below:
awk -f my.awk life.csv >output.txt
Also I see a field separator as ,. You can add that in the begin block of the .awk file as FS=","

How to "do something" for each input text files

Say that I read in the following information stored in three diffrent text files (Can be many more)
File 1
1 2 rt 45
2 3 er 44
File 2
rf r 4 5
3 er 4 t
er t yu 4
File 3
er tyu 3er 3r
der 4r 5e
edr rty tyu 4r
edr 5t yt5 45
When I read in this information I want it to print this information from these two files into separate arrays as for now they are printed out in the same time
Now I Have this script printing out all information at the same time
{
TESTd[NR-1] = $2; g++
}
END {
for (i = 0 ; i <= g-1; i ++ ) {
print " [\"" TESTd[i] "\"]"
}
print " _____"
}
But is there a way to read in multiple files and do this for every text file?
Like instead of getting this output when doing awk -f test.awk 1.txt 2.txt 3.txt
["2"]
["3"]
["r"]
["er"]
["t"]
["tyu"]
["4r"]
["rty"]
["5t"]
_____
I get this output
["2"]
["3"]
_____
["r"]
["er"]
["t"]
_____
["tyu"]
["4r"]
["rty"]
["5t"]
_____
And reading in each file at the time is preferably not an option here since I will have like 30 text files.
EDIT________________________________________________________________
I want to do this in awk if possible because I'm going to do something like this
{
PRINTONCE[NR-1] = $2; g++
PRINTONEATTIME[NR-1] = $3
}
END {
#Do this for all arguments once
for (i = 0 ; i <= g-1; i ++ ) {
print " [\"" PRINTONCE[i] "\"] \n"
}
print " _____"
#Do this for loop for every .txt file that is read in as an argument
#for(j=0;j<args.length;j++){
for (i = 0 ; i <= g-1; i ++ ) {
print " [\"" PRINTONEATTIME[i] "\"] \n"
}
print " _____"
}
From what i understand, you have an awk script that works and you want to run that awk script on many files and want their output to have a new line(or _) in between so you can distinguish which output is from which file.
Try this bash script :-
dir=~/*.txt #all txt files in ~(home) directory
for f in $dir
do
echo "File is $f"
awk 'BEGIN{print "Hello"}' $f #your awk code will take $f file as input.
echo "------------------"; echo;
done
Also, if you do not want to do this to all files you can write the for loop as for f in 1.txt 2.txt 3.txt.
If you don't want to do it in awk directly. You can call it like this in bash or zsh for example:
for fic in test*.txt; awk -f test.awk $fic
It's quite simple to do it directly in awk:
# define a function to print out the array
function dump(array, n) {
for (i = 0 ; i <= n-1; i ++ ) {
print " [\"" array[i] "\"]"
}
print " _____"
}
# dump and reset when starting a new file
FNR==1 && NR!=1 {
dump(TESTd, g)
delete TESTd
g = 0
}
# add data to the array
{
TESTd[FNR-1] = $2; g++
}
# dump at the end
END {
dump(TESTd, g)
}
N.B. using delete TESTd is a non-standard gawk feature, but the question is tagged as gawk so I assumed it's OK to use it.
Alternatively you could use one or more of ARGIND, ARGV, ARGC or FILENAME to distinguish the different files.
Or as suggested by see https://stackoverflow.com/a/10691259/981959, with gawk 4 you can use an ENDFILE group instead of END in your original:
{
TESTd[FNR-1] = $2; g++
}
ENDFILE {
for (i = 0 ; i <= g-1; i ++ ) {
print " [\"" TESTd[i] "\"]"
}
print " _____"
delete TESTd
g = 0
}
Write a bash shell script or a basic shell script. Try to put below into test.sh. Then call /bin/sh test.sh or /bin/bash test.sh, see which one will work
for f in *.txt
do
echo "File is $f"
awk -F '\t' 'blah blah' $f >> output.txt
done
Or write a bash shell script to call your awk script
for f in *.txt
do
echo "File is $f"
/bin/sh yourscript.sh
done