AWK - Max Variables?

AWK - Max Variables? - awk

Why does the first statement work but not the second? I'm trying to add an additional two (one shown) variables to do another comparison, but the second instance errors out.
1st Instance
awk 'f1=substr($1,0,9), f2=substr($3,0,9){if(f1==f2)print $1,$2,$3,$4}' file
2nd Instance
awk 'f1=substr($1,0,9), f2=substr($3,0,9), f3=substr($1,5,3){if(f1==f2)print $1,$2,$3,$4}' file
awk: cmd. line:1: f1=substr($1,0,9), f2=substr($3,0,9), f3=substr($1,5,3){if(f1==f2)print $1,$2,$3,$4,16}
awk: cmd. line:1: ^ syntax error
File
TULSHDRJ02 ae0.0 KSCYBBRJ01 ae1.0
MTC3BBRJ02 ae4.0 KSCYBBRJ01 ae6.0
KSCYBBRJ01 ae2.0 KSCYBBRJ02 ae2.0
MTC1BBRJ02 ae4.0 KSCYBBRJ02 ae6.0
Output
KSCYBBRJ01 ae2.0 KSCYBBRJ02 ae2.0

$ awk 'substr($1,1,9)==substr($3,1,9){print $1,$2,$3,$4}' file
since you're printing everything you can drop the action part
$ awk 'substr($1,1,9)==substr($3,1,9)' file
or, for DRY
$ awk 'function s(v) {return substr(v,1,9)}
s($1)==s($3)' file

The general program structure of an awk program is as follows:
condition { action [; action [ ; ... ]] }
Multiple actions are separated by ; or newline.
Both the condition and the block of actions are optional. When you omit the condition
{ action [; action [ ; ... ]] }
... actions will be always executed. If you omit the actions:
condition
... the default action is print.
Multiple of those blocks can be put in a row:
cond1 { action1 } cond2 {action2} ...
Note: newline can be always used as a delimiter (for multiline programs)
I guess you wanted:
awk '{f1=substr($1,0,9);f2=substr($3,0,9)} f1==f2{print $1,$2,$3,$4}'
... or in multiline form:
awk '# Runs on every line
{
f1=substr($1,0,9)
f2=substr($3,0,9)
}
# Runs only if condition is met
f1==f2 {
print $1,$2,$3,$4
}'
But not quite!
It should be
awk '{f1=substr($1,1,9);f2=substr($3,1,9)} f1==f2{print $1,$2,$3,$4}'
instead of
awk '{f1=substr($1,0,9);f2=substr($3,0,9)} f1==f2{print $1,$2,$3,$4}'
Note that string, field and array indices in awk start at 1, not 0.
Please check also karakfa's answer, which shows how the command can be simplified.

Related

How can I store the length of a line into a var withing awk script?

I have this simple awk script with which I attempt to check the amount of characters in the first line.
if the first line has more of less than 10 characters I want to store the amount
of caracters into a var.
Somehow the first print statement works but storing that result into a var doesn't.
Please help.
I tried removing dollar sign " thelength=(length($0))"
and removing the parenthesis "thelength=length($0)" but it doen't print anything...
Thanks!
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=$(length($0))
print "The length of the first line is: ",$thelength;
exit 1;
}
}
END { print "STOP" }' $1

Two issues dealing with mixing ksh and awk scripting ...
no need to make a sub-shell call within awk to obtain the length; use thelength=length($0)
awk variables do not require a leading $ when being referenced; use print ... ,thelength
So your code becomes:
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=length($0)
print "The length of the first line is: ",thelength;
exit 1;
}
}
END { print "STOP" }' $1

awk: extract data from a column by name rather than position

I have a text file that is comma delimited. The first line is a list of field names, and subsequent lines contain data. I'll get new versions of the file, and I want to extract all the values from a particular column by name rather than by column number. (I.e. the column I want may be in different positions in different versions of the file.)
For example, here are two files:
foo,bar,interesting,junk
1,2,gold,ramjet
2,25,diamonds,superfluous
and
foo,bar,baz,interesting,junk,morejunk
5,3,smurf,platinum,garbage,scrap
6,2.5,mushroom,sodium,liverwurst,eew
I'd like a single script that will go through multiple files, extracting the minerals in the "interesting" column. :-)
What I've got so far is something that works on ONE file, but I know that awk is more elegant than this. How do I clean this up and make it work on multiple files at once?
BEGIN {
FS=",";
}
NR == 1 {
for(i=1; i<=NF; i++) {
if($i=="interesting") {
col=i;
}
}
}
NR > 1 {
print $col;
}

You're pretty darn close already. Just use FNR instead of NR, for "File NR".
#!/usr/bin/awk -f
BEGIN { FS="," }
FNR==1 {
for (col=1;col<=NF;col++)
if ($col=="interesting")
next
}
{ print $col }
Or if you like:
#!/usr/bin/awk -f
BEGIN { FS="," }
FNR==1 { for (col=1;$col!="interesting";col++); next }
{ print $col }
Or if you prefer one-liners:
$ awk -F, -v txt="interesting" 'FNR==1{for(c=1;$c!=txt;c++);next} {print $c}' file1 file2
Of course, be careful that you actually have the specified column, or you may find yourself in an endless loop. You can probably figure out the extra condition that saves you from that risk.
Note that in awk, you only need to terminate commands with semicolons if they are followed by another command. Thus, you would do this:
command1; command2
But you can drop the semicolon if you separate commands with newlines:
command1
command2

Do it this way:
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 { for (i=1;i<=NF;i++) f[$i]=i; next }
{ print $(f["interesting"]) }
$ awk -f tst.awk file1 file2
gold
diamonds
platinum
sodium
Creating a name->value array is always the best approach when it's applicable. It keeps every part of the code simple and decoupled from the rest of the code, and it sets you up for doing other things like changing the order of the fields when you output the results, e.g.:
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 { for (i=1;i<=NF;i++) f[$i]=i; next }
{ print $(f["junk"]), $(f["interesting"]), $(f["bar"]) }
$ awk -f tst.awk file1 file2
ramjet,gold,2
superfluous,diamonds,25
garbage,platinum,3
liverwurst,sodium,2.5

awk: ^ backslash not last character on line

Can anybody help in resolving this issue? Not sure wht it is giving problem.
ssh root#host1 "tail -f /data1/logs/logger.log | awk '{ if(\$0 ~ /^Mar|^Apr/) { printf(\"\\n%s\",\$0) } if(\$0 \!~ /^Mar|^Apr/) { printf(\"%s\", \$0);} };' "
root#host1's password:
awk: { if($0 ~ /^Mar|^Apr/) { printf("\n%s",$0) } if($0 \!~ /^Mar|^Apr/) { printf("%s", $0);} };
awk: ^ backslash not last character on line

Rather than testing over ssh, you can replicate the behaviour using eval. I made a test file (called month):
Mar line1
line2 line2
Apr line3
You have (at least) three options:
First option
Your two options are mutually exclusive, so you can sidestep the issue of escaping a ! entirely by using two blocks with next in the first block:
eval "awk '/^Mar|^Apr/ { printf(\"\\n%s\",\$0); next } { printf(\"%s\", \$0) }' month"
If the condition is true, the first block is taken and next skips the rest. Note that I have removed the unnecessary $0 ~ from the condition. The match is performed against the whole line by default.
Second option
You could actually just do this:
eval "awk '/^Mar|^Apr/ { \$0 = \"\\n\"\$0 } { printf(\"%s\", \$0) }' month"
If the line matches, precede it with a newline.
In all cases (no condition before the { }), print the line.
Third option
If you wrap the overall command in single quotes, you don't need to do anything fancy with the !:
eval 'awk "{ if(/^Mar|^Apr/) { printf(\"\\n%s\",\$0) } if(!/^Mar|^Apr/) { printf(\"%s\", \$0)} }" month'
I recommend one of the other two solutions, I just thought that it would be worth showing that you can use ! within the command if you need to.
Output for all three cases:
Mar line1line2 line2
Apr line3

Here is a cleaned up version of your awk
awk '/^Mar|^Apr/ {printf "\n%s",$0;next} {printf "%s",$0}'
This will test if $0 (default, so need to add) is starting with Mar or Apr
If yes do printf "\n%s",$0;next.
The next makes final code to be rune only if $0 is not starting with Mar or Apr
Then run printf "%s",$0
Or just:
awk '/^Mar|^Apr/ {$0="\n"$0} {printf "%s",$0}'

How to append lines to a new file with AWK

I am trying to append lines to some new files with awk in this way:
#!/usr/bin/awk -f
BEGIN {
FS = "[ \t|]"; }
{
print $5 "\t" $13 "\t" $14 >> "./bed/" $5 ".bed";
}
END {
}
New file is created with filename derived from a field of awk input file (5th field). I am unable to execute this script since it fails with
awk: ./blast2bed.awk:6: (FILENAME=blastout000 FNR=1) fatal: can't redirect to `./bed/AY517392.1.bed' (No such file or directory)
Any hints?
Thanks

The directory bed has to exist so create it first with mkdir bed either before you run your script or in the BEGIN block. You should also add brackets around the output file:
print $5"\t"$13"\t"$14 >> ("./bed/"$5".bed")
Notes: You don't need to end lines with ; if you have a single statement per line and the BEGIN and END blocks are optional.

awk: non-terminated string

I'm trying to run the command below, and its giving me the error. Thoughts on how to fix? I would rather have this be a one line command than a script.
grep "id\": \"http://room.event.assist.com/event/room/event/" failed_events.txt |
head -n1217 |
awk -F/ ' { print $7 } ' |
awk -F\" ' { print "url \= \"http\:\/\/room\.event\.assist\.com\/event\/room\/event\/'{ print $1 }'\?schema\=1\.3\.0\&form\=json\&pretty\=true\&token\=582EVTY78-03iBkTAf0JAhwOBx\&account\=room_event\"" } '
awk: non-terminated string url = "ht... at source line 1
context is
>>> <<<
awk: giving up
source line number 2
The line below exports out a single column of ID's:
grep "id\": \"http://room.event.assist.com/event/room/event/" failed_events.txt |
head -n1217 |
awk -F/ ' { print $7 } '
156512145
898545774
454658748
898432413
I'm looking to get the ID's above into a string like so:
" url = "string...'ID'string"

take a look what you have in last awk :
awk -F\"
' #single start here
{ print " #double starts for print, no ends
url \= \"http\:\/\/room\.event\.assist\.com\/event\/room\/event\/
' #single ends here???
{ print $1 }'..... #single again??? ...
(rest codes)
and you want to print exact {print } out? i don't think so. why you were nesting print ?

Most of the elements of your pipe can be expressed right inside awk.
I can't tell exactly what you want to do with the last awk script, but here are some points:
Your "grep" is really just looking for a string of text, not a
regexp.
You can save time and simplify things if you use awk's
index() function instead of a RE. Output formats are almost always
best handled using printf().
Since you haven't provided your input data, I can't test this code, so you'll need to adapt it if it doesn't work. But here goes:
awk -F/ '
BEGIN {
string="id\": \"http://room.event.assist.com/event/room/event/";
fmt="url = http://example.com/event/room/event/%s?schema=whatever\n";
}
count == 1217 { nextfile; }
index($0, string) {
split($7, a, "\"");
printf(fmt, a[0]);
count++;
}' failed_events.txt
If you like, you can use awk's -v option to pass in the string variable from a shell script calling this awk script. Or if this is a stand-alone awk script (using #! shebang), you could refer to command line options with ARGV.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

AWK - Max Variables? - awk

$ awk 'substr($1,1,9)==substr($3,1,9){print $1,$2,$3,$4}' file since you're printing everything you can drop the action part $ awk 'substr($1,1,9)==substr($3,1,9)' file or, for DRY $ awk 'function s(v) {return substr(v,1,9)} s($1)==s($3)' file

Related

How can I store the length of a line into a var withing awk script?

awk: extract data from a column by name rather than position

awk: ^ backslash not last character on line

How to append lines to a new file with AWK

awk: non-terminated string

Categories

Resources