awk help for getting fields

awk help for getting fields - awk

I have field like
a_b_c_d
I want as the output a b c_d, this query wont work for this
awk -F"_" <file> | '{print $1,$2,$3}'
Since it will only print a b c

Try
awk -F"_" -f <file> '{ print $1" "$2" "$3"_"$4 }'
In other words,
$echo a_b_c_d | awk -F"_" '{ print $1" "$2" "$3"_"$4 }'
a b c_d
The code in the brackets means
print the first match
print space
print the second match
print space
...

Related

awk multiple row and printing results

I would like to print some specific parts of a results with awk, after multiple pattern selection.
What I have is (filetest):
A : 1
B : 2
I expect to have:
1 - B : 2
So, only the result of the first row, then the whole second row.
The dash was added by me.
I have this:
awk -F': ' '$1 ~ /A|B/ { printf "%s", $2 "-" }' filetest
Result:
1 -2 -
And I cannot get the full second row, without failing in showing just the result of the first one
awk -F': ' '$1 ~ /A|B/ { printf "%s", $2 "$1" }' filetest
Result:
1 - A 2 - B
Is there any way to print in the same line, exactly the column/row that I need with awk?
In my case R1C2 - R2C1: R2C2?
Thanks!

This will do what you are expecting:
awk -F: '/^A/{printf "%s -", $2}/^B/{print}' filetest

$ awk -F: 'NR%2 {printf "%s - ", $2; next}1' filetest
1 - B : 2

You can try this
awk -F: 'NR%2==1{a=$2; } NR%2==0{print a " - " $0}' file
output
1 - B : 2

I'd probably go with #jas's answer as it's clear, simple, and not coupled to your data values but just to show an alternative approach:
$ awk '{printf "%s", (NR%2 ? $3 " - " : $0 ORS)}' file
1 - B : 2

tried on gnu awk
awk -F':' 'NR==1{s=$2;next}{FS="";s=s" - "$0;print s}' filetest

What is the meaning of $0 = $0 in Awk?

While going through a piece of code I saw the below command:
grep "r" temp | awk '{FS=","; $0=$0} { print $1,$3}'
temp file contain the pattern like:
1. r,1,5
2. r,4,5
3. ...
I could not understand what does the statement $0=$0 mean in awk command.
Can anyone explain what does it mean?

When you do $1=$1 (or any other assignment to a field) it causes record recompilation where $0 is rebuilt with every FS replaced with OFS but it does not change NF (unless there was no $1 previously and then NF would change from 0 to 1) or reevaluate the record in any other way.
When you do $0=$0 it causes field splitting where NF, $1, $2, etc. are repopulated based on the current value of FS but it does not change the FSs to OFSs or modify $0 in any other way.
Look:
$ echo 'a-b-c' |
awk -F'-+' -v OFS='-' '
function p() { printf "%d) %d: $0=%s, $2=%s\n", ++c,NF,$0,$2 }
{ p(); $2=""; p(); $1=$1; p(); $0=$0; p(); $1=$1; p() }
'
1) 3: $0=a-b-c, $2=b
2) 3: $0=a--c, $2=
3) 3: $0=a--c, $2=
4) 2: $0=a--c, $2=c
5) 2: $0=a-c, $2=c
Note in the above that even though setting $2 to null resulted in 2 consecutive -s and the FS of -+ means that 2 -s are a single separator, they are not treated as such until $0=$0 causes the record to be re-split into fields as shown in output step 4.
The code you have:
awk '{FS=","; $0=$0}'
is using $0=$0 as a cludge to work around the fact that it's not setting FS until AFTER the first record has been read and split into fields:
$ printf 'a,b\nc,d\n' | awk '{print NF, $1}'
1 a,b
1 c,d
$ printf 'a,b\nc,d\n' | awk '{FS=","; print NF, $1}'
1 a,b
2 c
$ printf 'a,b\nc,d\n' | awk '{FS=","; $0=$0; print NF, $1}'
2 a
2 c
The correct solution, of course, is instead to simply set FS BEFORE The first record is read:
$ printf 'a,b\nc,d\n' | awk -F, '{print NF, $1}'
2 a
2 c
To be clear - assigning any value to $0 causes field splitting, it does not cause record recompilation while assigning any value to any field ($1, etc.) causes record recompilation but not field splitting:
$ echo 'a-b-c' | awk -F'-+' -v OFS='#' '{$2=$2}1'
a#b#c
$ echo 'a-b-c' | awk -F'-+' -v OFS='#' '{$0=$0}1'
a-b-c

$0 = $0 is used most often to rebuild the field separation evaluation of a modified entry. Ex: adding a field will change $NF after $0 = $0 where it stay as original (at entry of the line).
in this case, it change every line the field separator by , and (see #EdMorton comment below for strike) reparse the line with current FS info where a awk -F ',' { print $1 "," $3 }' is a lot better coding for the same idea, taking the field separator at begining for all lines (in this case, could be different if separator is modified during process depernding by example of previous line content)
ex:
echo "foo;bar" | awk '{print NF}{FS=";"; print NF}{$0=$0;print NF}'
1
1
2
based on #EdMorton comment and related post (What is the meaning of $0 = $0 in Awk)
echo "a-b-c" |\
awk ' BEGIN{ FS="-+"; OFS="-"}
function p(Ref) { printf "%12s) NF=%d $0=%s, $2=%s\n", Ref,NF,$0,$2 }
{
p("Org")
$2="-"; p( "S2=-")
$1=$1 ; p( "$1=$1")
$2=$2 ; p( "$2=$2")
$0=$0 ; p( "$0=$0")
$2=$2 ; p( "$2=$2")
$3=$3 ; p( "$3=$3")
$1=$1 ; p( "$1=$1")
} '
Org) NF=3 $0=a-b-c, $2=b
S2=-) NF=3 $0=a---c, $2=-
$1=$1) NF=3 $0=a---c, $2=-
$2=$2) NF=3 $0=a---c, $2=-
$0=$0) NF=2 $0=a---c, $2=c
$2=$2) NF=2 $0=a-c, $2=c
$3=$3) NF=3 $0=a-c-, $2=c
$1=$1) NF=3 $0=a-c-, $2=c

$0=$0 is for re-evaluate the fields
For example
akshay#db-3325:~$ cat <<EOF | awk '/:/{FS=":"}/\|/{FS="|"}{print $2}'
1:2
2|3
EOF
# Same with $0=$0, it will force awk to have the $0 reevaluated
akshay#db-3325:~$ cat <<EOF | awk '/:/{FS=":"}/\|/{FS="|"}{$0=$0;print $2}'
1:2
2|3
EOF
2
3
# NF - gives you the total number of fields in a record
akshay#db-3325:~$ cat <<EOF | awk '/:/{FS=":"}/\|/{FS="|"}{print NF}'
1:2
2|3
EOF
1
1
# When we Force to re-evaluate the fields, we get correct 2 fields
akshay#db-3325:~$ cat <<EOF | awk '/:/{FS=":"}/\|/{FS="|"}{$0=$0; print NF}'
1:2
2|3
EOF
2
2

>>> echo 'a-b-c' | awk -F'-+' -v OFS='#' '{$2=$2}1'
>>> a#b#c
This can be slightly simplified to
mawk 'BEGIN { FS="[-]+"; OFS = "#"; } ($2=$2)'
Rationale being the boolean test that comes afterwards will evaluate to true upon the assignment, so that itself is sufficient to re-gen the fields in OFS and print it.

Print default value if index is not in awk array

$ cat file1 #It contains ID:Name
5:John
4:Michel
$ cat file2 #It contains ID
5
4
3
I want to Replace the IDs in file2 with Names from file1, output required
John
Michel
NO MATCH FOUND
I need to expand the below code to reult NO MATCH FOUND text.
awk -F":" 'NR==FNR {a[$1]=$2;next} {print a[$1]}' file1 file2
My current result:
John
Michel
<< empty line
Thanks,

You can use a ternary operator for this: print ($1 in a)?a[$1]:"NO MATCH FOUND". That is, if $1 is in the array, print it; otherwise, print the text "NO MATCH FOUND".
All together:
$ awk -F":" 'NR==FNR {a[$1]=$2;next} {print ($1 in a)?a[$1]:"NO MATCH FOUND"}' f1 f2
John
Michel
NO MATCH FOUND

You can test whether the index occurs in the array:
$ awk -F":" 'NR==FNR {a[$1]=$2;next} $1 in a {print a[$1]; next} {print "NOT FOUND"}' file1 file2
John
Michel
NOT FOUND

if file2 has only digit (no space at the end)
awk -F ':' '$1 in A {print A[$1];next}{if($2~/^$/) print "NOT FOUND";else A[$1]=$2}' file1
if not
awk -F '[:[:blank:]]' '$1 in A {print A[$1];next}{if($2~/^$/) print "NOT FOUND";else A[$1]=$2}' file1 file2

How to split fields on a double pipeline

I want to use awk to split fields on a double pipeline ||. Here is my code:
Here is the code that I'm using.
BEGIN {
FS="/|/|"
}
{
print $2
print $1
}

You need to use backslashes, not forward slashes to escape the pipe characters. They also need to be double-escaped:
$ awk 'BEGIN{FS="\\|\\|"}{print $2; print $1}' <<< "a||b"
b
a
The reason that they need to be double-escaped is that they are effectively parsed twice. The first backslashes are lost in the conversion from a string to a regex pattern and the second ones are needed so that the | is not interpreted as a regex OR.

Some other variations:
awk -F"\\\|\\\|" '{print $2; print $1}' <<< "a||b"
b
a
awk -F'\\|\\|' '{print $2; print $1}' <<< "a||b"
b
a
awk -F"[|][|]" '{print $2; print $1}' <<< "a||b"
b
a
awk -F'[|][|]' '{print $2; print $1}' <<< "a||b"
b
a

Tab separated values in awk

How do I select the first column from the TAB separated string?
# echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk -F'\t' '{print $1}'
The above will return the entire line and not just "LOAD_SETTLED" as expected.
Update:
I need to change the third column in the tab separated values.
The following does not work.
echo $line | awk 'BEGIN { -v var="$mycol_new" FS = "[ \t]+" } ; { print $1 $2 var $4 $5 $6 $7 $8 $9 }' >> /pdump/temp.txt
This however works as expected if the separator is comma instead of tab.
echo $line | awk -v var="$mycol_new" -F'\t' '{print $1 "," $2 "," var "," $4 "," $5 "," $6 "," $7 "," $8 "," $9 "}' >> /pdump/temp.txt

You need to set the OFS variable (output field separator) to be a tab:
echo "$line" |
awk -v var="$mycol_new" -F'\t' 'BEGIN {OFS = FS} {$3 = var; print}'
(make sure you quote the $line variable in the echo statement)

Make sure they're really tabs! In bash, you can insert a tab using C-v TAB
$ echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk -F$'\t' '{print $1}'
LOAD_SETTLED

Use:
awk -v FS='\t' -v OFS='\t' ...
Example from one of my scripts.
I use the FS and OFS variables to manipulate BIND zone files, which are tab delimited:
awk -v FS='\t' -v OFS='\t' \
-v record_type=$record_type \
-v hostname=$hostname \
-v ip_address=$ip_address '
$1==hostname && $3==record_type {$4=ip_address}
{print}
' $zone_file > $temp
This is a clean and easy to read way to do this.

You can set the Field Separator:
... | awk 'BEGIN {FS="\t"}; {print $1}'
Excellent read:
https://docs.freebsd.org/info/gawk/gawk.info.Field_Separators.html

echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk -v var="test" 'BEGIN { FS = "[ \t]+" } ; { print $1 "\t" var "\t" $3 }'

If your fields are separated by tabs - this works for me in Linux.
awk -F'\t' '{print $1}' < tab_delimited_file.txt
I use this to process data generated by mysql, which generates tab-separated output in batch mode.
From awk man page:
-F fs
--field-separator fs
Use fs for the input field separator (the value of the FS prede‐
fined variable).

1st column only
— awk NF=1 FS='\t'
LOAD_SETTLED
First 3 columns
— awk NF=3 FS='\t' OFS='\t'
LOAD_SETTLED LOAD_INIT 2011-01-13
Except first 2 columns
— {g,n}awk NF=NF OFS= FS='^([^\t]+\t){2}'
— {m}awk NF=NF OFS= FS='^[^\t]+\t[^\t]+\t'
2011-01-13 03:50:01
Last column only
— awk '($!NF=$NF)^_' FS='\t', or
— awk NF=NF OFS= FS='^.*\t'
03:50:01

Should this not work?
echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk '{print $1}'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

awk help for getting fields - awk

I have field like a_b_c_d I want as the output a b c_d, this query wont work for this awk -F"_" <file> | '{print $1,$2,$3}' Since it will only print a b c

Try awk -F"_" -f <file> '{ print $1" "$2" "$3"_"$4 }' In other words, $echo a_b_c_d | awk -F"_" '{ print $1" "$2" "$3"_"$4 }' a b c_d The code in the brackets means print the first match print space print the second match print space ...

Related

awk multiple row and printing results

What is the meaning of $0 = $0 in Awk?

Print default value if index is not in awk array

How to split fields on a double pipeline

Tab separated values in awk

Categories

Resources