Simple Pattern match with a field and a variable does not seem to work in GAWK/AWK

Simple Pattern match with a field and a variable does not seem to work in GAWK/AWK - awk

I am trying to extract all lines where a field matches a pattern which is defined as a variable.
I tried the following
head input.dat |
awk -F '|' -v CODE="39905|19043" '{print $13; if($13~CODE){print "Matched"} else {print "Nomatch"} }'
I am printing the value of the field before attempting a pattern match.(This way I don't have to show the entire line that contains many fields)
This is the output I got.
PLAN_ID
Nomatch
39905
Nomatch
39905
Nomatch
39883
Nomatch
19043
Nomatch
2215
Nomatch
19043
Nomatch
9149
Nomatch
42718
Nomatch
24
Nomatch
I expected to see at least 3 instances of Matched in the output. What am I doing wrong?
edit by #Fravadona
xxd input.dat | head -n 6
00000000: fffe 4d00 4f00 4e00 5400 4800 5f00 4900 ..M.O.N.T.H._.I.
00000010: 4400 7c00 5300 5600 4300 5f00 4400 5400 D.|.S.V.C._.D.T.
00000020: 7c00 5000 4100 5400 4900 4500 4e00 5400 |.P.A.T.I.E.N.T.
00000030: 5f00 4900 4400 7c00 5000 4100 5400 5f00 .I.D.|.P.A.T..
00000040: 5a00 4900 5000 3300 7c00 4300 4c00 4100 Z.I.P.3.|.C.L.A.
00000050: 4900 4d00 5f00 4900 4400 7c00 5300 5600 I.M._.I.D.|.S.V.
Turns out that the input file uses the UTF-16 LE Encoding (as shown by the hexdump of the content). Thus, the solution seems to be to convert the input file from UTF-16LE to UTF-8 before running AWK. Thanks

I found out (thanks to all who suggested looking at the hexdump of the input file) that the file used UTF-16LE encoding. Once I converted the input file using iconv , the AWK script worked as expected

Related

conditional awk statement to create a new field with additive value

Question
How would I use awk to create a new field that has $2+consistent value?
I am planning to cycle through a list of values but I wouldn't mind using a one liner for each command
PseudoCode
awk '$1 == Bob {$4 = $2 + 400}' file
Sample Data
Philip 13 2
Bob 152 8
Bob 4561 2
Bob 234 36
Bob 98 12
Rey 147 152
Rey 15 1547
Expected Output
Philip 13 2
Bob 152 8 408
Bob 4561 2 402
Bob 234 36 436
Bob 98 12 412
Rey 147 152
Rey 15 1547

just quote Bob, also you want to add third field not second
$ awk '$1=="Bob" {$4=$3+400}1' file | column -t
Philip 13 2
Bob 152 8 408
Bob 4561 2 402
Bob 234 36 436
Bob 98 12 412
Rey 147 152
Rey 15 1547

Here , check if $1 is equal to Bob and , reconstruct the record ($0) by appending $2 FS 400 in to $0. Here FS is the field separator used between 3rd and 4th fields. 1 in the end means tell awk to take the default action which is print.
awk '$1=="Bob"{$0=$0 FS $2 + 400}1' file
Philip 13 2
Bob 152 8 552
Bob 4561 2 4961
Bob 234 36 634
Bob 98 12 498
Rey 147 152
Rey 15 1547
Or , if you want to keep name(Bob) as variable
awk -vname="Bob" '$1==name{$0=$0 FS $2 + 400}1' file

1st solutiuon: Could you please try following too once. I am using here NF and NF+1 awk's out of the box variables. Where $NF denotes value of last column of current line and $(NF+1) will create an additional column if condition of st field stringBob` is found is TRUE.
awk '{$(NF+1)=$1=="Bob"?400+$NF:""} 1' OFS="\t" Input_file
2nd solution: In case we don't want to create a new field and simply want to print the values as per condition then try following, this should be more faster I believe.
awk 'BEGIN{OFS="\t"}{$1=$1;print $0,$1=="Bob"?400+$NF:""}' Input_file
Output will be as follows.
Philip 13 2
Bob 152 8 408
Bob 4561 2 402
Bob 234 36 436
Bob 98 12 412
Rey 147 152
Rey 15 1547
Explanation: Adding explanation for above code now.
awk ' ##Starting awk program here.
{
$(NF+1)=$1=="Bob"?400+$NF:"" ##Creating a new last field here whose value will be depending upon condition check.
##its checking condition if 1st field is having Bob string in it then add 400 value to last field value or make it NULL.
}
1 ##awk works on method of condition then action. so by mentioning 1 making condition TRUE here and NO action defined so by default print of current line will happen.
' OFS="\t" Input_file ##Setting OFS as TAB here where OFS ois output field separator and mentioning Input_file name here.

Awk value greater than 40

Can someone please help me. I'm trying to get values greater than 40, but when it's at 100 it doesn't get it.
[root#localhost home]# df -Pk --block-size=1M
Filesystem 1048576-blocks Used Available Capacity Mounted on
/dev/mapper/rhel-root 22510 13135 9375 59% /
devtmpfs 905 0 905 0% /dev
tmpfs 920 1 920 1% /dev/shm
tmpfs 920 9 911 1% /run
tmpfs 920 0 920 0% /sys/fs/cgroup
/dev/sda1 1014 178 837 18% /boot
Linux_DB2 240879 96794 144086 41% /media/sf_Linux_DB2
tmpfs 184 1 184 1% /run/user/42
tmpfs 184 1 184 1% /run/user/0
*/dev/sr0 56 56 0 100% /run/media/root/VBox_GAs_5.2.20*
[root#localhost home]# df -Pk --block-size=1M | awk '$5 > 40'
Filesystem 1048576-blocks Used Available Capacity Mounted on
/dev/mapper/rhel-root 22510 13135 9375 59% /
Linux_DB2 240879 96794 144086 41% /media/sf_Linux_DB2
The /dev/sr0 56 56 0 100% /run/media/root/VBox_GAs_5.2.20 doesn't come out.

Could you please try following once.
df -hP | awk '$5+0>40'
Explanation: Since 5th field of disk usage is having string with digits added, so by adding a zero +0 with $5 it tells awk to keep only digits in comparison and it will NOT have strings in it. Then this condition will considered like digits are getting compared, will show the right output then. Here -P option with df command is also crucial since it gives the output of df in a single line and it makes awk command's life easy to get its calculations done.

How to append the count of numbers in each line of text using awk?

I have several very large text files and would like to append the count of numbers following by a space in front of each line. Could anyone kindly suggest how to do it efficiently using Awk?
Input:
10 109 120 200 1148 1210 1500 5201
9 139 1239 1439 6551
199 5693 5695
Desired Output:
8 10 109 120 200 1148 1210 1500 5201
5 9 139 1239 1439 6551
3 199 5693 5695

You can use
awk '{print NF,$0}' input.txt
It says print number of field of the current line NF separated by current field separator , which in this case is a space then print the current line itself $0.

this will work for you:
awk '{$0=NF FS $0}7' file

Awk - how to print the number?

I have a test file:
0000 850 1300 Pump 4112 893 2400 Installing sleeve 5910 890 2202 Installing tool
Testing crankcase and Protecting oil seal Installing crankshaft
carburetor for leaks (starter side) 5910 890 2208 Installing tool, 8
0000 855 8106 Sealing plate 4112 893 2401 Press sleeve Installing hookless
Sealing exhaust port Installing oil seal snap rings in piston
0000 855 9200 Nipple (clutch side) 5910 890 2301 Screwdriver, T20
Testing carburetor for 4118 890 6400 Setting gauge Separating handle
leaks Setting air gap moldings
0000 890 1701 Testing tool kit between ignition 5910 890 2400 Screwdriver, T27x150
0000 893 2600 Clamping strap module and flywheel For all IS screw
I want to print only:
0000 850 1300
4112 893 2400
5910 890 2202
5910 890 2208
0000 855 8106
.
.
.
Thank you for your help.
EDIT:
The numbers in the file are in different places. The numbers are randomly placed in the input file. Each number is the format:
xxxx xxx xxxx
EDIT-1:
I tried two ways, but it does not work on mawk:
pic#pic:~/Pulpit$ mawk --traditional -f script.awk infile
mawk: not an option: --traditional
pic#pic:~/Pulpit$ mawk -f script.awk infile
pic#pic:~/Pulpit$

One way with grep (if your version supports the -P flag):
grep -oP "[0-9]{4} [0-9]{3} [0-9]{4}" file.txt
Output:
0000 850 1300
4112 893 2400
5910 890 2202
5910 890 2208
0000 855 8106
4112 893 2401
0000 855 9200
5910 890 2301
4118 890 6400
0000 890 1701
5910 890 2400
0000 893 2600
HTH

This is shorter and looks for the specific pattern:
mawk '
BEGIN {
d = "[0-9]"
};
{
offset = 1;
while (RSTART + RLENGTH < length($0)) {
if (! match(substr($0, offset), d d d d " " d d d " " d d d d)) {
next
};
print substr($0, RSTART+offset - 1, RLENGTH);
offset = RSTART + RLENGTH + offset
}
}' inputfile

One way using awk:
Assuming infile has the content provided in your question:
Content of script.awk:
{
## Traverse all words of the line but last two. I assume to print three
## consecutive number fields.
i = 1
while ( i <= NF - 3 ) {
## Set current word position in line.
j = i
## Get next word while current one is a digit, and save it to print later.
while ( $j ~ /^[[:digit:]]+$/ ) {
value[j] = $j
++j
}
## If found three consecutive number fields, print them and update counter of
## words in the line.
if ( i + 3 == j ) {
for ( key in value ) {
printf "%s ", value[key]
}
printf ORS
i += 3
}
else {
## Failed the search, go to next field and try again.
++i
}
## Delete array where I save numbers.
# delete value <--- Commented for compatibility with older versions.
for ( key in value ) {
delete value[key]
}
}
}
Run it like:
awk -f script.awk infile
With following output:
0000 850 1300
4112 893 2400
5910 890 2202
5910 890 2208
0000 855 8106
4112 893 2401
0000 855 9200
5910 890 2301
4118 890 6400
0000 890 1701
5910 890 2400
0000 893 2600

Linux red-hat5.4 + awk file manipulation

how to match PARAM (param=name) word in file.txt and print the lines between
NAMEx and NAMEy, via awk , as the following way :
if PARAM matched in the file.txt , then awk will print only the words between the close NAMES strings while PARAM is one of the names
remark1 PARAM can be any name as Pitter , Bob , etc.....
remark2 awk will get PARAM=(any name)
remark3 we not know how many spaces we have between (# to NAME)
more file.txt
# NAMES1
Pitter 23
Bob 75
# NAMES2
Donald 54
Josef 85
Patrick 21
# NAMES3
Tom 32
Jennifer 85
Svetlana 25
# NAMES4
examples ( regarding file.txt contents )
In case PARAM=pitter then awk will print the names to out.txt file
Pitter 23
Bob 75
In case PARAM=Josef then awk will print the names to out.txt file
Donald 54
Josef 85
Patrick 21
In case PARAM=Jennifer then awk will print the names to out.txt file
Tom 32
Jennifer 85
Svetlana 25

using RS of awk would be helpful in this case. see the test below:
testing with example
kent$ cat file
# NAMES1
Pitter 23
Bob 75
# NAMES2
Donald 54
Josef 85
Patrick 21
# NAMES3
Tom 32
Jennifer 85
Svetlana 25
# NAMES4
kent$ awk -vPARAM="Pitter" 'BEGIN{RS="# NAMES."} {if($0~PARAM)print}' file
Pitter 23
Bob 75
kent$ awk -vPARAM="Josef" 'BEGIN{RS="# NAMES."} {if($0~PARAM)print}' file
Donald 54
Josef 85
Patrick 21
kent$ awk -vPARAM="Jennifer" 'BEGIN{RS="# NAMES."} {if($0~PARAM)print}' file
Tom 32
Jennifer 85
Svetlana 25
note, there are some empty lines in output, because they existed in your input. however it would be easy to remove them from output.
update
if you have spaces between # and NAMES, you can try:
awk -vPARAM="Pitter" 'BEGIN{RS="# *NAMES."} {if($0~PARAM)print}' file

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Simple Pattern match with a field and a variable does not seem to work in GAWK/AWK - awk

I found out (thanks to all who suggested looking at the hexdump of the input file) that the file used UTF-16LE encoding. Once I converted the input file using iconv , the AWK script worked as expected

Related

conditional awk statement to create a new field with additive value

Awk value greater than 40

How to append the count of numbers in each line of text using awk?

Awk - how to print the number?

Linux red-hat5.4 + awk file manipulation

Categories

Resources