Awk - how to print the number? - awk

I have a test file:
0000 850 1300 Pump 4112 893 2400 Installing sleeve 5910 890 2202 Installing tool
Testing crankcase and Protecting oil seal Installing crankshaft
carburetor for leaks (starter side) 5910 890 2208 Installing tool, 8
0000 855 8106 Sealing plate 4112 893 2401 Press sleeve Installing hookless
Sealing exhaust port Installing oil seal snap rings in piston
0000 855 9200 Nipple (clutch side) 5910 890 2301 Screwdriver, T20
Testing carburetor for 4118 890 6400 Setting gauge Separating handle
leaks Setting air gap moldings
0000 890 1701 Testing tool kit between ignition 5910 890 2400 Screwdriver, T27x150
0000 893 2600 Clamping strap module and flywheel For all IS screw
I want to print only:
0000 850 1300
4112 893 2400
5910 890 2202
5910 890 2208
0000 855 8106
.
.
.
Thank you for your help.
EDIT:
The numbers in the file are in different places. The numbers are randomly placed in the input file. Each number is the format:
xxxx xxx xxxx
EDIT-1:
I tried two ways, but it does not work on mawk:
pic#pic:~/Pulpit$ mawk --traditional -f script.awk infile
mawk: not an option: --traditional
pic#pic:~/Pulpit$ mawk -f script.awk infile
pic#pic:~/Pulpit$

One way with grep (if your version supports the -P flag):
grep -oP "[0-9]{4} [0-9]{3} [0-9]{4}" file.txt
Output:
0000 850 1300
4112 893 2400
5910 890 2202
5910 890 2208
0000 855 8106
4112 893 2401
0000 855 9200
5910 890 2301
4118 890 6400
0000 890 1701
5910 890 2400
0000 893 2600
HTH

This is shorter and looks for the specific pattern:
mawk '
BEGIN {
d = "[0-9]"
};
{
offset = 1;
while (RSTART + RLENGTH < length($0)) {
if (! match(substr($0, offset), d d d d " " d d d " " d d d d)) {
next
};
print substr($0, RSTART+offset - 1, RLENGTH);
offset = RSTART + RLENGTH + offset
}
}' inputfile

One way using awk:
Assuming infile has the content provided in your question:
Content of script.awk:
{
## Traverse all words of the line but last two. I assume to print three
## consecutive number fields.
i = 1
while ( i <= NF - 3 ) {
## Set current word position in line.
j = i
## Get next word while current one is a digit, and save it to print later.
while ( $j ~ /^[[:digit:]]+$/ ) {
value[j] = $j
++j
}
## If found three consecutive number fields, print them and update counter of
## words in the line.
if ( i + 3 == j ) {
for ( key in value ) {
printf "%s ", value[key]
}
printf ORS
i += 3
}
else {
## Failed the search, go to next field and try again.
++i
}
## Delete array where I save numbers.
# delete value <--- Commented for compatibility with older versions.
for ( key in value ) {
delete value[key]
}
}
}
Run it like:
awk -f script.awk infile
With following output:
0000 850 1300
4112 893 2400
5910 890 2202
5910 890 2208
0000 855 8106
4112 893 2401
0000 855 9200
5910 890 2301
4118 890 6400
0000 890 1701
5910 890 2400
0000 893 2600

Related

Simple Pattern match with a field and a variable does not seem to work in GAWK/AWK

I am trying to extract all lines where a field matches a pattern which is defined as a variable.
I tried the following
head input.dat |
awk -F '|' -v CODE="39905|19043" '{print $13; if($13~CODE){print "Matched"} else {print "Nomatch"} }'
I am printing the value of the field before attempting a pattern match.(This way I don't have to show the entire line that contains many fields)
This is the output I got.
PLAN_ID
Nomatch
39905
Nomatch
39905
Nomatch
39883
Nomatch
19043
Nomatch
2215
Nomatch
19043
Nomatch
9149
Nomatch
42718
Nomatch
24
Nomatch
I expected to see at least 3 instances of Matched in the output. What am I doing wrong?
edit by #Fravadona
xxd input.dat | head -n 6
00000000: fffe 4d00 4f00 4e00 5400 4800 5f00 4900 ..M.O.N.T.H._.I.
00000010: 4400 7c00 5300 5600 4300 5f00 4400 5400 D.|.S.V.C._.D.T.
00000020: 7c00 5000 4100 5400 4900 4500 4e00 5400 |.P.A.T.I.E.N.T.
00000030: 5f00 4900 4400 7c00 5000 4100 5400 5f00 .I.D.|.P.A.T..
00000040: 5a00 4900 5000 3300 7c00 4300 4c00 4100 Z.I.P.3.|.C.L.A.
00000050: 4900 4d00 5f00 4900 4400 7c00 5300 5600 I.M._.I.D.|.S.V.
Turns out that the input file uses the UTF-16 LE Encoding (as shown by the hexdump of the content). Thus, the solution seems to be to convert the input file from UTF-16LE to UTF-8 before running AWK. Thanks
I found out (thanks to all who suggested looking at the hexdump of the input file) that the file used UTF-16LE encoding. Once I converted the input file using iconv , the AWK script worked as expected

awk getting good distribution of random integer values between 2 inputs

How to get a good distribution of random integer values between 2 inputs using awk?.
I'm trying the below
$ awk -v min=200 -v max=500 ' BEGIN { srand();for(i=0;i<10;i++) print int(min+rand()*100*(max/min)) } '
407
406
360
334
264
365
303
417
249
225
$
Is there a better way in awk
Sorry to inform you that your code is not even correct. Try with min=max=10.
Something like this will work. You can verify the uniformity as well.
$ awk -v min=200 -v max=210 ' BEGIN{srand();
for(i=0;i<10000;i++) a[int(min+rand()*(max-min))]++;
for(k in a) print k,a[k]}' | sort
200 1045
201 966
202 1014
203 1016
204 985
205 1010
206 988
207 1027
208 986
209 963
Note also that min is inclusive but max is not.

output matching column from multiple input in awk

Assumes there are some data from these two input which I only want, which is "A" from inputA.txt and "B" from inputB.txt
==> inputA.txt <==
A 10214027 6369158
A 10214028 6369263
A 10214029 6369321
A 10214030 6369713
A 10214031 6370146
A 10214032 6370553
A 10214033 6370917
A 10214034 6371322
A 10214035 6371735
A 10214036 6372136
So I only want the data with A's
==> inputB.txt <==
B 50015214 5116941
B 50015215 5116767
B 50015216 5116577
B 50015217 5116409
B 50015218 5116221
B 50015219 5116044
B 50015220 5115845
B 50015221 5115676
B 50015222 5115512
B 50015223 5115326
Same goes here, only want B's
and I've built the script, but it's been doubled due to using multiple inputs.
#!/bin/awk -f
BEGIN{
printf "Column 1\tColumn 2\tColumn 3"
}
/^A/{
c=substr($2,1,4)
d=substr($2,5,3)
e=substr($3,1,4)
f=substr($3,5,3)
}
{
printf "%4.1f %4.1f %4.1f %4.1f\n",c,d,e,f > "outputA.txt"
}
/^B/{
c=substr($2,1,4)
d=substr($2,5,3)
e=substr($3,1,4)
f=substr($3,5,3)
}
{
printf "%4.1f %4.1f %4.1f %4.1f\n",c,d,e,f > "outputB.txt"
}
Let me know your thought on this.
Expected output
==> outputA.txt <==
Column 1 Column 2 Column 3 Column 4
1021 4027 6369 158
1021 4028 6369 263
1021 4029 6369 321
1021 4030 6369 713
1021 4031 6370 146
1021 4032 6370 553
1021 4033 6370 917
1021 4034 6371 322
1021 4035 6371 735
1021 4036 6372 136
==> outputB.txt <==
Column 1 Column 2 Column 3 Column 4
5001 5214 5116 941
5001 5215 5116 767
5001 5216 5116 577
5001 5217 5116 409
5001 5218 5116 221
5001 5219 5116 044
5001 5220 5115 845
5001 5221 5115 676
5001 5222 5115 512
5001 5223 5115 326
With GNU awk and FIELDWIDTHS:
awk 'BEGIN{FIELDWIDTHS="1 1 4 4 1 4 3"}
{out="output" $1 ".txt"}
FNR==1{print "Column 1 Column 2 Column 3 Column 4" >out}
{print $3,$4,$6,$7 >out}' inputA.txt inputB.txt
Use FIELDWIDTHS to split current row to seven columns. out contains name of new file. If first row of current file is reached print header to new file. For every row print four columns to new file.
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
Could you please try following.
awk '
FNR==1{
sub(/[a-z]+/,"",FILENAME)
file="output"FILENAME".txt"
print "Column 1 Column 2 Column 3 Column 4" > (file)
}
{
print substr($0,3,4),substr($0,7,4),substr($0,12,4),substr($0,16,3) > (file)
}
' inputA inputB
Explanation:
awk ' ##Starting awk program here.
FNR==1{ ##Checking condition if FNR==1, line number is 1 then do following.
sub(/[a-z]+/,"",FILENAME) ##Substituting all small letters from file name with NULL.
file="output"FILENAME".txt" ##Creating variable file whose value is string output FILENAME and .txt
print "Column 1 Column 2 Column 3 Column 4" > (file) ##Printing headers to output file.
}
{
print substr($0,3,4),substr($0,7,4),substr($0,12,4),substr($0,16,3) > (file) ##Printing substrings values as per OP need to output files.
}
' inputA inputB ##Mentioning multiple Input_file names here.
You don't need substr here. Empty out the first field, insert a space after every four digits, force awk to reparse fields and then print:
awk '$1=="A"{
$1=""
gsub(/[0-9]{4}/,"& ")
$1=$1
print
}' inputA.txt
Its output:
1021 4027 6369 158
1021 4028 6369 263
1021 4029 6369 321
1021 4030 6369 713
1021 4031 6370 146
1021 4032 6370 553
1021 4033 6370 917
1021 4034 6371 322
1021 4035 6371 735
1021 4036 6372 136
Obviously this works with only one input but I believe referring to other answers you can tweak it to work with multiple files
just keep it simple :
${...input_data...} |
{m,g,n}awk 'gsub(" ....", "& ")^_'
A 1021 4027 6369 158
A 1021 4028 6369 263
A 1021 4029 6369 321
A 1021 4030 6369 713
A 1021 4031 6370 146
A 1021 4032 6370 553
A 1021 4033 6370 917
A 1021 4034 6371 322
A 1021 4035 6371 735
A 1021 4036 6372 136
B 5001 5214 5116 941
B 5001 5215 5116 767
B 5001 5216 5116 577
B 5001 5217 5116 409
B 5001 5218 5116 221
B 5001 5219 5116 044
B 5001 5220 5115 845
B 5001 5221 5115 676
B 5001 5222 5115 512
B 5001 5223 5115 326

How to append the count of numbers in each line of text using awk?

I have several very large text files and would like to append the count of numbers following by a space in front of each line. Could anyone kindly suggest how to do it efficiently using Awk?
Input:
10 109 120 200 1148 1210 1500 5201
9 139 1239 1439 6551
199 5693 5695
Desired Output:
8 10 109 120 200 1148 1210 1500 5201
5 9 139 1239 1439 6551
3 199 5693 5695
You can use
awk '{print NF,$0}' input.txt
It says print number of field of the current line NF separated by current field separator , which in this case is a space then print the current line itself $0.
this will work for you:
awk '{$0=NF FS $0}7' file

How to number the lines according to a field with awk?

I wonder whether there is a way using awk to number the lines according to a field. For example,
Input
2334 332
2334 546
2334 675
7890 222
7890 134
234 45
.
.
.
Based on the 1st field, I would have the following output
Output
1 2334 332
1 2334 546
1 2334 675
2 7890 222
2 7890 134
3 234 45
.
.
.
I would be grateful for your help.
Cheers,
T
here's how,
awk '!a[$1]++{c++}{print c, $0}' file
1 2334 332
1 2334 546
1 2334 675
2 7890 222
2 7890 134
3 234 45
awk 'last != $1 { line = line + 1 } { last = $1; print line, $0 }'