AWK - how to selectively modify txt file - awk

I would like to print particular 2nd field (that matches regex) of each record
awk '$2 ~ /regex1/'
BUT, ONLY specific records, that are between regex2 and regex3
awk '/regex2/,/regex3/'
other records, that are not between regex2 and regex3 shall be printed normally (all fields)
any ideas, how to put it together?
quick sample of input and output:
input
parrot milana 3 ukraine
dog husky 1 poland
cat husky 5 france
elephant malamut 5 belgium
bird husky 5 turkey
output: (show me
parrot milana 3 ukraine
dog husky 1 poland
husky
elephant malamut 5 belgium
bird husky 5 turkey
Show entire input but:
Between /dog/ and /elephant/ (show these records unchanged) show only 2nd field, which match regex /husky/
I hope this is usefull...

This:
awk '/regex2/,/regex3/'
is shorthand for
awk '/regex2/{f=1} f; /regex3/{f=0}'
The shorthand version IMHO should NEVER be used as it's brevity isn't worth the difficulty it introduces when you try to build on it with other criteria, e.g. not printing the start line and/or not printing the end line and/or introducing other REs to match within the range as you're doing now.
Given that, you're starting with this script:
awk '/dog/{f=1} f; /elephant/{f=0}'
and you want to only print the lines where you find "husky" so it's the simple, obvious tweak:
awk '/dog/{f=1} f && /husky/; /elephant/{f=0}'
EDIT: in response to changed requirements, and using a tab-separated file:
$ cat file
parrot milana 3 ukraine
dog husky 1 poland
cat husky 5 france
elephant malamut 5 belgium
bird husky 5 turkey
$ awk '
BEGIN{ FS=OFS="\t" }
/elephant/ {f=0}
{
if (f) {
if ($2 == "husky") {
print "", $2
}
}
else {
print
}
}
/dog/ {f=1}
' file
parrot milana 3 ukraine
dog husky 1 poland
husky
elephant malamut 5 belgium
bird husky 5 turkey
You can write it more briefly:
$ awk '
BEGIN{ FS=OFS="\t" }
/elephant/ {f=0}
f && /husky/ { print "", $2 }
!f
/dog/ {f=1}
' file
parrot milana 3 ukraine
dog husky 1 poland
husky
elephant malamut 5 belgium
bird husky 5 turkey
but I think the if-else syntax is clearest and easiest to modify for newcomers to awk. If you want different output formatting, look up "printf" in the manual.

infile:
$ cat input
parrot milana 3 ukraine
dog husky 1 poland
cat husky 5 france
elephant malamut 5 belgium
bird husky 5 turkey
command:
$ awk '/dog/{m=1} $2 ~ /husky/ && m{print $2} !m{print} /elephant/{m=0}' input
parrot milana 3 ukraine
husky
husky
bird husky 5 turkey

There are some ambiguities with your question, but this should do it:
awk '/regex2/ {inside=1}
/regex3/ {inside=0}
$2 ~ /regex1/ && inside {print $2}
!inside {print}' input_file

Related

Using AWK for best match replace

I have two files:
operators.txt # includes Country_code and Country_name
49 Germany
43 Austria
32 Belgium
33 France
traffic.txt # MSISDN and VLR_address (includes Country_code prefix)
123456789 491234567
123456788 432569874
123456787 333256987
123456789 431238523
I need to replace the VLR_address in traffic.txt file with Country_name from the first file.
The following awk command do that:
awk 'NR==FNR{a[$1]=$2;next} {print $1,a[$2]}' <(cat operators.txt) <(cat traffic.txt|awk '{print $1,substr($2,1,2)}')
123456789 Germany
123456788 Austria
123456787 France
123456789 Austria
but how to do it in case operators file is:
49 Germany
43 Austria
32 Belgium
33 France
355 Albania
1246 Barbados
1 USA
when country_code is not fixed length and in some case best match will apply e.g.
124612345 shall be Barbados
122018523 shall be USA
The sample input/output you provided isn't adequate to test with as it doesn't include the cases you later described as problematic but if we modify it to include a representation of those later statements:
$ head operators.txt traffic.txt
==> operators.txt <==
49 Germany
43 Austria
32 Belgium
33 France
1 USA
355 Albania
1246 Barbados
==> traffic.txt <==
123456789 491234567
123456788 432569874
123456787 333256987
123456789 431238523
foo 124612345
bar 122018523
then this may be what you want:
$ cat tst.sh
#!/usr/bin/env bash
awk '
NR==FNR {
keys[++numKeys] = $1
map[$1] = $2
next
}
{
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
if ( index($2,key) == 1 ) {
$2 = map[key]
break
}
}
print
}
' <(sort -k1,1rn operators.txt) traffic.txt
$ ./tst.sh
123456789 Germany
123456788 Austria
123456787 France
123456789 Austria
foo Barbados
bar USA
You obviously need to try a substring of the correct length.
awk 'NR==FNR{a[$2]=$1;next}
{ for (prefix in a) {
p = a[prefix]; l = length(p)
if ($2 ~ "^" p) { $2 = prefix; break } } }1' operators.txt traffic.txt
Notice how Awk itself is perfectly capable of reading files without the help of cat. You also nearly never need to pipe one Awk script into another; just refactor to put all the logic in one script.
I swapped the value of the key and the value in the NR==FNR block but that is more of a stylistic change.
And, as always, the final 1 is a shorthand idiom for printing all lines.
Perhaps as an optimization, pull the prefixes into a regular expression so that you can simply match on them all in one go, instead of looping over them.
awk 'NR==FNR{a[$1]=$2; regex = regex "|" $1; next}
FNR == 1 { regex = "^(" substr(regex, 2) ")" } # trim first "|"
match($2, regex) { $2 = a[substr($2, 1, RLENGTH)] } 1' operators.txt traffic.txt
The use of match() to pull out the length of the matched substring is arguably a complication; I wish Awk would provide this information for a normal regex match without the use of a separate dedicated function.

Adding a character to vertically on column

Input:
dog
fish
elephant
...
Output:
dog |
fish |
elephant|
... |
I want to add a "|" on the 9th character of every row
You should first space pad the lines to the max line width (eg: 8 chars as you say).
Then, you can use
sed 's/./&|/9' <padded.txt >output.txt
Hard-coding the output field width:
$ awk '{printf "%-*s|\n",8,$0}' file
dog |
fish |
elephant|
... |
or specifying the output field width as an argument:
$ awk -v wid=8 '{printf "%-*s|\n",wid,$0}' file
dog |
fish |
elephant|
... |
or dynamically determining the output field width from the input field widths:
$ awk 'NR==FNR{lgth=length($0); wid=(lgth > wid ? lgth : wid); next} {printf "%-*s|\n",wid,$0}' file file
dog |
fish |
elephant|
... |
If you need to further process the records, it might be a good idea to actually make the $0 9 chars wide:
$ awk '{$0=$0 sprintf("%" 9-length() "s","|")}1' file
Output:
dog |
fish |
elephant|
... |

Sed replace nth column of multiple tsv files without header

Here are multiple tsv files, where I want to add 'XX' characters only in the second column (everywhere except in the header) and save it to this same file.
Input:
$ls
file1.tsv file2.tsv file3.tsv
$head -n 4 file1.tsv
a b c
James England 25
Brian France 41
Maria France 18
Ouptut wanted:
a b c
James X1_England 25
Brian X1_France 41
Maria X1_France 18
I tried this, but the result is not kept in the file, and a simple redirection won't work:
# this works, but doesn't save the changes
i=1
for f in *tsv
do awk '{if (NR!=1) print $2}’ $f | sed "s|^|X${i}_|"
i=$((i+1))
done
# adding '-i' option to sed: this throws an error but would be perfect (sed no input files error)
i=1
for f in *tsv
do awk '{if (NR!=1) print $2}’ $f | sed -i "s|^|T${i}_|"
i=$((i+1))
done
Some help would be appreciated.
The second column is particularly easy because you simply replace the first occurrence of the separator.
for file in *.tsv; do
sed -i '2,$s/\t/\tX1_/' "$file"
done
If your sed doesn't recognize the symbol \t, use a literal tab (in many shells, you type it with ctrlv tab.) On *BSD (and hence MacOS) you need -i ''
AWK solution:
awk -i inplace 'BEGIN { FS=OFS="\t" } NR!=1 { $2 = "X1_" $2 } 1' file1.tsv
Input:
a b c
James England 25
Brian France 41
Maria France 18
Output:
a b c
James X1_England 25
Brian X1_France 41
Maria X1_France 18

Awk print different character on seperate line

I have been search aorund and could not find the answer... wonder if anyone can help here.
suppose I have a file contents the following:
File1:
name Joe
day Wednesday
lunch was fish
name John
dinner pie
day tuesday
lunch was noodles
name Mary
day Friday
lunch was fish pie
I wanted to grep and print only their name and what they had for lunch.
I suppose i can do
cat file1 | grep -iE 'name|lunch'
but what if i want to do a awk to just have their name and food like this output below?
Joe
fish
John
noodles
Mary
fish pie
I am aware to use awk to print, but this may require awk, is it possible for awk to lets say print $2 on one line, and print $3 on another?
Can I also output it in this format:
Person food
Joe fish
John noodles
Mary fish pie
Thanks
You can for example say:
$ awk '/name/ {print $2} /lunch/ {$1=$2=""; print}' file
Joe
fish
John
noodles
Mary
fish pie
Or remove the lunch was text:
awk '/name/ {print $2} /lunch/ {gsub("lunch was ",""); print}' file
To make the output in two columns:
$ awk -v OFS="\t" '/name/ {name=$2} /lunch/ {gsub("lunch was ",""); print name, $0}' a
Joe fish
John noodles
Mary fish pie
awk
with awk you can do it in one shot,
awk -v RS="" '{n=$2;sub(/.*lunch was\s*/,"");print n,$0}' file
Note that with this one-liner, the format of your input file should be fixed. Your data should be stored in data blocks and lunch was line should be at the end of each data block.
test with your example:
kent$ awk -v RS="" '{n=$2;sub(/.*lunch was\s*/,"");print n,$0}' file
Joe fish
John noodles
Mary fish pie
grep & sed
also you can do it in two steps, grep the values out, and merge lines
grep -Po 'name\s*\K.*|lunch was\s*\K.*' file|sed 'N;s/\n/ /'
with your input file, it outputs:
kent$ grep -Po 'name\s*\K.*|lunch was\s*\K.*' file|sed 'N;s/\n/ /'
Joe fish
John noodles
Mary fish pie

AWK associative array

Suppose I have 2 files
File-1 map.txt
1 tony
2 sean
3 jerry
4 ada
File-2 relation.txt
tony sean
jerry ada
ada sean
Expected-Output result.txt
1 2
3 4
4 2
My code was:
awk 'FNR==NR{map[$1]=$2;next;} {$1=map[$1]; $2=map[$2]; print $0}' map.txt relation.txt > output.txt
But I got the left column only:
1
3
4
It seems that something wrong near $2=map[$2].
Very appreciated if you could help.
You've got the mapping creation the wrong way around, it needs to be:
map[$2] = $1
Your current script maps numbers to names whereas what you seem to be after is a map from names to numbers.
The following transcript shows the corrected script:
pax> cat m.txt
1 tony
2 sean
3 jerry
4 ada
pax> cat r.txt
tony sean
jerry ada
ada sean
pax> awk 'FNR==NR{map[$2]=$1;next;}{$1=map[$1];$2=map[$2];print $0}' m.txt r.txt
1 2
3 4
4 2
Using awk.
awk 'FNR==NR{map[$2]=$1;next;}{print map[$1], map[$2]}' m.txt r.txt