Value of an assignment in awk, ternary operator parsing in awk - awk

I new to awk and playing around with it. When trying to use the ternary operator, at some point I wanted to execute two operations upon true condition, and as I couldn't find the syntax to do so I tried to smuggle one of the two operations inside the condition to take advantage of the lazy evaluation.
I have an input file as follow:
file.csv
A,B,C,D
1,,,
2,,,
3,,,
4,,,
5,,,
6,,,
7,,,
8,,,
And I'd like, for the sake of the exercise, to put assign B and C to 0 if A is less than 5 ; C to 1 if A is 5 or more.
I guess the ternary operator is a terrible way to do this but this is not my point.
The question is: why does the following line outputs that? How does awk parse this expression ?
awk '(FNR!=1){$1<5 && $3=0 ? $2=0 : $2=1}{print $0}' FS=, OFS=, file.csv
Output:
1,1,1,
2,1,1,
3,1,1,
4,1,1,
5,,,
6,,,
7,,,
8,,,
I was expecting the $3=0 expression to be executed and evaluated to true, and being skipped when the first part of the condition ($1<5) is false.
Expected result:
1,0,0,
2,0,0,
3,0,0,
4,0,0,
5,1,,
6,1,,
7,1,,
8,1,,
Extra question: can I actually use the ternary operator and have in it several instructions executed depending on the condition value ? Is it only a bad practice or actually impossible ?

1st solution: You should have like this code, written and tested with your shown samples and attempts. I have used ternary operators to check if value of 1st field is lesser than 5 or not and based on that setting values for 2nd and 3rd fields here.
awk '
BEGIN { FS=OFS="," }
FNR==1{
print
next
}
{
$2=($1<5?0:$1)
$3=($1<5?0:$3)
}
1
' Input_file
2nd solution(Generic approach): If you have to pass N number of fields to be checked in program then better create a function and do the checks and assignment there, using again ternary operators here for computation.
Where:
threshold is an awk variable which is assigned to 5 value by which you want to do comparison fir 1st field.
fieldCompare is again an awk variable which contains 1 in this case since we want to compare 1st field value to 5 here.
checkValue is function where field numbers(eg: 2 and 3 in this case) are being passed with comma separated values to be checked in a single shot within function.
awk -v threshold="5" -v fieldCompare="1" '
function checkValue(fields){
num=split(fields,arr,",")
for(i=1;i<=num;i++){
fieldNum = arr[i]
$fieldNum = ($fieldCompare<threshold?0:$fieldNum)
}
}
BEGIN { FS=OFS="," }
FNR==1{
print
next
}
checkValue("2,3")
1
' Input_file

If I look at the expected outcome, the 2nd field should be one.
Setting field 2 and 3 to zero if field 1 is smaller than five, else set field 2 to one.
The one at the end }1 evaluates to true and will print the whole line.
awk 'BEGIN{FS=OFS=","}(FNR!=1){($1 < 5) ? $2=$3=0 : $2=1}1' file.csv
Output
A,B,C,D
1,0,0,
2,0,0,
3,0,0,
4,0,0,
5,1,,
6,1,,
7,1,,
8,1,,

If you want to write cryptic code, this is one way to do it. You don't even need the ternary operator.
$ awk 'BEGIN {FS=OFS=","}
NR>1 {$2=$1>=5 || $3=0 }1' file
A,B,C,D
1,0,0,
2,0,0,
3,0,0,
4,0,0,
5,1,,
6,1,,
7,1,,
8,1,,

I was expecting the $3=0 expression to be executed and evaluated to true
The result of an assignment is the value assigned. Zero is false.
... and being skipped when the first part of the condition ($1<5) is false.
Since && has a higher precedence than ?:, and ?: has a higher precedence than =, awk is doing this:
$1<5 && ($3 = (0 ? $2=0 : $2=1))
When $1 < 5, if 0 is true (it is not) then assign $3 the result of $2 = 0, else assign $3 the result of $2 = 1.
When $1 >= 5, do nothing.

tested and confirmed working on mawk-1, mawk-2, gawk, and nawk :
only difference being order of precedence at 3rd section
{g,n}awk 'BEGIN { _+=_^=FS=OFS="," } NR<_ || ($_=_^_<+$!_) || $(_--)=!++_ ""'
or
mawk 'BEGIN { _+=_^=FS=OFS="," } NR<_ || ($_=_^_<+$!_) || $++_ = !--_ ""'
|
A,B,C,D
1,0,0,
2,0,0,
3,0,0,
4,0,0,
5,1,,
6,1,,
7,1,,
8,1,,
concating with empty string ("") at the tail ensures print out for a zero-value assignment

Related

Replace case of 2nd column of dataset within awk?

Im trying command
awk 'BEGIN{FS=","}NR>1{tolower(substr($2,2))} {print $0}' emp.txt
on below data but not working
- M_ID,M_NAME,DEPT_ID,START_DATE,END_DATE,Salary
M001,Richa,D001,27-Jan-07,27-Feb-07,150000
M002,Nitin,D002,16-Feb-07,16-May-07,40000
M003,AJIT,D003,8-Mar-07,8-Sep-07,70000
M004,SHARVARI,D004,28-Mar-07,28-Mar-08,120000
M005,ADITYA,D002,27-Apr-07,27-Jul-07,40000
M006,Rohan,D004,12-Apr-07,12-Apr-08,130000
M007,Usha,D003,17-Apr-07,17-Oct-07,70000
M008,Anjali,D002,2-Apr-07,2-Jul-07,40000
M009,Yash,D006,11-Apr-07,11-Jul-07,85000
M010,Nalini,D007,15-Apr-07,15-Oct-07,9999
Expected output
M_ID,M_NAME,DEPT_ID,START_DATE,END_DATE,Salary
M001,Richa,D001,27-Jan-07,27-Feb-07,150000
M002,Nitin,D002,16-Feb-07,16-May-07,40000
M003,Ajit,D003,8-Mar-07,8-Sep-07,70000
M004,Sharvari,D004,28-Mar-07,28-Mar-08,120000
M005,Aditya,D002,27-Apr-07,27-Jul-07,40000
M006,Rohan,D004,12-Apr-07,12-Apr-08,130000
M007,Usha,D003,17-Apr-07,17-Oct-07,70000
M008,Anjali,D002,2-Apr-07,2-Jul-07,40000
M009,Yash,D006,11-Apr-07,11-Jul-07,85000
M010,Nalini,D007,15-Apr-07,15-Oct-07,9999
With your shown samples in GNU awk please try following awk code. Its using GNU awk's match function, where I am using regex (^[^,]*,.)([^,]*)(.*) which is creating 3 capturing groups and storing values into an array named arr(whose indexes are 1,2,3 and so on depending upon number of capturing groups created). Then if this condition is fine then printing array elements where using tolower function to Lower the spellings on 2nd element of arr to get expected output.
awk '
FNR==1{
print
next
}
match($0,/(^[^,]*,.)([^,]*)(.*)/,arr){
print arr[1] tolower(arr[2]) arr[3]
}
' Input_file
You need to assign the result of tolower() to something, it doesn't operate in place. And in this case, you need to concatenate it with the first character of the field and assign that back to the field.
$2 = substr($2, 1, 1) tolower(substr($2, 2));
To get comma separators in the output file, you need to set OFS. So you need:
BEGIN {OFS=FS=","}
mawk, gawk, or nawk :
awk 'BEGIN { _+=_^=FS=OFS="," } NR<_ || $_ = substr( toupper($(+_)=\
tolower($_)), --_,_) substr($++_,_)'
M_ID,M_name,DEPT_ID,START_DATE,END_DATE,Salary
M001,Richa,D001,27-Jan-07,27-Feb-07,150000
M002,Nitin,D002,16-Feb-07,16-May-07,40000
M003,Ajit,D003,8-Mar-07,8-Sep-07,70000
M004,Sharvari,D004,28-Mar-07,28-Mar-08,120000
M005,Aditya,D002,27-Apr-07,27-Jul-07,40000
M006,Rohan,D004,12-Apr-07,12-Apr-08,130000
M007,Usha,D003,17-Apr-07,17-Oct-07,70000
M008,Anjali,D002,2-Apr-07,2-Jul-07,40000
M009,Yash,D006,11-Apr-07,11-Jul-07,85000
M010,Nalini,D007,15-Apr-07,15-Oct-07,9999

Conditional Performance of Matching Expression in Gawk

I'm using gawk to match database entries (simple text file, "fields" separated with ::, one line = one record). I have up to 8 variables I want to match, but the variables are based on user input, and don't necessarily exist/ are empty. My logical operator is "AND" (&&). I only want to perform a match for a particular variable if the variable exists, so that an empty variable does not return a "false" for the entire search.
For example, my variables are "date" and "reps". I've tried:
{ if ( date ) { $2 ~ date } && if ( reps ) { $3 ~ reps }}
and I've also tried:
{ if ( date ) { $2 ~ date; && if ( reps ) $3 ~ reps }}
but the "&&" gives a syntax error (there may be other problems, too, of course).
How do I (1) perform a conditional match and (2) how to I string several of those together?
__
Follow up: from answers received so far (thank you!) I can tell I didn't state my logical requirements clearly. What I'm trying to achieve on an field basis is: if the variable exists and matches, select the record; but if the variable does not exist, ignore it as a test condition. What I don't want to happen is when the variable does not exist, it still gets used as a test condition and results in the record not getting selected. (Also, I'm not concerned about the variable existing and not matching.) For an entire record, I want to use all existing variables in a cumulative basis.
You might try something like
awk '
# start by accepting all records
{ filter = 1 }
# then, individually, see if any conditions fail
filter && date && $2 !~ date { filter = 0 }
filter && reps && $3 !~ reps { filter = 0 }
# ...
# then print the record if it has passed all the conditions
filter
'
add in your mechanism to pass variables into awk.
You can string them together into one giant condition, but readability suffers
if (date && $2 ~ date && reps && $3 ~ reps && ...)
awk -F'yourfs' -v date="$DATE" -v reps="$REPS" '
BEGIN{ if(!date || !reps) exit }
$1 ~ date && $2 ~ reps
' file
How do I(...)perform a conditional match
Take look at so-called ternary operator condition?valueiftrue:valueiffalse, consider following example, let say you have file.txt as follows
0 abc
1 abc
0 def
1 def
and 1st column determines if check is to be made and check is 2nd columnd is abc then you can do
awk '$1?$2=="abc":1' file.txt
which will give
0 abc
1 abc
0 def
(tested in GNU Awk 5.0.1)

AWK script- Not showing data

I'm trying to create a variable to sum columns 26 to 30 and 32.
SO far I have this code which prints me the hearder and the output format like I want but no data is being shown.
#! /usr/bin/awk -f
BEGIN { FS="," }
NR>1 {
TotalPositiveStats= ($26+$27+$28+$29+$30+$32)
}
{printf "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%.2f %,%s,%s,%.2f %,%s,%s,%.2f %,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s, %s\n",
EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc,TotalPositiveStats
}
NR==1 {
print "EndYear,Rk,G,Date,Years,Days,Age,Tm,HOme,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc,TotalPositiveStats" }#header
Input data:
EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc
1985,1,1,10/26/1984,21,252,21.6899384,CHI,1,WSB,1,16,1,40,5,16,0.313,0,0,,6,7,0.857,1,5,6,7,2,4,5,2,16,12.5
1985,2,2,10/27/1984,21,253,21.69267625,CHI,0,MIL,0,-2,1,34,8,13,0.615,0,0,,5,5,1,3,2,5,5,2,1,3,4,21,19.4
1985,3,3,10/29/1984,21,255,21.69815195,CHI,1,MIL,1,6,1,34,13,24,0.542,0,0,,11,13,0.846,2,2,4,5,6,2,3,4,37,32.9
1985,4,4,10/30/1984,21,256,21.7008898,CHI,0,KCK,1,5,1,36,8,21,0.381,0,0,,9,9,1,2,2,4,5,3,1,6,5,25,14.7
1985,5,5,11/1/1984,21,258,21.7063655,CHI,0,DEN,0,-16,1,33,7,15,0.467,0,0,,3,4,0.75,3,2,5,5,1,1,2,4,17,13.2
1985,6,6,11/7/1984,21,264,21.72279261,CHI,0,DET,1,4,1,27,9,19,0.474,0,0,,7,9,0.778,1,3,4,3,3,1,5,5,25,14.9
1985,7,7,11/8/1984,21,265,21.72553046,CHI,0,NYK,1,15,1,33,15,22,0.682,0,0,,3,4,0.75,4,4,8,5,3,2,5,2,33,29.3
Output expected:
EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc,TotalPositiveStats
1985,1,1,10/26/1984,21,252,21.6899384,CHI,1,WSB,1,16,1,40,5,16,0.313,0,0,,6,7,0.857,1,5,6,7,2,4,5,2,16,12.5,35
1985,2,2,10/27/1984,21,253,21.69267625,CHI,0,MIL,0,-2,1,34,8,13,0.615,0,0,,5,5,1,3,2,5,5,2,1,3,4,21,19.4,34
1985,3,3,10/29/1984,21,255,21.69815195,CHI,1,MIL,1,6,1,34,13,24,0.542,0,0,,11,13,0.846,2,2,4,5,6,2,3,4,37,32.9,54
1985,4,4,10/30/1984,21,256,21.7008898,CHI,0,KCK,1,5,1,36,8,21,0.381,0,0,,9,9,1,2,2,4,5,3,1,6,5,25,14.7,38
1985,5,5,11/1/1984,21,258,21.7063655,CHI,0,DEN,0,-16,1,33,7,15,0.467,0,0,,3,4,0.75,3,2,5,5,1,1,2,4,17,13.2,29
1985,6,6,11/7/1984,21,264,21.72279261,CHI,0,DET,1,4,1,27,9,19,0.474,0,0,,7,9,0.778,1,3,4,3,3,1,5,5,25,14.9,36
1985,7,7,11/8/1984,21,265,21.72553046,CHI,0,NYK,1,15,1,33,15,22,0.682,0,0,,3,4,0.75,4,4,8,5,3,2,5,2,33,29.3,51
This script will be called like gawk -f script.awk <filename>.
Currently when calling this is the output (It seems to be calculating the variable but the rest of fields are empty)
awk is well suited to summing columns:
awk 'NR>1{$(NF+1)=$26+$27+$28+$29+$30+$32}1' FS=, OFS=, input-file > tmp
mv tmp input-file
That doesn't add a field in the header line, so you might want something like:
awk '{$(NF+1) = NR>1 ? ($26+$27+$28+$29+$30+$32) : "TotalPositiveStats"}1' FS=, OFS=,
An explanation on the issues with the current printf output is covered in the 2nd half of this answer (below).
It appears OP's objective is to reformat three of the current fields while also adding a new field on the end of each line. (NOTE: certain aspects of OPs code are not reflected in the expected output so I'm not 100% sure what OP is looking to generate; regardless, OP should be able to tweak the provided code to generate the desired result)
Using sprintf() to reformat the three fields we can rewrite OP's current code as:
awk '
BEGIN { FS=OFS="," }
NR==1 { print $0, "TotalPositiveStats"; next }
{ TotalPositiveStats = ($26+$27+$28+$29+$30+$32)
$17 = sprintf("%.3f",$17) # FG_PCT
if ($20 != "") $20 = sprintf("%.3f",$20) # 3P_PCT
$23 = sprintf("%.3f",$23) # FT_PCT
print $0, TotalPositiveStats
}
' raw.dat
NOTE: while OP's printf shows a format of %.2f % for the 3 fields of interest ($17, $20, $23), the expected output shows that the fields are not actually being reformatted (eg, $17 remains %.3f, $20 is an empty string, $23 remains %.2f); I've opted to leave $20 as blank otherwise reformat all 3 fields as %.3f; OP can modify the sprintf() calls as needed
This generates:
EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc,TotalPositiveStats
1985,1,1,10/26/1984,21,252,21.6899384,CHI,1,WSB,1,16,1,40,5,16,0.313,0,0,,6,7,0.857,1,5,6,7,2,4,5,2,16,12.5,40
1985,2,2,10/27/1984,21,253,21.69267625,CHI,0,MIL,0,-2,1,34,8,13,0.615,0,0,,5,5,1.000,3,2,5,5,2,1,3,4,21,19.4,37
1985,3,3,10/29/1984,21,255,21.69815195,CHI,1,MIL,1,6,1,34,13,24,0.542,0,0,,11,13,0.846,2,2,4,5,6,2,3,4,37,32.9,57
1985,4,4,10/30/1984,21,256,21.7008898,CHI,0,KCK,1,5,1,36,8,21,0.381,0,0,,9,9,1.000,2,2,4,5,3,1,6,5,25,14.7,44
1985,5,5,11/1/1984,21,258,21.7063655,CHI,0,DEN,0,-16,1,33,7,15,0.467,0,0,,3,4,0.750,3,2,5,5,1,1,2,4,17,13.2,31
1985,6,6,11/7/1984,21,264,21.72279261,CHI,0,DET,1,4,1,27,9,19,0.474,0,0,,7,9,0.778,1,3,4,3,3,1,5,5,25,14.9,41
1985,7,7,11/8/1984,21,265,21.72553046,CHI,0,NYK,1,15,1,33,15,22,0.682,0,0,,3,4,0.750,4,4,8,5,3,2,5,2,33,29.3,56
NOTE: in OP's expected output it appears the last/new field (TotalPositiveStats) does not contain the value from $30 hence the mismatch between the expected results and this answer; again, OP can modify the assignment statement for TotalPositiveStats to include/exclude fields as needed
Regarding the issues with the current printf ...
{printf "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%.2f %,%s,%s,%.2f %,%s,%s,%.2f %,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s, %s\n",
EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc,TotalPositiveStats}
... is referencing (awk) variables that have not been defined (eg, EndYear, Rk, G). [NOTE: one exeception is the very last variable in the list - TotalPositiveStats - which has in fact been defined earlier in the script.]
The default value for undefined variables is the empty string ("") or zero (0), depending on how the awk code is referencing the variable, eg:
printf "%s", EndYear => EndYear is treated as a string and the printed result is an empty string; with an output field delimiter of a comma (,) this empty strings shows up as 2 commas next to each other (,,)
printf "%.2f %", FG_PCT => FG_PCT is treated as a numeric (because of the %f format) and the printed result is 0.00 %
Where it gets a little interesting is when the (undefined) variable name starts with a numeric (eg, 3P) in which case the P is ignored and the entire reference is treated as a number, eg:
printf "%s", 3P => 3P is processed as 3 and the printed result is 3
This should explain the 5 static values (0.00 %, 3, 3, 3.00 % and 0.00 %) printed in all output lines as well as the 'missing' values between the rest of the commas (eg, ,,,,).
Obviously the last value in the line is an actual number, ie, the value of the awk variable TotalPositiveStats.

awk to get result of multiple lines in one sentence with if statement

I am new to awk and I was wondering if I could get one single result for an if operation on awk.
Example:
cat example.txt:
0
0
0
0
0
awk '{ if ($1==0) print "all zeros"; else print "there is 1"}'
result:
all zeros
all zeros
all zeros
all zeros
all zeros
I would like to have only one all zeros as answer or a TRUE . Is this the case where I should use an awk function to return something ? Thanks
Have your code in this way. Written and tested with shown samples.
awk '$0==0{count++} END{if(count==FNR){print "TRUE"}}' Input_file
OR
awk '$0==0{count++} END{if(count==FNR){print "All lines are zeroes"}}' Input_file
OR to print a message when some non-zero line(s) found:
awk '$0==0{count++} END{if(count==FNR){print "TRUE"} else{print "There is non-zero line(s) found."}}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
$0==0{ ##Checking condition if current line is zero then do following.
count++ ##Increasing count with 1 here.
}
END{ ##Starting END block of this program from here.
if(count==FNR){ ##Checking condition if count is equal to number of total lines of file.
print "TRUE" ##If above condition is TRUE then print TRUE here.
}
}
' Input_file ##Mentioning Input_file name here.
Here is an alternative using gnu-awk:
awk -v RS='^(0\r?\n)+$' '{print (NF ? "there is 1" : "all zeros")}' file
all zeros
I would do it following way using GNU AWK let file.txt content be
0
0
0
0
0
then
awk '{nonzero = nonzero || $1!="0"}END{print nonzero?"has not zero":"all zeros"}' file.txt
output
all zeros
Explanation: I am using nonzero to store information if some non-zero was already save (value 1) or not (value 0). If you are about variable which is not set in awk in arithmetic context then its value is 0, so I do not need to declare nonzero=0 in BEGIN section. I harness || which is logical or and might be described as follows:
if you did not see non-zero earlier and do not see it now that mean there is not non-zero element so far (0 || 0 == 0)
if you did not see non-zero earlier and do see it now that mean there is non-zero element so far (0 || 1 == 1)
if you did see non-zero earlier and do not see it now that mean there is non-zero element so far (1 || 0 == 1)
if you did see non-zero eralier and do see it now that mean there is non-zero element so far (1 || 1 == 1)
After processing all lines, in END section I print either has not zero or all zeros depending on nonzero value and harnessing ternary.
(tested in gawk 4.2.1)
Another:
$ awk '$0{exit v=1}END{printf "All %szeroes\n",(v?"not ":"")}' file
Output with sample data:
All zeroes
Alternative output:
All not zeroes
Explained:
$ awk '
$0 { # if record evaluates to non-zero
exit v=1 # jump to END with "parameter" 1
} # why continue once non-zero seen
END {
printf "All %szeroes\n",(v?"not ":"") # if "parameter" v was set, output "not"
}' file
The condition to examine $0 could of course be something more specific (like $0=="0") but it's sufficient for this purpose. exit v=1 sets var v to value 1 but it also exits the program once a non-zero value of $0 has been found and jumps to END where the value of v is examined. The program finally exits with exit code 1. If that is not acceptable, you need to exit from END explicitly with exit 0.

awk: first, split a line into separate lines; second, use those new lines as a new input

Let's say I have this line:
foo|bar|foobar
I want to split it at | and then use those 3 new lines as the input for the further proceedings (let's say replace bar with xxx).
Sure, I can pipe two awk instances, like this:
echo "foo|bar|foobar" | awk '{gsub(/\|/, "\n"); print}' | awk '/bar/ {gsub(/bar/, "xxx"); print}'
But how I can achieve this in one script? First, do one operation on some input, and then treat the result as the new input for the second operation?
I tried something like this:
echo "foo|bar|foobar" | awk -v c=0 '{
{
gsub(/\|/, "\n");
sprintf("%s", $0);
}
{
if ($0 ~ /bar/) {
c+=1;
gsub(/bar/, "xxx");
print c;
print
}
}
}'
Which results in this:
1
foo
xxx
fooxxx
And thanks to the counter c, it's absolutely obvious that the subsequent if doesn't treat the multi-line input it receives as several new records but instead just as one multi-lined record.
Thus, my question is: how to tell awk to treat this new multi-line record it receives as many single-line records?
The desired output in this very example should be something like this if I'm correct:
1
xxx
2
fooxxx
But this is just an example, the question is more about the mechanics of such a transition.
I would suggest an alternative approach using split() where you can just split the elements based on the delimiter into an array and iterate over its fields, Instead of working on a single multi line string.
echo "foo|bar|foobar" |\
awk '{
count = 0
n = split($0, arr, "|")
for ( i = 1; i <= n; i++ )
{
if ( arr[i] ~ /bar/ )
{
count += sub(/bar/, "xxx", arr[i])
print count
print arr[i]
}
}
}'
Also you don't need an explicit increment of count variable, sub() returns the number of substitutions made on the source string. You can just increment to the existing value of count.
As one more level of optimization, you can get rid of the ~ match in the if condition and directly use the sub() function there
if ( sub(/bar/, "xxx", arr[i]) )
{
count++
print count
print arr[i]
}
If you set the record separator (RS) to the pipe character, you almost get the desired effect, e.g.:
echo 'foo|bar|foobar' | awk -v RS='|' 1
Output:
foo
bar
foobar
[...an empty line
Except that a new-line character becomes part of the last field, so there is an extra line at the end of the output. You can work around this by either including a new-line in the RS variable, making it less portable, or avoid sending new-lines to awk.
For example using the less portable way:
echo 'foo|bar|foobar' | awk -v RS='\\||\n' '{ sub(/bar/, "baz") } 1'
Output:
foo
baz
foobaz
Note that the empty record at the end is ignored.
With GNU awk:
$ awk -v RS='[|\n]' 'gsub(/bar/,"xxx"){print ++c ORS $i}' file
1
xxx
2
fooxxx
With any awk:
$ awk -F'|' '{c=0; for (i=1;i<=NF;i++) if ( gsub(/bar/,"xxx",$i) ) print ++c ORS $i }' file
1
xxx
2
fooxxx