How to match new lines in awk? - awk

Input file text.txt:
foo()
{
}
buz()
{
}
Awk script.awk:
BEGIN {
RS = "\n\n+";
FS = "\n";
}
/[a-z]+\(\)\n/ {print "FUNCTION: " $1;}
{print "NOT FOUND: " $0;}
Running script:
awk -f script.awk text.txt
gives:
NOT FOUND: foo()
{
}
NOT FOUND: buz()
{
}
But I've expected to match both functions WITH newlines. How to do this?

Since you're already using "\n" as the FS, you can just do matching against $1:
awk -v RS='\n\n+' -v FS='\n' '
$1 ~ /^[a-z]+\(\)$/ {print "FUNCTION: " $1; next}
{print "NOT FOUND: " $0}
' text.txt
This worked with gawk:
FUNCTION: foo()
FUNCTION: buz()

You can try this:
BEGIN {
RS = "";
FS = "\n";
}
/[a-z]+\(\)/ {print "FUNCTION: " $1;}
!/[a-z]+\(\)/ {print "NOT FOUND: " $0;}
If you want to verify that there is nothing after the () you can do this:
$1~/[a-z]+()$/ {print "FUNCTION: " $1;}
I don't know why newline isn't matched. Maybe someone would explain it.

This might work for you (GNU awk):
awk '{if(/^[a-z]+\(\)\n/)print "FUNCTION:"$1; else print "NOT FOUND: "$0}' RS="" file

Not sure what output you expect excactly.
In order to process FS in awk you need to indroduce a dummy command like $1=$1. Without it no filed parsing is done.
So if you expect result like this:
foo() { }
buz() { }
Just type:
awk '{$1=$1; print} ' RS='\n\n' FS='\n' OFS=" "

Related

Append nextline to current line until pattern matched in awk

Input file data:
"1","123","hh
KKK,111,ll
Jk"
"2","124","jj"
Output data:
"1","123","hh KKK,111,ll jk"
"2","124","jj"
Tried below code in awk file. still not working for desired output:
BEGIN{
`FS="\",\"";
record_lock_flag=0;
total_feilds=3;
tmp_field_count=0;
tmp_rec_buff="";
lines=0;
}
{
if(NR>0)
{
if( record_lock_flag == 0 && NF == total_feilds && substr($NF,length($NF)-1,length($NF)) ~ /^"/ )
{
print $0;
}
else
{
tmp_rec_buff=tmp_rec_buff$0 ;
tmp_field_count=tmp_field_count+NF ;
if ( $0 != "")
{ lines++ ;}
rec_lock_flag=1 ;
if(tmp_field_count==exp_fields+lines-1){
print tmp_rec_buff;
record_lock_flag=0;
tmp_field_count=0;
tmp_rec_buff="";
lines=0;
}
}
}
}
END{
}`
Using any awk in any shell on every Unix box:
$ awk 'BEGIN{RS=ORS="\""} !(NR%2){gsub(/\n/," ")} 1' file
"1","123","hh KKK,111,ll Jk"
"2","124","jj"
See also What's the most robust way to efficiently parse CSV using awk?.
Using gnu-awk we can break records using text "\n" then remove \n from each record and finally append "\n" in the end using same ORS (assuming there are no blank fields with opening and closing quotes on separate lines):
awk -v RS='"\n("|$)' '{gsub(/\n/, " "); ORS=RT} 1' file
"1","123","hh KKK,111,ll Jk"
"2","124","jj"
Another version using gnu-awk if you already know number of fields in each record as shown in your question:
awk -v n=3 -v FPAT='"[^"]*"' 'p {$0 = p " " $0; p=""}
NF < n {p = $0; next} 1' file
"1","123","hh KKK,111,ll Jk"
"2","124","jj"
With your shown samples only, you could try following awk code. Written and tested with GNU awk.
awk -v RS="" -v FS="\n" '
{
for(i=1;i<=NF;i++){
sum+=gsub(/"/,"&",$i)
val=(val?val OFS:"")$i
if(sum%2==0){
print val
sum=0
val=""
}
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk -v RS="" -v FS="\n" ' ##Starting awk program from here, setting RS as NULL and field separator as new line.
{
for(i=1;i<=NF;i++){ ##Traversing through all fields here.
sum+=gsub(/"/,"&",$i) ##Globally substituting " with itself and keeping its count to sum variable.
val=(val?val OFS:"")$i ##Creating val which has current field in it and keep appending its value to it.
if(sum%2==0){ ##Checking if sum is even number then do following.
print val ##Printing val here.
sum=0 ##Setting sum to 0 here.
val="" ##Nullifying val here.
}
}
}
' Input_file ##Mentioning Input_file name here.
With awk setting ORS:
awk '{ORS = (!/"$/) ? " " : "\n"} 1' file
"1","123","hh KKK,111,ll Jk"
"2","124","jj"

add special characters in a text using awk

I have a file which is :
line1
line2
line3
What I am trying to have is
"line1"{
"line1"
}
I am trying to do this using awk but I don't know how to use the special characters. For now I have this.
awk '{ "$0" {"$0"} }'
$ awk '{$0="\""$0"\""; print $0 "{\n" $0 "\n}"}' file
"line1"{
"line1"
}
"line2"{
"line2"
}
"line3"{
"line3"
}
$ awk -v q='"' '{ql = q $0 q; print ql "{" ORS ql ORS "}" }' ip.txt
"line1"{
"line1"
}
"line2"{
"line2"
}
"line3"{
"line3"
}
-v q='"' save the double quote character in variable named q, makes it easier to insert double quotes instead of messing with escapes
ql = q $0 q this adds double quotes around the input record
ql "{" ORS ql ORS "}" required output, ORS is output record separator which is newline character by default
space between the different parameters is ignored, use " }" to get a space before } in the output
as a comparison with sed
$ sed 's/.*/"&"{\n"&"\n}/' file
"line1"{
"line1"
}
"line2"{
"line2"
}
"line3"{
"line3"
}
also another awk
$ awk -v OFS="\n" -v q='"' '{v=q $0 q; print v "{", v, "}" }' file
You can simply use printf:
awk -v fmt='"%s"{\n"%s"\n}\n' '{printf fmt, $0,$0 }' file
Test with your data:
kent$ awk -v fmt='"%s"{\n"%s"\n}\n' '{printf fmt, $0,$0 }' f
"line1"{
"line1"
}
"line2"{
"line2"
}
"line3"{
"line3"
}

awk: gsub /pattern1/, but not /pattern1pattern2/

In my work, I have to solve such a simple problem: change pattern1 to newpattern, but only if it is not followed by pattern2 or pattern3:
"pattern1 pattern1pattern2 pattern1pattern3 pattern1pattern4" → "newpattern pattern1pattern2 pattern1pattern3 newpatternpattern4"
Here is my solution, but I don't like it and I suppose there should be a more elegant and easy way to do that?
$ echo 'pattern1 pattern1pattern2 pattern1pattern3 pattern1pattern4' | awk '
{gsub(/pattern1pattern2/, "###", $0)
gsub(/pattern1pattern3/, "%%%", $0)
gsub(/pattern1/, "newpattern", $0)
gsub(/###/, "pattern1pattern2", $0)
gsub(/%%%/, "pattern1pattern3", $0)
print}'
newpattern pattern1pattern2 pattern1pattern3 newpatternpattern4
So, the sample input file:
pattern1 pattern1pattern2 aaa_pattern1pattern3 pattern1pattern4 pattern1pattern2pattern1
The sample output file should be:
newpattern pattern1pattern2 aaa_pattern1pattern3 newpatternpattern4 pattern1pattern2newpattern
This is trivial in perl, using a negative lookahead:
perl -pe 's/pattern1(?!pattern[23])/newpattern/g' file
Substitute all matches of pattern1 that are not followed by pattern2 or pattern3.
If for some reason you need to do it in awk, then here's one way you could go about it:
{
out = ""
replacement = "newpattern"
while (match($0, /pattern1/)) {
if (substr($0, RSTART + RLENGTH) ~ /^pattern[23]/) {
out = out substr($0, 1, RSTART + RLENGTH - 1)
}
else {
out = out substr($0, 1, RSTART - 1) replacement
}
$0 = substr($0, RSTART + RLENGTH)
}
print out $0
}
Consume the input while pattern1 matches and build the string out, inserting the replacement when the part after each match isn't pattern2 or pattern3. Once there are no more matches, print the string that has been build so far, followed by whatever is left in the input.
With GNU awk for the 4th arg to split():
$ cat tst.awk
{
split($0,flds,/pattern1(pattern2|pattern3)/,seps)
for (i=1; i in flds; i++) {
printf "%s%s", gensub(/pattern1/,"newpattern","g",flds[i]), seps[i]
}
print ""
}
$ awk -f tst.awk file
newpattern pattern1pattern2 aaa_pattern1pattern3 newpatternpattern4 pattern1pattern2newpattern
With other awks you can do the same with a while(match()) loop:
$ cat tst.awk
{
while ( match($0,/pattern1(pattern2|pattern3)/) ) {
tgt = substr($0,1,RSTART-1)
gsub(/pattern1/,"newpattern",tgt)
printf "%s%s", tgt, substr($0,RSTART,RLENGTH)
$0 = substr($0,RSTART+RLENGTH)
}
gsub(/pattern1/,"newpattern",$0)
print
}
$ awk -f tst.awk file
newpattern pattern1pattern2 aaa_pattern1pattern3 newpatternpattern4 pattern1pattern2newpattern
but obviously the gawk solution is simpler and more concise so, as always, get gawk!
awk solution. Nice question. Basically it's doing 2 gensubs:
$ cat tst.awk
{ for (i=1; i<=NF; i++){
s=gensub(/pattern1/, "newpattern", "g", $i);
t=gensub(/(newpattern)(pattern(2|3))/, "pattern1\\2", "g", s);
$i=t
}
}1
Testing:
echo "pattern1 pattern1pattern2 aaa_pattern1pattern3 pattern1pattern4 pattern1pattern2pattern1" | awk -f tst.awk
newpattern pattern1pattern2 aaa_pattern1pattern3 newpatternpattern4 pattern1pattern2newpattern
However, this will fail whenever you already have something like newpatternpattern2 in your input. But that's not what OP suggests with his input examples, I guess.

Using a field separator of the two-character string "\n" in awk

My input file has a plain-text representation of the newline character in it separating the fields:
First line\nSecond line\nThird line
I would expect the following to replace that text \n with a newline:
$ awk 'BEGIN { FS = "\\n"; OFS = "\n" } { print $1 }' test.txt
First line\nSecond line\nThird line
But it doesn't (gawk 4.0.1 / OpenBSD nawk 20110810).
I'm allowed to separate on just the \:
$ awk 'BEGIN { FS = "\\"; OFS = "\n" } { print $1, $2 }' test.txt
First line
nSecond line
I can also use a character class in gawk:
$ awk 'BEGIN { FS = "[[:punct:]]n"; OFS = "\n" } { $1 = $1; print $0 }' test.txt
First line
Second line
Third line
But I feel like I should be able to specify the exact separator.
A field separator is a type of regexp and when using a dynamic regexp you need to double escape everything:
$ awk 'BEGIN { FS = "\\\\n"; OFS = "\n" } { print $1 }' file
First line
See the man page for details.
Here sed might be a better tool for this task
sed 's/\\n/\n/g'

How to translate a column value in the file using awk with tr command in unix

Details:
Input file : file.txt
P123456789,COLUMN2
P123456790,COLUMN2
P123456791,COLUMN2
Expected output:
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2
If i try using a variable it is giving proper result.
(i.e) /tmp>echo "P123456789"|tr "0-9" "5-9"|tr "A-Z" "X-Z"
Z678999999
But if i do with awk command it is not giving result instead giving error:
/tmp>$ awk 'BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }' /tmp/file.txt >/tmp/file.txt.tmp
awk: BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }
awk: ^ syntax error
awk: BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }
awk: ^ syntax error
awk: BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }
awk: ^ syntax error
Can anyone help please?
just do what you wanted, without changing your logic:
awk line:
awk -F, -v OFS="," '{ "echo \""$1"\"|tr \"0-9\" \"5-9\"|tr \"A-Z\" \"X-Z\"" |getline $1}7'
with your data:
kent$ echo "P123456789,COLUMN2
P123456790,COLUMN2
P123456791,COLUMN2"|awk -F, -v OFS="," '{ "echo \""$1"\"|tr \"0-9\" \"5-9\"|tr \"A-Z\" \"X-Z\"" |getline $1}7'
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2
$ cat tst.awk
function tr(old,new,str, oldA,newA,strA,i,j) {
split(old,oldA,"")
split(new,newA,"")
split(str,strA,"")
str = ""
for (i=1;i in strA;i++) {
for (j=1;(j in oldA) && !sub(oldA[j],newA[j],strA[i]);j++)
;
str = str strA[i]
}
return str
}
BEGIN { FS=OFS="," }
{ print tr("P012345678","Z567899999",$1), $2 }
$ awk -f tst.awk file
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2
Unfortunately, AWK does not have a built in translation function. You could write one like Ed Morton has done, but I would reach for (and highly recommend) a more powerful tool. Perl, for example, can process fields using the autosplit (-a) command switch:
-a turns on autosplit mode when used with a -n or -p. An implicit split command to the #F array is done as the first thing inside the
implicit while loop produced by the -n or -p.
You can type perldoc perlrun for more details.
Here's my solution:
perl -F, -lane '$F[0] =~ tr/0-9/5-9/; $F[0] =~ tr/A-Z/X-Z/; print join (",", #F)' file.txt
Results:
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2