what does it means in awk script? `awk -F "|" '{!a[$1]++}{printf RS $1}{print FS $2}' input.txt` - awk

I need meaning of below code in unix, and help me go forward..
`awk -F "|" '{!a[$1]++}{printf RS $1}{print FS $2}' input.txt`
My sample i/p file is like below
1|Balaji 1|Kumar 3|India 3|China 3|Australia 1|Dinesh
I need o/p like below
1|Balaji|Kumar|Dinesh 3|India|China|Australia

I won't explain the awk line in your question. because it doesn't make much sense:
created array a[], but never use
wrong usage of RS, FS
try this one-liner:
awk -F'[| ]' '{for(i=1;i<=NF;i++)if(i%2)a[$i]=a[$i]?a[$i]"|"$(i+1):$(i+1)}
END{for(x in a) printf x"|"a[x]" ";print ""}' file
with your example:
kent$ echo "1|Balaji 1|Kumar 3|India 3|China 3|Australia 1|Dinesh"|awk -F'[| ]' '{for(i=1;i<=NF;i++)if(i%2)a[$i]=a[$i]?a[$i]"|"$(i+1):$(i+1)}END{for(x in a) printf x"|"a[x]" ";print ""}'
1|Balaji|Kumar|Dinesh 3|India|China|Australia
Note that there would be an ending space, it could be removed in the END loop.

Surprisingly, it can be change to simply. I am not sure why !a[$1]++ is written inside that.Its obsolete overe there:
awk -F "|" '{printf RS $1}{print FS $2}' input.txt
it will print first the record separator which is newline and then $1 which is the first field and then the field separator which is "|" and then the second field $2 and then a newline(since the statement is print. If printf is used newline will not be printed).
Based on your comment, below should work:
awk '{
for(i=1;i<=NF;i++){split($i,a,"|");
b[a[1]]?b[a[1]]=b[a[1]]" "a[2]:b[a[1]]=a[2]
}
for(j in b)printf j"|"b[j]" ";
print"";}' your_file

Changing record selector makes it easy to read this data. It have only a small bug that I do not see how to solve, it prints it on two line.
awk -F\| '{a[$1]=a[$1]?a[$1]"|"$2:$2} END{for(i in a) printf i"|"a[i]" "}' RS=" " file
1|Balaji|Kumar|Dinesh
3|India|China|Australia
New version with correct output, thanks to Birei
awk -F\| '{sub(/\n/,x, $0); a[$1]=a[$1]?a[$1]"|"$2:$2} END{for(i in a) printf i"|"a[i]" "}' RS=" "
1|Balaji|Kumar|Dinesh 3|India|China|Australia

Related

How to use awk to find the line starting with a variable

I know 2 things about awk:
1.
PAT='aGeneName'
awk -v var="$PAT" '$3 ~ var {print $0}' file.txt # will print the line where 3rd field includes the variable $PAT
2.
awk '$3 ~ /^aGeneName/' file.txt # will print the line where 3rd field starts with string "aGeneName"
But what I want is the combination of these two: I want to print the line where the 3rd field starts with the variable $PAT, something like
PAT='aGeneName'
awk -v var="$PAT" '$3 ~ /^var/ {print $0}' file.txt # but this is wrong, since variable can't be put into //
One way is like this:
PAT='aGeneName'
awk -v var="$PAT" '$3 ~ "^" var {print $0}' file.txt
And the {print $0} can be saved here, it's implied.
Another way, when the pattern var is a simple string, no RegEX character inside:
PAT='aGeneName'
awk -v var="$PAT" 'index($3, var)==1' file.txt

Awk variables concatenated with symbols

I am writing a simple bash loop where I use awk to grab lines from a file. The pattern is iterated over in the loop. I can get the program to work fine until I try to add symbols to the variable used in awk for the search pattern
WORKING PROGRAM (first search term is "cat")
list="cat dog"
for k in $list
do
vark="$k"
awk '/'$vark'/{print RS $0}' RS=\> FILE1 > FILE2
done
But when I try to add the symbols "|" ,"<", and "_" to the variable the loop breaks. I have tried with multiple seperators for the symbols but can not seem to get it to correctly integrate the symbols into the variable.
BROKEN PROGRAM (first search term is "|cat>_")
list="cat dog"
for k in $list
do
vark="$k"
varkk="|$k>_"
awk '/'$varkk'/{print RS $0}' RS=\> FILE1 > FILE2
done
Thank you so much for your help!
Correct way to pass shell/bash variable to awk is using -v option, like -v varname="$shell_var"
From
awk '/'$varkk'/{print RS $0}' RS=\> FILE1 > FILE2
To
awk -v regexp="$vark" -v RS='\>' '$0 ~ regexp{print RS $0}' FILE1 > FILE2
You can also do this, using awk itself
awk '
FNR==NR{ arr[$0]; next }
{ for(i in arr)if($0 ~ i){ print RS $0; next} }
' pattern_file RS='\>' FILE1 >file2

awk, print all columns and add new column with substr

I have this table
USI,Name,2D-3D
RO0001,Patate,2D
RO0002,Haricot,3D
RO0003,Banane,2D
RO0004,Pomme,2D
RO0005,Poire,2D
and I want this
USI,Name,2D-3D
RO0001,Patate,2D,RO_2D_Patate
RO0002,Haricot,3D,RO_3D_Haricot
RO0003,Banane,2D,RO_2D_Banane
RO0004,Pomme,2D,RO_2D_Pomme
RO0005,Poire,2D,RO_2D_Poire
I manage to obtain the construction "RO_2D_Patate" with awk
awk -F "," '{print substr($1,1,2)"_"substr($3,1,2)"_"$2}' Test4.txt
But I want to print all my column $0 before as my second table.
I tried everything But I am still a novice !!!!
Any idea over there?
awk -F, '{print $0 (NR>1 ? FS substr($1,1,2)"_"$3"_"$2 : "")}' Test4.txt
$ awk -F, -v OFS=, 'NR>1{$4=substr($1,1,2)"_"$3"_"$2}1' Test4.txt
USI,Name,2D-3D
RO0001,Patate,2D,RO_2D_Patate
RO0002,Haricot,3D,RO_3D_Haricot
RO0003,Banane,2D,RO_2D_Banane
RO0004,Pomme,2D,RO_2D_Pomme
RO0005,Poire,2D,RO_2D_Poire
awk -F, 'NR>1{print $0,substr($1,1,2)"_"$NF"_"$2}/USI/' OFS=, file
USI,Name,2D-3D
RO0001,Patate,2D,RO_2D_Patate
RO0002,Haricot,3D,RO_3D_Haricot
RO0003,Banane,2D,RO_2D_Banane
RO0004,Pomme,2D,RO_2D_Pomme
RO0005,Poire,2D,RO_2D_Poire

awk first line not working removing columns

I'm trying to remove columns beyond number 26 from all lines of a file, using this code:
awk '{ FS = ";" ; for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}'
It is working well in all the lines but for the first one, where it shows 2 more fields (and cuts the last in two).
Is there anything wrong in my code?
Thanks a lot
This is because you set FS on every line, while it should be in a BEGIN{} block (or outside as a parameter, like others answers correctly suggest):
awk 'BEGIN{FS=";"} {for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}' file
In fact, to accomplish your goal it is easier to use cut:
cut -d';' -f-26 file
^ ^^^
| all fields up to the 26th
delimiter
Example with 4 cols
sample file:
$ cat a
1col1;col2;col3;col4;col5;col6
2col1;col2;col3;col4;col5;col6
3col1;col2;col3;col4;col5;col6
previous code:
$ awk '{FS=";"; for(i=1;i<NF;i++) if (i<4) printf $i FS}{print $4}' a
2col1;col2;col3;col4
3col1;col2;col3;col4
new code:
$ awk 'BEGIN{FS=";"} {for(i=1;i<NF;i++) if (i<4) printf $i FS}{print $4}' a
1col1;col2;col3;col4
2col1;col2;col3;col4
3col1;col2;col3;col4
with cut:
$ cut -d';' -f-4 a
1col1;col2;col3;col4
2col1;col2;col3;col4
3col1;col2;col3;col4
You can try this awk,
awk -F';' 'NF>26{NF=26}1' OFS=';' yourfile
#fedorqui is right.
But you can also use this to set Field Separator :
awk -F";" '{for(i=1;i<NF;i++) if (i<26) printf $i FS}{print $26}' file

awk to convert "{(linefeed)" to just "{"

I would like use awk to convert "{(linefeed)" to just "{"
I tried w/o success
awk '{gsub("{\n", "{")} input >output;
any sensible descriptive solutions...?
Use GNU awk for multi-char RS to let you read the whole file at once:
awk -v RS='^$' -v ORS= '{gsub(/{\n/, "{")} 1' input >output
Your problem is that, unlike perl, the record separator does not appear in the record.
If that last character on the line is an open brace, print without a newline, else print with a newline.
awk '/{$/ {printf "%s", $0; next} 1' file
or,
awk '{printf "%s%s", $0, /{$/ ? "" : ORS}' file