Join lines into one line using awk - awk

I have a file with the following records
ABC
BCD
CDE
EFG
I would like to convert this into
'ABC','BCD','CDE','EFG'
I attempted to attack this problem using Awk in the following way:
awk '/START/{if (x)print x;x="";next}{x=(!x)?$0:x","$0;}END{print x;}'
but I obtain not what I expected:
ABC,BCD,CDE,EFG
Are there any suggestions on how we can achieve this?

Could you please try following.
awk -v s1="'" 'BEGIN{OFS=","} {val=val?val OFS s1 $0 s1:s1 $0 s1} END{print val}' Input_file
Output will be as follows.
'ABC','BCD','CDE','EFG'

With GNU awk for multi-char RS:
$ awk -v RS='\n$' -F'\n' -v OFS="','" -v q="'" '{$1=$1; print q $0 q}' file
'ABC','BCD','CDE','EFG'

There are many ways of achieving this:
with pipes:
sed "s/.*/'&'/" <file> | paste -sd,
awk '{print '"'"'$0'"'"'}' <file> | paste -sd,
remark: we do not make use of tr here as this would lead to an extra , at the end.
reading the full file into memory:
sed ':a;N;$!ba;s/\n/'"','"'/g;s/.*/'"'&'"'/g' <file> #POSIX
sed -z 's/^\|\n$/'"'"'/g;s/\n/'"','"'/g;' <file> #GNU
and the solution of #EdMorton
without reading the full file into memory:
awk '{printf (NR>1?",":"")"\047"$0"\047"}' <file>
and some random other attempts:
awk '(NR-1){s=s","}{s=s"\047"$0"\047"}END{print s}' <file>
awk 'BEGIN{printf s="\047";ORS=s","s}(NR>1){print t}{t=$0}END{ORS=s;print t} <file>
So what is going on with the OP's attempts?
Writing down the OP's awk line, we have
/START/{if (x)print x;x="";next}
{x=(!x)?$0:x","$0;}
END{print x;}
What does this do? Let us analyze step by step:
/START/{if (x)print x;x="";next}:: This reads If the current record/line contains the string START, then do
if (x) print x:: if x is not an empty string, print the value of x
x="" set x to be an empty string
next:: skip to the next record/line
In this code block, the OP probably assumed that /START/ means do this at the beginning of all things. In awk, this is however written as BEGIN and since in the beginning, all variables are empty strings or zero, the if statement is not executed by default. This block could be replaced by:
BEGIN{x=""}
But again, this is not needed and thus one can remove it:
{x=(!x)?$0:x","$0;}:: concatenate the string with the correct delimiter. This is good, especially due to the usage of the ternary operator. Sadly the delimiter is set to , and not ',' which in awk is best written as \047,\047. So the line could read:
{x=(!x)?$0:x"\047,\047"$0;}
This line, can be written shorter if you realize that x could be an empty string. For an empty string, x=$0 is equivalent to x=x $0 and all you want to do is add a separator which all or not could be an empty string. So you can write this as
{x= x ((!x)?"":"\047,\047") $0}
or inverting the logic to get rid of some more characters:
{x=x(x?"\047,\047":"")$0}
one could even write
{x=x(x?"\047,\047":x)$0}
but this is not optimal as it needs to read what is the memory of x again. However, this form can be used to finally optimize it to (per #EdMorton's comment)
{x=(x?x"\047,\047":"")$0}
This is better as it removes an extra concatenation operator.
END{print x}:: Here the OP prints the result. This, however, will miss the final single-quotes at the beginning and end of the string, so they could be added
END{print "\047" x "\047"}
So the corrected version of the OP's code would read:
awk '{x=(x?x"\047,\047":"")$0}END{print "\047" x "\047"}'

awk may be better
awk '{printf fmt,$1}' fmt="'%s'\n" file | paste -sd, -
'ABC','BCD','CDE','EFG'

Related

What does this Awk expression mean

I am working with bash script that has this command in it.
awk -F ‘‘ ‘/abc/{print $3}’|xargs
What is the meaning of this command?? Assume input is provided to awk.
The quick answer is it'll do different things depending on the version of awk you're running and how many fields of output the awk script produces.
I assume you meant to write:
awk -F '' '/abc/{print $3}'|xargs
not the syntactically invalid (due to "smart quotes"):
awk -F ‘’’/abc/{print $3}’|xargs
-F '' is undefined behavior per POSIX so what it will do depends on the version of awk you're running. In some awks it'll split the current line into 1 character per field. in others it'll be ignored and the line will be split into fields at every sequence of white space. In other awks still it could do anything else.
/abc/ looks for a string matching the regexp abc on the current line and if found invokes the subsequent action, in this case {print $3}.
However it's split into fields, print $3 will print the 3rd such field.
xargs as used will just print chunks of the multi-line input it's getting all on 1 line so you could get 1 line of all-fields output if you don't have many fields being output or several lines of multi-field output if you do.
I suspect the intent of that code was to do what this code actually will do in any awk alone:
awk '/abc/{printf "%s%s", sep, substr($0,3,1); sep=OFS} END{print ""}'
e.g.:
$ printf 'foo\nxabc\nyzabc\nbar\n' |
awk '/abc/{printf "%s%s", sep, substr($0,3,1); sep=OFS} END{print ""}'
b a

Issue with field separator in AWK script

Having a very large file where two lines shown below and having two fields name and revision having colon delimiter. I need to print only the second column.
sam:7.[0:6]
Ram:8.[6:6]_rev[2:4] h_ack[2:6]
vincent:58
I tried this code:
#!/bin/bash
awk -F: '{print $2}'
7.[0
8.[6
58
Output should be:
7.[0:6]
8.[6:6]_rev[2:4] h_ack[2:6]
58
What went wrong in my code.
The problem in your awk expression is that you are splitting on all :.
Instead, you want to split only on the first : from the start.
$ awk -F'^[^:]+:' '{print $2}' file
The regex pattern matches the start of the string ^, any character other than a :, and finally a :.
If you specify field separator as :, it's normal behavior of awk to output this, ex:
7.[0, because you need the other columns after $2.
cut here, better suits the requirement:
cut -d: -f2- file
Could you please try following.
awk '
match($0,/:.*/){
print substr($0,RSTART+1,RLENGTH-1)
}
' Input_file

How can i merge every block of 3 lines together while ignoring lower numbers of consecutive lines?

I have a text file like the following, contains blocks of text, blocks are in multiples of 3 lines or just 1 line:
AAAAAAAAAAAAA
BBBBBBBBBBBBB
CCCCCCCCCCCCC
DDDDDDDDDDDDD
EEEEEEEEEEEEE
FFFFFFFFFFFFF
GGGGGGGGGGGGG
HHHHHHHHHHHHH
IIIIIIIIIIIII
JJJJJJJJJJJJJ
KKKKKKKKKKKKK
LLLLLLLLLLLLL
MMMMMMMMMMMMM
NNNNNNNNNNNNN
OOOOOOOOOOOOO
PPPPPPPPPPPPP
QQQQQQQQQQQQQ
RRRRRRRRRRRRR
SSSSSSSSSSSSS
TTTTTTTTTTTTT
UUUUUUUUUUUUU
VVVVVVVVVVVVV
WWWWWWWWWWWWW
XXXXXXXXXXXXX
YYYYYYYYYYYYY
ZZZZZZZZZZZZZ
1111111111111
I would like to merge every block of 3 consecutive lines together, starting with the first in the block. I want to ignore lines that are in less then a block of 3 consecutive lines.
Characters and lengths of lines are always different. ( i have made the lines the same size in the example so it doesn't look too ugly).
So the output would be
AAAAAAAAAAAAA BBBBBBBBBBBBB CCCCCCCCCCCCC
DDDDDDDDDDDDD EEEEEEEEEEEEE FFFFFFFFFFFFF
GGGGGGGGGGGGG
HHHHHHHHHHHHH IIIIIIIIIIIII JJJJJJJJJJJJJ
KKKKKKKKKKKKK
LLLLLLLLLLLLL MMMMMMMMMMMMM NNNNNNNNNNNNN
OOOOOOOOOOOOO PPPPPPPPPPPPP QQQQQQQQQQQQQ
RRRRRRRRRRRRR SSSSSSSSSSSSS TTTTTTTTTTTTT
UUUUUUUUUUUUU
VVVVVVVVVVVVV WWWWWWWWWWWWW XXXXXXXXXXXXX
YYYYYYYYYYYYY ZZZZZZZZZZZZZ 1111111111111
I have tried to use
xargs -n3
However im not sure how to ignore singular lines
How can i acheive this?
With GNU awk for gensub():
$ awk -v RS= -v ORS='\n\n' '{$1=$1; print gensub(/(([^ ]+ ){2}[^ ]+) /,"\\1\n","g")}' file
AAAAAAAAAAAAA BBBBBBBBBBBBB CCCCCCCCCCCCC
DDDDDDDDDDDDD EEEEEEEEEEEEE FFFFFFFFFFFFF
GGGGGGGGGGGGG
HHHHHHHHHHHHH IIIIIIIIIIIII JJJJJJJJJJJJJ
KKKKKKKKKKKKK
LLLLLLLLLLLLL MMMMMMMMMMMMM NNNNNNNNNNNNN
OOOOOOOOOOOOO PPPPPPPPPPPPP QQQQQQQQQQQQQ
RRRRRRRRRRRRR SSSSSSSSSSSSS TTTTTTTTTTTTT
UUUUUUUUUUUUU
VVVVVVVVVVVVV WWWWWWWWWWWWW XXXXXXXXXXXXX
YYYYYYYYYYYYY ZZZZZZZZZZZZZ 1111111111111
In awk:
$ awk -v FS="\n" -v RS="" '{for(i=1;i<=NF;i+=3)print $i,$(i+1),$(i+2);print ""}' file
Output:
AAAAAAAAAAAAA BBBBBBBBBBBBB CCCCCCCCCCCCC
DDDDDDDDDDDDD EEEEEEEEEEEEE FFFFFFFFFFFFF
GGGGGGGGGGGGG
HHHHHHHHHHHHH IIIIIIIIIIIII JJJJJJJJJJJJJ
...
Update Version that won't leave trailing space:
$ awk -v FS="\n" -v RS="" '{for(i=1;i<=NF;i++)printf "%s%s",$i,(i%3==0||i==NF?ORS:OFS);print ""}' file
Please see discussion on some features in the comments. Thanks to the commentators for the constructive feedback.
Here is a different which will always work:
awk '(NF==0){print rec ORS; rec="";c=0; next}
{rec = rec (c ? (c%3==0 ? ORS : OFS) : "") $0; c++ }
END {print rec}' file
This might work for you (GNU sed):
sed '/\S/{N;/\n\s*$/b;N;//b;s/\n/ /g}' file
If the current line is not empty, append the next line.
If the appended line is not empty, append the next line.
If that line is also not empty, replace the newlines by spaces.
In all other cases print the line(s) as is.
An alternative, that is more programmatic:
sed ':a;N;s/\n/&/2;Ta;/^\s*$/M{P;D};s/\n/ /g' file

AWK get specificic pattern

I have lines like this:
Volume.Free_IBM_LUN59_28D: 2072083693568
I would like to get only IBM_LUN59_28D from this line using awk.
Thanks
You can use sub to do substitutions on each input line, as per the following transcript:
pax> echo 'Volume.Free_IBM_LUN59_28D: 2072083693568' | awk '
...> {
...> sub (".*Free_", "");
...> sub (":.*", "");
...> print
...> }'
IBM_LUN59_28D
That command crosses multiple lines for readability but, if you're operating on a file and not too concerned about readability, you can just use the compressed version:
awk '{sub(".*Free_","");sub(":.*","");print}' inputFile
If you're amenable to non-awk solutions, you could also use sed:
sed -e 's/.*Free_//' -e 's/:.*//' inputFile
Note that both those solutions rely on your (somewhat sparse) test data. If your definition of "like" includes preceding textual segments other than Free_ or subsequent characters other than :, some more work may be needed.
For example, if you wanted the string between the first _ and the first :, you could use:
awk '{sub("[^_]*_","");sub(":.*","");print}'
With sed:
sed 's/[^_]*_\(.*\):.*/\1/'
Search for sequence of non _ characters followed by _ (this will match Volume.Free_), then another sequence of characters (this will match IBM_LUN59_28D, we group this for future use), followed by : and any char sequence. Substitute with the saved pattern (\1). That's it.
Sample:
$ echo "Volume.Free_IBM_LUN59_28D: 2072083693568" | sed 's/[^_]*_\(.*\):.*/\1/'
IBM_LUN59_28D
Here is one awk
awk -F"Free_" 'NF>1{split($2,a,":");print a[1]}'
Eks:
echo "Volume.Free_IBM_LUN59_28D: 2072083693568" | awk -F"Free_" 'NF>1{split($2,a,":");print a[1]}'
IBM_LUN59_28D
It divides the line by Free_.
If line then have more than one field NF>1 then:
Split second field bye : and print first part a[1]
With awk:
echo "$val" | awk -F: '{print $1}' | awk -F. '{print $2}' | awk '{print substr($0,6)}'
where the given string is in $val.

Awk and dollar sign in record separator

I'm trying to parse a file that has, for whatever reason, the string "&($)" as record separator and "(#)$" as a field separator. I couldn't get awk to parse the file by specifying these as RS and FS in the BEGIN block. I'm using gnu awk 3.1.7 and it complains saying that there's a syntax error but couldn't find how to escape the dollar sign (assuming that's what it doesn't like).
$ awk 'BEGIN{FS="(#)$" RS="&($)"} {} END{print NR}' some-file.txt
awk: BEGIN{FS="(#)$" RS="&($)"} {} END{print NR}
awk: ^ syntax error
Appreciate help from the experts.
Thanks,
K
you need escape those chars with special meaning in regex.
kent$ cat f
foo(#)$bar(#)$blah&($)foo2(#)$bar2(#)$blah2
kent$ awk 'BEGIN{FS="\\(#\\)\\$";RS="&\\(\\$\\)"}{print NR,NF}' f
1 3
2 3
As you are defining two values in the BEGIN{} block, you are missing a semi colon to separate them:
awk 'BEGIN{FS="(#)$"; RS="&($)"} {} END{print NR}' file
^
You can also do
awk 'BEGIN{FS="(#)$"} {} END{print NR}' RS="&($)" file
Regarding the use of these separators, note as well what Kent is commenting in his answer: you need to escape them.
$ cat a
hello(#)$this(#)$is one record&($)and this another one
$ awk 'BEGIN{FS="\\(\\#\\)\\$"} {print $1, NR, NF}' RS="\\&\\(\\$\\)" a
hello 1 3
and this another one
2 1