merge few lines into one until found special line - awk

I have a text file with the following format. There is a separator (*****) after few random lines like below
aa
bb
cc
*****
dd
ee
*****
ff
ggg
hh
ii
*****
I'm expecting like below ouput
aa,bb,cc
dd,ee
ff,ggg,hh,ii
i think awk and sed can help me. but can not figure
it out exactly.
How do I merge those lines into one?
Thank you.

This awk one-liner should work for your input:
awk -v RS='[*]+' -v OFS="," '$1=$1' file
test with your data:
kent$ cat f
aa
bb
cc
*****
dd
ee
*****
ff
ggg
hh
ii
*****
kent$ awk -v RS='[*]+' -v OFS="," '$1=$1' f
aa,bb,cc
dd,ee
ff,ggg,hh,ii

alternative to nice awk solution, if your sed supports `\n'
$ tr '\n' ',' <file | sed 's/,\**,/\n/g'

How about something like this:
cat in.txt | tr ' ' ',' | tr '\n' ' ' | sed 's/\*\*\*\*\*/\n/' | sed 's/^ //'
will result in :
aa,bb,cc dd,ee ff,ggg,hh,ii
ff,ggg,hh,ii dd,ee aa,bb,cc
for input lines :
aa bb cc
dd ee
ff ggg hh ii
*****
ff ggg hh ii
dd ee
aa bb cc

This might work for you (GNU sed):
sed -r ':a;N;/\n\*+$/!s/\n/,/;ta;P;d' file
Append another line to the current and then replace its newline by a comma until a line consisting of only *'s is encountered. Then print the first line and delete the pattern space.

sed in action
sed -nr '/^[*]+ */!{H};/^[*]+ */{s/.*//g;x;s/\n/,/g;s/^,//g;p}' my_file
output
aa,bb,cc
dd,ee
ff,ggg,hh,ii

Related

Replace work in line by combining after grep

I need to replace a word after performing a grep and getting the last line of the result.
Here my example file:
aaa ts1 ts2
bbb ts3 ts4
aaa ts5 ts6
aaa ts7 NONE
What I need is to select all lines containing 'aaa', get the last one in the result and replace NONE.
I tried
cat <file> | grep "aaa" | tail -n 1 | sed -i 's/NONE/ts8/g'
but it doesn't work.
Any suggestion to do that?
Thanks
With tac + awk solution please try following.
tac Input_file | awk '/aaa/ && ++count==1{sub(/NONE/,"ts8")} 1' | tac
once you are happy with above command try following, to do inplace save into Input_file.
tac Input_file | awk '/aaa/ && ++count==1{sub(/NONE/,"ts8")} 1' | tac > temp && mv temp Input_file
Explanation: Firstly printing Input_file in reverse order by tac then sending its standard output to awk as an input where substituting of NONE to ts8 in very first line(which is actually last line containing aaa). Simply printing all other lines, again sending output to tac to make it in actual order like Input_file's order.
For doing this in a single command, this should work in any version of awk:
awk 'FNR==NR {if ($1=="aaa") n=FNR; next} FNR == n {$3="TS7"} 1' file{,}
aaa ts1 ts2
bbb ts3 ts4
aaa ts5 ts6
aaa ts7 TS7
To save output in same file use:
awk 'FNR==NR {if ($1=="aaa") n=FNR; next}
FNR == n {$3="TS7"} 1' file{,} > file.out && mv file.out file
Or using gnu sed, you may use:
sed -i -Ez 's/(.*\naaa[[:blank:]]+[^[:blank:]]+[[:blank:]]+)NONE/\1ts8/' file
cat file
aaa ts1 ts2
bbb ts3 ts4
aaa ts5 ts6
aaa ts7 ts8
If you want to get the last line that matches aaa at the start, you can go through all lines, and in the END block, print the last occurrence and replace NONE with ts8 using awk:
awk '$1=="aaa"{last=$0}END{sub(/NONE/,"ts8",last);print last}' file
In parts:
$1=="aaa" { # If the first field is aaa
last=$0 # Set variable last to the whole line (overwrite on each match)
}
END { # Run once at the end
sub(/NONE/,"ts8",last) # Replace NONE with ts8 in the last variable
print last
}
' file
Output
aaa ts7 ts8

awk,merge two data sets based on column value

I need to combine two data sets stored in variables. This merge needs to be conditional based on the value of 1st column of "$x" and third column of "$y"
-->echo "$x"
12 hey
23 hello
34 hi
-->echo "$y"
aa bb 12
bb cc 55
ff gg 34
ss ww 23
By following command, I managed to store the value of first column of $x in a[] and check for third column of $y but not getting what I am expecting, can someone please help here.
awk 'NR==FNR{a[$1]=$1;next} $3 in a{print $0,a[$1]}' <(echo "$x") <(echo "$y")
aa bb 12
ff gg 34
ss ww 23
Expected result:
aa bb 12 hey
ff gg 34 hi
ss ww 23 hello
Your answer is almost right:
awk 'NR==FNR{a[$1]=$2;next} ($3 in a){print $0,a[$3]}' <(echo "$x") <(echo "$y")
Note the a[$1]=$2 and the print $0,a[$3].
join -1 1 -2 3 <(sort -k 1b,1 a.txt) <(sort -k 3b,3 b.txt) |awk '{print $3, $4, $1, $2 }'
Might be a solution for your input in two textfiles a.txt and b.txt using join on your two number columns.
It does not keep the order though. You might have to sort again if it is important.

How to filter empty line with a 'cut' command?

I have a tab delimited file with a few fields:
f1 f2 f3
a b c
a c
d e
f g a
I want to extract the 3rd column with a 'cut'command:
cut -f3 t
This works. However, how can I filter the empty line in the output? As it can be seen, the 2nd and 3rd lines are empty after they are extracted.
To remove empty output:
$ cut -f3 file | grep .
f3
c
a
Or:
$ awk -F'\t' '$3 {print $3}' file
f3
c
a
To replace the missing output with a filler:
$ awk -F'\t' '{if ($3) print $3; else print "FILL"}' file
f3
c
FILL
FILL
a
Or, for people who like the more compact ternary statement:
$ awk -F'\t' '{print ($3?$3:"FILL")}' file
f3
c
FILL
FILL
a
Example with multiple words in field 3
$ cat file2
f1 f2 f3
f g a b c d
$ cut -f3 file2 | grep .
f3
a b c d
$ awk -F'\t' '$3 {print $3}' file2
f3
a b c d

how to append a line with sed/awk after specific text

i would like to translate this input file using sed or awk:
input
1 AA
3 BB
5 CC
output
1 AA
3 BB
3 GG
5 CC
the closest syntax I found on this site sed -i '/^BB:/ s/$/ GG/' file but it does 3 BB GG. What I need is similar to a vi yank, paste & regex replace.
can this be done with sed or awk? thanks
Rand
With GNU sed:
sed -r 's/^([^ ]*) BB$/&\n\1 GG/' file
Output:
1 AA
3 BB
3 GG
5 CC
This might work for you (GNU sed):
sed '/BB/p;s//GG/' file
If the line contains the required string print it then substitute another string for it.
awk is a fine choice for this:
awk '{print $0} $2=="BB"{print $1,"GG"}' yourfile.txt
That will print the line {print $0}. And then if the second field in the line is equal to "BB", it will print the first field in the line (the number) and the text "GG".
Example in use:
>echo "1 AA\n3 BB\n4 RR" | awk '{print $0} $2=="BB"{print $1,"GG"}'
1 AA
3 BB
3 GG
4 RR
In awk:
$ awk '1; /BB/ && $2="GG"' input
1 AA
3 BB
3 GG
5 CC
1 prints the record. If there was BB in the record just printed, replace it with GG and print again.

Reading a file from line 4 to the end

I want to read a file from the line 4 to the very end is there anyway to this with awk or something?
This sed command will do:
sed -n '4,$p' file.txt
Or using awk:
awk 'NR>=4' file.txt
Or using tail:
tail +4 file.txt
awk 'NR >= 4 {print $0}'
For example
$> seq 101 110 | awk 'NR >= 4 {print $0}'
104
105
106
107
108
109
110
tail +4 filename ll serve ur purpose.
more on tail
heres a method (that can depend on the type of shell you use, bash should work):
tmpvar=`cat a_file | wc -l `; tail -$((tmpvar-4)) a_file
heres another method that should work in more shells:
cat a_file -n | awk '{if($1>4) print $2}'