Append lines to a previous line

Append lines to a previous line - awk

I am trying to append all lines that begin with > to the previous line that did not begin with >
cat tmp
ATAAACGGAAAAACACTACTTTAGCTTACGGGATCCGGT
>Aa_816
>Aa_817
>Aa_818
CCAAACGGAAAAACACTACTTGAGCTTACGGGATCCGGT
>Aa_940
>Aa_941
CTAAAAGGAAAAACACTACTTTAGCTTTTGGGATCCGGT
What I want is this:
ATAAACGGAAAAACACTACTTTAGCTTACGGGATCCGGT >Aa_816 >Aa_817 >Aa_818
CCAAACGGAAAAACACTACTTGAGCTTACGGGATCCGGT >Aa_940 >Aa_941
CTAAAAGGAAAAACACTACTTTAGCTTTTGGGATCCGGT
This almost gets me there:
cat tmp |awk '!/>/ {sub(/\\$/,""); getline t; print $0 t; next}; 1'

With awk:
awk '!/^>/{printf "%s%s", (NR==1)?"":RS,$0;next}{printf "%s", FS $0}END{print ""}' file

Using awk
awk '!/>/{printf (NR==1)?$0:RS $0;next}{printf FS $0}' file
If you don't care the output has new line generated on the first line, here is the shorter one.
awk '{printf (/>/?FS $0:RS $0)}' file

I think all you need is a little sed:
sed ':a; N; $!ba; s/\n>/ >/g' file
Results:
ATAAACGGAAAAACACTACTTTAGCTTACGGGATCCGGT >Aa_816 >Aa_817 >Aa_818
CCAAACGGAAAAACACTACTTGAGCTTACGGGATCCGGT >Aa_940 >Aa_941
CTAAAAGGAAAAACACTACTTTAGCTTTTGGGATCCGGT

awk '/^[^>]/ { if (length(old) > 0) print old; old = $0 }
/^>/ { old = old " " $0 }
END { if (length(old) > 0) print old }'

Related

AWK: How to number auto-increment?

I have a file.file content is:
20210126000880000003|3|33.00|20210126|15:30
1|20210126000000000000000000002207|1220210126080109|1000|100000000000000319|100058110000000325|402041000012|402041000012|PT07|621067000000123645|收款方户名|2021-01-26|2021-01-26|10.00|TN|NCS|12|875466
2|20210126000000000000000000002208|1220210126080110|1000|100000000000000319|100058110000000325|402041000012|402041000012|PT06|621067000000123645|收款方户名|2021-01-26|2021-01-26|20.00|TN|NCS|12|875466
3|20210126000000000000000000002209|1220210126080111|1000|100000000000000319|100058110000000325|402041000012|402041000012|PT08|621067000000123645|收款方户名|2021-01-26|2021-01-26|3.00|TN|NCS|12|875466
I use awk command:
awk -F"|" 'NR==1{print $1};FNR==2{print $2,$3}' testfile
Get the following result：
20210126000880000003
20210126000000000000000000002207 1220210126080109
I want the number to auto-increase:
awk -F"|" 'NR==1{print $1+1};FNR==2{print $2+1,$3+1}' testfile
But get follow result:
20210126000880001024
20210126000000000944237587726336 1220210126080110
have question:
I want to the numer is auto-increase: hope the result is：
20210126000880000003
20210126000000000000000000002207|1220210126080109
-------------------------------------------------
20210126000880000004
20210126000000000000000000002208|1220210126080110
--------------------------------------------------
20210126000880000005
20210126000000000000000000002209|1220210126080111
How to auto_increase?
Thanks!

You may try this gnu awk command:
awk -M 'BEGIN {FS=OFS="|"} NR == 1 {hdr = $1; next} NF>2 {print ++hdr; print $2, $3; print "-------------------"}' file
20210126000880000004
20210126000000000000000000002207|1220210126080109
-------------------
20210126000880000005
20210126000000000000000000002208|1220210126080110
-------------------
20210126000880000006
20210126000000000000000000002209|1220210126080111
-------------------
A more readable version:
awk -M 'BEGIN {
FS=OFS="|"
}
NR == 1 {
hdr = $1
next
}
NF > 2 {
print ++hdr
print $2, $3
print "-------------------"
}' file
Here is a POSIX awk solution that doesn't need -M:
awk 'BEGIN {FS=OFS="|"} NR == 1 {hdr = $1; next} NF>2 {"echo " hdr " + 1 | bc" | getline hdr; print hdr; print $2, $3; print "-------------------"}' file
20210126000880000004
20210126000000000000000000002207|1220210126080109
-------------------
20210126000880000005
20210126000000000000000000002208|1220210126080110
-------------------
20210126000880000006
20210126000000000000000000002209|1220210126080111
-------------------

Anubhava has the best solution but for older versions of GNU awk that don't support -M (big numbers) you can try the following:
awk -F\| 'NR==1 { print $1;hed=$1;hed1=substr($1,(length($1)-1));next; } !/^$/ {print $2" "$3 } /^$/ { print "--------------------------------------------------";printf "%s%s\n",substr(hed,1,((length(hed))-(length(hed1)+1))),++hed1 }' testfile
Explanation:
awk -F\| 'NR==1 { # Set field delimiter to | and process the first line
print $1; # Print the first field
hed=$1; # Set the variable hed to the first field
hed1=substr($1,(length($1)-1)); # Set a counter variable hed1 to the last digit in hed ($1)
next;
}
!/^$/ {
print $2" "$3 # Where there is no blank line, print the second field, a space and the third field
}
/^$/ {
print "--------------------------------------------------"; # Where there is a blank field, process
printf "%s%s\n",substr(hed,1,((length(hed))-(length(hed1)+1))),++hed1 # print the header extract before the counter, followed by the incremented counter
}' testfile

Merge lines based on first column without delimiter

I need to merge all the lines that have the same value on the first column.
The input file is the following:
34600000031|(1|1|0|1|1|20190114180000|20191027185959)
34600000031|(2|2|0|2|2|20190114180000|20191027185959)
34600000031|(3|3|0|3|3|20190114180000|20191027185959)
34600000031|(4|4|0|4|4|20190114180000|20191027185959)
34600000015|(1|1|100|1|8|20190114180000|20191027185959)
34600000015|(2|2|100|2|9|20190114180000|20191027185959)
34600000015|(3|3|100|3|10|20190114180000|20191027185959)
34600000015|(4|4|100|4|11|20190114180000|20191027185959)
I was able to partially achieve it using the following:
awk -F'|' '$1!=p{if(p)print s; p=$1; s=$0; next}{sub(p,x); s=s $0} END{print s}' INPUT
The output is the following:
34600000031|(1|1|0|1|1|20190114180000|20191027185959)|(2|2|0|2|2|20190114180000|20191027185959)|(3|3|0|3|3|20190114180000|20191027185959)|(4|4|0|4|4|20190114180000|20191027185959)
34600000015|(1|1|100|1|8|20190114180000|20191027185959)|(2|2|100|2|9|20190114180000|20191027185959)|(3|3|100|3|10|20190114180000|20191027185959)|(4|4|100|4|11|20190114180000|20191027185959)
What I need (and i cannot find how) is the following:
34600000031|(1|1|0|1|1|20190114180000|20191027185959)(2|2|0|2|2|20190114180000|20191027185959)(3|3|0|3|3|20190114180000|20191027185959)(4|4|0|4|4|20190114180000|20191027185959)
34600000015|(1|1|100|1|8|20190114180000|20191027185959)(2|2|100|2|9|20190114180000|20191027185959)(3|3|100|3|10|20190114180000|20191027185959)(4|4|100|4|11|20190114180000|20191027185959)
I could do a sed after the initial awk but I don't believe that this is the proper way to do it.

You need to substitute the separator in the values too. Your fixes awk would look like this:
awk -F'|' '$1!=p{if(p)print s; p=$1; s=$0; next}{sub(p "\\|",x); s=s $0} END{print s}'
but it's also good to match beginning of the string:
awk -F'|' '$1!=p{if(p)print s; p=$1; s=$0; next}{sub("^" p "\\|",x); s=s $0} END{print s}'
I would do it somewhat simpler, which uses more memory (as it stores everything in an array) but doesn't need the file to be sorted:
awk -F'|' '{ k=$1; sub("^" $1 "\\|", ""); a[k] = a[k] $0 } END{ for (i in a) print i "|" a[i] }'
For each line, remember the first field, substitute the first field with | for nothing, then add it to an array indexed by the first field. On the end, print each element in the array with the key, separator and value.

$ awk -F'|' '
{
curr = $1
sub(/^[^|]+\|/,"")
printf "%s%s", (curr==prev ? "" : ors curr FS), $0
ors = ORS
prev = curr
}
END { print "" }
' file
34600000031|(1|1|0|1|1|20190114180000|20191027185959)(2|2|0|2|2|20190114180000|20191027185959)(3|3|0|3|3|20190114180000|20191027185959)(4|4|0|4|4|20190114180000|20191027185959)
34600000015|(1|1|100|1|8|20190114180000|20191027185959)(2|2|100|2|9|20190114180000|20191027185959)(3|3|100|3|10|20190114180000|20191027185959)(4|4|100|4|11|20190114180000|20191027185959)

printing information of two files according specific field

I have two files. I need to print information like the example, when the first field exist and is equal, in two files.
file 1
20;"aaaaaa";99292929
24;"fsfdfa";42933294
30;"fsdsff";23832299
38;"fjsdjl";62673777
file 2
13;"fsdffsdfs";2272777
20;"ffuiiii";23728877
30;"wdwfsdh";8882817
40;"sfjslll";82371111
expect result:
file1;20;"aaaaaa";99292929;file2;20;"ffuiiii";23728877
file1,30;"fsdsff";23832299;file2;30;"wdwfsdh";8882817
I tried with:
awk 'FNR==NR{a[$1]=$1;next} $1 in a' file2 file1 > newfile
logical it's ok, but I can't show fields that I want.

awk will help:
awk -F ';' 'NR==FNR{rec[$1]=FILENAME FS $0}
NR>FNR{
if($1 in rec){
print rec[$1] FS FILENAME FS $0
}
}' file{1..2}
should do.

$ cat tst.awk
BEGIN { FS=OFS=";" }
{ $0 = FILENAME FS $0 }
NR==FNR { a[$2] = $0; next }
$2 in a { print a[$2], $0 }
$ awk -f tst.awk file1 file2
file1;20;"aaaaaa";99292929;file2;20;"ffuiiii";23728877
file1;30;"fsdsff";23832299;file2;30;"wdwfsdh";8882817

match two files with awk and output selected fields

I want to compare two files delimited with
;
with the same field1,
output field2 of file1 and field2 field1 of file2.
File1:
16003-Z/VG043;204352
16003/C3;100947
16003/C3;172973
16003/PAB4L;62245
16003;100530
16003;101691
16003;144786
File2:
16003-Z/VG043;568E;0540575;2.59
16003/C3;568E;0000340;2.53
16003/PAB4L;568H;0606738;9.74
16003;568E;0000339;0.71
16003TN9/C3;568E;0042261;3.29
Desired output:
204352;568E;16003-Z/VG043
100947;568E;16003/C3
172973;568E;16003/C3
62245;568H;16003/PAB4L
100530;568E;16003
101691;568E;16003
144786;568E;16003
My try:
awk -F\, '{FS=";"} NR==FNR {a[$1]; next} ($1) in a{ print a[$2]";"$2";"$3}' File1 File2 > Output
The above is not working probably because awk is still obscure to me.
The problem is what is driving the output? what $1, $2, etc are referred to what?
The a[$2] in my intention is the field2 of file 1....but it is not...
What I get is:
;204352;16003-Z/VG043
;100947;16003/C3
;172973;16003/C3
;62245;16003/PAB4L
;100530;16003
;101691;16003
;144786;16003
thanks for helping

This might be what you are after:
awk -F";" '(NR==FNR) { a[$1] = ($1 in a ? a[$1] FS : "") $2; next }
($1 in a) { split(a[$1],b); for(i in b) print b[i] FS $2 FS $1 }' file1 file2
This outputs:
204352;568E;16003-Z/VG043
100947;568E;16003/C3
172973;568E;16003/C3
62245;568H;16003/PAB4L
100530;568E;16003
101691;568E;16003
144786;568E;16003

This approach reads a file file_1.txt by first into an associative array table. (This is done to associate ids / values across files.) Then, looping over the 2nd file file_2.txt, I print the values in table that match the id field of this file along with the current value:
BEGIN {
FS=OFS=";"
while (getline < first)
table[$1] = $2 FS table[$1]
}
$1 in table {
len = split(table[$1], parts)
for (i=1; i<len; i++)
print parts[i], $2, $1
}
$ awk -v first=file_1.txt -f script.awk file_2.txt
204352;568E;16003-Z/VG043
172973;568E;16003/C3
100947;568E;16003/C3
62245;568H;16003/PAB4L
144786;568E;16003
101691;568E;16003
100530;568E;16003

print unique lines based on field

Would like to print unique lines based on first field , keep the first occurrence of that line and remove duplicate other occurrences.
Input.csv
10,15-10-2014,abc
20,12-10-2014,bcd
10,09-10-2014,def
40,06-10-2014,ghi
10,15-10-2014,abc
Desired Output:
10,15-10-2014,abc
20,12-10-2014,bcd
40,06-10-2014,ghi
Have tried below command and in-complete
awk 'BEGIN { FS = OFS = "," } { !seen[$1]++ } END { for ( i in seen) print $0}' Input.csv
Looking for your suggestions ...

You put your test for "seen" in the action part of the script instead of the condition part. Change it to:
awk -F, '!seen[$1]++' Input.csv
Yes, that's the whole script:
$ cat Input.csv
10,15-10-2014,abc
20,12-10-2014,bcd
10,09-10-2014,def
40,06-10-2014,ghi
10,15-10-2014,abc
$
$ awk -F, '!seen[$1]++' Input.csv
10,15-10-2014,abc
20,12-10-2014,bcd
40,06-10-2014,ghi

This should give you what you want:
awk -F, '{ if (!($1 in a)) a[$1] = $0; } END '{ for (i in a) print a[i]}' input.csv

typo there in syntax.
awk '{ if (!($1 in a)) a[$1] = $0; } END { for (i in a) print a[i]}'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Append lines to a previous line - awk

With awk: awk '!/^>/{printf "%s%s", (NR==1)?"":RS,$0;next}{printf "%s", FS $0}END{print ""}' file

Using awk awk '!/>/{printf (NR==1)?$0:RS $0;next}{printf FS $0}' file If you don't care the output has new line generated on the first line, here is the shorter one. awk '{printf (/>/?FS $0:RS $0)}' file

I think all you need is a little sed: sed ':a; N; $!ba; s/\n>/ >/g' file Results: ATAAACGGAAAAACACTACTTTAGCTTACGGGATCCGGT >Aa_816 >Aa_817 >Aa_818 CCAAACGGAAAAACACTACTTGAGCTTACGGGATCCGGT >Aa_940 >Aa_941 CTAAAAGGAAAAACACTACTTTAGCTTTTGGGATCCGGT

awk '/^[^>]/ { if (length(old) > 0) print old; old = $0 } /^>/ { old = old " " $0 } END { if (length(old) > 0) print old }'

Related

AWK: How to number auto-increment?

Merge lines based on first column without delimiter

printing information of two files according specific field

match two files with awk and output selected fields

print unique lines based on field

Categories

Resources