Print lines between two patterns along with the header - awk

I have the file like this below:
Name: DB1
========================================================
Primary :
f3
f6
f7
f9
f0
Secondary :
internal input
internal output
internal Loaded
internal output
internal Loaded
Name: DB2
========================================================
Primary :
s2
m5
m7
m8
m9
Secondary :
External output
External Revoke
External Reuse
External input
But I need the output like this need to extract the lines between Primary and Secondary along with the names:
Name: DB1
========================================================
f3
f6
f7
f9
f0
Name: DB2
========================================================
Primary :
s2
m5
m7
m8
I tried this :
$ awk '/Primary :/{flag=1; next} /Undriven :/{flag=0} flag' file
f3
f6
f7
f9
f0
s2
m5
m7
m8
m9
I am not getting the names can anyone please help me in this.

It looks like you're pretty close, except that (a) you're never explicitly matching the Name: line, and (b) you're matching the word "Undriven" which doesn't appear in your sample data.
I would probably do something like this:
awk '
/^Name:/
/^====/
/^Primary :/{flag=1; next}
/^Secondary :/{flag=0}
flag
' file
Which produces as output:
Name: DB1
========================================================
f3
f6
f7
f9
f0
Name: DB2
========================================================
s2
m5
m7
m8
m9

If this isn't all you need then edit your question to provide more truly representative sample input/output that this doesn't work for:
$ awk -v RS= -F'\n' '{for (i=1;i<=8;i++) print $i; print ""}' file
Name: DB1
========================================================
Primary :
f3
f6
f7
f9
f0
Name: DB2
========================================================
Primary :
s2
m5
m7
m8
m9
or:
$ awk -v RS= -v FS='\n' '{print $1 ORS $2; for (i=4;i<=8;i++) print $i; print ""}' file
Name: DB1
========================================================
f3
f6
f7
f9
f0
Name: DB2
========================================================
s2
m5
m7
m8
m9

Related

How can I adjust a text file in VIM to have two columns instead of three while not splitting paired data?

Hello there and thank you for reading this!
I have a very large text file that looks something like this:
a1 b1 a2
b2 a3 b3
a4 b4 a5
b5 a6 b6
I would like my text file to look like this:
a1 b1
a2 b2
a3 b3
a4 b4
a5 b5
a6 b6
In reality, these values are paired lon/lat coordinates. If it is useful the values look like:
1.591336e+02 4.978998e+01 1.591162e+02
4.977995e+01 1.590988e+02 4.976991e+01
1.590815e+02 4.975988e+01 1.590641e+02
4.974984e+01 1.590468e+02 4.973980e+01
I have been learning in vim, but I do have access to other tools if this is easier done elsewhere. Should I be looking for a sed or awk command that will assess the amount of spaces in a given row? I appreciate any and all advice, and if I can offer any more information, I would be glad to!
I have searched for other folks who have had this question, but I don't know how to apply some of the solutions I've seen for similar problems to this - and I'm afraid of messing up this very large file. I am expecting the answer to be something using sed or awk, but I don't know how to be successful with these commands with what I've found so far. I'm rather new to coding and this site, so if I missed this question already being asked, I apologize!
All the best!
EDIT: I used
sed 's/\s/\n/2;P;D' file.txt > newfile.txt
to turn my file into:
a1 b1
a2^M
b2 a3
b3^M
a4 b4
a5^M
b5 a6
b6^M
I then used:
dos2unix newfile.txt
to get rid of the ^M within the data. I haven't made it to the structure, but I am one step closer.
$ tr ' ' '\n' <input_file|paste -d" " - -
a1 b1
a2 b2
a3 b3
$ sed 's/ /\n/2; P; D' <(tr '\n' ' ' <input_file)
a1 b1
a2 b2
a3 b3
$ tr '\n' ' ' <input_file|xargs -d" " printf '%s %s\n'
a1 b1
a2 b2
a3 b3
An approach using awk
% awk '{for(i=1;i<=NF;i++){x++; printf x%2==0 ? $i"\n" : $i" "}}' file
a1 b1
a2 b2
a3 b3
a4 b4
a5 b5
a6 b6
Data
% cat file
a1 b1 a2
b2 a3 b3
a4 b4 a5
b5 a6 b6
Using GNU sed
$ sed -Ez 's/([^ \n]*)[ \n]([^ \n]* ?)\n?/\1 \2\n/g' input_file
a1 b1
a2 b2
a3 b3
a4 b4
a5 b5
a6 b6
Using cut:
$ cut -d' ' -f1,2 test
a1 b1
b2 a3
a4 b4
b5 a6
In vim you can
:%j
to join all the lines, then
:s/\([^ ]\+ [^ ]\+\) /\1\r/g
to turn every 2nd space into a newline.
With perl
perl -0777 -lpe 's/(\S+)\s+(\S+)\s+/$1 $2\n/g' file
That reads the whole file into memory, so it depends on what "very large" is. Is it smaller than the amount of memory you have?
FWIW, here is a bit of a meta-answer.
Vim lets you filter all or some of the lines in your buffer via an external program with :help :!. It is very handy because, wile Vim can do a lot, there are plenty of use cases for which external tools would do a better job.
So… if you already have the file opened in Vim, you should be able to apply the provided answers with little effort:
:%!tr ' ' '\n' | paste -d" " - -
:%!tr '\n' ' ' | sed 's/ /\n/2; P; D'
:%!perl -0777 -lpe 's/(\S+)\s+(\S+)\s+/$1 $2\n/g'
etc.
Of note:
The command after the [range]! takes the lines covered by [range] as standard input which makes constructs like <filename unnecessary in this context.
Vim expands % to the current filename and # to the alternate filename if they exist so they usually need to be escaped, as in:
:%!tr '\n' ' ' | xargs printf '\%s \%s\n'
There's a lot to learn from this thread. Good luck.
This might work for you (GNU sed):
sed '$!N;y/\n/ /;s/ /\n/2;P;D' file
Append the following line if not the last.
Translate all newlines to spaces.
Replace the second space by a newline.
Print the first line, delete the first line and repeat.

how to extract two strings from each line

I have a file of the following content:
product a1 version a2 owner a3
owner b1 version b2 product b3 size b4
....
I am interested in extracting product and version from each line using a shell script, and write them in 2 columns with product first and version second. So the output should be:
a1 a2
b3 b2
...
I used "while read line", but it is extremely slow. I tried to use awk, but couldn't figure out how to do it. Any help is appreciated.
The following will do what you want:
$ nl dat
1 product a1 version a2 owner a3
2 owner b1 version b2 product b3 size b4
$ awk 'NF { delete row;
for( i=1; i <= NF; i += 2 ) {
row[$i] = $(i+1)
}
print( row["product"], row["version"])
}' dat
a1 a2
b3 b2
This builds an associative array from the name-value pairs in your data file by position, and then retrieves the values by name. The NF in the pattern ensures blank lines are ignored. If product or version are otherwise missing, they'll likewise be missing in the output.
A different perl approach:
perl -lane 'my %h = #F; print "$h{product} $h{version}"' input.txt
Uses auto-split mode to put each word of each line in an array, turns that into a hash/associative array, and prints out the keys you're interested in.
Here is a perl to do that:
perl -lnE '
$x=$1 if /(?=product\h+(\H+))/;
$y=$1 if /(?=version\h+(\H+))/;
say "$x $y" if $x && $y;
$x=$y="";' file
Or, same method with GNU awk:
gawk '/product/ && /version/{
match($0,/product[ \t]+([^ \t]+)/,f1)
match($0,/version[ \t]+([^ \t]+)/,f2)
print f1[1],f2[1]
}' file
With the example, either prints:
a1 a2
b3 b2
The advantage here is only complete lines are printed where both targets are found.
With awk:
$ awk '{for(i=1;i<NF;i++){
if($i=="version")v=$(i+1)
if($i=="product")p=$(i+1)}}
{print p,v}' data.txt
a1 a2
b3 b2
If you have lines without a version or product number, and you want to skip them:
awk '{ok=0}
{for(i=1;i<NF;i++){
if($i=="version"){v=$(i+1);ok++}
if($i=="product"){p=$(i+1);ok++}}}
ok==2{print p,v}' data.txt
Thank you guys for the quick and excellent replies. I ended up using the awk version as it is most convenient for me to insert into an existing shell script. But I learned a lot from other scripts too.

Need to substitute \x0d\x0a to \x2c\x0d\x0a in a file

I need to substitute \x0d\x0a to \x2c\x0d\x0a in a file
The following does not do anything:
awk '{if NR> 1 {gsub(/\x0D\x0A/,"\x2C\x0D\x0A"); print}}' test.csv > testfixed.csv
$ xxd test.csv
00000e0: 350d 0a45 4941 2d39 3330 2c44 6169 6c79 5..EIA-930,Daily
00000f0: 2c4e 5949 532c 2c55 5443 302c 3030 3132 ,NYIS,,UTC0,0012
You are trying to make a substitution of the hex string \x0D\x0A which is nothing more than CRLF or \r\n.
Since awk by default splits its records on the <newline> character (which is LF), you actually never have to try to match your <newline> character \n (or \x0a). So all you need to do is substitute \r into ,\r (0x2c is the hex value of ,). So this should do the trick:
awk '(NR>1){sub("\r$",",\r"); print}' file
So why was your script failing?
As mentioned before, awk works in records and the default record separator is the <newline> character. This means that the <newline> character, also written as \n and having hexadecimal value \x0a, is never part of the record $0. Also, the print statement automatically adds its record output separator ORS after the record. By default this is again the <newline> character. So you did not have to try to substitute that. All you had to do was:
awk 'NR > 1 {sub(/\x0D$/,"\x2C\x0D"); print}' test.csv > testfixed.csv
So is it possible to substitute by means of its hexacedimal values?
Yes, clearly it is!
echo -n "Hello World" | awk 'sub(/\x57\x6f\x72\x6c\x64/,"\x43\x6f\x77")'
But how can I change <newline>?
You can just redefine the output record separator ORS:
awk -v ORS="whatever" '1'
Also, using GNU awk, you can follow glenn jackman's solution.
Very much related: Why does my tool output overwrite itself and how do I fix it?
The newline \n or \x0A will not appear in each record because by default it is the record separator.
I would do this: define the input and output record separators to be \r\n and then for line number > 1, append a comma to the record:
$ printf "a\r\nb\r\nc\r\n" >| file
$ hexdump -C file
00000000 61 0d 0a 62 0d 0a 63 0d 0a |a..b..c..|
00000009
$ awk 'BEGIN {RS = ORS = "\r\n"} NR > 1 {$0 = $0 ","} 1' file | hexdump -C
00000000 61 0d 0a 62 2c 0d 0a 63 2c 0d 0a |a..b,..c,..|
0000000b

Awk: concatenating split field element from one file to another based on common field

I have two tab-delimited files, f1 and f2, that look like this:
f1:
id1 r1
id2 r2
id3 r3
...
idN rN
f2:
f1 g1 x1;id1
f2 g2 x2;id2
f4 g4 x2;id4
...
fM gM xm;idM
where N and M may be different. I'm looking to create an associative array of f1 and concatenate the second column of f1 to the end of f2 such that the output is:
f1 g1 x1;id1=r1
f2 g2 x2;id2=r2
...
As a test, I've run this:
awk 'BEGIN{FS=OFS="\t"} NR==FNR{id[$1]=$1; r[$1]=$2; next} {split($3,a,";"); if (a[2] in id) {print "found"} else {print "not found"}}' f1 f2
which gives output:
found
found
not found
...
However, running the following command:
awk 'BEGIN{FS=OFS="\t"} NR==FNR{id[$1]=$1; r[$1]=$2; next} {split($3,a,";"); if (a[2] in id) {$3=$3"="r[$1]; print} else {print "not found"}}' f1 f2
gives the output:
f1 g1 x1;id1=
f2 g2 x2;id2=
not found
...
My question is: how do I access the value associated with the key such that I can append it to the 3rd column of f2?
join is the tool for joining files, especially if they are already sorted by the key.
$ join -14 <(sed 's/;/; /' file2) file1 |
awk '{print $2,$3,$4$1 "=" $5}'
f1 g1 x1;id1=r1
f2 g2 x2;id2=r2
however, your output format is not standard, so need awk for that purpose. I guess in that case the whole script can be done in awk as well.

Deleting columns from a file with awk or from command line on linux

How can I delete some columns from a tab separated fields file with awk?
c1 c2 c3 ..... c60
For example, delete columns between 3 and 29 .
This is what the cut command is for:
cut -f1,2,30- inputfile
The default is tab. You can change that with the -d switch.
You can loop over all columns and filter out the ones you don't want:
awk '{for (i=1; i<=NF; i++) if (i<3 || i>29) printf $i " "; print""}' input.txt
where the NF gives you the total number of fields in a record.
For each column that meets the condition we print the column followed by a space " ".
EDIT: updated after remark from johnny:
awk -F 'FS' 'BEGIN{FS="\t"}{for (i=1; i<=NF-1; i++) if(i<3 || i>5) {printf $i FS};{print $NF}}' input.txt
this is improved in 2 ways:
keeps the original separators
does not append a separator at the end
awk '{for(z=3;z<=15;z++)$z="";$0=$0;$1=$1}1'
Input
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21
Output
c1 c2 c16 c17 c18 c19 c20 c21
Perl 'splice' solution which does not add leading or trailing whitespace:
perl -lane 'splice #F,3,27; print join " ",#F' file
Produces output:
c1 c2 c30 c31