How to print certain lines from sections of a file separated by a blank line with sed - awk

I have been trying to come up with a sed command that will pull certain lines from blocks of text separated by a blank line in a file. The blocks of text are as below.
# cat test_file.txt
line 1
line 2
line 3
line 4
line 5
line 1
line 2
line 3
line 4
line 5
line 1
line 2
line 3
line 4
line 5
I am trying to pull out line 2 an 4 from each block so the output will be like below.
line 2
line 4
line 2
line 4
line 2
line 4
I came up with a way to do it for the first block of text using sed:
# sed -n -e 2p -e 4p test_flie.txt
line 2
line 4
But haven't been able to find a way to get it to continue for each block of text till the end of the file. Any pointers would be greatly appreciated.

awks paragraph mode exists specifically to handle blank-line separated records/blocks of text like you're dealing with:
$ awk 'BEGIN{RS=""; ORS="\n\n"; FS=OFS="\n"} {print $2, $4}' file
line 2
line 4
line 2
line 4
line 2
line 4
Reference the POSIX standard:
If RS is null, then records are separated by sequences consisting of a <newline> plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input
If you need to not have a blank line printed after the final record:
$ awk 'BEGIN{RS=""; FS=OFS="\n"} NR>1{print prev ORS} {prev=$2 OFS $4} END{print prev}' file
line 2
line 4
line 2
line 4
line 2
line 4
or if you don't want to use paragraph mode for some reason then:
$ awk 'BEGIN{tgts[2]; tgts[4]} !NF{print ""; lineNr=0; next} ++lineNr in tgts' file
line 2
line 4
line 2
line 4
line 2
line 4

I'd use awk for this, e.g:
awk '(!NF&&m=NR)||NR-m==2||NR-m==4' file

This might work for you (GNU sed):
sed -n '/\S/{n;p;n;n;p;:a;n;//ba;p}' file
Set the -n option for explicit printing. Print the second and fourth lines then throw away any non-blank lines and print the first blank one. Repeat.

Related

From linux command line, how can I remove \n from a particular line to merge two lines together? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Using the command line, how can I transform something like:
1 first line
2 second line
3 third line
4 fourth line
extra bit
5 fifth line
6 sixth line
into, say:
1 first line
2 second line
3 third line
4 fourth line; extra bit
5 fifth line
6 sixth line
The condition on which I would like to merge, is to remove any newline creating a line which does not start with a number.
I have seen answers to similar questions using the command-line tools awk, sed, and tr.
awk '/^[0-9]/{ printf "%s%s", (NR == 1 ? "" : "\n"), $0; next}
{printf "; %s", $0} END { printf "\n"}' input
I'm not really sure what you want to do when the first line does not begin with a digit, and I'm making the assumption that starting with a digit is the characteristic you are looking for to combine lines. Modify as needed.
With GNU sed:
sed "4{N;s/\n/; /}" file
With GNU awk:
awk -v line=4 'NR==line{x=$0; getline; $0=x "; " $0}1' file
Output:
1 first line
2 second line
3 third line
4 fourth line; extra bit
5 fifth line
6 sixth line
Could you please try following.
Written and tested it in
https://ideone.com/xqk4si
awk -v line_num="5" '
FNR==(line_num-1){
val=$0
next
}
val{
$0=val";"$0
val=""
}
1
' Input_file
Explanation: mentioning awk variable named line_num which has line number which OP wants to merge with its previous line. In main program checking condition if current line is just one lesser than mentioned line number of yes then create variable val and save that line. Then next condition checking if Val is SET then print previous line value semi colon and current line value and next will skip all further statements from there. 1 is way to print the current lines in awk
On second thought, it might be better to merge all lines that do not start with a number, rather than specifying by number each line to be merged.
Easy to do with ed:
printf "%s\n" '2,$g/^[^0-9]/-1s/$/; /\' '.,+1j' w | ed -s input.txt
Translated from ed's rather cryptic commands: For each line that does not start with a digit (Skipping the first line because it has no previous one to merge with), add ; to the end of the previous line, and then join those two lines. Finally save the changed file.
Example:
$ cat input.txt
1 first line
2 second line
extra stuff
3 third line
4 fourth line
extra bit
5 fifth line
6 sixth line
$ printf "%s\n" '2,$g/^[^0-9]/-1s/$/; /\' '.,+1j' w | ed -s input.txt
$ cat input.txt
1 first line
2 second line; extra stuff
3 third line
4 fourth line; extra bit
5 fifth line
6 sixth line
With GNU sed, to join any number of lines not starting with a digit:
sed -E ':a;N;s/\n([^0-9])/; \1/;ta;P;D;' file

Removing duplicate blank lines with awk

For one of my problems for class, I have a file where I am to delete duplicate blank lines in a file. so for example, I have an input file that looks like this:
Sample Line 1
Sample line 2
Sample line 3
and the output would then turn all multiple blank lines into a singular one, so the output file would look like this:
Sample Line 1
Sample line 2
Sample line 3
I've been able to complete this with a sed command, but the problem insists that I use awk in order to obtain this output.
The closest I've gotten has been with awk '!x[$0]++', but that simply deletes pretty much every blank line. I feel like I'm missing something basic.
Thanks for any help!
$ awk 'NF{c=1} (c++)<3' file
Sample Line 1
Sample line 2
Sample line 3
or if you don't mind an extra blank line at the end:
$ awk -v RS= -v ORS='\n\n' '1' file
Sample Line 1
Sample line 2
Sample line 3
Could you please try following.
awk '!NF{found++} found>1 && !NF{next} NF{found=""} 1' Input_file
Output will be as follows.
Sample Line 1
Sample line 2
Sample line 3
This also works if the file has duplicate lines at beginning or end.
awk '
NF==0{
if (! blank) {print;blank=1}
next
}
{blank=0;print}
' file
The base for its operation is that NF is zero for every blank/empty line with the default awk separator.
For example, if file is:
Sample Line 1
Sample line 2
Sample line 3
Sample line 4
it becomes
Sample Line 1
Sample line 2
Sample line 3
Sample line 4
awk 'NF || p; { p = NF }' p=1 file
For modifying multiple files at once:
gawk -i inplace 'BEGINFILE { p = 1 } NF || p; { p = NF }' file ...
Exclude p=1 or set initial value of p to 0 to also remove starting blank lines.
1
2
3 Sample Line 1
4
5
6
7 Sample line 2
8
9
10
11
12 Sample line 3
13 Sample Line 4
14
15
mawk 'ORS = "\n\n"' RS=
1 Sample Line 1
2
3 Sample line 2
4
5 Sample line 3
6 Sample Line 4
7

Add text to specific blocks defined by some special characters in sed

I have hundreds of books in text format, which will be converted to epub and pdf with pandoc. Each text file contains plain text and poems. Aligning poems is a repeated task. Every second line of each poem needs to be intended. I need to add some special character at every other line of each poem, say, ==.
My question is:
here are some text
poem line 1
poem line 2
poem line 3
poem line 4
here are some text
poem line 1
poem line 2
I need output
here are some text
poem line 1
==poem line 2
here are some text
poem line 1
==poem line 2
poem line 3
==poem line 4
My idea is:
If we define poem blocks by some special character like
~
poem line 1
poem line 2
~~
~
poem line 1
poem line 2
poem line 3
poem line 4
~~
sed finds this ~ and adds == at each 3+2 lines and ended with ~~.
output should like this
~
poem line 1
== poem line 2
~~
~
poem line 1
== poem line 2
poem line 3
== poem line 4
~~
Is it possible to do with sed or awk or any other scripts?
http://xensoft.com/use-sed-to-insert-text-every-n-lines-characters/
sed '/^$/b;n;/^$/b;s/^/--/' input
/^$/b: if the line is empty print it and start again with the next line.
n: print current line and get the next one.
s/^/--/: add special chars to the line.
Output:
here are some text
poem line 1
--poem line 2
poem line 3
--poem line 4
here are some text
poem line 1
--poem line 2
You can use delimiters as you suggested:
here are some text
#+
poem line 1
poem line 2
poem line 3
poem line 4
#-
here are some text
#+
poem line 1
poem line 2
poem line 3
#-
With this command:
sed '/#+/!b;:l;n;/#-/b;n;/#-/b;s/^/--/;bl;' input
You get:
here are some text
#+
poem line 1
--poem line 2
poem line 3
--poem line 4
#-
here are some text
#+
poem line 1
--poem line 2
poem line 3
#-
This might work for you (GNU sed):
sed '/^~\s*$/{:a;n;/^~~\s*$/b;n;//b;s/^/== /;ba}' file
Insert == before the second line of each poem, where poems are delimited by ~ and ~~.
sed is for doing s/old/new on individual strings, that is all. This is a completely inappropriate task for sed, absolutely trivial for awk and exactly the type of task awk was created to perform, and you don't need to add additional ~ delimiters to your text to just get the output you posted from the first block of input you posted:
$ awk -v RS= -F'\n' '{for (i=1; i<=NF; i++) print (i%2?"":"==") $i; print ""}' file
here are some text
poem line 1
==poem line 2
poem line 3
==poem line 4
here are some text
poem line 1
==poem line 2
The above will work using any awk in any shell on every UNIX box.

awk to copy and move of file last line to previous line above

In the awk below I am trying to move the last line only, to the one above it. The problem with the below is that since my input file varies (not always 4 lines like in the below), I can not use i=3 everytime and can not seem to fix it. Thank you :).
file
this is line 1
this is line 2
this is line 3
this is line 4
desired output
this is line 1
this is line 2
this is line 4
this is line 3
awk (seems like the last line is being moved, but to i=2)
awk '
{lines[NR]=$0}
END{
print lines[1], lines[NR];
for (i=3; i<NR; i++) {print lines[i]}
}
' OFS=$'\n' file
this is line 1
this is line 2
this is line 4
this is line 3
$ seq 4 | awk 'NR>2{print p2} {p2=p1; p1=$0} END{print p1 ORS p2}'
1
2
4
3
$ seq 7 | awk 'NR>2{print p2} {p2=p1; p1=$0} END{print p1 ORS p2}'
1
2
3
4
5
7
6
try following awk once:
awk '{a[FNR]=$0} END{for(i=1;i<=FNR-2;i++){print a[i]};print a[FNR] ORS a[FNR-1]}' Input_file
Explanation: Creating an array named a with index FNR(current line's number) and keeping it's value to current line's value. Now in END section of awk, starting a for loop from i=1 to i<=FNR-2 why till FNR-2 because you need to swap only last 2 lines here. Once it prints all the lines then simply printing a[FNR](which is last line) and then printing a[FNR-1] with ORS(to print new line).
Solution 2nd: By counting the number of lines in a Input_file and putting them into a awk variable.
awk -v lines=$(wc -l < Input_file) 'FNR==(lines-1){val=$0;next} FNR==lines{print $0 ORS val;next} 1' Input_file
You nearly had it. You just have to change the order.
awk '
{lines[NR]=$0}
END{
for (i=1; i<NR-1; i++) {print lines[i]}
print lines[NR];
print lines[NR-1];
}
' OFS=$'\n' file
I'd reverse the file, swap the first two lines, then re-reverse the file
tac file | awk 'NR==1 {getline line2; print line2} 1' | tac

print whole variable contents if the number of lines are greater than N

How to print all lines if certain condition matches.
Example:
echo "$ip"
this is a sample line
another line
one more
last one
If this file has more than 3 lines then print the whole variable.
I am tried:
echo $ip| awk 'NR==4'
last one
echo $ip|awk 'NR>3{print}'
last one
echo $ip|awk 'NR==12{} {print}'
this is a sample line
another line
one more
last one
echo $ip| awk 'END{x=NR} x>4{print}'
Need to achieve this:
If this file has more than 3 lines then print the whole file. I can do this using wc and bash but need a one liner.
The right way to do this (no echo, no pipe, no loops, etc.):
$ awk -v ip="$ip" 'BEGIN{if (gsub(RS,"&",ip)>2) print ip}'
this is a sample line
another line
one more
last one
You can use Awk as follows,
echo "$ip" | awk '{a[$0]; next}END{ if (NR>3) { for(i in a) print i }}'
one more
another line
this is a sample line
last one
you can also make the value 3 configurable from an awk variable,
echo "$ip" | awk -v count=3 '{a[$0]; next}END{ if (NR>count) { for(i in a) print i }}'
The idea is to store the contents of the each line in {a[$0]; next} as each line is processed, by the time the END clause is reached, the NR variable will have the line count of the string/file you have. Print the lines if the condition matches i.e. number of lines greater than 3 or whatever configurable value using.
And always remember to double-quote the variables in bash to avoid undergoing word-splitting done by the shell.
Using James Brown's useful comment below to preserve the order of lines, do
echo "$ip" | awk -v count=3 '{a[NR]=$0; next}END{if(NR>3)for(i=1;i<=NR;i++)print a[i]}'
this is a sample line
another line
one more
last one
Another in awk. First test files:
$ cat 3
1
2
3
$ cat 4
1
2
3
4
Code:
$ awk 'NR<4{b=b (NR==1?"":ORS)$0;next} b{print b;b=""}1' 3 # look ma, no lines
[this line left intentionally blank. no wait!]
$ awk 'NR<4{b=b (NR==1?"":ORS)$0;next} b{print b;b=""}1' 4
1
2
3
4
Explained:
NR<4 { # for tghe first 3 records
b=b (NR==1?"":ORS) $0 # buffer them to b with ORS delimiter
next # proceed to next record
}
b { # if buffer has records, ie. NR>=4
print b # output buffer
b="" # and reset it
}1 # print all records after that