awk - print between patterns - two lines after the first - awk

I have a file that looks like that
y
z
pattern1
line
1
1
1
patern2
x
k
What I want to do is print the content between the two patterns with the following restrictions
Avoid printing the patterns
Skip the next line after the first pattern
This means that my output file should look like this
1
1
1
So far I am able to print between patterns, ignoring them by using
awk '/pattern1/{flag=1;next}/pattern2/{flag=0}flag' file
Any idea on how to do it?

Try this:
awk '/pattern1/{i=1;next}/patern2/{i=0}{if(i==1){i++;next}}i' File

$ awk '/pattern1/,/patern2/{i++} /patern2/{i=0} i>2' file
1
1
1
Between patterns increment i, after 2 records start printing (i>2) and reset i at the end marker.

you can record the start line number when pattern1 matched:
awk '/pattern1/{s=NR+1;p=1;next}/pattern2/{p=0}p&&NR>s' file
The next could be saved if there is no line matches both pattern1 and pattern2

Related

Delete a line with a pattern ONLY when the previous line has another specific pattern

I am trying to delete all lines with a specific pattern (PATTERN 2) only when the previous line has another specific pattern (PATTERN 1).
The code looks like this:
PATTERN 1
PATTERN 2 <- This line should be deleted
NNN
PATTERN 2
PATTERN 1
PATTERN 2 <- This line should be deleted
blabla
PATTERN 1
blabla
PATTERN 2
PATTERN 1
PATTERN 2 <- This line should be deleted
PATTERN 2 should be deleted ONLY when the previous line is PATTERN 1
I know how to delete all lines with PATTERN 2 : sed '/PATTERN 2/d'
and I can delete all lines that follow PATTERN 1: sed '/PATTERN 1/{n;N;d}'
However, I don't know to apply both requirements to a single AWK or SED.
How can this be done with AWK? Thank you in advance,
Assuming by "PATTERN" you mean "partial regexp match" since that's what you use in the sed scripts in your question:
$ awk '!(/PATTERN 2/ && prev~/PATTERN 1/); {prev=$0}' file
PATTERN 1
NNN
PATTERN 2
PATTERN 1
blabla
PATTERN 1
blabla
PATTERN 2
PATTERN 1
Mac_3.2.57$cat input | awk '{if(lastline!="PATTERN 1"||$0!="PATTERN 2"){print}};{lastline=$0}'
PATTERN 1
NNN
PATTERN 2
PATTERN 1
blabla
PATTERN 1
blabla
PATTERN 2
PATTERN 1
Mac_3.2.57$cat input
PATTERN 1
PATTERN 2
NNN
PATTERN 2
PATTERN 1
PATTERN 2
blabla
PATTERN 1
blabla
PATTERN 2
PATTERN 1
PATTERN 2
Mac_3.2.57$
With your shown samples/attempts, please try following awk code.
awk '/PATTERN 1/{found=1;print;next} found && /PATTERN 2/{found="";next} 1' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/PATTERN 1/{ ##Checking if line contains PATTERN 1 then do following.
found=1 ##Setting found to 1 here.
print ##printing current line here.
next ##next will skip all further statements from here.
}
found && /PATTERN 2/{ ##Checking condition if found is NOT NULL AND PATTERN 2 is found.
found="" ##Nullifying found here.
next ##next will skip all further statements from here.
}
1 ##printing current line here.
' Input_file ##Mentioning Input_file name here.
Another variation could be checking if the current line matches PATTERN 2 and the last line matches PATTERN 1.
If that is the case, print the current line, else move to the next line without printing it.
awk '{if(/PATTERN 2/&&last~/PATTERN 1/){last=$0;next}last=$0}1' file
See an awk demo.
In a more readable format:
awk '
{
if (/PATTERN 2/ && last ~ /PATTERN 1/) { # If both patterns match
last = $0 # Save the last line, but don't print
next # Go on to the next record
}
last = $0 # Save the last line
}1 # Print the line
' file

From linux command line, how can I remove \n from a particular line to merge two lines together? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Using the command line, how can I transform something like:
1 first line
2 second line
3 third line
4 fourth line
extra bit
5 fifth line
6 sixth line
into, say:
1 first line
2 second line
3 third line
4 fourth line; extra bit
5 fifth line
6 sixth line
The condition on which I would like to merge, is to remove any newline creating a line which does not start with a number.
I have seen answers to similar questions using the command-line tools awk, sed, and tr.
awk '/^[0-9]/{ printf "%s%s", (NR == 1 ? "" : "\n"), $0; next}
{printf "; %s", $0} END { printf "\n"}' input
I'm not really sure what you want to do when the first line does not begin with a digit, and I'm making the assumption that starting with a digit is the characteristic you are looking for to combine lines. Modify as needed.
With GNU sed:
sed "4{N;s/\n/; /}" file
With GNU awk:
awk -v line=4 'NR==line{x=$0; getline; $0=x "; " $0}1' file
Output:
1 first line
2 second line
3 third line
4 fourth line; extra bit
5 fifth line
6 sixth line
Could you please try following.
Written and tested it in
https://ideone.com/xqk4si
awk -v line_num="5" '
FNR==(line_num-1){
val=$0
next
}
val{
$0=val";"$0
val=""
}
1
' Input_file
Explanation: mentioning awk variable named line_num which has line number which OP wants to merge with its previous line. In main program checking condition if current line is just one lesser than mentioned line number of yes then create variable val and save that line. Then next condition checking if Val is SET then print previous line value semi colon and current line value and next will skip all further statements from there. 1 is way to print the current lines in awk
On second thought, it might be better to merge all lines that do not start with a number, rather than specifying by number each line to be merged.
Easy to do with ed:
printf "%s\n" '2,$g/^[^0-9]/-1s/$/; /\' '.,+1j' w | ed -s input.txt
Translated from ed's rather cryptic commands: For each line that does not start with a digit (Skipping the first line because it has no previous one to merge with), add ; to the end of the previous line, and then join those two lines. Finally save the changed file.
Example:
$ cat input.txt
1 first line
2 second line
extra stuff
3 third line
4 fourth line
extra bit
5 fifth line
6 sixth line
$ printf "%s\n" '2,$g/^[^0-9]/-1s/$/; /\' '.,+1j' w | ed -s input.txt
$ cat input.txt
1 first line
2 second line; extra stuff
3 third line
4 fourth line; extra bit
5 fifth line
6 sixth line
With GNU sed, to join any number of lines not starting with a digit:
sed -E ':a;N;s/\n([^0-9])/; \1/;ta;P;D;' file

print whole variable contents if the number of lines are greater than N

How to print all lines if certain condition matches.
Example:
echo "$ip"
this is a sample line
another line
one more
last one
If this file has more than 3 lines then print the whole variable.
I am tried:
echo $ip| awk 'NR==4'
last one
echo $ip|awk 'NR>3{print}'
last one
echo $ip|awk 'NR==12{} {print}'
this is a sample line
another line
one more
last one
echo $ip| awk 'END{x=NR} x>4{print}'
Need to achieve this:
If this file has more than 3 lines then print the whole file. I can do this using wc and bash but need a one liner.
The right way to do this (no echo, no pipe, no loops, etc.):
$ awk -v ip="$ip" 'BEGIN{if (gsub(RS,"&",ip)>2) print ip}'
this is a sample line
another line
one more
last one
You can use Awk as follows,
echo "$ip" | awk '{a[$0]; next}END{ if (NR>3) { for(i in a) print i }}'
one more
another line
this is a sample line
last one
you can also make the value 3 configurable from an awk variable,
echo "$ip" | awk -v count=3 '{a[$0]; next}END{ if (NR>count) { for(i in a) print i }}'
The idea is to store the contents of the each line in {a[$0]; next} as each line is processed, by the time the END clause is reached, the NR variable will have the line count of the string/file you have. Print the lines if the condition matches i.e. number of lines greater than 3 or whatever configurable value using.
And always remember to double-quote the variables in bash to avoid undergoing word-splitting done by the shell.
Using James Brown's useful comment below to preserve the order of lines, do
echo "$ip" | awk -v count=3 '{a[NR]=$0; next}END{if(NR>3)for(i=1;i<=NR;i++)print a[i]}'
this is a sample line
another line
one more
last one
Another in awk. First test files:
$ cat 3
1
2
3
$ cat 4
1
2
3
4
Code:
$ awk 'NR<4{b=b (NR==1?"":ORS)$0;next} b{print b;b=""}1' 3 # look ma, no lines
[this line left intentionally blank. no wait!]
$ awk 'NR<4{b=b (NR==1?"":ORS)$0;next} b{print b;b=""}1' 4
1
2
3
4
Explained:
NR<4 { # for tghe first 3 records
b=b (NR==1?"":ORS) $0 # buffer them to b with ORS delimiter
next # proceed to next record
}
b { # if buffer has records, ie. NR>=4
print b # output buffer
b="" # and reset it
}1 # print all records after that

Word Count using AWK

I have file like below :
this is a sample file
this file will be used for testing
this is a sample file
this file will be used for testing
I want to count the words using AWK.
the expected output is
this 2
is 1
a 1
sample 1
file 2
will 1
be 1
used 1
for 1
the below AWK I have written but getting some errors
cat anyfile.txt|awk -F" "'{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}'
It works fine for me:
awk '{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}' testfile
used 1
this 2
be 1
a 1
for 1
testing 1
file 2
will 1
sample 1
is 1
PS you do not need to set -F" ", since its default any blank.
PS2, do not use cat with programs that can read data itself, like awk
You can add sort behind code to sort it.
awk '{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}' testfile | sort -k 2 -n
a 1
be 1
for 1
is 1
sample 1
testing 1
used 1
will 1
file 2
this 2
Instead of looping each line and saving the word in array ({for(i=1;i<=NF;i++) a[$i]++}) use gawk with multi-char RS (Record Separator) definition support option and save each field in array as following(It's a little bit fast):
gawk '{a[$0]++} END{for (k in a) print k,a[k]}' RS='[[:space:]]+' file
Output:
used 1
this 2
be 1
a 1
for 1
testing 1
file 2
will 1
sample 1
is 1
In above gawk command I defines space-character-class [[:space:]]+ (including one or more spaces or \new line character) as record separator.
Here is Perl code which provides similar sorted output to Jotne's awk solution:
perl -ne 'for (split /\s+/, $_){ $w{$_}++ }; END{ for $key (sort keys %w) { print "$key $w{$key}\n"}}' testfile
$_ is the current line, which is split based on whitespace /\s+/
Each word is then put into $_
The %w hash stores the number of occurrences of each word
After the entire file is processed, the END{} block is run
The keys of the %w hash are sorted alphabetically
Each word $key and number of occurrences $w{$key} is printed

Awk - use particular line again to match with patterns

Suppose I have file:
1Alorem
2ipsuml
3oremip
4sumZAl
5oremip
6sumlor
7emZips
I want to split text from lines containing A to lines containing Z match with range:
/A/,/Z/ {
print > "rangeX.txt"
}
I want this particular input to give me 2 files:
1Alorem
2ipsuml
3oremip
4sumZAl
and
4sumZAl
5oremip
6sumlor
7emZips
problem is that line 4 is taken only once ad is matched as end of range, but 2nd range never starts because there is no A in other lines.
Is there a way to try to match line 4 again against all patterns or tell awk that it has to start new range?
Thanks
As Arne pointed out the second section will not be caught but the current pattern. Here is an alternative without the range.
awk 'p==0 {p= (~/A/)>0;filenr++} p==1 {print > "range"filenr".txt"; p= (~/Z/)==0; if(!p && ~/A/){filenr++;;p=1; print > "range"filenr".txt"}}' test.txt
It also handles more than two sections
All you need to do is save the last line of the first range to a variable and then reprint that variable, along with the following range, for the second file.
In other words, since you're just looping through each line, define an empty variable in your BEGIN and then update it each time through. You'll have the variable saved as the last line when your range ends. Write out that line to the next file before you begin again.
There is no way to rematch a record, but writing a variant of the pattern is an option. Here the second range pattern matches from a line containing A and Z to a line containing Z but not A:
awk "/A/,/Z/ {print 1, $0} (/A/ && /Z/),(/Z/ && !/A/) {print 2, $0}"
prints:
1 1Alorem
1 2ipsuml
1 3oremip
1 4sumZAl
2 4sumZAl
2 5oremip
2 6sumlor
2 7emZips
As your sample is a bit synthetic I don't know if that solution fits your real problem.