How to remove all lines after a line containing some string? - awk

I need to remove all lines after the first line that contains the word "fox".
For example, for the following input file "in.txt":
The quick
brown fox jumps
over
the
lazy dog
the second fox
is hunting
The result will be:
The quick
brown fox jumps
I prefer a script made in awk or sed but any other command line tools are good, like perl or php or python etc.
I am using gnuwin32 tools in Windows, and the solution I could find was this one:
grep -m1 -n fox in.txt | cut -f1 -d":" > var.txt
set /p MyNumber=<var.txt
head -%MyNumber% in.txt > out.txt
However, I am looking for a solution that is shorter and that is portable (this one contains Windows specific command set /p).
Similar questions:
How to delete all the lines after the last occurence of pattern?
How to delete lines before a match perserving it?
Remove all lines before a match with sed
How to delete the lines starting from the 1st line till line before encountering the pattern '[ERROR] -17-12-2015' using sed?
How to delete all lines before the first and after the last occurrence of a string?

awk '{print} /fox/{exit}' file
With GNU sed:
sed '0,/fox/!d' file
or
sed -n '0,/fox/p' file

Related

Delete everything before first pattern match with sed/awk

Let's say I have a line looking like this:
/Users/random/354765478/Tests/StoreTests/Base64Tests.swift
In this example, I would like to get the result:
Tests/StoreTests/Base64Tests.swift
How can I do if I want to get everything before the first pattern match (either Sources or Tests) using sed or awk?
I am using sed 's/^.*\(Tests.*\).*$/\1/' right now but it's falling:
echo '/Users/random/354765478/Tests/StoreTests/Base64Tests.swift' | sed 's/^.*\(Tests\)/\1/'
Tests.swift
Here's another example using Sources (which seems to work):
echo '/Users/random/741672469/Sources/Store/StoreDataSource.swift' | sed 's/^.*\(Sources\)/\1/'
Sources/Store/StoreDataSource.swift
I would like to get everything before the first, and not the last Sources or Tests pattern match.
Any help would be appreciated!
How can I do if I want to get everything before the first pattern match (either Sources or Tests).
Easier to use a grep -o here:
grep -Eo '(Sources|Tests)/.*' file
Tests/StoreTests/Base64Tests.swift
Sources/Store/StoreDataSource.swift
# where input file is
cat file
/Users/random/354765478/Tests/StoreTests/Base64Tests.swift
/Users/random/741672469/Sources/Store/StoreDataSource.swift
Breakdown:
Regex pattern (Sources|Tests)/.* match any text that starts with Sources/ or Tests/ until end of the line.
-E: enables extended regex mode
-o: prints only matched text instead of full line
Alternatively you may use this awk as well:
awk 'match($0, /(Sources|Tests)\/.*/) {
print substr($0, RSTART)
}' file
Tests/StoreTests/Base64Tests.swift
Sources/Store/StoreDataSource.swift
Or this sed:
sed -E 's~.*/((Sources|Tests)/.*)~\1~' file
Tests/StoreTests/Base64Tests.swift
Sources/Store/StoreDataSource.swift
With your shown samples please try following GNU grep. This will look for very first match of /Sources OR /Tests and then print values from these strings to till end of the value.
grep -oP '^.*?\/\K(Sources|Tests)\/.*' Input_file
Using sed
$ sed -E 's~([^/]*/)+((Tests|Sources).*)~\2~' input_file
Tests/StoreTests/Base64Tests.swift
would like to get everything before the first, and not the last
Sources or Tests pattern match.
First thing is to understand reason of that, you are using
sed 's/^.*\(Tests.*\).*$/\1/'
observe that * is greedy, i.e. it will match as much as possible, therefore it will always pick last Tests, if it would be non-greedy it would find first Tests but sed does not support this, if you are using linux there is good chance that you have perl command which does support that, let file.txt content be
/Users/random/354765478/Tests/StoreTests/Base64Tests.swift
then
perl -p -e 's/^.*?(Tests.*)$/\1/' file.txt
gives output
Tests/StoreTests/Base64Tests.swift
Explanation: -p -e means engage sed-like mode, alterations in regular expression made: brackets no longer require escapes, first .* (greedy) changed to .*? (non-greedy), last .* deleted as superfluous (observe that capturing group will always extended to end of line)
(tested in perl 5, version 30, subversion 0)

Awk to print a single next word followin a pattern match

This Q is a variation on the theme of printing something after a pattern.
There will be input lines with words. Some lines will match a pattern where the pattern will be one or multiple words separated by space. The pattern might have a leading/trailing space which needs to be obeyed. I need to print the word immediately following the match.
Example input
The quick brown fox jumps over the lazy dog
Pattern : "brown fox "
Desired output : jumps
The pattern will only occur once in the line. There will always be a word following the pattern. There will be lines without the pattern.
awk or sed would be nice.
Cheers.
EDIT :
I failed to ask the question properly. There will be one or more spaces between the pattern and the next word. This breaks Andre's proposal.
% echo -e "The quick brown fox jumps over the lazy dog\n" | awk -F 'brown fox ' 'NF>1{ sub(/ .*/,"",$NF); print $NF }'
jumps
% echo -e "The quick brown fox jumps over the lazy dog\n" | awk -F 'brown fox ' 'NF>1{ sub(/ .*/,"",$NF); print $NF }'
This works, given that the desired word is followed by a space:
$ echo -e "The quick brown fox jumps over the lazy dog\n" > file
$ awk -F 'brown fox ' 'NF>1{ sub(/ .*/,"",$NF); print $NF }' file
jumps
Edit:
If there're more spaces use this:
$ awk -F 'brown fox' 'NF>1{ sub(/^ */,"",$NF);
sub(/ .*/,"",$NF); print $NF }' file
Disclaimer: this solution assumes that if no pattern is found (There will be lines without the pattern.) it is appropriate to print empty line, if this does not hold true ignore this answer entirely.
I would use AWK for this following way, let file.txt content be
The quick brown fox jumps over the lazy dog
No animals in this line
The quick brown fox jumps over the lazy dog
then
awk 'BEGIN{FS="brown fox *"}{sub(/ .*/,"",$2);print $2}' file.txt
output
jumps
jumps
Explanation: I set field seperator FS to "brown fox " followed by any numbers of spaces. What is after this will appear in 2nd column, I jettison from 2nd column anything which is after first space including said space, then print that column. In case there is no match, second column is empty and these actions result in empty line.
With GNU grep:
$ grep -oP '(?<=brown fox )(\w+)' file
jumps
If you have more than 1 space after the match:
$ echo 'The quick brown fox jumps over the lazy dog' | grep -oP '(?<=\bbrown fox\b)\s+\K(\w+)'
jumps
Perl, with the same regex:
$ perl -lnE 'print $1 if /(?<=\bbrown fox )(\w+)/' file
Or, if you have multiple spaces:
$ perl -lnE 'print $1 if /(?<=brown fox)\s+(\w+)/' file
(As stated in comments, both the GNU grep and Perl regex could be \bbrown\h+fox\h+\K\w+ which has the advantage of supporting multiple spaces between brown and fox)
With awk, you can split on the string and split the result (this works as-is for multi spaces):
pat='brown fox'
awk -v pat="$pat" 'index($0, pat){
split($0,arr, pat)
split(arr[2], arr2)
print arr2[1]}' file
With GNU awk, you might also use a capture group with function match.
\ybrown\s+fox\s+(\w+)
\y A word boundary
brown\s+ Match brown and 1+ whitespace chars
fox\s+ Match fox and 1+ whitespace chars
(\w+) Capture 1+ word chars in group 1
In awk, get the group 1 value using arr[1]
Example
echo "The quick brown fox jumps over the lazy dog" |
awk 'match($0,/\ybrown\s+fox\s+(\w+)/, arr) {print arr[1]}'
Output
jumps
See a bash demo

Get the line number of the last line with non-blank characters

I have a file which has the following content:
10 tiny toes
tree
this is that tree
5 funny 0
There are spaces at the end of the file. I want to get the line number of the last row of a file (that has characters). How do I do that in SED?
This is easily done with awk,
awk 'NF{c=FNR}END{print c}' file
With sed it is more tricky. You can use the = operator but this will print the line-number to standard out and not in the pattern space. So you cannot manipulate it. If you want to use sed, you'll have to pipe it to another or use tail:
sed -n '/^[[:blank:]]*$/!=' file | tail -1
You can use following pseudo-code:
Replace all spaces by empty string
Remove all <beginning_of_line><end_of_line> (the lines, only containing spaces, will be removed like this)
Count the number of remaining lines in your file
It's tough to count line numbers in sed. Some versions of sed give you the = operator, but it's not standard. You could use an external tool to generate line numbers and do something like:
nl -s ' ' -n ln -ba input | sed -n 's/^\(......\)...*/\1/p' | sed -n '$p'
but if you're going to do that you might as well just use awk.
This might work for you (GNU sed):
sed -n '/\S/=' file | sed -n '$p'
For all lines that contain a non white space character, print a line number. Pipe this output to second invocation of sed and print only the last line.
Alternative:
grep -n '\S' file | sed -n '$s/:.*//p'

If pattern matched delete newline character in that line

Let say pattern is string "Love"
input
This is some text
Love this or that
He is running like a rabbit
output
This is some text
Love this or thatHe is running like a rabbit
I've noticed that sed is very unpleasant for deleting newline characters, any idea?
You can use this:
sed '/^Love/{N;s/\n//;}' love.txt
details:
/^Love/ identifies the line to treat, if you like you can use /[Ll]ove/ instead
N adds the next line to the pattern space. After this command the pattern space contains Love this or that\nHe is running like a rabbit
s/\n// replaces the newline character
Perl:
$ perl -pe 's/^(Love[^\n]*)\n/\1/' file.txt
This is some text
Love this or thatHe is running like a rabbit
Or, if the intent is solely focused on the \n you can chomp based on a pattern:
$ perl -pe 'chomp if /^Love/' file.txt
This is some text
Love this or thatHe is running like a rabbit
$ awk '/Love/{printf "%s ",$0;next} 1' file
This is some text
Love this or that He is running like a rabbit
Explanation:
/Love/{printf "%s ",$0;next}
For lines that contain Love, the line is printed, via printf, without a newline. awk then starts over on the next line.
1
For lines that don't include Love, they are printed normally (with a newline). The 1 command is awk's cryptic shorthand for print normally.
Through Perl,
$ perl -pe 's/^Love.*\K\n//' file
This is some text
Love this or thatHe is running like a rabbit
\K discards previously matched characters.
OR
$ perl -pe '/^Love/ && s/\n//' file
This is some text
Love this or thatHe is running like a rabbit
If a line starts with the string Love, then it removes the newline character from that line.
Here is another awkvariation:
awk '{ORS=(/Love/?FS:RS)}1' file
This is some text
Love this or that He is running like a rabbi
This change the ORS based on the pattern
Here are some other awk
awk '{printf "%s%s",$0,(/Love/?FS:RS)}' file
This is some text
Love this or that He is running like a rabbit
If line has Love in it use FS as separator, else use RS
This should work too, but use the first one.
awk '{printf "%s"(/Love/?FS:RS),$0}' file

Extract data from ASCII file with grep/AWK

I have a long ASCII log-file from a simulation and need to extract some data from it.
The lines I want have the structure:
Main step= 1 a= 0.00E+00 b=-6.85E-08 c= 4.58E-08
The phrase "Main step" is only used in the lines I want. This is easy to grep for, but I also want to include the next line following the line above, which has the structure:
Fine step= 1 t=-1.31854E+01
Note that "Fine step" is used other places in the log-file.
My question boils down to this: How can I extract the lines containing a keyword/phrase (here "Main step") and also make sure that I get the next following line using grep or AWK or some other standard Linux program?
You can use sed
sed -n '/Main step/,/./p' inputFile
This prints only the lines in a range starting from Main step and ending with . (the wildcard). Effectively, every line which reads Main step and the following are printed.
Posted according to the tag awk. And the one through awk's getline function,
awk '/Main step/{print; getline; print}' file
It would print the Main step line and also the next line.
Because you tagged "grep", and since this is the most obvious solution to me:
grep -A1 'Main step' file
...although this will add "--" between matches. So to get the same output as the awk and sed answer:
grep -A1 'Main step' file | grep -v '^--$'