What is a more efficient way to get the first match from a reverse file search using some combination of awk grep and sed

What is a more efficient way to get the first match from a reverse file search using some combination of awk grep and sed - awk

I am working on an operating system with limited utilities. Utilities like tail, head, and tac are not available! sed, awk, and Grep are available, but grep does not have the -m option for stopping after the first find. see the list of available options here.
My goal is to search for a line containing a string in a potentially large log.txt file, maybe ~100Mb from the end in reverse and print it. The trick is the operation has to be fast: no more than 3-4sec tops.
I tried using sed to reverse the contents of the file into another and then using awk and grep in a loop to search chunks of 10,000 lines, but the sed reverse was way too slow for anything beyond a few Mb
Something I tried.
self.sed_line_search = 10001
self.sed_cmd = "sed -e :a -e '$q;N;"+str(self.sed_line_search)+",$D;ba'"
self.awk_cmd = "awk '/Version/{print}'"
self.Command = self.sed_cmd + " " + LOGFILE_PATH + " | " + self.awk_cmd + "\n"
tries, max_tries = 1,5
while tries < max_tries:
ret = execute(self.Command)
if not ret:
self.sed_line_search += 10000
self.sed_cmd = "sed -e :a -e '$q;N;"+str(self.sed_line_search)+",$D;ba'"
self.Command = self.sed_cmd + " " + LOGFILE_PATH + " | " + self.awk_cmd + "\n"
tries += 1
With out knowing how to stop at the fist match without the grep -m 1 option, this slightly achieves that goal by only looking at a few thousand lines at a time. But, It does not search in reverse.

Not sure if it this you want. It search for all line with test and prints them in reveres.
cat file
dfsdf
test1
fsdfsdf
fdg
sfdgs
fdgsdf
gsfdg
sfdte
test2
dgsfdgsdf
fdgsfdg
sdfgs
df
test3
sfdgsfdg
awk '/test/ {a[++x]=$0} END {for (i=x;i>=1;i--) print a[i]}' file
test3
test2
test1

This might work for you (GNU sed):
sed -n '/regexp/h;$!b;x;p' file
Copy the line that matches regexp to the hold space and at the end of the file print the hold space.

IMHO the fastest you could do would be:
grep 'regexp' | sed -n '$p'

Related

How do I decrement all array indexes in a text file?

Background
I have a text file that looks like the following:
$SomeText.element_[1]="MoreText[3]";\r"
$SomeText.element_[2]="MoreText[6]";\r"
$SomeText.element_[3]="MoreText[2]";\r"
$SomeText.element_[4]="MoreText[1]";\r"
$SomeText.element_[5]="MoreText[5]";\r"
This goes on for over a thousand lines. I want to do the following:
$SomeText.element_[0]="MoreText[3]";\r"
$SomeText.element_[1]="MoreText[6]";\r"
$SomeText.element_[2]="MoreText[2]";\r"
$SomeText.element_[3]="MoreText[1]";\r"
$SomeText.element_[4]="MoreText[5]";\r"
Each line of text in the file should have the left most index reduced by one, with the rest of the text unchanged.
Attempted Solutions
So far I have tried the following...but the issue for me is I do not know how to feed it back into the file properly:
Attempt 1
I tried a double cutting technique:
cat file.txt | cut -d '[' -f2 | cut -d ']' -f1 | xargs -I {} expr {} + 1
This properly outputs all of the indicies reduced by one to the command line.
Attempt 2
I tried using awk with a mix of sed, but this caused by machine to hang:
awk -F'[' '{printf("%d\n", $2-1)}' file.txt | xargs -I {} sed -i 's/\[\d+\]/{}/g' file.txt
Question
How to I properly decrement all of the array indexes in the file by one and properly write the decremented indexes into the right location of the text file?

A Perl one-liner makes this easy, overwriting the input file:
perl -pi -e 's/(\d+)/$1-1/e' your-file-name-here
(assuming the first number on each line is the index)

With simple awk you could try following, written and tested with shown samples.
awk '
match($0,/\[[^]]*/){
print substr($0,1,RSTART) count++ substr($0,RSTART+RLENGTH)
}
' Input_file
OR in case your Input_file's count in between [..] is in any order then simply reduce 1 from them as follows.
awk '
match($0,/\[[^]]*/){
print substr($0,1,RSTART) substr($0,RSTART+1,RLENGTH)-1 substr($0,RSTART+RLENGTH)
}
' Input_file

With GNU sed and bash:
sed -E "s/([^[]*\[)([0-9]+)(].*)/printf '%s%d%s\n' '\1' \$((\2 - 1)) '\3'/e" file
Or, if it is possible that the lines contain ' character:
sed -E "
/\[[0-9]+]/{
s/'/'\\\''/g
s/([^[]*\[)([0-9]+)(].*)/printf '%s%d%s\n' '\1' \$((\2 - 1)) '\3'/e
}" file

How to print lines only after specific line (pattern) in awk? [duplicate]

Question: I'd like to print a single line directly following a line that contains a matching pattern.
My version of sed will not take the following syntax (it bombs out on +1p) which would seem like a simple solution:
sed -n '/ABC/,+1p' infile
I assume awk would be better to do multiline processing, but I am not sure how to do it.

Never use the word "pattern" in this context as it is ambiguous. Always use "string" or "regexp" (or in shell "globbing pattern"), whichever it is you really mean. See How do I find the text that matches a pattern? for more about that.
The specific answer you want is:
awk 'f{print;f=0} /regexp/{f=1}' file
or specializing the more general solution of the Nth record after a regexp (idiom "c" below):
awk 'c&&!--c; /regexp/{c=1}' file
The following idioms describe how to select a range of records given a specific regexp to match:
a) Print all records from some regexp:
awk '/regexp/{f=1}f' file
b) Print all records after some regexp:
awk 'f;/regexp/{f=1}' file
c) Print the Nth record after some regexp:
awk 'c&&!--c;/regexp/{c=N}' file
d) Print every record except the Nth record after some regexp:
awk 'c&&!--c{next}/regexp/{c=N}1' file
e) Print the N records after some regexp:
awk 'c&&c--;/regexp/{c=N}' file
f) Print every record except the N records after some regexp:
awk 'c&&c--{next}/regexp/{c=N}1' file
g) Print the N records from some regexp:
awk '/regexp/{c=N}c&&c--' file
I changed the variable name from "f" for "found" to "c" for "count" where
appropriate as that's more expressive of what the variable actually IS.
f is short for found. Its a boolean flag that I'm setting to 1 (true) when I find a string matching the regular expression regexp in the input (/regexp/{f=1}). The other place you see f on its own in each script it's being tested as a condition and when true causes awk to execute its default action of printing the current record. So input records only get output after we see regexp and set f to 1/true.
c && c-- { foo } means "if c is non-zero then decrement it and if it's still non-zero then execute foo" so if c starts at 3 then it'll be decremented to 2 and then foo executed, and on the next input line c is now 2 so it'll be decremented to 1 and then foo executed again, and on the next input line c is now 1 so it'll be decremented to 0 but this time foo will not be executed because 0 is a false condition. We do c && c-- instead of just testing for c-- > 0 so we can't run into a case with a huge input file where c hits zero and continues getting decremented so often it wraps around and becomes positive again.

It's the line after that match that you're interesting in, right? In sed, that could be accomplished like so:
sed -n '/ABC/{n;p}' infile
Alternatively, grep's A option might be what you're looking for.
-A NUM, Print NUM lines of trailing context after matching lines.
For example, given the following input file:
foo
bar
baz
bash
bongo
You could use the following:
$ grep -A 1 "bar" file
bar
baz
$ sed -n '/bar/{n;p}' file
baz

I needed to print ALL lines after the pattern ( ok Ed, REGEX ), so I settled on this one:
sed -n '/pattern/,$p' # prints all lines after ( and including ) the pattern
But since I wanted to print all the lines AFTER ( and exclude the pattern )
sed -n '/pattern/,$p' | tail -n+2 # all lines after first occurrence of pattern
I suppose in your case you can add a head -1 at the end
sed -n '/pattern/,$p' | tail -n+2 | head -1 # prints line after pattern
And I really should include tlwhitec's comment in this answer (since their sed-strict approach is the more elegant than my suggestions):
sed '0,/pattern/d'
The above script deletes every line starting with the first and stopping with (and including) the line that matches the pattern. All lines after that are printed.

awk Version:
awk '/regexp/ { getline; print $0; }' filetosearch

If pattern match, copy next line into the pattern buffer, delete a return, then quit -- side effect is to print.
sed '/pattern/ { N; s/.*\n//; q }; d'

Actually sed -n '/pattern/{n;p}' filename will fail if the pattern match continuous lines:
$ seq 15 |sed -n '/1/{n;p}'
2
11
13
15
The expected answers should be:
2
11
12
13
14
15
My solution is:
$ sed -n -r 'x;/_/{x;p;x};x;/pattern/!s/.*//;/pattern/s/.*/_/;h' filename
For example:
$ seq 15 |sed -n -r 'x;/_/{x;p;x};x;/1/!s/.*//;/1/s/.*/_/;h'
2
11
12
13
14
15
Explains:
x;: at the beginning of each line from input, use x command to exchange the contents in pattern space & hold space.
/_/{x;p;x};: if pattern space, which is the hold space actually, contains _ (this is just a indicator indicating if last line matched the pattern or not), then use x to exchange the actual content of current line to pattern space, use p to print current line, and x to recover this operation.
x: recover the contents in pattern space and hold space.
/pattern/!s/.*//: if current line does NOT match pattern, which means we should NOT print the NEXT following line, then use s/.*// command to delete all contents in pattern space.
/pattern/s/.*/_/: if current line matches pattern, which means we should print the NEXT following line, then we need to set a indicator to tell sed to print NEXT line, so use s/.*/_/ to substitute all contents in pattern space to a _(the second command will use it to judge if last line matched the pattern or not).
h: overwrite the hold space with the contents in pattern space; then, the content in hold space is ^_$ which means current line matches the pattern, or ^$, which means current line does NOT match the pattern.
the fifth step and sixth step can NOT exchange, because after s/.*/_/, the pattern space can NOT match /pattern/, so the s/.*// MUST be executed!

This might work for you (GNU sed):
sed -n ':a;/regexp/{n;h;p;x;ba}' file
Use seds grep-like option -n and if the current line contains the required regexp replace the current line with the next, copy that line to the hold space (HS), print the line, swap the pattern space (PS) for the HS and repeat.

Piping some greps can do it (it runs in POSIX shell and under BusyBox):
cat my-file | grep -A1 my-regexp | grep -v -- '--' | grep -v my-regexp
-v will show non-matching lines
-- is printed by grep to separate each match, so we skip that too

If you just want the next line after a pattern, this sed command will work
sed -n -e '/pattern/{n;p;}'
-n supresses output (quiet mode);
-e denotes a sed command (not required in this case);
/pattern/ is a regex search for lines containing the literal combination of the characters pattern (Use /^pattern$/ for line consisting of only of “pattern”;
n replaces the pattern space with the next line;
p prints;
For example:
seq 10 | sed -n -e '/5/{n;p;}'
Note that the above command will print a single line after every line containing pattern. If you just want the first one use sed -n -e '/pattern/{n;p;q;}'. This is also more efficient as the whole file is not read.
This strictly sed command will print all lines after your pattern.
sed -n '/pattern/,${/pattern/!p;}
Formatted as a sed script this would be:
/pattern/,${
/pattern/!p
}
Here’s a short example:
seq 10 | sed -n '/5/,${/5/!p;}'
/pattern/,$ will select all the lines from pattern to the end of the file.
{} groups the next set of commands (c-like block command)
/pattern/!p; prints lines that doesn’t match pattern. Note that the ; is required in early versions, and some non-GNU, of sed. This turns the instruction into a exclusive range - sed ranges are normally inclusive for both start and end of the range.
To exclude the end of range you could do something like this:
sed -n '/pattern/,/endpattern/{/pattern/!{/endpattern/d;p;}}
/pattern/,/endpattern/{
/pattern/!{
/endpattern/d
p
}
}
/endpattern/d is deleted from the “pattern space” and the script restarts from the top, skipping the p command for that line.
Another pithy example:
seq 10 | sed -n '/5/,/8/{/5/!{/8/d;p}}'
If you have GNU sed you can add the debug switch:
seq 5 | sed -n --debug '/2/,/4/{/2/!{/4/d;p}}'
Output:
SED PROGRAM:
/2/,/4/ {
/2/! {
/4/ d
p
}
}
INPUT: 'STDIN' line 1
PATTERN: 1
COMMAND: /2/,/4/ {
COMMAND: }
END-OF-CYCLE:
INPUT: 'STDIN' line 2
PATTERN: 2
COMMAND: /2/,/4/ {
COMMAND: /2/! {
COMMAND: }
COMMAND: }
END-OF-CYCLE:
INPUT: 'STDIN' line 3
PATTERN: 3
COMMAND: /2/,/4/ {
COMMAND: /2/! {
COMMAND: /4/ d
COMMAND: p
3
COMMAND: }
COMMAND: }
END-OF-CYCLE:
INPUT: 'STDIN' line 4
PATTERN: 4
COMMAND: /2/,/4/ {
COMMAND: /2/! {
COMMAND: /4/ d
END-OF-CYCLE:
INPUT: 'STDIN' line 5
PATTERN: 5
COMMAND: /2/,/4/ {
COMMAND: }
END-OF-CYCLE:

How to grab the content of command "who" except 'pts/0'? [duplicate]

I have a file with three columns. I would like to delete the 3rd column(in-place editing). How can I do this with awk or sed?
123 abc 22.3
453 abg 56.7
1236 hjg 2.3
Desired output
123 abc
453 abg
1236 hjg

try this short thing:
awk '!($3="")' file

With GNU awk for inplace editing, \s/\S, and gensub() to delete
1) the FIRST field:
awk -i inplace '{sub(/^\S+\s*/,"")}1' file
or
awk -i inplace '{$0=gensub(/^\S+\s*/,"",1)}1' file
2) the LAST field:
awk -i inplace '{sub(/\s*\S+$/,"")}1' file
or
awk -i inplace '{$0=gensub(/\s*\S+$/,"",1)}1' file
3) the Nth field where N=3:
awk -i inplace '{$0=gensub(/\s*\S+/,"",3)}1' file
Without GNU awk you need a match()+substr() combo or multiple sub()s + vars to remove a middle field. See also Print all but the first three columns.

This might work for you (GNU sed):
sed -i -r 's/\S+//3' file
If you want to delete the white space before the 3rd field:
sed -i -r 's/(\s+)?\S+//3' file

It seems you could simply go with
awk '{print $1 " " $2}' file
This prints the two first fields of each line in your input file, separated with a space.

Try using cut... its fast and easy
First you have repeated spaces, you can squeeze those down to a single space between columns if thats what you want with tr -s ' '
If each column already has just one delimiter between it, you can use cut -d ' ' -f-2 to print fields (columns) <= 2.
for example if your data is in a file input.txt you can do one of the following:
cat input.txt | tr -s ' ' | cut -d ' ' -f-2
Or if you better reason about this problem by removing the 3rd column you can write the following
cat input.txt | tr -s ' ' | cut -d ' ' --complement -f3
cut is pretty powerful, you can also extract ranges of bytes, or characters, in addition to columns
excerpt from the man page on the syntax of how to specify the list range
Each LIST is made up of one range, or many ranges separated by commas.
Selected input is written in the same order that it is read, and is
written exactly once. Each range is one of:
N N'th byte, character or field, counted from 1
N- from N'th byte, character or field, to end of line
N-M from N'th to M'th (included) byte, character or field
-M from first to M'th (included) byte, character or field
so you also could have said you want specific columns 1 and 2 with...
cat input.txt | tr -s ' ' | cut -d ' ' -f1,2

Try this :
awk '$3="";1' file.txt > new_file && mv new_file file.txt
or
awk '{$3="";print}' file.txt > new_file && mv new_file file.txt

Try
awk '{$3=""; print $0}'

If you're open to a Perl solution...
perl -ane 'print "$F[0] $F[1]\n"' file
These command-line options are used:
-n loop around every line of the input file, do not automatically print every line
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace
-e execute the following perl code

How to extract the final word of a sentence

For a given text file I'd like to extract the final word in every sentence to a space-delimited text file. It would be acceptable to have a few errors for words like Mr. and Dr., so I don't need to try to achieve that level of precision.
I was thinking I could do this with Sed and Awk, but it's been too long since I've worked with them and I don't remember where to begin. Help?
(Output example: For the previous two paragraphs, I'd like to see this):
file Mr Dr precision begin Help

Using this regex:
([[:alpha:]]+)[.!?]
Explanation
Grep can do this:
$ echo "$txt" | grep -o -E '([[:alpha:]]+)[.!?]'
file.
Mr.
Dr.
precision.
begin.
Help?
Then if you want only the words, a second time through:
$ echo "$txt" | grep -o -E '([[:alpha:]]+)[.!?]' | grep -o -E '[[:alpha:]]+'
file
Mr
Dr
precision
begin
Help
In awk, same regex:
$ echo "$txt" | awk '/[[:alpha:]]+[.!?]/{for(i=1;i<=NF;i++) if($i~/[[:alpha:]]+[.!?]/) print $i}'
Perl, same regex, allows capture groups and maybe a little more direct syntax:
$ echo "$txt" | perl -ne 'print "$1 " while /([[:alpha:]]+)[.!?]/g'
file Mr Dr precision begin Help
And with Perl, it is easier to refine the regex to be more discriminating about the words captured:
echo "$txt" | perl -ne 'print "$1 " while /([[:alpha:]]+)(?=[.!?](?:(?:\s+[[:upper:]])|(?:\s*\z)))/g'
file precision begin Help

gawk:
$ gawk -v ORS=' ' -v RS='[.?!]' '{print $NF}' w.txt
file Mr Dr precision begin Help
(Note that plain awk does not support assigning a regular expression to RS.)

This might work for you (GNU sed):
sed -r 's/^[^.?!]*\b(\w+)[.?!]/\1\n/;/\n/!d;P;D' file
For one word per line or use paste for a single line so:
sed -r 's/^[^.?!]*\b(\w+)[.?!]/\1\n/;/\n/!d;P;D' file | paste -sd' '
For another solution just using sed:
sed -r 'H;$!d;x;s/\n//g;s/\b(\w+)[.?!]/\n\1\n/g;/\n/!d;s/[^\n]*\n([^\n]*)\n/ \1/g;s/.//' file

Easy in Perl:
perl -ne 'print "$1 " while /(\w+)[.!?]/g'
-n reads the input line by line.
\w matches a "word character".
\w+ matches one or more word characters.
[.!?] matches any of the sentence-end markers.
/g stands for "globally" - it remembers where the last match occurred and tries to match after it.

Shell script: How to split line?

here's my scanerio:
my input file like:
/tmp/abc.txt
/tmp/cde.txt
/tmp/xyz/123.txt
and i'd like to obtain the following output in 2 files:
first file
/tmp/
/tmp/
/tmp/xyz/
second file
abc.txt
cde.txt
123.txt
thanks a lot

Here is all in one single awk
awk -F\/ -vOFS=\/ '{print $NF > "file2";$NF="";print > "file1"}' input
cat file1
/tmp/
/tmp/
/tmp/xyz/
cat file2
abc.txt
cde.txt
123.txt
Here we set input and output separator to /
Then print last field $NF to file2
Set the last field to nothing, then print the rest to file1

I realize you already have an answer, but you might be interested in the following two commands:
basename
dirname
If they're available on your system, you'll be able to get what you want just piping through these:
cat input | xargs -l dirname > file1
cat input | xargs -l basename > file2
Enjoy!
Edit: Fixed per quantdev's comment. Good catch!

Through grep,
grep -o '.*/' file > file1.txt
grep -o '[^/]*$' file > file2.txt
.*/ Matches all the characters from the start upto the last / symbol.
[^/]*$ Matches any character but not of / zero or more times. $ asserts that we are at the end of a line.

The awk solution is probably the best, but here is a pure sed solution :
#n sed script to get base and file paths
h
s/.*\/\(.*.txt\)/\1/
w file1
g
s/\(.*\)\/.*.txt/\1/
w file2
Note how we hold the buffer with h, and how we use the write (w) command to produce the output files. There are many other ways to do it with sed, but I like this one for using multiple different commands.
To use it :
> sed -f sed_script testfile

Here is another oneliner that uses tee:cat f1.txt | tee >(xargs -n 1 dirname >> f2.txt) >(xargs -n 1 basename >> f3.txt) &>/dev/random

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

What is a more efficient way to get the first match from a reverse file search using some combination of awk grep and sed - awk

Not sure if it this you want. It search for all line with test and prints them in reveres. cat file dfsdf test1 fsdfsdf fdg sfdgs fdgsdf gsfdg sfdte test2 dgsfdgsdf fdgsfdg sdfgs df test3 sfdgsfdg awk '/test/ {a[++x]=$0} END {for (i=x;i>=1;i--) print a[i]}' file test3 test2 test1

This might work for you (GNU sed): sed -n '/regexp/h;$!b;x;p' file Copy the line that matches regexp to the hold space and at the end of the file print the hold space.

IMHO the fastest you could do would be: grep 'regexp' | sed -n '$p'

Related

How do I decrement all array indexes in a text file?

How to print lines only after specific line (pattern) in awk? [duplicate]

How to grab the content of command "who" except 'pts/0'? [duplicate]

How to extract the final word of a sentence

Shell script: How to split line?

Categories

Resources