How to match pattern after first 32 letters using the grep? - awk

I was trying to filter lines with pattern 04:26. I expected the command,
cat file1.txt | grep -E '04:26'
to filter the lines which contain 04:26 after timestamps. Instead, I got the second line also.
file1.txt
2022-12-23T04:26:47.748412+00:00 raspberrypi dnsmasq-dhcp[698]: DHCPREQUEST(eth0) 192.168.42.17 04:c8:07:23:04:26
2022-12-23T04:26:47.749307+00:00 raspberrypi dnsmasq-dhcp[698]: DHCPACK(eth0) 192.168.42.17 04:c8:07:23:34:13
How to mask the first 32 letters of timestamps from matching?

You may use this grep:
grep -E '^.{32,}04:26' file
2022-12-23T04:26:47.748412+00:00 raspberrypi dnsmasq-dhcp[698]: DHCPREQUEST(eth0) 192.168.42.17 04:c8:07:23:04:26
Breakdown:
^: Start
.{32,}: Match 32 or more characters
04:26: Match 04:26
Alternatively you can use this grep as well:
grep ' .*04:26' file
Considering the fact that you want to ignore timestamp text that is before first space in each line.
An awk solution:
awk '$NF ~ /04:26/' file

With your shown samples please try following awk code. Simple explanation would be, setting field separator to 32 characters from starting of line, then in main program checking if 2nd field is matching everything till : followed by 04:26 if this condition matches then print that line.
awk -F'^.{32}' '$2~/^.*:04:26/' Input_file

With awk checking that there are 2 or more fields as the value that you don't want to match is in the first field.
awk 'NF > 1 && $NF ~ /04:26/' file
Or with awk checking that the line has more than 37 characters and match 04:26 in last field.
awk 'length($0) > 37 && index($NF, "04:26")' file
Or grep matching 32 or more characters and then match 04:26
grep -E '^.{32,}04:26' file
Output
2022-12-23T04:26:47.748412+00:00 raspberrypi dnsmasq-dhcp[698]: DHCPREQUEST(eth0) 192.168.42.17 04:c8:07:23:04:26

There are many simple ways you can do this, trying to avoid side cases. The cleanest way would be the programmatical way in which you identify what you try to search for. The robust way would be awk but you can do it also with grep pipe-lines:
grep for MAC-address:
$ grep -E '([[:xdigit:]]{2}:){5}[[:xdigit:]]{2}' file
$ awk '/([[:xdigit:]]{2}:){5}[[:xdigit:]]{2}/' file
grep for MAC-address that ends with 04:46:
$ grep -E '([[:xdigit:]]{2}:){4}04:46' file
$ awk '/([[:xdigit:]]{2}:){4}04:46/' file
grep for MAC-address in last field that ends with 04:46:
$ grep -E '([[:xdigit:]]{2}:){4}04:46[[:blank:]]*$' file
$ awk '$NF~/([[:xdigit:]]{2}:){4}04:46/' file
grep for MAC-address that contains with 04:46:
$ grep -oE '([[:xdigit:]]{2}:){5}[[:xdigit:]]{2}' file | grep '04:46' | grep -Ff - file
$ awk 'match($0,/([[:xdigit:]]{2}:){5}[[:xdigit:]]{2}/) && substr($0,RSTART,RLENGTH)~/04:46/' file

How to mask first 32 letters
You might use cut to get 33th and following character in each line, let file1.txt content be
2022-12-23T04:26:47.748412+00:00 raspberrypi dnsmasq-dhcp[698]: DHCPREQUEST(eth0) 192.168.42.17 04:c8:07:23:04:26
2022-12-23T04:26:47.749307+00:00 raspberrypi dnsmasq-dhcp[698]: DHCPACK(eth0) 192.168.42.17 04:c8:07:23:34:13
then
cut --characters=33- file.txt
gives output
raspberrypi dnsmasq-dhcp[698]: DHCPREQUEST(eth0) 192.168.42.17 04:c8:07:23:04:26
raspberrypi dnsmasq-dhcp[698]: DHCPACK(eth0) 192.168.42.17 04:c8:07:23:34:13
which could then by fused with your code as follows
cut --characters=33- file.txt | grep -E '04:26'
that result in output output
raspberrypi dnsmasq-dhcp[698]: DHCPREQUEST(eth0) 192.168.42.17 04:c8:07:23:04:26
Explanation: --characters= is used to select certain characters from each line, 33- means 33th character and following.
(tested in GNU grep 3.4)

Related

Caret regexp produces no output in mawk

I am trying to print all files in /usr/bin/ where the filename starts with a v. This works,
ls -lA /usr/bin/ | awk '{print $9}' | grep ^v
Surprisingly, this returns no output,
ls -lA /usr/bin/ | awk '/^v/ {print $9}'.
I don't understand the difference. I am running Ubuntu 21.10 with awk -W version saying that it is on 1.3.4 20200120.
Edit: I understand that awk may not be the best way to accomplish what I am wanting to do here. But, this is an exercise in learning awk by testing my understanding via comparing it to the real output.
The difference between the two pipelines is that the first outputs the 9th column and then check to see if that starts with a v the second checks to see if the line starts with a v, change the second to:
$ ls -lA /usr/bin/ | awk '$9 ~ /^v/ {print $9}'
When writing:
/pattern/ { ... }
it's the same as writing
$0 ~ /pattern/ { ... }
but in your case you want to compare the 9th column, so write that instead.
But you really don't want to create a pipeline for this, and what would happen if your files contain a space?
You can consider using find or globs instead:
$ printf '%s\n' /usr/bin/v*
/usr/bin/vi
/usr/bin/view
...
or
$ find /usr/bin -name 'v*' -print
/usr/bin/vi
/usr/bin/view
...

AWK print between two characters

When I try this command:
/usr/bin/curl -s sketch*.zip "https://www.sketch.com/downloads/mac/" |\
grep 'download.sketchapp.com/sketch-' | awk 'NR==1{print $3}'
The output is:
content="0;URL='https://download.sketchapp.com/sketch-68.2-102594.zip
what I am looking to get is:
68.2
Any help would be appreciated.
It seems you want to extract the number after your pattern, only for the first matcing row. You can use one grep command:
... | grep -oPm1 '(?<=download.sketchapp.com/sketch-)[^-]+' file
or as this is the 3rd field of your 1st curl output row you want, you can use one awk command (split field using hyphen as separator to array and print the element in the middle):
awk '/download.sketchapp.com/sketch-/ && NR==1 {split($3,a,"-"); print a[2]; exit}'
Using sed:
/usr/bin/curl -s sketch*.zip "https://www.sketch.com/downloads/mac/" | \
sed -n 's!.*download.sketchapp.com/sketch-\([^-]*\).*!\1!p;' | \
head -1
head is to get rid of multiple matches. sed command extracts non-hyphen characters after download.sketchapp.com/sketch-.

How can I print only lines that are immediately preceeded by an empty line in a file using sed?

I have a text file with the following structure:
bla1
bla2
bla3
bla4
bla5
So you can see that some lines of text are preceeded by an empty line.
I understand that sed has the concept of two buffers, a pattern space buffer and a hold space buffer, so I'm guessing these need to come in to play here, but I'm unclear how to specify them to accomplish what I need.
In my contrived example above, I'd expect to see the following lines outputted:
bla3
bla5
sed is for doing s/old/new on individual lines, that is all. Any time you start talking about buffers or doing anything related to multi-lines comparisons you're using the wrong tool.
You could do this with awk:
$ awk -v RS= -F'\n' 'NR>1{print $1}' file
bla3
bla5
but it would fail to print the first non-empty line if the first line(s) in the file were empty so this may be what you want if you want lines of all space chars considered to be empty lines:
$ awk 'NF && !p{print} {p=NF}' file
bla3
bla5
and this otherwise:
$ awk '($0!="") && (p==""){print} {p=$0}' file
bla3
bla5
All of the above will work even if there are multiple empty lines preceding any given non-empty line.
To see the difference between the 3 approaches (which you won't see given the sample input in the question):
PS1> printf '\nfoo\n \nbar\n\netc\n' | cat -E
$
foo$
$
bar$
$
etc$
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk -v RS= -F'\n' 'NR>1{print $1}'
etc
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk 'NF && !p{print} {p=NF}'
foo
bar
etc
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk '($0!="") && (p==""){print} {p=$0}'
foo
etc
You can use the hold buffer easily to print the line before the blank like this:
sed -n -e '/^$/{x; p;}' -e h input
But I don't see an easy way to use it for your use case. For your case, instead of using the hold buffer, you could do:
sed -n -e '/^$/ba' -e d -e :a -e n -e p input
But I would do this with awk.
awk 'NR!=1{print $1}' RS= FS=\\n input-file
awk 'p;{p=/^$/}' file
above command does these for each line:
if p is 1, print line;
if line is empty, set p to 1.
if lines consisting of one or more spaces are also considered empty:
awk 'p;{p=!NF}' file
to print non-empty lines each coming right after an empty line, you can use this:
awk 'p*!(p=/^$/)' file
if p is 1 and this line is not empty (1*!(0) = 1*1 = 1), print this line;
otherwise (1*!(1) = 1*0 = 0, 0*anything = 0), don't print anything.
note that this one may not work with all awks, a portable version of this would look like:
awk 'p*(/./);{p=/^$/}' file
if lines consisting of one or more spaces are also considered empty:
awk 'p*NF;{p=!NF}' file
see them online here, and here.
If sed/awk is not mandatory, you can do it with grep:
grep -A 1 '^$' input.txt | grep -v -E '^$|--'
You can use sed to match a range of lines and do sub-matches inside the matches, like so:
# - use the "-n" option to omit printing of lines
# - match lines between a blank line (/^$/) and a non-blank one (/^./),
# then print only the line that contains at least a character,
# i.e, the non-blank line.
sed -ne '
/^$/,/^./ {
/^./{ p; }
}' input.txt
tested by gnu sed, your data in 'a':
$ sed -nE '/^$/{N;s/\n(.+)/\1/p}' a
bla3
bla5
add -i option precedes -n to real editing

Removing blank lines

I have a csv file in which every other line is blank. I have tried everything, nothing removes the lines. What should make it easier is that the the digits 44 appear in each valid line. Things I have tried:
grep -ir 44 file.csv
sed '/^$/d' <file.csv
cat -A file.csv
sed 's/^ *//; s/ *$//; /^$/d' <file.csv
egrep -v "^$" file.csv
awk 'NF' file.csv
grep '\S' file.csv
sed 's/^ *//; s/ *$//; /^$/d; /^\s*$/d' <file.csv
cat file.csv | tr -s \n
Decided I was imagining the blank lines, but import into Google Sheets and there they are still! Starting to question my sanity! Can anyone help?
sed -n -i '/44/p' file
-n means skip printing
-i inplace (overwrite same file)
- /44/p print lines where '44' exists
without '44' present
sed -i '/^\s*$/d' file
\s is matching whitespace, ^startofline, $endofline, d delete line
Use the -i option to replace the original file with the edited one.
sed -i '/^[ \t]*$/d' file.csv
Alternatively output to another file and rename it, which is doing the exactly what -i does.
sed '/^[[:blank:]]*$/d' file.csv > file.csv.out && mv file.csv.out file.csv
Given:
$ cat bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
You can remove blank lines with Perl:
$ perl -lne 'print unless /^\s*$/' bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
awk:
$ awk 'NF>0' bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
sed + tr:
$ cat bl.txt | tr '\t' ' ' | sed '/^ *$/d'
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
Just sed:
$ sed '/^[[:space:]]*$/d' bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
Aside from the fact that your commands do not show that you capture their output in a new file to be used in place of the original, there's nothing wrong with them, EXCEPT that:
cat file.csv | tr -s \n
should be:
cat file.csv | tr -s '\n' # more efficient alternative: tr -s '\n' < file.csv
Otherwise, the shell eats the \ and all that tr sees is n.
Note, however, that the above only eliminates only truly empty lines, whereas some of your other commands also eliminate blank lines (empty or all-whitespace).
Also, the -i (for case-insensitive matching) in grep -ir 44 file.csv is pointless, and while using -r (for recursive searches) will not change the fact that only file.csv is searched, it will prepend the filename followed by : to each matching line.
If you have indeed captured the output in a new file and that file truly still has blank lines, the cat -A (cat -et on BSD-like platforms) you already mention in your question should show you if any unusual characters are present in the file, in the form of ^<char> sequences, such as ^M for \r chars.
If you like awk, this should do:
awk '/44/' file
It will only print lines that contains 44

How to grep from within specified character range in line and then print entire line

I have a file which have multiple row each row contains 3400 characters. I want to grep something from specified character range, let's say I want to grep "pavan" between character range 14 to 25 in the line.
To do this I can simply do like below
cat filename | cut -c 14-25 | grep pavan
I tried to use awk command but it does not work since the lines have more than `3000 characters
but by this complete line will not print.
I want to print complete line also so that I can perform further operation on it.
awk -v pattern="pavan" 'match( substr($0, 14, 11), pattern )' file
Will print the matching lines.
A more complicated way of doing the same thing:
awk -v patt="pavan" -v start=14 -v end=25 '
match($0,patt) && start <= RSTART && RSTART <= end-RLENGTH
' file
-- stricken due to valid commentary from Ed Morton.
Some bit of arithmetic and you could use grep:
grep -E '^.{13}.{0,7}pavan' filename
This would match lines containing pavan between the specified character range.
It essentially matches 13 arbitrary characters at the beginning of a line. Then looks for pavan that can be preceded by 0 to 7 arbitrary characters.
This is not very elegant, but does work!
Start off with what you had, but remove the unnecessary cat:
cut -c 14-25 file
now get awk to find the string you want and print the line number:
cut -c 14-25 file | awk '/paven/{print NR}'
Now you have a list of all the line numbers that you want. You can either process them in a while loop, like this:
cut -c 14-25 file | awk '/pavan/{print NR}' | while read line; do
echo $line
sed -n "${line} p"
done
or put them in an array
lines=($(cut -c 14-25 file | awk '/pavan/{print NR}'))
echo ${lines[#]}