grep/awk - how to filter out a certain keyword - awk

I have the following line of text, where i want to filter out the N from (KEY_N) etc. Keep in mind that the N is not constant, it can be anything, like (KEY_J), (KEY_K), (KEY_L), (KEY_I), (KEY_SPACE) and so on..
Event: time 1442439135.995248, type 1 (EV_KEY), code 49 (KEY_N), value 0

Update:
I hope that I got the question properly, if not then please let me know.
Having GNU grep you can use this:
grep -oP '.*\(\K[^)]+' file
An alternative on non GNU systems might be to use sed:
sed 's/.*(\([^)]\{1,\}\)).*/\1/' file

Related

Issues converting a small Hex value to a Binary value

I am trying to take the contents of a file that has a Hex number and convert that number to Binary and output to a file.
This is what I am trying but not getting the binary value:
xxd -r -p Hex.txt > Binary.txt
The contents of Hex.txt is: ff
I have also tried FF and 0xFF, but would like to just use ff since the device I am pulling the info from has it in that format.
Instead of 11111111 which it should be, I get a y with 2 dots above it.
If I change it to ee, I get an i with 2 dots. It seems to be reading it just fine but according to what I have read on the xxd -r -p command, it is not outputing it in the correct format.
The other ways I have found to convert Hex to Binary have either also not worked or is a pretty big Bash script that seems unnecessary to do what I thought would be a simple task.
This also gives me the y with 2 dots.
$ for i in $(cat Hex.txt) ; do printf "\x$i" ; done > Binary.txt
For some reason almost every solution I find gives me this format instead of a human readable Binary value with 1s and 0s.
Any help is appreciated. I am planning on using this in a script to pull the Relay values from Digital Loggers devices using curl and giving Home Assistant a readable file to record the Relay State. Digital Loggers curl cmd gives the state of all 8 relays at once using Hex instead of being able to pull the status of a specific relay.
If "file.txt" contains:
fe
0a
and you run this:
perl -ane 'printf("%08b\n",hex($_))' file.txt
You'll get this:
11111110
00001010
If you use it a lot, you might want to make a bash function of it in your login profile along these lines - being extremely respectful of spaces and semi-colons that might look unnecessary:
bin(){ perl -ane 'printf("%08b\n",hex($_))' $1 ; }
Then you'll be able to do:
bin file.txt
If you dislike Perl for some reason, you can achieve something similar without it as follows:
tr '[:lower:]' '[:upper:]' < file.txt |
while read h ; do
echo "obase=2; ibase=16; $h" | bc
done

awk/sed - generate an error if 2nd address of range is missing

We are currently using sed to filter output of regression runs. Sometimes we have a filter that looks like this:
/copyright/,/end copyright/d
If that end copyright is ever missing, the rest of the file is deleted. I'm wondering if there's some way to generate an error for this? awk would also be okay to use. I don't really want to add code that reads the file line by line and issues an error if it hits EOF.
here's a string
copyright
2016 jan 15
end copyright
date 2016 jan 5 time 15:36
last one
I'd like to get an error if end copyright is missing. The real filter also would replace the date line with DATE, so it's more that just ripping out the copyright.
You can persuade sed to generate an error if you reach end of input (i.e. see address $) between your start and end, but it won't be a very helpful message:
/copyright/,/end copyright/{
$s//\1/ # here
d
}
This will error if end copyright is missing or on the last line, with an exit status of 1 and the helpful message:
sed: -e expression #1, char 0: invalid reference \1 on `s' command's RHS
If you're using this in a makefile, you might want to echo a helpful message first, or (better) to wrap this in something that catches the error and produces a more useful one.
I tested this with GNU sed; though if you are using GNU sed, you could more easily use its useful extension:
q [EXIT-CODE]
This command only accepts a single address.
Exit 'sed' without processing any more commands or input. Note
that the current pattern space is printed if auto-print is not
disabled with the -n options. The ability to return an exit code
from the 'sed' script is a GNU 'sed' extension.
Q [EXIT-CODE]
This command only accepts a single address.
This command is the same as 'q', but will not print the contents of
pattern space. Like 'q', it provides the ability to return an exit
code to the caller.
So you could simply write
/copyright/,/end copyright/{
$Q 42
d
}
Never use range expressions /start/,/end/ as they make trivial code very slightly briefer but require a complete rewrite or duplicate conditions when you have the tiniest requirements change. Always use a flag instead. Note that since sed doesn't support variables, it doesn't support flag variables, and so you shouldn't be using sed you should be using awk instead.
In this case your original code would be:
awk '/copyright/{f=1} !f; /end copyright/{f=0}' file
And your modified code would be:
awk '/copyright/{f=1} !f; /end copyright/{f=0} END{if (f) print "Missing end copyright"}' file
The above is obviously untested since you didn't provide any sample input/output we could test a potential solution against.
With sed you can build a loop:
sed -e '/copyright/{:a;/end copyright/d;N;ba;};' file
:a defines the label "a"
/copyright end/d deletes the pattern space, only when "end copyright" matches
N appends the next line to the pattern space
ba jumps to the label "a"
Note that d ends the loop.
In this way you can avoid to delete the text until the end.
If you don't want the text to be displayed at all and prefer an error message when a "copyright" block stays unclosed, you obviously need to wait the end of the file. You can do it with sed too storing all the lines in the buffer space until the end:
sed -n -e '/copyright/{:a;/end copyright/d;${c\ERROR MESSAGE
;};N;ba;};H;${g;p};' file
H appends the current line to the buffer space
g put the content of the buffer space to the pattern space
The file content is only displayed once the last line reached with ${g;p} otherwise when the closing "end copyright" is missing, the current line is changed in the error message with ${c\ERROR MESSAGE\n;} inside the loop.
This way you can test what returns sed before redirecting it to whatever you want.

Extract multiple data between multiple, same tags

Further to the solution in Extract data between two tags, of which extract one set of CIDR, now we just come across something more complex.
The result returned is as follow:
<tr><td class='bold vmiddle'> Owner CIDR: </td><td><span class='jtruncate-text'>80.245.225.0/24, 80.245.226.0/23, 80.245.228.0/22, 80.245.232.0/22, 80.245.236.0/23, 80.245.238.0/24</span></td></tr>
It has 6 CIDR:
80.245.225.0/24, 80.245.226.0/23, 80.245.228.0/22, 80.245.232.0/22, 80.245.236.0/23, 80.245.238.0/24
In fact, for other queries we don't know how many CIDRs would be returned.
What solution in bash should we use? Expand the sed string in the linked question? Or something completely different?
Can anyone help?
maybe this could help you;
grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}</a>/[0-9]{1,2}' yourFile | sed 's/<\/a>//'
Eg;
user#host:/tmp$ grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}</a>/[0-9]{1,2}' test | sed 's/<\/a>//'
80.245.225.0/24
80.245.226.0/23
80.245.228.0/22
80.245.232.0/22
80.245.236.0/23
80.245.238.0/24

extracting subtext between two characters using grep

I have text file which has information like follows
#Mp_chzt_1
asdjhsadhasdhdbjashdjaudashdjashdasdhasdhasdh
asdasdkasjdkaskdskadkasdkasdkjaskldasdklasdas
ahsjdasdfdfsdhghrtuztiuiuzozuoiouiouiouiouiou
asjkjieqjeroiweoriksfjksjksjkf
+
!!!#!!!!!!!!++??????????????~~~~~~~~~~~~~
BBBBBBBBBBBBMMMMMM!!!!!++LLLLLL******
#Mp_btrea_1
uokjjkzghqawsdasduihdlöklöaklöskdlkaökgzgzggz
asdasduzuqwtzeqweuvixcvdjfiisduiifuzwpqüqwoeü
kjkjiuijwiqquzwuziziqz
+
**********||||||||||||#########++++?????????
MMMMMMMMMUUUU***+++~~~~~~~~~~~~~~~~~~~~~~~~~~
#Mp_trwe_3
jhtrqhkjiqkjkqwjelasjjljiewkjkljkldjflsjljki8u
immhgwqtzopirpjgbsdkfjieipwippieoroeirkvsdjjfk
jkahdjhjhfuhjkwekksjakjeiuwiurweiurioweuroweod
poplrtm,ernmjhazqweqwjidiipfiopdifosidpfppsdif
mnasnbdhgqweqweipoipoxkajksdökalsklsaksldkasöd
asdas
+
!!!!!!!!!!!!!!!!!!#####???????????????????
I would like extract the region only between #Mp_* and + that comes right below the text and export it to txt file like following
#Mp_chzt_1
asdjhsadhasdhdbjashdjaudashdjashdasdhasdhasdh
asdasdkasjdkaskdskadkasdkasdkjaskldasdklasdas
ahsjdasdfdfsdhghrtuztiuiuzozuoiouiouiouiouiou
asjkjieqjeroiweoriksfjksjksjkf
#Mp_btrea_1
uokjjkzghqawsdasduihdlöklöaklöskdlkaökgzgzggz
asdasduzuqwtzeqweuvixcvdjfiisduiifuzwpqüqwoeü
kjkjiuijwiqquzwuziziqz
#Mp_trwe_3
jhtrqhkjiqkjkqwjelasjjljiewkjkljkldjflsjljki8u
immhgwqtzopirpjgbsdkfjieipwippieoroeirkvsdjjfk
jkahdjhjhfuhjkwekksjakjeiuwiurweiurioweuroweod
poplrtm,ernmjhazqweqwjidiipfiopdifosidpfppsdif
mnasnbdhgqweqweipoipoxkajksdökalsklsaksldkasöd
asdas
When I used the following code
grep -o -P '(?<=#MP.*).*(?=+)' query.txt > output.txt
It gave me "grep: nothing to repeat".
Could anyone guide where my mistake is and how to rectify it.
Thanks in advance.
Better use awk for this:
awk '/^#/{f=1} /^+/ {f=0} f' file > output.txt
Or, if you have leading spaces, match them with \s*:
awk '/^\s*#/{f=1} /^\s*\+/ {f=0} f' file > output.txt
This uses a flag f to decide whether the line should be printed or not.
When it sees a line starting with #, it activates it.
When it sees a line starting with +, it deactivates it.
Then, it evaluates the flag and prints if it is True.
With your given input it returns:
#Mp_chzt_1
asdjhsadhasdhdbjashdjaudashdjashdasdhasdhasdh
asdasdkasjdkaskdskadkasdkasdkjaskldasdklasdas
ahsjdasdfdfsdhghrtuztiuiuzozuoiouiouiouiouiou
asjkjieqjeroiweoriksfjksjksjkf
#Mp_btrea_1
uokjjkzghqawsdasduihdlöklöaklöskdlkaökgzgzggz
asdasduzuqwtzeqweuvixcvdjfiisduiifuzwpqüqwoeü
kjkjiuijwiqquzwuziziqz
#Mp_trwe_3
jhtrqhkjiqkjkqwjelasjjljiewkjkljkldjflsjljki8u
immhgwqtzopirpjgbsdkfjieipwippieoroeirkvsdjjfk
jkahdjhjhfuhjkwekksjakjeiuwiurweiurioweuroweod
poplrtm,ernmjhazqweqwjidiipfiopdifosidpfppsdif
mnasnbdhgqweqweipoipoxkajksdökalsklsaksldkasöd
asdas

awk getline skipping to last line -- possible newline character issue

I'm using
while( (getline line < "filename") > 0 )
within my BEGIN statement, but this while loop only seems to read the last line of the file instead of each line. I think it may be a newline character problem, but really I don't know. Any ideas?
I'm trying to read the data in from a file other than the main input file.
The same syntax actually works for one file, but not another, and the only difference I see is that the one for which it DOES work has "^M" at the end of each line when I look at it in Vim, and the one for which it DOESN'T work doesn't have ^M. But this seems like an odd problem to be having on my (UNIX based) Mac.
I wish I understood what was going with getline a lot better than I do.
You would have to specify RS to something more vague.
Here is a ugly hack to get things working
RS="[\x0d\x0a\x0d]"
Now, this may require some explanation.
Diffrent systems use difrent ways to handle change of line.
Read http://en.wikipedia.org/wiki/Carriage_return and http://en.wikipedia.org/wiki/Newline if you are
interested in it.
Normally awk hadles this gracefully, but it appears that in your enviroment, some files are being naughty.
0x0d or 0x0a or 0x0d 0x0a (CR+LF) should be there, but not mixed.
So lets try a example of a mixed data stream
$ echo -e "foo\x0d\x0abar\x0d\x0adoe\x0arar\x0azoe\x0dqwe\x0dtry" |awk 'BEGIN{while((getline r )>0){print "r=["r"]";}}'
Result:
r=[foo]
r=[bar]
r=[doe]
r=[rar]
try]oe
We can see that the last lines are lost.
Now using the ugly hack to RS
$ echo -e "foo\x0d\x0abar\x0d\x0adoe\x0arar\x0azoe\x0dqwe\x0dtry" |awk 'BEGIN{RS="[\x0d\x0a\x0d]";while((getline r )>0){print "r=["r"]";}}'
Result:
r=[foo]
r=[bar]
r=[doe]
r=[rar]
r=[zoe]
r=[qwe]
r=[try]
We can see every line is obtained, reguardless of the 0x0d 0x0a junk :-)
Maybe you should preprocess your input file with for example dos2unix (http://sourceforge.net/projects/dos2unix/) utility?