linux grep with perl regex greedy not work

linux grep with perl regex greedy not work - regex-lookarounds

my perl version on linux server is :
This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
I have a test as below.
echo "mac:abcdefg1234" | grep -Po "(?<=mac:).*(?=\d+)"
The result is abcdefg123.
But the greedy match does not work.The result I want is abcdefg.
How can I get the content between "mac:" and "digital" (as many as is allowed)

(?<=mac:)[^\d]*(?=\d+) thats the content beetween.
[^\d]* means all not digital with length >=0. Typing a ^ after [ negates the character class. The result is that the character class matches any character that is not in the character class. It also match (invisible) line break characters.

Related

How to put multiple charsets at one position in hashcat?

I want to search for all letters and some special characters at the same time
?ludhHs?ludhHs?ludhHs?ludhHs?ludhHs?ludhHs?ludhHs?ludhHs?lludhHs
Something like this for a 9 Letter password with all characters. Does this work?

It sounds like you want to use the built-in charset ?a, which is a shortcut for ?l?u?d?s.
Try out this command:
hashcat -a 3 -m 0 your.hash ?a?a?a?a?a?a?a?a?a
Don't forget to swap out the mode and hash file for whatever you are cracking.
If you really were trying to use a custom charset with ?l and the characters udhHs, that's mostly redundant since ?l is already all the lowercase letters, but here's an example for that custom charset:
hashcat -a 3 -1 ?ludhHs -m 0 your.hash ?1?1?1?1?1?1?1?1?1

removing unconventional field separators (^#^#^#) in a text file [duplicate]

I have a text file containing unwanted null characters (ASCII NUL, \0). When I try to view it in vi I see ^# symbols, interleaved in normal text. How can I:
Identify which lines in the file contain null characters? I have tried grepping for \0 and \x0, but this did not work.
Remove the null characters? Running strings on the file cleaned it up, but I'm just wondering if this is the best way?

I’d use tr:
tr < file-with-nulls -d '\000' > file-without-nulls
If you are wondering if input redirection in the middle of the command arguments works, it does. Most shells will recognize and deal with I/O redirection (<, >, …) anywhere in the command line, actually.

Use the following sed command for removing the null characters in a file.
sed -i 's/\x0//g' null.txt
this solution edits the file in place, important if the file is still being used. passing -i'ext' creates a backup of the original file with 'ext' suffix added.

A large number of unwanted NUL characters, say one every other byte, indicates that the file is encoded in UTF-16 and that you should use iconv to convert it to UTF-8.

I discovered the following, which prints out which lines, if any, have null characters:
perl -ne '/\000/ and print;' file-with-nulls
Also, an octal dump can tell you if there are nulls:
od file-with-nulls | grep ' 000'

If the lines in the file end with \r\n\000 then what works is to delete the \n\000 then replace the \r with \n.
tr -d '\n\000' <infile | tr '\r' '\n' >outfile

Here is example how to remove NULL characters using ex (in-place):
ex -s +"%s/\%x00//g" -cwq nulls.txt
and for multiple files:
ex -s +'bufdo!%s/\%x00//g' -cxa *.txt
For recursivity, you may use globbing option **/*.txt (if it is supported by your shell).
Useful for scripting since sed and its -i parameter is a non-standard BSD extension.
See also: How to check if the file is a binary file and read all the files which are not?

I used:
recode UTF-16..UTF-8 <filename>
to get rid of zeroes in file.

I faced the same error with:
import codecs as cd
f=cd.open(filePath,'r','ISO-8859-1')
I solved the problem by changing the encoding to utf-16
f=cd.open(filePath,'r','utf-16')

Remove trailing null character at the end of a PDF file using PHP, . This is independent of OS
This script uses PHP to remove a trailing NULL value at the end of a binary file, solving a crashing issue that was triggered by the NULL value. You can edit this script to remove all NULL characters, but seeing it done once will help you understand how this works.
Backstory
We were receiving PDF's from a 3rd party that we needed to upload to our system using a PDF library. In the files being sent to us, there was a null value that was sometimes being appended to the PDF file. When our system processed these files, files that had the trailing NULL value caused the system to crash.
Originally we were using sed but sed behaves differently on Macs and Linux machines. We needed a platform independent method to extract the trailing null value. Php was the best option. Also, it was a PHP application so it made sense :)
This script performs the following operation:
Take the binary file, convert it to HEX (binary files don't like exploding by new lines or carriage returns), explode the string using carriage return as the delimiter, pop the last member of the array if the value is null, implode the array using carriage return, process the file.
//In this case we are getting the file as a string from another application.
// We use this line to get a sample bad file.
$fd = file_get_contents($filename);
//We trim leading and tailing whitespace and convert the string into hex
$bin2hex = trim(bin2hex($fd));
//We create an array using carriage return as the delminiter
$bin2hex_ex = explode('0d0a', $bin2hex);
//look at the last element. if the last element is equal to 00 we pop it off
$end = end($bin2hex_ex);
if($end === '00') {
array_pop($bin2hex_ex);
}
//we implode the array using carriage return as the glue
$bin2hex = implode('0d0a', $bin2hex_ex);
//the new string no longer has the null character at the EOF
$fd = hex2bin($bin2hex);

SHA256 generation different for file and content of this file

I use online SHA256 converters to calculate a hash for a given file. There, I have seen an effect I don't understand.
For testing purposes, I wanted to calculate the hash for a very simple file. I named it "test.txt", and its only content is the string "abc", followed by a new line (I just pressed enter).
Now, when I put "abc" and newline into a SHA256 generator, I get the hash
edeaaff3f1774ad2888673770c6d64097e391bc362d7d6fb34982ddf0efd18cb
But when I put the complete file into the same generator, I get the hash
552bab6864c7a7b69a502ed1854b9245c0e1a30f008aaa0b281da62585fdb025
Where does the difference come from? I used this generator (in fact, I tried several ones, and they always yield the same result):
https://emn178.github.io/online-tools/sha256_checksum.html
Note that this different does not arise without newlines. If the file just contains the string "abc", the hash is
ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
for the file as well as just for the content.

As noted in my comment, the difference is caused by how newline characters are represented across different operating systems (see details here):
On UNIX and UNIX-like systems, newlines are represented by a line feed character (\n).
On DOS and Windows systems, newlines are represented by a carriage return followed by a line feed character (\r\n).
Compare the following two commands and their output, corresponding to the SHA256 values in your question:
echo -en "abc\n" | sha256sum
edeaaff3f1774ad2888673770c6d64097e391bc362d7d6fb34982ddf0efd18cb
echo -en "abc\r\n" | sha256sum
552bab6864c7a7b69a502ed1854b9245c0e1a30f008aaa0b281da62585fdb025

The issue you are having could come from the character encoding of the new line.
In windows the new line is escaped with \r\n and in linux is escaped with \n.
These 2 have a different dec value (\r is 13 and \n is 10).
More info you can find here:
https://en.wikipedia.org/wiki/Newline
https://en.wikipedia.org/wiki/List_of_Unicode_characters

Even i faced same issue. but providing the data in hex mode helped to understand the actual behavior.
Canonicalization of data needs to be performed before SHA calculations which will eliminate such issues. Canonicalization needs to be performed both at Generation side and also at verification side.

Character-set of SSH keys (safe delimiter for using sed with public keys)

I am using sed to replace a placeholder in a script with my public ssh key. The character / is definitely present in some SSH keys, how can I find out which character I can use as delimiter for sed?
I am looking for an answer of either the set of all characters that can be part of the string generated by ssh-keygen, or which characters are guaranteed not to.

The public key in opnessh format is base64-encoded as mentioned for example in the manual page for sshd. Therefore you can use any character that is not in the list of base64 characters. The / is there but for example | can be used safely (though in the comment section can be anything).

For information, from the info sed, section 3.5:
The '/' characters may be uniformly replaced by any other single character within any given 's' command.
The '/' character (or whatever other character is used in its stead) can
appear in the REGEXP or REPLACEMENT only if it is preceded by a '\'
character.
So you can chose any suitable character that doesn't appear in your input data.

Match the start of the file or a newline (Ragel)

I'm using ragel with C as the host language.
I can recognise a newline simply with '\n', but I need to recognise the start of the file as an alternative.
In other implementations of regex this could be given by \A or $, but $ is reserved for other purposes, '\A' maps to something else (alarm?) and \A gives a parser error.

I don't think there's an escape sequence for that. However, you can detect it by checking if Ragel's ts variable equals 0.

In text format You have 3 choice:
Old mac to nr 9 \n\r (and Commodore, Apple II, Microware OS-9)
Unices and new Mac OS X \n ( and BeOS, AmigaOS, MorphOS, RISC OS, Multics)
Windows uses \r\n (and DOS, OS/2, Symbian, DEC RT-11)
in Ragel defining end of line
endline = ( "\r" | "\n" )+ #{ increase_line_number; };
the start of line is begining (any - endline)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

linux grep with perl regex greedy not work - regex-lookarounds

(?<=mac:)[^\d](?=\d+) thats the content beetween. [^\d] means all not digital with length >=0. Typing a ^ after [ negates the character class. The result is that the character class matches any character that is not in the character class. It also match (invisible) line break characters.

Related

How to put multiple charsets at one position in hashcat?

removing unconventional field separators (^#^#^#) in a text file [duplicate]

SHA256 generation different for file and content of this file

Character-set of SSH keys (safe delimiter for using sed with public keys)

Match the start of the file or a newline (Ragel)

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

linux grep with perl regex greedy not work - regex-lookarounds

(?<=mac:)[^\d]*(?=\d+) thats the content beetween. [^\d]* means all not digital with length >=0. Typing a ^ after [ negates the character class. The result is that the character class matches any character that is not in the character class. It also match (invisible) line break characters.

Related

How to put multiple charsets at one position in hashcat?

removing unconventional field separators (^#^#^#) in a text file [duplicate]

SHA256 generation different for file and content of this file

Character-set of SSH keys (safe delimiter for using sed with public keys)

Match the start of the file or a newline (Ragel)

Categories

Resources

(?<=mac:)[^\d](?=\d+) thats the content beetween. [^\d] means all not digital with length >=0. Typing a ^ after [ negates the character class. The result is that the character class matches any character that is not in the character class. It also match (invisible) line break characters.