GREP REGEX for EnCase - Date conversion - regex-negation

How do I convert February 2, 2002 at 10 to GREP REGEX for EnCase Forensics?
Thanks

I don't know EnCase Forensics and given the astounding number of answers here, I am not alone...
This is probably not even a programming question, more like a superuser one, "how to use a program".
But well, if it uses traditional regexes (regular expressions, right), like in Perl (not pearl), just enter the string as given, it doesn't use any special character used in regexes. Ie. that will be a plain text search.

It depends on the format. For Month/Day/Year formatting, I would recommend 0?2/11/(20)?02. This makes a leading zero optional as well as allowing for a 2 digit or 4 digit year.
A general month/day/year date regexp in EnCase could be [01]?#/##/(##)?##. However, there will be many dates in a different format, using abbreviations for months or formatted as year/month/day, etc.
(NB: EnCase uses '#' for [0-9] whereas most grep engines use \d.)
Jon

Related

How to extract digits from field using regex

I am using Firebird 2.5 and I have a field (called identifier) with mixed letters, numbers and special characters. I would like to use regex to extract only the numbers in a new column. I have tried something like below, but it is not working.
Any idea how I can achieve this using regex without using stored procedures or execute block
SELECT ORDER_ID,
ORDER_DATE,
SUBSTRING(IDENTIFIER FROM 1 TO 10) SIMILAR TO '^[0-9]{10}$' --- DESIRED EXTRACTION COLUMN
FROM ORDERS
Example of data
IDENTIFIER DESIRED OUTPUT
ANDRE 02869567995 02869567995
02869567995 MARIA 02869567995
028.695.67.995 02869567995
028695679-95 02869567995
You cannot do this in Firebird 2.5, at least not without help from a UDF, or a (selectable) stored procedure. I'm not aware of third-party UDFs providing regular expressions, so you might have to write this yourself.
In Firebird 3.0, you could also use a UDR or stored function to achieve this. Unfortunately, using the regular expression functionality available in Firebird alone will not be enough to solve this.
NOTE: The rest of the answer is based on the assumption to extract digits if the first 10 characters of string are digits. With the updated question, this assumption is no longer valid.
That said, if your need is exactly as shown in your question, that is only extract the first 10 characters from a string if they are all digits, then you could use:
case
when IDENTIFIER similar to '[[:DIGIT:]]{10}%'
then substring(IDENTIFIER from 1 for 10)
end
(as an aside, the positional SUBSTRING syntax is from <start> for <length>, not from <start> to <end>)
In Firebird 3.0 and higher, you can use SUBSTRING(... SIMILAR ...) with a SQL regular expression pattern. Assuming you want to extract 10 digits from the start of a string, you can do:
substring(IDENTIFIER similar '#"[[:DIGIT:]]{10}#"%' escape '#')
The #" delimits the pattern to extract (where # is a custom escape character as specified in the ESCAPE clause). The remainder of the pattern must match the rest of the string, hence the use of % here (in other cases, you may need to specify a pattern before the first #" as well.
See this dbfiddle for an example.
It is not possible in any version of Firebird.

select month in date with substring and cast in postgresql

I've been cracking my head with this excercise, the professor want us to select only the months from a date type in postgresql, but using only substring and cast.
I've tried several ways, but none of them worked. This is the latter sentence that i have:
select substring(cast(fac.fecha as varchar(10)), '_____[0,9]___' ) from facturas fac
this sentence returns this:
sentence result
[0,9] is regex for "a single 0 or nine or colon".
You probably want [0-9]{2}, i.e. regex for "2 digits".
On the other hand, all implementations (those I know of) of substr() (different spelling, maybe you have a more powerful engine, which supports regexes...) use indexes into the string as parameters, nothing wiht regexes or even the single letter wildcards _.
So I recommend to study the documentation of your substring().

What are pros and cons of using special characters in SQL identifiers?

Should I avoid special characters like "é á ç" in SQL table names and column names?
What are the pros and cons of using special characters?
As you can guess, there are pros and cons. This is more or less a subjective question.
SQL (unlike most programming languages) allows you to use special characters, whitespace, punctuation, or reserved words in your table or column identifiers.
It's pretty nice that people have the choice to use appropriate characters for their native language.
Especially in cases where a word changes its meaning significantly when spelled with the closest ASCII characters: e.g. año vs. ano.
But the downside is that if you do this, you have to use "delimited identifiers" every time you reference the table with special characters. In standard SQL, delimited identifiers use double-quotes.
SELECT * FROM "SELECT"
This is actually okay! If you want to use an SQL reserved word as a table name, you can do it. But it might cause some confusion for some readers of the code.
Likewise if you use special non-ASCII characters, it might make it hard for English-speaking programmers to maintain the code, because they are not familiar with the key sequence to type those special characters. Or they might forget that they have to delimit the table names.
SELECT * FROM "año"
Then there's non-standard delimited identifiers. Microsoft uses square-brackets by default:
SELECT * FROM [año]
And MySQL uses back-ticks by default:
SELECT * FROM `año`
Though both can use the standard double-quotes as identifier delimiters if you enable certain options, you can't always rely on that, and if the option gets disabled, your code will stop working. So users of Microsoft and MySQL are kind of stuck using the non-standard delimiters, unfortunately.
Maintaining the code is simpler in some ways if you can stick with ASCII characters. But there are legitimate reasons to want to use special characters too.

Redshift SQL - Extract numbers from string

In Amazon Redshift tables, I have a string column from which I need to extract numbers only out. For this currently I use
translate(stringfield, '0123456789'||stringfield, '0123456789')
I was trying out REPLACE function, but its not gonna be elegant.
Any thoughts with converting the string into ASCII first and then doing some operation to extract only number? Or any other alternatives.
It is hard here as Redshift do not support functions and is missing lot of traditional functions.
Edit:
Trying out the below, but it only returns 051-a92 where as I need 05192 as output. I am thinking of substring etc, but I only have regexp_substr available right now. How do I get rid of any characters in between
select REGEXP_SUBSTR('somestring-051-a92', '[0-9]+..[0-9]+', 1)
might be late but I was solving the same problem and finally came up with this
select REGEXP_replace('somestring-051-a92', '[a-z/-]', '')
alternatively, you can create a Python UDF now
Typically your inputs will conform to some sort of pattern that can be used to do the parsing using SUBSTRING() with CHARINDEX() { aka STRPOS(), POSITION() }.
E.g. find the first hyphen and the second hyphen and take the data between them.
If not (and assuming your character range is limited to ASCII) then your best bet would be to nest 26+ REPLACE() functions to remove all of the standard alpha characters (and any punctuation as well).
If you have multibyte characters in your data though then this is a non-starter.
Better method is to remove all the non-numeric values:
select REGEXP_replace('somestring-051-a92', '[^0-9]', '')
You can specify "any non digit" that includes non-printable, symbols, alpha, etc.
e.g., regexp_replace('brws--A*1','[\D]')
returns
"1"

How to convert foreign characters to English characters in SQL Query?

I have to create sql function that converts special Characters, International Characters(French, Chinese...) to english.
Is there any special function in sql, can i get??
Thanks for your help.
If you are after English names for the characters, that is an achievable goal, as they all have published names as part of the Unicode standard.
See for example:
http://www.unicode.org/ucd/
http://www.unicode.org/Public/UNIDATA/
Your task then is to simply turn the list of unicode characters into a table with 100,000 or so rows. Unfortunately the names you get will be things like ARABIC LIGATURE LAM WITH MEEM MEDIAL FORM.
On the other hand, if you want to actually translate the meaning, you need to be looking at machine translation software. Both Microsoft and Google have well-known cloud translation offerings and there are several other well-thought of products too.
I think the short answer is you can't unless you narrow your requirements a lot. It seems you want to take a text sample, A, and convert it into romanized text B.
There are a few problems to tackle:
Languages are typically not romanized on a single character basis. The correct pronunciation of a character is often dependent on the characters and words around it, and can even have special rules for just one word (learning English can be tough because it is filled with these, having borrowed words from many languages without normalizing the spelling).
Even if you code rules for every language you want to support you still have homographs, words that are spelled using exactly the same characters, but that have different pronunciations (and thus romanization) depending on what was meant - for example "sow" meaning a pig, or "sow" (where the w is silent) meaning to plant seeds.
And then you get into the problem of what language you are romanizing: Characters and even words are not unique to one language, but the actual meaning and romanization can vary. The fact that many languages include loan words from those language they share characters with complicates any attempt to automatically determine which language you are trying to romanize.
Given all these difficulties, what it is you actually want to achieve (what problem are you solving)?
You mention French among the languages you want to "convert" into English - yet French (with its accented characters) is already written in the roman alphabet. Even everyday words used in English occasionally make use of accented characters, though these are rare enough that the meaning and pronunciation is understood even if they are omitted (ex. résumé).
Is your problem really that you can't store unicode/extended ASCII? There are numerous ways to correct or work around that.