SQLite regular Expressions regex get exact word by number - sql

I have a string like the following in a sqlite 1 column:
1a 2B 3c 354 AfS 151 31s2fef 1fs31 3F1e2s 84f64e 45fs
space separated, x amount of characters 0-9, a-z, A-Z, there might be punctuation I'm not sure, but it is definitely space separated.
I'm trying to make a regular expression so I can query the database by number of words. basically if I wanted to get the 6th "word" in the example I'd be looking for:
151
so I tried to make a regular expression that says if the Nth word = 151, return me that row.
Here's what I've got so far.
SELECT * FROM table1 WHERE column1 REGEXP ^((?:\S+\s+){1}){6}
That unfortunately gives me the first through sixth words, but I really want to pinpoint the 6th word like the example above.
Also, I was thinking so save room in the database I could get rid of the white space, I'd just need to know how to count a specific number of characters in to the string which I couldn't figure out either.
Thanks for the help, never written a regular expression before.

If all you need is a match, there is no need for the non-capturing group. Just match a non-space + space group 5 times and follow with a 151.
^(\S+\s+){5}151

Use following regex
^([^\s]+\s){6}(.*?)(\s|$)
this must return 151, you can change {6} with your number to match the string.

Related

Imapala Regex - find specific sequence of characters, with delimiters between them, some are not letters, digits or underscore

I am new to regex and need to search a string field in Impala for multiple matches to this exact sequence of characters: ~FC* followed by 11 more * that could have letters/digits between (but could not, they are basically delimiters in this string field). After the 12th * (if you count #1 in ~FC*) it should be immediately followed by Y~.
since the asterisks are not letters or digits, I am unsure on how to search for these delimiters properly.
This is my SQL so far:
select
regexp_extract(col_name, '(~FC\\*).*(\\*Y~)', 1) as "pattern_found"
from db.table
where id = 123456789
limit 1
data returned:
pattern_found
--------------
~FC*
(~FC\\*) in Impala SQL it returns ~FC* which is great (got it from my other question)
Been trying this (~FC\\*).*(\\*Y~) which obviously isnt counting the number of asterisks but its is also not picking the Y up.
This is a test string, it has 2 occurrences:
N4*CITY*STATE*2155446*2120~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~
results should be these 2, which has an overlapping ~ between them. but will settle for at least the first being found if both cannot.
~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~
~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~
figured out a solution but happy to learn of a better way to accomplish this
This is what worked in Impala SQL, needed parentheses and double escape backslashes for allllll the asterisks:
(~FC\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*Y)
Full SQL:
select
regexp_extract(col_name, '(~FC\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*Y)', 1) as "pattern_found"
from db.table
where id = 123456789
limit 1
and here is the RegexDemo without the additional syntax needed for Impala SQL

How can I return Special Characters in SQL?

I've been searching and I couldnt found exactly what I need.
I have and SQL Server 2008
I have some strings in a table with special characters such as "!,;:-()" and I'm trying to make a script that could return this characters, BUT ONLY THOSE CHARACTERS (do you get it?)
For example, I have "Borges, Ricardo" and I need to return only the ","
Another example, "Calle 13 Nº 34, Mercedes Bs As ( 6600)", and I only need the "º,()"
I dont want to get the letters or numbers.
I wrote this simple script:
SELECT name FROM table WHERE name LIKE '%[^A-Za-z0-9 ]%'
SO I could get all the rows where there is a Special Character like a (,;.:-()&%$... but I need to RETURN ONLY the special Characters. Get it? No letters, no numbers, only the special characters in the row.
THank you very much for your help!

Find phone numbers with unexpected characters using SQL in Oracle?

I need to find rows where the phone number field contains unexpected characters.
Most of the values in this field look like:
123456-7890
This is expected. However, we are also seeing character values in this field such as * and #.
I want to find all rows where these unexpected character values exist.
Expected:
Numbers are expected
Hyphen with numbers is expected (hyphen alone is not)
NULL is expected
Empty is expected
Tried this:
WHERE phone_num is not like ' %[0-9,-,' ' ]%
Still getting rows where phone has numbers.
from https://regexr.com/3c53v address you can edit regex to match your needs.
I am going to use example regex for this purpose
select * from Table1
Where NOT REGEXP_LIKE(PhoneNumberColumn, '^[+]*[(]{0,1}[0-9]{1,4}[)]{0,1}[-\s\./0-9]*$')
You can use translate()
...
WHERE translate(Phone_Number,'a1234567890-', 'a') is NOT NULL
This will strip out all valid characters leaving behind the invalid ones. If all the characters are valid, the result would be NULL. This does not validate the format, for that you'd need to use REGEXP_LIKE or something similar.
You can use regexp_like().
...
WHERE regexp_like(phone_num, '[^ 0123456789-]|^-|-$')
[^ 0123456789-] matches any character that is not a space nor a digit nor a hyphen. ^- matches a hyphen at the beginning and -$ on the end of the string. The pipes are "ors" i.e. a|b matches if pattern a matches of if pattern b matches.
Oracle has REGEXP_LIKE for regex compares:
WHERE REGEXP_LIKE(phone_num,'[^0-9''\-]')
If you're unfamiliar with regular expressions, there are plenty of good sites to help you build them. I like this one

LIKE Wild Card Search in Teradata

I have text in a column like RAPP 01. RAPP 02 upto RAPP 45 and RAPP 99.I included all these values manually in my IN statement in my WHERE clause but it slows the query as the data set in huge. I tried WHERE SUBSTR(REMARK_TXT,1,7) LIKE 'RIPA [01-45,99]' and it did not return any data. Can you please help?
Thanks!
You could use REGEXP functionality here:
WHERE REGEXP_SIMILAR(REMARK_TXT, '^RAPP [0-9]{2}$') = 1;
That regex matches with a string that starts with RAPP followed by a space then followed by 2 numbers and the end of the string.
Updating to deal with two number ranges (01-49) and (99). This isn't the best thing to do with regex, but it's still possible:
WHERE REGEXP_SIMILAR(REMARK_TXT, '^RAPP ([0-4][0-9]|99)$') = 1;
This is saying a string that starts with RAPP and then ends in either a two digit number that starts with 0 through 4 OR the number 99
You could use the following:
where column like ('RAPP %')
Which would return anything beginning with the string RAPP and a whitespace. Notice the '%' sign, this will be your wildcard.
Careful on using like, especially not like and putting the wildcard at the beginning of your condition, it would have much bigger performance issues.

How can I extract a substring from a character column without using SUBSTR()?

I have a questions regarding below data.
You clearly can see each EMP_IDENTIFIER has connected with EMP_ID.
So I need to pull only identifier which is 10 characters that will insert another column.
How would I do that?
I did some traditional way, using INSTR, SUBSTR.
I just want to know is there any other way to do it but not using INSTR, SUBSTR.
EMP_ID(VARCHAR2)EMP_IDENTIFIER(VARCHAR2)
62049 62049-2162400111
6394 6394-1368000222
64473 64473-1814702333
61598 61598-0876000444
57452 57452-0336503555
5842 5842-0000070666
75778 75778-0955501777
76021 76021-0546004888
76274 76274-0000454999
73910 73910-0574500122
I am using Oracle 11g.
If you want the second part of the identifier and it is always 10 characters:
select t.*, substr(emp_identifier, -10) as secondpart
from t;
Here is one way:
REGEXP_SUBSTR (EMP_IDENTIFIER, '-(.{10})',1,1,null,1)
That will give the 1st 10 character string that follows a dash ("-") in your string. Thanks to mathguy for the improvement.
Beyond that, you'll have to provide more details on the exact logic for picking out the identifier you want.
Since apparently this is for learning purposes... let's say the assignment was more complicated. Let's say you had a longer input string, and it had several groups separated by -, and the groups could include letters and digits. You know there are at least two groups that are "digits only" and you need to grab the second such "purely numeric" group. Then something like this will work (and there will not be an instr/substr solution):
select regexp_substr(input_str, '(-|^)(\d+)(-|$)', 1, 2, null, 2) from ....
This searches the input string for one or more digits ( \d means any digit, + means one or more occurrences) between a - or the beginning of the string (^ means beginning of the string; (a|b) means match a OR b) and a - or the end of the string ($ means end of the string). It starts searching at the first character (the second argument of the function is 1); it looks for the second occurrence (the argument 2); it doesn't do any special matching such as ignore case (the argument "null" to the function), and when the match is found, return the fragment of the match pattern included in the second set of parentheses (the last argument, 2, to the regexp function). The second fragment is the \d+ - the sequence of digits, without the leading and/or trailing dash -.
This solution will work in your example too, it's just overkill. It will find the right "digits-only" group in something like AS23302-ATX-20032-33900293-CWV20-3499-RA; it will return the second numeric group, 33900293.