Finding first letter in a varchar2 Oracle sql - sql

I am trying to automate the data input of a adress data base however some of the housenumber have a letter added to the end of them. Because my database keeps the letter and the number separate i need to somehow split these 2 up. I know that i need a substring for this. My question is however how do i get the location of the first letter.
for example 17b
would give me 3
where 259a
would give me 4

You should consider using REGEXP_INSTR or REGEXP_SUBSTR as the standard INSTR and SUBSTR functions won't have the kind of flexibility that you need.
In your case, you would use something like REGEXP_INSTR(column_name, '[^0-9 ]').

Use regexp_replace for both numbers or letters
select regexp_replace('259a', '[A-Za-z]') from dual;
select regexp_replace('259a', '[0-9]') from dual;
FIDDLE

Related

Regex to split apart text. Special case for parentheses with spaces in them

I am trying to split a field by delimiter in LookML. This field either follows the format of:
Managers (AE)
Managers (AE - MM)
I was able to split to first case using this
sql: case
when rlike (${user_role_name}, '^.*[\\(\\)].*$') then split_part(${user_role_name}, ' ', -1)
However, I haven't been able to get the 2nd case to do the same. It's in a case statement so I am going to add another when statement, but am not able to figure out the regex for parentheses that contains spaces.
Thanks in advance for the help!
By "split" the string, I think you mean you want to extract the part in parentheses, right?
I would do this using a regex substring method. You didn't mention what warehouse you're using, and the syntax will vary a little, but on snowflake that would look like:
regexp_substr(${user_role_name}, '\\([^)]*\\)')
So, for example, with the inputs you gave:
select regexp_substr('Managers (AE)', '\\([^)]*\\)')
union all
select regexp_substr('Managers (AE - MM)', '\\([^)]*\\)')
result
(AE)
(AE - MM)

Substring of a specific occurence

I have a column as varchar2 datatype, the data in it is in format:
100323.3819823.222
100.323123.443422
1001010100.233888
LOL12333.DDD33.44
I need to remove the whole part after the first occurrence of '.'
In the end it should look like this:
100323
100
1001010100
LOL12333
I cant seem to find the exact substring expression due to the fact that there is not any fix length of the first part.
One way is to use REGEXP_SUBSTR:
SELECT REGEXP_SUBSTR(column_name,'^[^.]*') FROM table
The other way is to combine SUBSTR with INSTR, which is a bit faster, but will result in NULL if the data doesn't contain a dot, so you'll have to add a switch if needed:
SELECT SUBSTR(column_name, 1, INSTR(column_name,'.') - 1) FROM table
For oracle you can try this:
select substr (i,1,Instr(i,'.',i)-1) from Table name.

How to match and replace sections of a string in SQL

I'm pulling a list of popular sites from my database, but I want to combine results that are from the same domain. I've been able to do this partially by using :
REGEXP_REPLACE(site, '%|^www([123])?\.|^m\.|^mobile\.|^desktop\.')) as site
so that "www.facebook.com" and "facebook.com" or "m.facebook.com"
- all of which appear in the database - are treated as the same when I do a select distinct.
However, I want to take this a step further by writing an expression that looks at each string between periods. If a match is found consecutively in three or more strings between periods, then I want to treat those as the same. I simply can't predict every possible string that could come before "facebook.com", or any other site.
So for example:
"my.careerone.com.au" and
"careerone.com.au" match in three places.
Or "yahoo.realestate.com.au" and "rs.realestate.com.au" match in three places.
Any ideas on how to achieve this?
#David code will work in Vertica as well but not so well performance wise maybe.
You can use Vertica's own internal functions such as TRIM & REGEXP_REPLACE.
After borrowing #David Faber reg exp i endend-up with this.
select TRIM(LEADING '.' from REGEXP_REPLACE(col_name,'^.*((\.[^.]+){3})$', '\1')) AS fixed_dn from table_name;
I don't have Vertica available so I tested this in Oracle SQL (which does have REGEXP_REPLACE() that is similar to Vertica's). Not sure what the CTE syntax would be in Vertica but you'll be querying against a table anyway:
WITH d1 AS (
SELECT 'my.careerone.com.au' AS domain_nm FROM dual
UNION ALL
SELECT 'careerone.com.au' FROM dual
UNION ALL
SELECT 'yahoo.realestate.com.au' FROM dual
UNION ALL
SELECT 'rs.realestate.com.au' FROM dual
)
SELECT domain_nm, TRIM('.' FROM REGEXP_REPLACE(domain_nm, '^.*((\.[^.]+){3})$', '\1')) AS domain_nm_fix
FROM d1;
What REGEXP_REPLACE() does here is trim the highest level subdomains from the domain name, if it exists and if there are more than 3 levels. If there are only three levels then nothing will be replaced as the regex won't match -- that is why the leading . character then has to be trimmed. So, for example, careerone.com.au will be unaltered, while my.careerone.com.au will be changed to .careerone.com.au by the REGEXP_REPLACE(), from which the leading . then has to be trimmed.

Finding first and second word in a string in SQL Developer

How can I find the first word and second word in a string separated by unknown number of spaces in SQL Developer? I need to run a query to get the expected result.
String:
Hello Monkey this is me
Different sentences have different number of spaces between the first and second word and I need a generic query to get the result.
Expected Result:
Hello
Monkey
I have managed to find the first word using substr and instr. However, I do not know how to find the second word due to the unknown number of spaces between the first and second word.
select substr((select ltrim(sentence) from table1),1,
(select (instr((select ltrim(sentence) from table1),' ',1,1)-1)
from table1))
from table1
Since you seem to want them as separate result rows, you could use a simple common table expression to duplicate the rows, once with the full row, then with the first word removed. Then all you have to do is get the first word from each;
WITH cte AS (
SELECT value FROM table1
UNION ALL
SELECT SUBSTR(TRIM(value), INSTR(TRIM(value), ' ')) FROM table1
)
SELECT SUBSTR(TRIM(value), 1, INSTR(TRIM(value), ' ') -1) word
FROM cte
Note that this very simple example assumes that there is a second word, if there isn't, NULL will be returned for both words.
An SQLfiddle to test with.
While Joachim Isaksson's answer is a robust and fast approach, you can also consider splitting the string and selecting from the resulting pieces set. This is just meant as hint for another approach, if your requirements alter (e.g. more than two string pieces).
You could split finally by the regex /[ ]+/, and so getting the words between the blanks.
Find more about splitting here: How do I split a string so I can access item x?
This will strongly depend on the SQL dialect you are using.
Try this with REGEXP_SUBSTR:
SELECT
REGEXP_SUBSTR(sentence,'\w+\s+'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s+(\w+)'),
REGEXP_SUBSTR(REGEXP_SUBSTR(sentence,'\s+(\w+)\s+(\w+)'),'\w+$'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s+$')
FROM table1;
result:
1 2 3 4 5
Hello Monkey Monkey this this is_me
Learn more about REGEXP_SUBSTR reference to Using Regular Expressions With Oracle Database
Test use SqlFiddle: http://sqlfiddle.com/#!4/8e9ef/9
If you only want to get the first and the second word, use REGEXP_INSTR to get second word start position :
SELECT
REGEXP_SUBSTR(sentence,'\w+\s+') AS FIRST,
REGEXP_SUBSTR(sentence,'\w+\s',REGEXP_INSTR(sentence,'\w+\s+')+length(REGEXP_SUBSTR(sentence,'\w+\s+'))) AS SECOND
FROM table1;

How to extract group from regular expression in Oracle?

I got this query and want to extract the value between the brackets.
select de_desc, regexp_substr(de_desc, '\[(.+)\]', 1)
from DATABASE
where col_name like '[%]';
It however gives me the value with the brackets such as "[TEST]". I just want "TEST". How do I modify the query to get it?
The third parameter of the REGEXP_SUBSTR function indicates the position in the target string (de_desc in your example) where you want to start searching. Assuming a match is found in the given portion of the string, it doesn't affect what is returned.
In Oracle 11g, there is a sixth parameter to the function, that I think is what you are trying to use, which indicates the capture group that you want returned. An example of proper use would be:
SELECT regexp_substr('abc[def]ghi', '\[(.+)\]', 1,1,NULL,1) from dual;
Where the last parameter 1 indicate the number of the capture group you want returned. Here is a link to the documentation that describes the parameter.
10g does not appear to have this option, but in your case you can achieve the same result with:
select substr( match, 2, length(match)-2 ) from (
SELECT regexp_substr('abc[def]ghi', '\[(.+)\]') match FROM dual
);
since you know that a match will have exactly one excess character at the beginning and end. (Alternatively, you could use RTRIM and LTRIM to remove brackets from both ends of the result.)
You need to do a replace and use a regex pattern that matches the whole string.
select regexp_replace(de_desc, '.*\[(.+)\].*', '\1') from DATABASE;