Oracle RegExp For Counting Words Occuring After a Character - sql

I want to identify the number of words occurring after a comma, in a full name field in Oracle database table.
The name field contains format of "LAST, FIRST MIDDLE"
Some names may have up to 4 to 5 names, such as "DOE, JOHN A B"
For example, if the Name field = 'WILLIAMS JR, HANK' it would output 1 (for 1 word occurring after the comma.
If the Name field = 'DOE, JOHN A B' i want it to output 3.
I would like to use a regexp_count function to determine this count.
I am using the following code to identify how many words exist in the field and would like to modify it to include this functionality:
REGEXP_COUNT(REPLACE(fieldname, ',',', '), '[^]+')
It would likely have to remove the replace function in order to find the comma, but this was the best I could do so far.
Help is much appreciated!

How about the following:
REGEXP_COUNT( fieldname, "\\w", INSTR(fieldname, ",")+1)

I have updated the code as follows, which appears to be working as desired:
REGEXP_COUNT(fieldname, '[^ ]+', (INSTR(fieldname, ',')+1))

Related

Comma inside like query fails to return any result

Using Oracle db,
Select name from name_table where name like 'abc%';
returns one row with value "abc, cd" but when I do a select query with a comma before % in my like query, it fails to return any value.
Select name from name_table where name like 'abc,%';
returns no row. How can I handle a comma before % in the like query?
Example:
Database has "Sam, Smith" in the name column when the like has "Sam%" it returns one row, when i do "Sam,%" it doesn't return any row
NOT AN ANSWER but posting it as one since I can't format in a comment.
Look at this and use DUMP() on your own machine... see if this helps.
SQL> select dump('Smith, Stan') from dual;
DUMP('SMITH,STAN')
-----------------------------------------------------
Typ=96 Len=11: 83,109,105,116,104,44,32,83,116,97,110
If you count, the string is 11 characters (including the comma and the space). The comma is character 44, and the space is character 32. If you look at YOUR string and you don't see 44 where the comma should be, you will know that's the problem. You could then let us know what you see there (just for that character, I understand posting "Leno, Jay" would be a violation of privacy).
Also, make sure you don't have any extra characters (perhaps non-printable ones!) right before the comma. Just compare the two strings you are using as inputs and see where the differences may be.

SQL - need help in parsing text of a field

I have a select query and it fetches a field with complex data. I need to parse that data in specified format. please help with your expertise:
selected string = complexType|ChannelCode=PB - Phone In A Box|IncludeExcludeIndicator=I
expected output - PB|I
Please help me in writing a sql regular expression to accomplish this output.
The first step in figuring out the regular expression is to be able to describe it plain language. Based on what we know (and as others have said, more info is really needed) from your post, some assumptions have to be made.
I'd take a stab at it by describing it like this, which is based on the sample data you provided: I want the sets of one or more characters that follow the equal signs but not including the following space or end of the line. The output should be these sets of characters, separated by a pipe, in the order they are encountered in the string when reading from left to right. My assumptions are based on your test data: only 2 equal signs exist in the string and the last data element is not followed by a space but by the end of the line. A regular expression can be built using that info, but you also need to consider other facts which would change the regex.
Could there be more than 2 equal signs?
Could there be an empty data element after the equal sign?
Could the data set after the equal sign contain one or more spaces?
All these affect how the regex needs to be designed. All that said, and based on the data provided and the assumptions as stated, next I would build a regex that describes the string (really translating from the plain language to the regex language), grouping around the data sets we want to preserve, then replace the string with those data sets separated by a pipe.
SQL> with tbl(str) as (
2 select 'complexType|ChannelCode=PB - Phone In A Box|IncludeExcludeIndicator=I' from dual
3 )
4 select regexp_replace(str, '^.*=([^ ]+).*=([^ ]+)$', '\1|\2') result from tbl;
RESU
----
PB|I
The match regex explained:
^ Match the beginning of the line
. followed by any character
* followed by 0 or more 'any characters' (refers to the previous character class)
= followed by an equal sign
( start remembered group 1
[^ ]+ which is a set of one or more characters that are not a space
) end remembered group one
.*= followed by any number of any characters but ending in an equal sign
([^ ]+) followed by the second remembered group of non-space characters
$ followed by the end of the line
The replace string explained:
\1 The first remembered group
| a pipe character
\2 the second remember group
Keep in mind this answer is for your exact sample data as shown, and may not work in all cases. You need to analyse the data you will be working with. At any rate, these steps should get you started on breaking down the problem when faced with a challenging regex. The important thing is to consider all types of data and patterns (or NULLs) that could be present and allow for all cases in the regex so you return accurate data.
Edit: Check this out, it parses all the values right after the equal signs and allows for nulls:
SQL> with tbl(str) as (
2 select 'a=zz|complexType|ChannelCode=PB - Phone In A Box|IncludeExcludeIndicator=I - testing|test1=|test2=test2 - testing' from dual
3 )
4 select regexp_substr(str, '=([^ |]*)( |||$)', 1, level, null, 1) output, level
5 from tbl
6 connect by level <= regexp_count(str, '=')
7 ORDER BY level;
OUTPUT LEVEL
-------------------- ----------
zz 1
PB 2
I 3
4
test2 5
SQL>

Finding first and second word in a string in SQL Developer

How can I find the first word and second word in a string separated by unknown number of spaces in SQL Developer? I need to run a query to get the expected result.
String:
Hello Monkey this is me
Different sentences have different number of spaces between the first and second word and I need a generic query to get the result.
Expected Result:
Hello
Monkey
I have managed to find the first word using substr and instr. However, I do not know how to find the second word due to the unknown number of spaces between the first and second word.
select substr((select ltrim(sentence) from table1),1,
(select (instr((select ltrim(sentence) from table1),' ',1,1)-1)
from table1))
from table1
Since you seem to want them as separate result rows, you could use a simple common table expression to duplicate the rows, once with the full row, then with the first word removed. Then all you have to do is get the first word from each;
WITH cte AS (
SELECT value FROM table1
UNION ALL
SELECT SUBSTR(TRIM(value), INSTR(TRIM(value), ' ')) FROM table1
)
SELECT SUBSTR(TRIM(value), 1, INSTR(TRIM(value), ' ') -1) word
FROM cte
Note that this very simple example assumes that there is a second word, if there isn't, NULL will be returned for both words.
An SQLfiddle to test with.
While Joachim Isaksson's answer is a robust and fast approach, you can also consider splitting the string and selecting from the resulting pieces set. This is just meant as hint for another approach, if your requirements alter (e.g. more than two string pieces).
You could split finally by the regex /[ ]+/, and so getting the words between the blanks.
Find more about splitting here: How do I split a string so I can access item x?
This will strongly depend on the SQL dialect you are using.
Try this with REGEXP_SUBSTR:
SELECT
REGEXP_SUBSTR(sentence,'\w+\s+'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s+(\w+)'),
REGEXP_SUBSTR(REGEXP_SUBSTR(sentence,'\s+(\w+)\s+(\w+)'),'\w+$'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s+$')
FROM table1;
result:
1 2 3 4 5
Hello Monkey Monkey this this is_me
Learn more about REGEXP_SUBSTR reference to Using Regular Expressions With Oracle Database
Test use SqlFiddle: http://sqlfiddle.com/#!4/8e9ef/9
If you only want to get the first and the second word, use REGEXP_INSTR to get second word start position :
SELECT
REGEXP_SUBSTR(sentence,'\w+\s+') AS FIRST,
REGEXP_SUBSTR(sentence,'\w+\s',REGEXP_INSTR(sentence,'\w+\s+')+length(REGEXP_SUBSTR(sentence,'\w+\s+'))) AS SECOND
FROM table1;

Select rows that has mixed charcters in a single value e.g. 'Joh?n' in name column

In an oracle table:
1- a value in a VARCHAR column contains characters that are not letters.
Consider a scenarion where a name in 'last_name' column contains regular characters (A - Z, a - z) as well as characters that are not english letters (e.g. '.', '-', ' ','_', '>' or similar).
The challenge is to select the rows that has names in 'last_name' as '.John' or 'John.' or '-John' or 'Joh-n'
2- Is it possible to have non-date values in a Date defined column? If yes, how can such records be selected in an oracle query?
Thanks!
I believe this will do the trick:
SELECT * FROM mytable WHERE REGEXP_LIKE(last_name, '[^A-Za-z]');
As for your 2nd question, I am unsure. I would be glad if someone else could add on to what I have to answer your 2nd question. I have found this website thought that might be of help. http://infolab.stanford.edu/~ullman/fcdb/oracle/or-time.html
It explains the DATE format.
If I properly understand your goal, you need to select rows with last_name column containing the name 'John', but it may also have additional characters before, after, or even inside the name. In that case, this should be helpful:
select * from tab where regexp_replace(last_name, '[^A-Za-z]+', '') = 'John'

Matching exactly 2 characters in string - SQL

How can i query a column with Names of people to get only the names those contain exactly 2 “a” ?
I am familiar with % symbol that's used with LIKE but that finds all names even with 1 a , when i write %a , but i need to find only those have exactly 2 characters.
Please explain - Thanks in advance
Table Name: "People"
Column Names: "Names, Age, Gender"
Assuming you're asking for two a characters search for a string with two a's but not with three.
select *
from people
where names like '%a%a%'
and name not like '%a%a%a%'
Use '_a'. '_' is a single character wildcard where '%' matches 0 or more characters.
If you need more advanced matches, use regular expressions, using REGEXP_LIKE. See Using Regular Expressions With Oracle Database.
And of course you can use other tricks as well. For instance, you can compare the length of the string with the length of the same string but with 'a's removed from it. If the difference is 2 then the string contained two 'a's. But as you can see things get ugly real soon, since length returns 'null' when a string is empty, so you have to make an exception for that, if you want to check for names that are exactly 'aa'.
select * from People
where
length(Names) - 2 = nvl(length(replace(Names, 'a', '')), 0)
Another solution is to replace everything that is not an a with nothing and check if the resulting String is exactly two characters long:
select names
from people
where length(regexp_replace(names, '[^a]', '')) = 2;
This can also be extended to deal with uppercase As:
select names
from people
where length(regexp_replace(names, '[^aA]', '')) = 2;
SQLFiddle example: http://sqlfiddle.com/#!4/09bc6
select * from People where names like '__'; also ll work