How to select a particular word between non-alphabet characters? - sql

I have below values in a column.
TOM-TOM
TOMMY
TOM 12
123_TOM
SITTOM
TOM TIM
TOM,TIN
TOP TOM TON
TOMA
ATOM
How to select only these rows:
TOM-TOM
TOM 12
123_TOM
TOM TIM
TOM,TIN
TOP TOM TON
but not the below rows
SITTOM
TOMMY
TOMA
ATOM
If one character before or after TOM is a non-alphabet, those rows should be shown.

You can achieve this with a straightforward query using REGEXP_LIKE:
select t.col
from table t
where regexp_like(t.col, '(^|[^a-zA-Z])TOM([^a-zA-Z]|$)')
Here's a breakdown of a regular expression used:
^|[^a-zA-Z] - start of a line or non-alphabetic character;
TOM - TOM;
([^a-zA-Z]|$) - end of a line or non-alphabetic character.
If you want to take non-english letters into account, you can use :alpha: instead of a-zA-Z:
select t.col
from t
where regexp_like(t.col, '(^|[^[:alpha:]])TOM([^[:alpha:]]|$)')
Try this on SQLFiddle: http://sqlfiddle.com/#!4/439b6/1

Related

Is there a wildcard search solution that can allow me to search for a given string but allow 2 characters to be wrong/missing/blank in Snowflake?

I'm very new to the concept or Regular Expressions and am looking for a wildcard search solution that allows 2 or fewer characters of the string to be wrong/missing/blank, in Snowflake.
For example, if I have a table's column of basketball players' names such as 'lebron james', 'carmelo anthony', 'kobe bryant', below are the results I would like to have matched from another table (consumers' search queries) for 'lebron james':
'lebrn james' (missing 'o')
'lebronjames' (missing a space between fn and ln)
'lebrn jme' (missing 'o' and 'a')
'lebron james' (exact match)
Would anyone be so kind to provide some guidance?
EDITDISTANCE is what you are asking for:
with input(str) as (
select * from values
('lebrn james'), ('lebronjames'), ('lebrn jme')
), targets(str) as (
select * from values
('lebron james'), ('carmelo anthony'), ('kobe bryant')
)
select i.str, t.str, editdistance(i.str, t.str)
from input i
cross join targets t;
gives:
STR
STR_2
EDITDISTANCE(I.STR, T.STR)
lebrn james
lebron james
1
lebrn james
carmelo anthony
14
lebrn james
kobe bryant
10
lebronjames
lebron james
1
lebronjames
carmelo anthony
13
lebronjames
kobe bryant
10
lebrn jme
lebron james
3
lebrn jme
carmelo anthony
13
lebrn jme
kobe bryant
9

Inverse of this regex expression

I have a list:
50 - David Herd (1961-1968)
49 - Teddy Sheringham (1997-2001)
48 - George Wall (1906-1915)
47 - Stan Pearson (1935-1954)
46 - Harry Gregg (1957-1966)
45 - Paddy Crerand (1963-1971)
44 - Jaap Stam (1998-2001)
43 - Paul Ince (1989-1995)
42 - Dwight Yorke (1998-2002)
I want to select all characters EXCEPT the first and last name with the space in between in order to delete them and leave just the first name, space and last name.
So far I can select the first name, space and last name with:
([[a-zA-Z]+\s[a-zA-Z]+)
But I am unsure of how to 'invert' this expression. Any pointers would be much appreciated.
If regex replacement be an option for you, you could try the following in regex mode:
Find: \d+ - (\w+(?: \w+)+) \(\d{4}-\d{4}\)
Replace: $1
Demo
One option is to match the surrounded data, and capture the firstname space lastname.
In the replacement use the capture group.
^.*?\b([a-zA-Z]+\s[a-zA-Z]+)\b.*
Regex demo

How to remove the last 3 words from a string in PL/SQL? [duplicate]

This question already has answers here:
Regex - How to replace the last 3 words of a string with PHP
(3 answers)
Closed 4 years ago.
I have strings like these:
Jack & Bauer Limited Company Bristol
Streetfood Limited Company München
Brouse with High Jack UnlimiteD Company London
What I want to have is just the company names like:
Jack & Bauer
Streetfood
Brouse with High Jack
So in every case, I have to delete the last 3 words, because the names can be consist a lot of words.
I know I have to use regexp, but I dont know how.
While you can use regular expressions to do this you don't have to. This task can be accomplished using a combination of INSTR and SUBSTR:
SELECT SUBSTR(FIELD1, 1, INSTR(FIELD1, ' ', -1, 3)-1) AS NAME
FROM TABLE1
SQLFiddle here
Best of luck.
Here is one method:
select regexp_replace(str, '( [^ ]+){3}$', '')
Here is a rextester.

Full text search on multiple columns sql server

I have the following table
Id Author
1 Alexander Mccall Smith
2 Ernest Hemingway
3 Giacomo Leopardi
4 Henry David Thoreau
5 Mary Higgins Clark
6 Rabindranath Tagore
7 Thomas Pynchon
8 Zora Neale Hurston
9 William S. Burroughs
10 Virginia Woolf
11 William tell
I want to search the Author by putting first few characters of the first and last name.
eg: Search Text: Will tel
Then the search result show the following result
William tell
eg: Search Text: will Burrou
Then the search result show the following result
William S. Burroughs
eg: Search Text: Will
Then the search result show the following result
William S. Burroughs
William tell
What is the efficient way to achieve this in sql server ?
As you mentioned this can be achieved using Full Text Search. You have to create the FTS catalog and then index on the table and column(s). You stated in the title 'Columns' but I only see one table column in your example so I will create the queries using that.
-- example 1 searching on Will and Tel
SELECT Id, Author
FROM Authors
WHERE CONTAINS(Author, '"Will*" AND "tel*"')
-- example 2 searching on Will and Burrou
SELECT Id, Author
FROM Authors
WHERE CONTAINS(Author, '"will*" AND "Burrou*"')
-- example 3 searching on Will
SELECT Id, Author
FROM Authors
WHERE CONTAINS(Author, '"will*"')
For further reference see
The Contains clause which searches for precise or fuzzy matches.
Article Query with Full-Text Search.
Less efficient than #Igor's answer as the table size grows, but you can also use the Like statement.
The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.
-- example 1 searching on Will and Tel
SELECT Id, Author
FROM Authors
WHERE Author Like('Will%Tel%')
-- example 2 searching on Will and Burrou
SELECT Id, Author
FROM Authors
WHERE Author Like('Will%Burrou%')
-- example 3 searching on Will
SELECT Id, Author
FROM Authors
WHERE Author Like('Will%')
Cons: It is slower than the contains statement.You need to include the % sign after any other keyword you're looking to search for.
Pros: Can be faster than contains statement in cases of smaller(<1000) row tables.

Oracle 10G regexp for Name

I am trying to write a regexp_replace to create a "Friendly" name for some employees. They are currently stored as FIRST <POSSIBLE MIDDLE INITIAL> LAST <POSSIBLE SUFFIX> <MULTIPLE WHITESPACE> SITE_ID
For example,
JOHN SMITH ABC
JOHN Q SMITH ABC
JOHN Q SMITH III ABC
I am trying to write a regex so that I will end up with:
Smith, John
Smith, John Q
Smith III, John Q
The ABC "Site ID" doesn't need to be included in my output.
This is what I tried with little success:
regexp_replace(
employee_name,
'^(\S+)\s(\S+)\s(\S+)',
'\3, \1 \2'
)
Also, I am using Oracle 10G. Any help would be greatly appreciated!
If your names don't show the problems ruakh points out, i.e., there aren't single-letter names or surnames, and no Hispanic names, you can try this regexp:
^(\S+)\s(\S\s)?(\S+)(\s\S+)?\s\s+\S+$
The replacement should be:
\3\4, \1\2