how to find char in column value

how to find char in column value - sql

I have two tables
table with all country codes like KZ,US,RU
table tranzactions with terminal location like
(Starbucks 1500 Broadway *Near Times Square US)
(CoffeBoom KZ Mendekulova district *Near Dostyk plaza)
and I want select
country code number , code str , location terminal name
like
398 | KZ | CoffeBoom KZ Mendekulova district *Near Dostyk plaza
840 | US | tarbucks 1500 Broadway *Near Times Square US
and without case when in terminal location name has code char in string like 'Gucci Moscow Redkzsuzin district RU' where char 'KZ','UZ' country code I want to select only 'RU'.

You can try building a regular expression incorporating column code_str within itself. The following attempts such. It builds an expression looking for the beginning of the string or a space followed the country code followed by a space or end-of-string and extracts rows matching. However, both false positives and false negatives as your searching free form text. Any occurrence matching that pattern will be returned even if NOT actually the a valid code and can miss valid ones as well. For example it will not find the row:
982,'US', 'Starbucks 618 Miracle Mile, Chicago, IL, USA'
You may need to workout a better definition of what you are searching for.
with tranzactions (country_code_number , code_str , location_terminal_name) as
(select 398,'KZ', 'CoffeBoom KZ Mendekulova district *Near Dostyk plaza' from dual union all
select 840,'US', 'Starbucks 1500 Broadway *Near Times Square US' from dual union all
select 982,'US', 'Starbucks 618 Miracle Mile, Chicago, IL, USA' from dual
)
select * from tranzactions
where regexp_like(location_terminal_name, '(^| )' || code_str || '( |$)' );

Related

How to get the differences between two rows and the name of the field where the difference is, in BigQuery?

I have a table in BigQuery like this:
Name
Phone Number
Address
John
123456778564
1 Penny Lane
John
873452987424
1 Penny Lane
Mary
845704562848
87 5th Avenue
Mary
845704562848
54 Lincoln Rd.
Amy
342847327234
4 Ocean Drive Avenue
Amy
347907387469
98 Truman Rd.
I want to get a table with the differences between two consecutive rows and the name of the field where occurs the difference:
I mean this:
Name
Field
Before
After
John
Phone Number
123456778564
873452987424
Mary
Address
87 5th Avenue
54 Lincoln Rd.
Amy
Phone Number
342847327234
347907387469
Amy
Address
4 Ocean Drive Avenue
98 Truman Rd.
How can I do this ? I've looked on other posts but couldn't find something that corresponds to my need.
Thank you

Consider below BigQuery'ish solution
select Name, ['Phone Number', 'Address'][offset(offset)] Field,
prev_field as Before, field as After
from (
select timestamp, Name, offset, field,
lag(field) over (partition by Name, offset order by timestamp) as prev_field
from yourtable,
unnest([`Phone Number`, Address]) field with offset
)
where prev_field != field
if applied to sample data in your question - output is
As you can see here - no matter how many columns in your table that you need to compare - it is still just one query - no unions and such.
You just need to enumerate your columns in two places
['Phone Number', 'Address'][offset(offset)] Field
and
unnest([`Phone Number`, Address]) field with offset
Note: you can further refactor above using scripting's execute immediate to compose such lists within the query on the fly (check my other answers - I frequently use such technique in them)

One method is just use to use lag() and union all
select name, 'phone', prev_phone as before, phone as after
from (select name, phone,
lag(phone) over (partition by name order by timestamp) as prev_phone
from t
) t
where prev_phone <> phone
union all
select name, 'address', prev_address as before, address as afte4r
from (select name, address,
lag(address) over (partition by name order by timestamp) as prev_address
from t
) t
where prev_address <> address

Find string match to Oracle table using regex

I have an Oracle stored procedure on an Oracle 12c database that receives a company_name input. From that company_name, I need to find and flag Federal institutions. To accomplish that, I have a table (TBL_FED_KEY) with one column (KEY_1) of keywords. The table contains nearly 50 values like:
ARMY
FEDERAL
AIR FORCE
VETERANS
HOMELAND SECURITY
INDIAN HOSPITAL
WILL ROGERS
To give you an idea of the company_name string that could be passed through to the procedure, here are examples:
US Army - Munson Health Center
Federal Bureau of Prisons,BOP/DOJ-
Hickam Air Force Base Pharmacy
Minnesota Veterans Home Pharmacy
P.H.S. Indian Hospital
Will Rogers Health Center
What Oracle SQL can be used to match the incoming company_name against TBL_FED_KEY.KEY_1? I've tried multiple variations of REGEXP_INSTR but I can't seem to get anything to work 100%. Is REGEXP_INSTR even the best tool to accomplish this?
Thanks!

You could just use like:
select f.*
from TBL_FED_KEY f
where lower(i.name) like '%' || lower(KEY_1) || '%'

Seems you want case-insensitive match among those string. So, use REGEXP_LIKE() function with case-insensitive(i) option :
SELECT *
FROM TBL_FED_KEY
WHERE REGEXP_LIKE(company_name,key_1,'i')

I am not sure what the procedure is supposed to do after it "flags" the company as federal vs. not. I would instead write it as a function as shown below (but you can easily reuse most of the code in a procedure, if needed).
Then I illustrate how the function can be used directly in SQL. You can also use it in PL/SQL if needed, but in most cases you don't. Note - the same idea can be implemented exclusively in SQL, resulting in faster execution, since you don't need PL/SQL at all. Important - even in plain SQL, this should be implemented via a semi join, as I demonstrated, for faster execution.
Setup:
create table tbl_fed_key (key_1 varchar2(200));
insert into tbl_fed_key
select 'ARMY' from dual union all
select 'FEDERAL' from dual union all
select 'AIR FORCE' from dual union all
select 'VETERANS' from dual union all
select 'HOMELAND SECURITY' from dual union all
select 'INDIAN HOSPITAL' from dual union all
select 'WILL ROGERS' from dual
;
commit;
Function code:
create or replace function is_federal_institution(company_name varchar2)
return varchar
deterministic
as
is_fed varchar2(1);
begin
select case when exists ( select key_1
from tbl_fed_key
where instr(upper(company_name), upper(key_1)) > 0
)
then 'Y' else 'N' end
into is_fed
from dual;
return is_fed;
end;
/
SQL test:
with
inputs (str) as (
select 'Joe and Bob Army Supply Store' from dual union all
select 'Mary Poppins Indian Hospital' from dual union all
select 'Bridge Association of NYC' from dual union all
select 'Will Rogers Garden' from dual union all
select 'First Federal Bank NA' from dual
)
select str, is_federal_institution(str) as is_federal
from inputs
;
STR IS_FEDERAL
------------------------------ ----------
Joe and Bob Army Supply Store Y
Mary Poppins Indian Hospital Y
Bridge Association of NYC N
Will Rogers Garden Y
First Federal Bank NA Y
As you can see, I threw in a few false positives - to illustrate the important fact that this "technological" solution is only partial. A human will still need to review the individual hits, if accuracy is important.

REGEXP end with "letter" that might have spaces after that letter

I have REGEXP expression that need to accept beginning of specific letter , anything in between the specific ending letter and also there might be spaces after that ending letter. (comes from database)
When I run my expression it doesn't give me the ending letter, because it has spaces in database after the name I am searching it
WHERE REGEXP_LIKE (cname, UPPER('^[&p_name_beginning](.*?)[&p_name_ending$]'));
Output:
JIE DONG has bought 2 car(s) and has spent $151200
JAMES BARREDO has bought 1 car(s) and has spent $300145
JUAN MENDIOLA has bought 1 car(s) and has spent $75610.89
JASON HADDAD has bought 1 car(s) and has spent $157000
JOSE ANDRADE has bought 1 car(s) and has spent $151046
JORDAN PENNEY has bought 1 car(s) and has spent $85201.92
JUAN RODAS has bought 1 car(s) and has spent $105000

You will get better help if you specify what you are trying to do with sample before and after data, as well as showing what you have tried. I suspect you are trying to select a row where the first and last letters of the name match parameters you have been given. If you update the tag to show what database you are using, you will get a more targeted answer, but I here's an Oracle solution to return the 3rd record that may help if my assumption is correct. It will give you a hint at any rate.
with tbl(str) as (
select 'JIE DONG has bought 2 car(s) and has spent $151200' from dual union all
select 'JAMES BARREDO has bought 1 car(s) and has spent $300145' from dual union all
select 'JUAN MENDIOLA has bought 1 car(s) and has spent $75610.89' from dual union all
select 'JASON HADDAD has bought 1 car(s) and has spent $157000' from dual union all
select 'JOSE ANDRADE has bought 1 car(s) and has spent $151046' from dual union all
select 'JORDAN PENNEY has bought 1 car(s) and has spent $85201.92' from dual union all
select 'JUAN RODAS has bought 1 car(s) and has spent $105000' from dual
)
select str
from tbl
where regexp_like(str, '^j\S+ \S+a .*$', 'i');
The regex reads as follows:
^ Anchor to the start of the line
j Match a 'j' (first letter of name)
\S+ Followed by one or more characters that are not spaces
<space> Then a space character
\S+ Followed by one or more characters that are not spaces
a Then the ending letter of the name
<space> Followed by a space character
.* Followed by zero or more of any characters
$ The end of the line
The 'i' means case-insensitive.

regexp_replace string

I'm using regexp_replace to standardize mailing addresses and I've encountered a situation I'm having trouble with.
Consider the following two addresses and what their result should be:
115 1/2 East 6th St -> 115 1/2 E 6th St
818 East St -> 818 East St
In the second address, "East" is the actual name of the street, not a directional indicator.
For my query, I've attempted
SELECT
regexp_replace(address, 'East[^ St]', 'E ')
but this fails to convert the first address to it's proper format.
How can I write my regexp_replace such that the word East is converted to an 'E' in the first address, but leaves the word intact in the second address?

Your current pattern matches the literal text East followed by any single character that isn't space, S, or t. I'm assuming you probably meant to use a negative lookahead to make sure that "East" doesn't come before " St", but sadly Oracle doesn't support negative lookaheads. Instead, you'll need to make the REGEXP_REPLACE conditional:
CASE
WHEN address LIKE '%East%' AND address NOT LIKE '%East St%'
THEN REGEXP_REPLACE(address, your_pattern, your_replacement)
ELSE address
END

This answers your question with REGEXP_REPLACE(). It looks for the string ' EAST' (don't want to catch the case where 'east' is the end of another word) followed by a space, one or more characters, another space and the string 'St' which is remembered in a group. If found, replace it with ' E' followed by the second remembered group (the space followed by the one or more characters followed by the space and 'St'. This is needed as they are 'consumed' by the regex engine as it moves left to right analyzing the string so you need to put them back. Note I added a bunch of different test formats (always test for the unexpected too!):
SQL> with tbl(address) as (
select '115 1/2 East 6th St' from dual union
select '115 1/2 NorthEast 6th St' from dual union
select '115 1/2 East 146th St' from dual union
select '115 1/2 East North 1st St' from dual union
select '818 East Ave' from dual union
select '818 Woodward' from dual union
select '818 East St' from dual
)
select regexp_replace(address, '( East)( .+ St)', ' E\2') new_addr
from tbl;
NEW_ADDR
------------------------------------------------------------------------
115 1/2 E 146th St
115 1/2 E 6th St
115 1/2 E North 1st St
115 1/2 NorthEast 6th St
818 East Ave
818 East St
818 Woodward
7 rows selected.

Fuzzy text searching in Oracle

I have a large Oracle DB table which contains street names for a whole country, which has 600000+ rows. In my application, I take an address string as input and want to check whether specific substrings of this address string matches one or many of the street names in the table, such that I can label that address substring as the name of a street.
Clearly, this should be a fuzzy text matching problem, there is only a small chance that the substring I query has an exact match with the street names in DB table. So there should be some kind of fuzzy text matching approach. I am trying to read the Oracle documentation at http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm in which CONTAINS and CATSEARCH search operators are explained. But these seem to be used for more complex tasks like searching a match for the given string in documents. I just want to do that for a column of a table.
What do you suggest me in this case, does Oracle have support for such kind of fuzzy text matching queries?

UTL_MATCH contains methods for matching strings and comparing their similarity. The
edit distance, also known as the Levenshtein Distance, might be a good place to start. Since one string is a substring it may help to compare the edit distance
relative to the size of the strings.
--Addresses that are most similar to each substring.
select substring, address, edit_ratio
from
(
--Rank edit ratios.
select substring, address, edit_ratio
,dense_rank() over (partition by substring order by edit_ratio desc) edit_ratio_rank
from
(
--Calculate edit ratio - edit distance relative to string sizes.
select
substring,
address,
(length(address) - UTL_MATCH.EDIT_DISTANCE(substring, address))/length(substring) edit_ratio
from
(
--Fake addreses (from http://names.igopaygo.com/street/north_american_address)
select '526 Burning Hill Big Beaver District of Columbia 20041' address from dual union all
select '5206 Hidden Rise Whitebead Michigan 48426' address from dual union all
select '2714 Noble Drive Milk River Michigan 48770' address from dual union all
select '8325 Grand Wagon Private Sleeping Buffalo Arkansas 72265' address from dual union all
select '968 Iron Corner Wacker Arkansas 72793' address from dual
) addresses
cross join
(
--Address substrings.
select 'Michigan' substring from dual union all
select 'Not-So-Hidden Rise' substring from dual union all
select '123 Fake Street' substring from dual
)
order by substring, edit_ratio desc
)
)
where edit_ratio_rank = 1
order by substring, address;
These results are not great but hopefully this is at least a good starting point. It should work with any language. But you'll still probably want to combine this with some language- or locale- specific comparison rules.
SUBSTRING ADDRESS EDIT_RATIO
--------- ------- ----------
123 Fake Street 526 Burning Hill Big Beaver District of Columbia 20041 0.5333
Michigan 2714 Noble Drive Milk River Michigan 48770 1
Michigan 5206 Hidden Rise Whitebead Michigan 48426 1
Not-So-Hidden Rise 5206 Hidden Rise Whitebead Michigan 48426 0.5

You could make use of the SOUNDEX function available in Oracle databases. SOUNDEX computes a numeric signature of a text string. This can be used to find strings which sound similar and thus reduce the number of string comparisons.
Edited:
If SOUNDEX is not suitable for your local language, you can ask Google for a phonetic signature or phonetic matching function which performs better. This function has to be evaluated once per new table entry and once for every query. Therefore, it does not need to reside in Oracle.
Example: A Turkish SOUNDEX is promoted here.
To increase the matching quality, the street name spelling should be unified in a first step. This could be done by applying a set of rules:
Simplified example rules:
Convert all characters to lowercase
Remove "str." at the end of a name
Remove "drv." at the end of a name
Remove "place" at the end of a name
Remove "ave." at the end of a name
Sort names with multiple words alphabetically
Drop auxiliary words like "of", "and", "the", ...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to find char in column value - sql

Related

How to get the differences between two rows and the name of the field where the difference is, in BigQuery?

Find string match to Oracle table using regex

REGEXP end with "letter" that might have spaces after that letter

regexp_replace string

Fuzzy text searching in Oracle

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to find char in column value - sql

Related

How to get the differences between two rows **and** the name of the field where the difference is, in BigQuery?

Find string match to Oracle table using regex

REGEXP end with "letter" that might have spaces after that letter

regexp_replace string

Fuzzy text searching in Oracle

Categories

Resources

How to get the differences between two rows and the name of the field where the difference is, in BigQuery?