Find the accent data in table records - sql

In a table, I have a column that contains a few records with accented characters. I want a query to find the records with accented characters.
If we have records like as below:
2ème édition
Natália
sravanth
query should pick these records:
2ème édition
Natália

You can use the REGEXP_LIKE function along with a list of all the accented characters you're interested in:
with t1(data) as (
select '2ème édition' from dual union all
select 'Natália' from dual union all
select 'sravanth' from dual
)
select * from t1 where regexp_like(data,'[àèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇßØøÅåÆæœ]');
DATA
--------------
2ème édition
Natália

The ASCIISTR function would be another way to find accented characters
ASCIISTR takes as its argument a string, or an expression that
resolves to a string, in any character set and returns an ASCII
version of the string in the database character set. Non-ASCII
characters are converted to the form \xxxx, where xxxx represents a
UTF-16 code unit.
So you can do something like
SELECT my_field FROM my_table
WHERE NOT my_field = ASCIISTR(my_field)
Or to re-use the demo from the accepted answer:
with t1(data) as (
select '2ème édition' from dual union all
select 'Natália' from dual union all
select 'sravanth' from dual
)
select * from t1 where data != asciistr(data)
which would output the 2 rows with accents.

with t1(data) as (
select '2ème édition' from dual union all
select 'Natália' from dual union all
select 'sravanth' from dual
)
select * from t1 where REGEXP_like(ASCIISTR(data), '\ \ [[:xdigit:]]{4}');
DATA
--------------
2ème édition
Natália

Way harder than it seems on the surface as there is more than one way to create an accent. What I do is have a mirror column I call clean and scrub out all the accents on load.
See this question I asked some time ago normalized string

Related

How to print the sequence to nth length? [duplicate]

I would like to know how to achieve the same functionality as REPEAT() in SQL*Plus. For example consider this problem: display the character '*' as many times as the value specified by an integer attribute specified for each entry in a given table.
Nitpicking: SQL*Plus doesn't have any feature for that. The database server (Oracle) provides the ability to execute SQL and has such a function:
You are looking for rpad()
select rpad('*', 10, '*')
from dual;
will output
**********
More details can be found in the manual: https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions159.htm#SQLRF06103
For single characters, the accepted answer works fine.
However, If you have multiple characters in a given string, you need to use RPAD along with length function like this.
WITH t (str) AS
(
SELECT 'a'
FROM DUAL
UNION ALL SELECT 'abc'
FROM DUAL
UNION ALL SELECT '123'
FROM DUAL
UNION ALL SELECT '#+-'
FROM DUAL
)
SELECT RPAD(str, 5*LENGTH(str), str) repeated_5_times
FROM t;
Output:
REPEATED_5_TIMES
---------------
aaaaa
abcabcabcabcabc
123123123123123
#+-#+-#+-#+-#+-

Get remaining of string (right) after x number of specific character - Snowflake

I am trying to get the remaining string (from right) after x number of a specific character... ex:
D-ERT-ESTTE
D-EST-AER-EJEL
D-E-AD
I would like to get all string data after the second '-'
Results Expected:
ESTTE
AER-EJEL
AD
I have tried modifying substring(SKU,1,regexp_instr(SKU,'-',1,2)-1)
, however this is only giving me giving me everything to the left of the second '-'... I need from the right though
Update: Looks like maybe the below works:
substr(SKU,regexp_instr(SKU,'-',1,2)+1)
try this
select fld1, SPLIT_PART(fld1,'-',3), substr(fld1,regexp_instr(fld1,'-',1,2)+1), regexp_instr(fld1,'-',1,2) from (
select 'D-ERT-ESTTE' fld1 from dual union all
select 'D-EST-AER-EJEL' from dual union all
select' D-E-ADF' from dual );
I like #hkandpal solution that looks first for the index of the second character, and then gets the substring out.
Presenting this as a regex-only alternative - that extracts the first group that matches after the two characters are seen. The regex is [^-]*-[^-]*-(.*):
select fld1, regexp_substr(fld1, '[^-]*-[^-]*-(.*)', 1, 1, 'c', 1)
from (
select 'D-ERT-ESTTE' fld1 union all
select 'D-EST-AER-EJEL' union all
select' D-E-ADF'
);

retrieve a specific data from a table after a symbol in oracle

Table DATA
----------------------------
Name
ABC:000
DEF:0
ABD:000
FFF:00
GGG:000
I need only those names which contains only 3 characters post the semicolon.
In the event that the field is stored as a char() and varying, then use trim():
where trim(name) like '%:___'
with
table_name ( name ) as (
select 'ABC:000' from dual union all
select 'DEF:0' from dual union all
select 'ABD:000' from dual union all
select 'FFF:00' from dual union all
select 'GGG:000' from dual
)
-- End of SIMULATED inputs (not part of the SQL query).
-- Solution begins BELOW THIS LINE. Use your actual table and column names.
select name
from table_name
where name like '%:___'
;
NAME
-------
ABC:000
ABD:000
GGG:000
Explanation: like is a comparison operator for strings. % stands for any sequence of characters, of any length (including of length zero). : stands for itself. Underscore stands for exactly one character - ANY character. The comparison string is one % sign, one : semicolon, and three underscores.

Using REGEXP_SUBSTR with Strings Qualifier

Getting Examples from similar Stack Overflow threads,
Remove all characters after a specific character in PL/SQL
and
How to Select a substring in Oracle SQL up to a specific character?
I would want to retrieve only the first characters before the occurrence of a string.
Example:
STRING_EXAMPLE
TREE_OF_APPLES
The Resulting Data set should only show only STRING_EXAM and TREE_OF_AP because PLE is my delimiter
Whenever i use the below REGEXP_SUBSTR, It gets only STRING_ because REGEXP_SUBSTR treats PLE as separate expressions (P, L and E), not as a single expression (PLE).
SELECT REGEXP_SUBSTR('STRING_EXAMPLE','[^PLE]+',1,1) from dual;
How can i do this without using numerous INSTRs and SUBSTRs?
Thank you.
The problem with your query is that if you use [^PLE] it would match any characters other than P or L or E. You are looking for an occurence of PLE consecutively. So, use
select REGEXP_SUBSTR(colname,'(.+)PLE',1,1,null,1)
from tablename
This returns the substring up to the last occurrence of PLE in the string.
If the string contains multiple instances of PLE and only the substring up to the first occurrence needs to be extracted, use
select REGEXP_SUBSTR(colname,'(.+?)PLE',1,1,null,1)
from tablename
Why use regular expressions for this?
select substr(colname, 1, instr(colname, 'PLE')-1) from...
would be more efficient.
with
inputs( colname ) as (
select 'FIRST_EXAMPLE' from dual union all
select 'IMPLEMENTATION' from dual union all
select 'PARIS' from dual union all
select 'PLEONASM' from dual
)
select colname, substr(colname, 1, instr(colname, 'PLE')-1) as result
from inputs
;
COLNAME RESULT
-------------- ----------
FIRST_EXAMPLE FIRST_EXAM
IMPLEMENTATION IM
PARIS
PLEONASM

REPEAT function equivalent in Oracle

I would like to know how to achieve the same functionality as REPEAT() in SQL*Plus. For example consider this problem: display the character '*' as many times as the value specified by an integer attribute specified for each entry in a given table.
Nitpicking: SQL*Plus doesn't have any feature for that. The database server (Oracle) provides the ability to execute SQL and has such a function:
You are looking for rpad()
select rpad('*', 10, '*')
from dual;
will output
**********
More details can be found in the manual: https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions159.htm#SQLRF06103
For single characters, the accepted answer works fine.
However, If you have multiple characters in a given string, you need to use RPAD along with length function like this.
WITH t (str) AS
(
SELECT 'a'
FROM DUAL
UNION ALL SELECT 'abc'
FROM DUAL
UNION ALL SELECT '123'
FROM DUAL
UNION ALL SELECT '#+-'
FROM DUAL
)
SELECT RPAD(str, 5*LENGTH(str), str) repeated_5_times
FROM t;
Output:
REPEATED_5_TIMES
---------------
aaaaa
abcabcabcabcabc
123123123123123
#+-#+-#+-#+-#+-