PLSQL - order by string with REGEX - sql

I'm trying to sort the result set of a query where the row is VARCHAR2.
I've tried using just:
ORDER BY
UPPER(SERVER_NAME) ASC
But I get inconstant results, for example:
120157
777555
AKO
a20064
Elilikes
kagan
1200165_DAVID
As you can see, 1200165_DAVID appears last, in addition, I tried using a regular expression like so:
ORDER BY
(CASE WHEN REGEXP_LIKE(UPPER(SERVER_NAME), '^[0-9]+$') THEN 1 ELSE 2 END) ASC,
UPPER(SERVER_NAME) ASC
But I get the same results, I would like to get the following ordring is possible:
120157
1200165_DAVID
777555
a20064
AKO
Elilikes
kagan
Please advise.

Three things.
First: Why do you want 1200165_DAVID to appear AFTER 120157? It should appear before it, if you order alphabetically.
Second: Running your query on your test data, I get the correct result. So I am inclined to believe either your query is different from what you reported, or there is some other error somewhere.
Third: You may have who-knows-what characters in your data. Selecting str and dump(str) side by side (or whatever the name of your expression; I like to use str in my test data) to see what characters are in each string. Look especially at those that seem to be sorted "out of order".
with
inputs ( str ) as (
select '120157' from dual union all
select '777555' from dual union all
select 'AKO' from dual union all
select 'a20064' from dual union all
select 'Elilikes' from dual union all
select 'kagan' from dual union all
select '1200165_DAVID' from dual
)
select str from inputs
order by upper(str);
STR
-------------
1200165_DAVID
120157
777555
a20064
AKO
Elilikes
kagan
7 rows selected.

This is too long for a comment.
Your data would appear to not be all characters that you recognize. In particular, the first character is suspicious.
I would suggest that you run a query like this:
select ASCII(SUBSTR(server_name, 1, 1)) as first_char-ascii,
'|' || SUBSTR(server_name, 1, 1) || '|' as first_char,
COUNT(*), min(server_name), max(server_name)
from t
group by SUBSTR(server_name, 1, 1)
order by count(*) asc;
Then you will see what characters are actually at the beginning of the string. My guess is you will find at least one interesting character. You will then need to modify the data (or the query) to handle that.

Related

How to print the sequence to nth length? [duplicate]

I would like to know how to achieve the same functionality as REPEAT() in SQL*Plus. For example consider this problem: display the character '*' as many times as the value specified by an integer attribute specified for each entry in a given table.
Nitpicking: SQL*Plus doesn't have any feature for that. The database server (Oracle) provides the ability to execute SQL and has such a function:
You are looking for rpad()
select rpad('*', 10, '*')
from dual;
will output
**********
More details can be found in the manual: https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions159.htm#SQLRF06103
For single characters, the accepted answer works fine.
However, If you have multiple characters in a given string, you need to use RPAD along with length function like this.
WITH t (str) AS
(
SELECT 'a'
FROM DUAL
UNION ALL SELECT 'abc'
FROM DUAL
UNION ALL SELECT '123'
FROM DUAL
UNION ALL SELECT '#+-'
FROM DUAL
)
SELECT RPAD(str, 5*LENGTH(str), str) repeated_5_times
FROM t;
Output:
REPEATED_5_TIMES
---------------
aaaaa
abcabcabcabcabc
123123123123123
#+-#+-#+-#+-#+-

Get remaining of string (right) after x number of specific character - Snowflake

I am trying to get the remaining string (from right) after x number of a specific character... ex:
D-ERT-ESTTE
D-EST-AER-EJEL
D-E-AD
I would like to get all string data after the second '-'
Results Expected:
ESTTE
AER-EJEL
AD
I have tried modifying substring(SKU,1,regexp_instr(SKU,'-',1,2)-1)
, however this is only giving me giving me everything to the left of the second '-'... I need from the right though
Update: Looks like maybe the below works:
substr(SKU,regexp_instr(SKU,'-',1,2)+1)
try this
select fld1, SPLIT_PART(fld1,'-',3), substr(fld1,regexp_instr(fld1,'-',1,2)+1), regexp_instr(fld1,'-',1,2) from (
select 'D-ERT-ESTTE' fld1 from dual union all
select 'D-EST-AER-EJEL' from dual union all
select' D-E-ADF' from dual );
I like #hkandpal solution that looks first for the index of the second character, and then gets the substring out.
Presenting this as a regex-only alternative - that extracts the first group that matches after the two characters are seen. The regex is [^-]*-[^-]*-(.*):
select fld1, regexp_substr(fld1, '[^-]*-[^-]*-(.*)', 1, 1, 'c', 1)
from (
select 'D-ERT-ESTTE' fld1 union all
select 'D-EST-AER-EJEL' union all
select' D-E-ADF'
);

Using REGEXP_SUBSTR with Strings Qualifier

Getting Examples from similar Stack Overflow threads,
Remove all characters after a specific character in PL/SQL
and
How to Select a substring in Oracle SQL up to a specific character?
I would want to retrieve only the first characters before the occurrence of a string.
Example:
STRING_EXAMPLE
TREE_OF_APPLES
The Resulting Data set should only show only STRING_EXAM and TREE_OF_AP because PLE is my delimiter
Whenever i use the below REGEXP_SUBSTR, It gets only STRING_ because REGEXP_SUBSTR treats PLE as separate expressions (P, L and E), not as a single expression (PLE).
SELECT REGEXP_SUBSTR('STRING_EXAMPLE','[^PLE]+',1,1) from dual;
How can i do this without using numerous INSTRs and SUBSTRs?
Thank you.
The problem with your query is that if you use [^PLE] it would match any characters other than P or L or E. You are looking for an occurence of PLE consecutively. So, use
select REGEXP_SUBSTR(colname,'(.+)PLE',1,1,null,1)
from tablename
This returns the substring up to the last occurrence of PLE in the string.
If the string contains multiple instances of PLE and only the substring up to the first occurrence needs to be extracted, use
select REGEXP_SUBSTR(colname,'(.+?)PLE',1,1,null,1)
from tablename
Why use regular expressions for this?
select substr(colname, 1, instr(colname, 'PLE')-1) from...
would be more efficient.
with
inputs( colname ) as (
select 'FIRST_EXAMPLE' from dual union all
select 'IMPLEMENTATION' from dual union all
select 'PARIS' from dual union all
select 'PLEONASM' from dual
)
select colname, substr(colname, 1, instr(colname, 'PLE')-1) as result
from inputs
;
COLNAME RESULT
-------------- ----------
FIRST_EXAMPLE FIRST_EXAM
IMPLEMENTATION IM
PARIS
PLEONASM

Querying substrings against a list of values

I'm reading from a dataset that I unfortunately don't have the access to modify. It has concatenated strings of values, and I want to select records for which any of those substrings (as split by a given character) matches any of the values in a specific list. I'll be passing the queries in via Python, so it won't be compared against a static list.
For example, the table looks like:
CrappyColumn
-----------
1;2
4
1
2;1
1;3
2
And I might want to return anything that has 2 or 4 in it. So, my result should be:
1;2
4
2
2;1
I have played with regexp_substr and gotten something that actually works; however, it just runs indefinitely (as much as 10 minutes before I give up) when I run it on the full dataset (which only includes about three thousand records with values that are often a couple hundred characters long). I need something that works in a reasonable amount of time for repeated execution.
I realize that--even with a variable comparison list--I could just write my Python code to parse the list and construct multiple LIKE statements, but that seems inefficient, and I assume that there is a better way.
And here's what I've done that takes too long:
SELECT DISTINCT CrappyColumn
FROM
(SELECT DISTINCT CrappyColumn, regexp_substr(CrappyColumn, '[^;]+', 1, LEVEL) as UGH
FROM CrappyTable
CONNECT BY regexp_substr(CrappyColumn, '[^;]+', 1, LEVEL) IS NOT NULL)
WHERE UGH IN ('2', '4')
Is there a better, faster, cleaner way to accomplish this?
EDIT - RESOLUTION:
Thanks to vkp's help, here is what I implemented:
regexp_like(SITE_ID, '^(2|4)(:)|(:)(2|4)(:)|(:)(2|4)$|^(2|4)$')
I modified it for my final product, so that it can handle strings of more than one character--by changing [2|4] to (2|4). This works in cases of searching for numbers that aren't single-digit.
You can use like:
select t.*
from crappytable t
where ';' || crappycolumn || ';' like '%;2;%' or
';' || crappycolumn || ';' like '%;4;%';
You seem to know that storing lists of values in a single column is a bad idea, so I'll spare the harangue ;)
EDIT:
If you don't like like, you can use regexp_like() like this:
where regexp_like(';' || crappycolumn || ';', ';2;|;4;')
A simpler method would be to use regexp_like to check if the list has 2 or 4 in it.
select *
from tablename
where regexp_like(crappycolumn,'^[2|4][^0-9]|[^0-9][2|4][^0-9]|[^0-9][2|4]$|^[2|4]$')
^[2|4][^0-9] - Starts with 2 or 4 not followed by a digit.
[^0-9][2|4][^0-9] - 2 or 4 not succeeded or preceded by a digit.
[^0-9][2|4]$ - Ends with 2 or 4 not preceded by a digit.
^[2|4]$ - 2 or 4 is the only character in the string.
Another form of regexp_like(). This regex looks for 2 or 4 only when proceeded by the beginning of the line or a semi-colon and when followed by a semi-colon or the end of the line:
SQL> with crappy_tbl(crappy_col) as (
select '1;2' from dual union
select '4' from dual union
select '1' from dual union
select '2;1' from dual union
select '1;3' from dual union
select '2' from dual union
select '22;;44;' from dual
)
select crappy_col
from crappy_tbl
where regexp_like(crappy_col, '(^|;)(2|4)(;|$)');
CRAPPY_
-------
1;2
2
2;1
4
SQL>

Find the accent data in table records

In a table, I have a column that contains a few records with accented characters. I want a query to find the records with accented characters.
If we have records like as below:
2ème édition
Natália
sravanth
query should pick these records:
2ème édition
Natália
You can use the REGEXP_LIKE function along with a list of all the accented characters you're interested in:
with t1(data) as (
select '2ème édition' from dual union all
select 'Natália' from dual union all
select 'sravanth' from dual
)
select * from t1 where regexp_like(data,'[àèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇßØøÅåÆæœ]');
DATA
--------------
2ème édition
Natália
The ASCIISTR function would be another way to find accented characters
ASCIISTR takes as its argument a string, or an expression that
resolves to a string, in any character set and returns an ASCII
version of the string in the database character set. Non-ASCII
characters are converted to the form \xxxx, where xxxx represents a
UTF-16 code unit.
So you can do something like
SELECT my_field FROM my_table
WHERE NOT my_field = ASCIISTR(my_field)
Or to re-use the demo from the accepted answer:
with t1(data) as (
select '2ème édition' from dual union all
select 'Natália' from dual union all
select 'sravanth' from dual
)
select * from t1 where data != asciistr(data)
which would output the 2 rows with accents.
with t1(data) as (
select '2ème édition' from dual union all
select 'Natália' from dual union all
select 'sravanth' from dual
)
select * from t1 where REGEXP_like(ASCIISTR(data), '\ \ [[:xdigit:]]{4}');
DATA
--------------
2ème édition
Natália
Way harder than it seems on the surface as there is more than one way to create an accent. What I do is have a mirror column I call clean and scrub out all the accents on load.
See this question I asked some time ago normalized string