SQL, ORACLE - trim right string (remove all parameters from URL) - sql

I query a column with URLs. Those URLs are from different origins and have different formats. Some of them have parameters. I wish to query this column and right trim the URLs from the first parameter symbol.
Example URLs:
URLs
http://www.domain1.com/path/page?parameters1&parameters2
https://www.domain2.com/path/page?parameters1&parameters2/somemorestufftoscrape
domain3.com/path/page?parameters1&parameters2
http://www.domain4.com/path/page&parameters1?parameters2
https://www.domain5.com/path/noparametershere.html
domain6.com/path/page=?parameters1&parameters2
I'll want to trim everything right from either ?,&,= (a list of characters that represent parameters for my case).
Desired Output:
TrimmedURLs
http://www.domain1.com/path/page
https://www.domain2.com/path/page
domain3.com/path/page
http://www.domain4.com/path/page
https://www.domain5.com/path
domain6.com/path/page
I've tried to use RTRIM as follows:
select
URLs
rtrim(URLs, '?=&') as TrimmedURLs
from
MyTable;
The query returns but URLs column is equal to TrimmedURLs (am I doing something wrong?).
I've tried to use regexp_substr but in the cases where there are multiple parameter charterers it trims from the last one and not the first one (see first note in page).
What is the query for the desired result?
Why does RTRIM not work for me?
Server is Oracle 11g
URLs Type is VARCHAR2(1024)
Thanks!

REGEXP_SUBSTR() sounds like the thing to use here:
with sample_data as (select 'http://www.domain1.com/path/page?parameters1&parameters2' url from dual union all
select 'https://www.domain2.com/path/page?parameters1&parameters2/somemorestufftoscrape' url from dual union all
select 'domain3.com/path/page?parameters1&parameters2' url from dual union all
select 'http://www.domain4.com/path/page&parameters1?parameters2' url from dual union all
select 'https://www.domain5.com/path/noparametershere.html' url from dual union all
select 'domain6.com/path/page=?parameters1&parameters2' url from dual)
select url,
regexp_substr(url, '[^?&=]+', 1, 1) main_url
from sample_data;
URL MAIN_URL
------------------------------------------------------------------------------- ------------------------------------------------------------
http://www.domain1.com/path/page?parameters1&parameters2 http://www.domain1.com/path/page
https://www.domain2.com/path/page?parameters1&parameters2/somemorestufftoscrape https://www.domain2.com/path/page
domain3.com/path/page?parameters1&parameters2 domain3.com/path/page
http://www.domain4.com/path/page&parameters1?parameters2 http://www.domain4.com/path/page
https://www.domain5.com/path/noparametershere.html https://www.domain5.com/path/noparametershere.html
domain6.com/path/page=?parameters1&parameters2 domain6.com/path/page

If you don't like regexp, you can also use a combination of substr and instr fonctions:
with sample_data as (select 'http://www.domain1.com/path/page?parameters1&parameters2' url from dual union all
select 'https://www.domain2.com/path/page?parameters1&parameters2/somemorestufftoscrape' url from dual union all
select 'domain3.com/path/page?parameters1&parameters2' url from dual union all
select 'http://www.domain4.com/path/page&parameters1?parameters2' url from dual union all
select 'https://www.domain5.com/path/noparametershere.html' url from dual union all
select 'domain6.com/path/page=?parameters1&parameters2' url from dual)
select
url,
substr(url, 0, instr(url,'?')-1) main_url
from
sample_data

Related

How to print the sequence to nth length? [duplicate]

I would like to know how to achieve the same functionality as REPEAT() in SQL*Plus. For example consider this problem: display the character '*' as many times as the value specified by an integer attribute specified for each entry in a given table.
Nitpicking: SQL*Plus doesn't have any feature for that. The database server (Oracle) provides the ability to execute SQL and has such a function:
You are looking for rpad()
select rpad('*', 10, '*')
from dual;
will output
**********
More details can be found in the manual: https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions159.htm#SQLRF06103
For single characters, the accepted answer works fine.
However, If you have multiple characters in a given string, you need to use RPAD along with length function like this.
WITH t (str) AS
(
SELECT 'a'
FROM DUAL
UNION ALL SELECT 'abc'
FROM DUAL
UNION ALL SELECT '123'
FROM DUAL
UNION ALL SELECT '#+-'
FROM DUAL
)
SELECT RPAD(str, 5*LENGTH(str), str) repeated_5_times
FROM t;
Output:
REPEATED_5_TIMES
---------------
aaaaa
abcabcabcabcabc
123123123123123
#+-#+-#+-#+-#+-

How to extract value between 2 slashes

I have a string like "1490/2334/5166400411000434" from which I need to derive value after second slash. I tried below logic
select REGEXP_SUBSTR('1490/2334/5166400411000434','[^/]+',1,3) from dual;
it is working fine. But when i dont have value between first and second slash it is returining blank.
For example my string is "1490//5166400411000434" and am trying
select REGEXP_SUBSTR('1490//5166400411000434','[^/]+',1,3) from dual;
it is returning blank. Please suggest me what i am missing.
If I understand well, you may need
regexp_substr(t, '(([^/]*/){2})([^/]*)', 1, 1, 'i', 3)
This handles the first 2 parts like 'xxx/' and then checks for a sequence of non / characters; the parameter 3 is used to get the 3rd matching subexpression, which is what you want.
For example:
with test(t) as (
select '1490/2334/5166400411000434' from dual union all
select '1490//5166400411000434' from dual union all
select '1490//5166400411000434/ramesh/3344' from dual
)
select t, regexp_substr(t, '(([^/]*/){2})([^/]*)', 1, 1, 'i', 3) as substr
from test
gives:
T SUBSTR
---------------------------------- ----------------------------------
1490/2334/5166400411000434 5166400411000434
1490//5166400411000434 5166400411000434
1490//5166400411000434/ramesh/3344 5166400411000434
You can REVERSE() your string and take the value before the first slash. And then reverse again to obtain the desired output.
select reverse(regexp_substr(reverse('1490//5166400411000434'), '[^/]+', 1, 1)) from dual;
It can also be done with basic substring and instr function:
select reverse(SUBSTR(reverse('1490//5166400411000434'), 0, INSTR(reverse('1490//5166400411000434'), '/')-1)) from dual;
Use other options in REGEXP_SUBSTR to match a pattren
select REGEXP_SUBSTR('1490//5166400411000434','(/\d*)/(\d+)',1,1,'x',2) from dual
Basically it is finding the pattren of two / including digits starting from 1 with 1 appearance and ignoring whitespaces ('x') then outputting 2nd subexpression that is in second expression within ()
... pattern,1,1,'x',subexp2)

simpler way to parse using regex

I have an input in the form of 'ABCD 3/1'.
I need to parse the digit before '/', Also if the input does not match this pattern then return the original string itself.
I am using below query, which works, but there would be a way to this in single regex I believe, any hints appreciated.
select nvl(REGEXP_substr(REGEXP_substr('ABCD 3/1', '\d\/'), '\d'), 'ABCD 3/1') from dual;
What about this? I believe it meets your requirements. Add more test cases as you see fit to the with clause.
SQL> with tbl(str) as (
select 'ABCD 3/1' from dual union
select 'ABCD 332/1' from dual union
select 'ABCD A/1' from dual union
select 'ABCD EFS' from dual
)
select regexp_replace(str, '.*\s(\d)/\d.*', '\1') digit_before_slash
from tbl;
DIGIT_BEFORE_SLASH
-----------------------------------------------------------------------------
3
ABCD 332/1
ABCD A/1
ABCD EFS
SQL>
You can try with REGEXP_REPLACE by mapping all your input string and picking only the part you want; for example, given this:
SQL> select regexp_replace('ABCD 3/1', '([A-Z]*)( )(\d)(\/)(\d)', '1:\1, 2:\2, 3:\3, 4:\4, 5:\5') from dual ;
REGEXP_REPLACE('ABCD3/1','
--------------------------
1:ABCD, 2: , 3:3, 4:/, 5:1
You can use '\3' to get only the third matched regexp:
SQL> select regexp_replace('ABCD 3/1', '([A-Z]*)( )(\d)(\/)(\d)', '\3') from dual ;
R
-
3

REPEAT function equivalent in Oracle

I would like to know how to achieve the same functionality as REPEAT() in SQL*Plus. For example consider this problem: display the character '*' as many times as the value specified by an integer attribute specified for each entry in a given table.
Nitpicking: SQL*Plus doesn't have any feature for that. The database server (Oracle) provides the ability to execute SQL and has such a function:
You are looking for rpad()
select rpad('*', 10, '*')
from dual;
will output
**********
More details can be found in the manual: https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions159.htm#SQLRF06103
For single characters, the accepted answer works fine.
However, If you have multiple characters in a given string, you need to use RPAD along with length function like this.
WITH t (str) AS
(
SELECT 'a'
FROM DUAL
UNION ALL SELECT 'abc'
FROM DUAL
UNION ALL SELECT '123'
FROM DUAL
UNION ALL SELECT '#+-'
FROM DUAL
)
SELECT RPAD(str, 5*LENGTH(str), str) repeated_5_times
FROM t;
Output:
REPEATED_5_TIMES
---------------
aaaaa
abcabcabcabcabc
123123123123123
#+-#+-#+-#+-#+-

Find the accent data in table records

In a table, I have a column that contains a few records with accented characters. I want a query to find the records with accented characters.
If we have records like as below:
2ème édition
Natália
sravanth
query should pick these records:
2ème édition
Natália
You can use the REGEXP_LIKE function along with a list of all the accented characters you're interested in:
with t1(data) as (
select '2ème édition' from dual union all
select 'Natália' from dual union all
select 'sravanth' from dual
)
select * from t1 where regexp_like(data,'[àèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇßØøÅåÆæœ]');
DATA
--------------
2ème édition
Natália
The ASCIISTR function would be another way to find accented characters
ASCIISTR takes as its argument a string, or an expression that
resolves to a string, in any character set and returns an ASCII
version of the string in the database character set. Non-ASCII
characters are converted to the form \xxxx, where xxxx represents a
UTF-16 code unit.
So you can do something like
SELECT my_field FROM my_table
WHERE NOT my_field = ASCIISTR(my_field)
Or to re-use the demo from the accepted answer:
with t1(data) as (
select '2ème édition' from dual union all
select 'Natália' from dual union all
select 'sravanth' from dual
)
select * from t1 where data != asciistr(data)
which would output the 2 rows with accents.
with t1(data) as (
select '2ème édition' from dual union all
select 'Natália' from dual union all
select 'sravanth' from dual
)
select * from t1 where REGEXP_like(ASCIISTR(data), '\ \ [[:xdigit:]]{4}');
DATA
--------------
2ème édition
Natália
Way harder than it seems on the surface as there is more than one way to create an accent. What I do is have a mirror column I call clean and scrub out all the accents on load.
See this question I asked some time ago normalized string