Padding inside of a string in SQL - sql

I just started learning SQL and there is my problem.
I have a column that contains acronyms like "GP2", "MU1", "FR10", .... and I want to add '0's to the acronyms that don't have enough characters.
For example I want acronyms like "FR10", "GP48",... to stay like this but acronyms like "MU3" must be converted into "MU03" to be as the same size as the others.
I already heard about LPAD and RPAD but it just add the wanted character at the left or the right.
Thanks !

Is the minimum length 3 as in your examples and the padded value should always be in the 3rd position? If so, use a case expression and concat such as this:
with my_data as (
select 'GP2' as col1 union all
select 'MU1' union all
select 'FR10'
)
select col1,
case
when length(col1) = 3 then concat(left(col1, 2), '0', right(col1, 1))
else col1
end padded_col1
from my_data;
col1
padded_col1
GP2
GP02
MU1
MU01
FR10
FR10

A regexp_replace():
with tests(example) as (values
('ab02'),('ab1'),('A'),('1'),('A1'),('123'),('ABC'),('abc0'),('a123'),('abcd0123'),('1a'),('a1a'),('1a1') )
select example,
regexp_replace(
example,
'^(\D{0,4})(\d{0,4})$',
'\1' || repeat('0',4-length(example)) || '\2' )
from tests;
example | regexp_replace
----------+----------------
ab02 | ab02
ab1 | ab01
A | A000
1 | 0001
A1 | A001
123 | 0123
ABC | ABC0
abc0 | abc0
a123 | a123
abcd0123 | abcd0123 --caught, repeat('0',-4) is same as repeat('0',0), so nothing
1a | 1a --doesn't start with non-digits
a1a | a1a --doesn't end with digits
1a1 | 1a1 --doesn't start with non-digits
catches non-digits with a \D at the start of the string ^
catches digits with a \d at the end $
specifies that it's looking for 0 to 4 occurences of each {0,4}
referencing each hit enclosed in consecutive parentheses () with a backreference \1 and \2.
filling the space between them with a repeat() up to the total length of 4.
It's good to consider additional test cases.

Thank you all for your response. I think i did something similar as Isolated.
Here is what I've done ("acronym" is the name of the column and "destination" is the name of the table) :
SELECT CONCAT(LEFT(acronym, 2), LPAD(RIGHT(acronym, LENGTH(acronym) - 2), 2, '0')) AS acronym
FROM destination
ORDER BY acronym;
Thanks !

Related

How to select string data and exclude data contain zero

I have a table named ElectronicAddress like below, and a string type column Phone.
Id Name Phone
--------------------------
1 Adele 23432434
2 Diana 0000
3 Whale 0000000
4 Sion 936
5 Aria wwqq
6 Dave 665332
7 Daisy dai567
i want to select Phone which is exclude zero only, exclude character only and must have > 5 characters.
Result i'm trying to get :
Id Name Phone
--------------------------
1 Adele 23432434
6 Dave 665332
7 Daisy dai567
i already try this :
select * from ElectronicAddress where Phone not like '[[:alpha:] -]' and LENGTH(TRIM(Phone)) >5
but i'm having a hard time to exclude data contain zero value.
Apply 3 conditions in the WHERE clause:
select * from ElectronicAddress
where
regexp_like("Phone", '[[:digit:]]')
and length("Phone") > 5
and replace("Phone", '0', '') is not null
Oracle treats empty strings as nulls, this is why you need is not null.
See the demo.
Results:
> Id | Name | Phone
> -: | :---- | :-------
> 1 | Adele | 23432434
> 6 | Dave | 665332
> 7 | Daisy | Dai567
SELECT *
FROM Yourtable
WHERE length (phone) > 5
and REPLACE(Phone, '0', '') <> ''
and LENGTH(TRIM(TRANSLATE(Phone, ' +-.0123456789', ' '))) = 0
By "exclude 0" you seem to mean "exclude strings that only consist of zeros" rather than exclude '0' (as I originally interpreted it). That piece is harder to incorporate into a single regular expression, so separate logic can be used.
I think you want only digits:
where regexp_like(phone, '^[0-9]{5,}$') and replace(phone, '0', '') is not null
Or:
where regexp_like(phone, '^[^a-zA-Z]{5,}$') and replace(phone, '0', '') is not null
Or:
where regexp_like(phone, '^[^[:alpha]]{5,}$') and replace(phone, '0', '') is not null
You can also do this in one regular expression. I'm not sure if there is a cleaner method. The following is brute force, looking for a non-zero digit in any of the first five positions:
where regexp_like(phone, '^[1-9][0-9]{4,}$|^[0-9]{1}[1-9][0-9]{3,}$|^[0-9]{2}[1-9][0-9]{2,}$|^[0-9]{3}[1-9][0-9]{1,}$|^[0-9]{4}[1-9][0-9]*$')
Here is a db<>fiddle.

How remove symbols from the sentence in Oracle?

In Oracle database I have such table.
| TREE | ORG_NAME |
|---------------------------------|----------|
| \Google earth\Nest global\ATAP | ATAP |
| \Google earth\Nest\Beemoney\ | Beemoney |
| \Google\\\BeeKey\ | |
| | York |
I am trying to make sql query which would return such result.
| ORGANIZATION |
|-----------------------------------|
| Google earth > Nest global > ATAP |
| Google earth > Nest Beemoney |
| Google > BeeKey |
| York |
As you can see I want:
1) Replace \ symbol at the beginning and end of the sentence.
2) Replace \ symbol which is inside sentence to > symbol.
3) Replace \\\ symbol which is inside sentence to > symbol.
4) If TREE colomn is empty take record from ORG_NAME colomn.
Here is how I started. This SQL query solve 2, 3 and 4 part. How to solve problem with 1 part. I think I need to use REGEXP_REPLACE, right? How to make it correctly? Is there any other more elegant way to redisign sql query? As you can see I walk on the same table a few times.
SELECT
COALESCE (TREE, ORG_NAME) as ORGANIZATION
FROM (
SELECT
REPLACE(TREE, '\', '>') AS TREE,
ORG_NAME
FROM (
SELECT
REPLACE(TREE, '\\\', '>') AS TREE,
ORG_NAME
FROM
ORG
)
)
This could be a way with a regexp_replace and a trim to remove the characters from the beginning and the end of the string:
select nvl(regexp_replace( trim('\' from tree), '\\+', ' > '), org_name)
from yourTable
Here is a working solution which uses two calls to regexp_replace:
select
regexp_replace(
regexp_replace('\Google\\\BeeKey\', '^\\?(.*?)\\?$', '\1'), '\\+', ' > ')
from dual;
Google > BeeKey
Demo
The inner call to regexp_replace strips off any possible leading or trailing path separators. The outer call converts any number of internal path separators / to > separators as a replacement.

regex to convert alphanumeric and special characters in a string to * in oracle

I have a requirement to convert all the characters in my string to *. My string can also contain special characters as well.
For Example:
abc_d$ should be converted to ******.
Can any body help me with regex like this in oracle.
Thanks
Use REGEXP_REPLACE and replace any single character (.) with *.
SELECT
REGEXP_REPLACE (col, '.', '*')
FROM yourTable
Demo
Instead of regex you could also use
select rpad('*', length('abc_d$ s'),'*') from dual
-- use '*' and pad it until length fits with other *
Doku: rpad(string,length,appendWhat)
Repeat with a string of '*' should work as well: repeat(string,count) (not tested)
regex or rpad makes no difference - they are optimized down to the same execution plan:
n-th try of rpad:
Plan Hash Value : 1388734953
-----------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 | 00:00:01 |
| 1 | FAST DUAL | | 1 | | 2 | 00:00:01 |
-----------------------------------------------------------------
n-th try of regex_replace
Plan Hash Value : 1388734953
-----------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 | 00:00:01 |
| 1 | FAST DUAL | | 1 | | 2 | 00:00:01 |
-----------------------------------------------------------------
So it does not matter wich u use.
THIS IS NOT AN ANSWER
As suggested by Tom Biegeleisen’s brother Tim, I ran a test to compare a solution based on regular expressions to one using just standard string functions. (Specifically, Tim's answer with regular expressions vs. Patrick Artner's solution using just LENGTH and RPAD.)
Details of the test are shown below.
CONCLUSION: On a table with 5 million rows, each consisting of one string of length 30 (in a single column), the regular expression query runs in 21 seconds. The query using LENGTH and RPAD runs in one second. Both solutions read all the data from the table; the only difference is the function used in the SELECT clause. As noted already, both queries have the same execution plan, AND the same estimated cost - because the cost does not take into account differences in function calculation time.
Setup:
create table tbl ( str varchar2(30) );
insert into tbl
select a.str
from ( select dbms_random.string('p', 30) as str
from dual
connect by level <= 100
) a
cross join
( select level
from dual
connect by level <= 50000
) b
;
commit;
Note that there are only 100 distinct values, and each is repeated 50,000 times for a total of 5 million values. We know the values are repeated; Oracle doesn't know that. It will really do "the same thing" 5 million times, it won't just do it 100 times and then simply copy the results; it's not that smart. This is something that would be known only by seeing the actual stored data, it's not known to Oracle beforehand, so it can't "prepare" for such shortcuts.
Queries:
The two queries - note that I didn't want to send 5 million rows to screen, nor did I want to populate another table with the "masked" values (and muddy the waters with the time it takes to INSERT the results into another table); rather, I compute all the new strings and take the MAX. Again, in this test all "new" strings are equal to each other - they are all strings of 30 asterisks - but there is no way for Oracle to know that. It really has to compute all 5 million new strings and take the max over them all.
select max(new_str)
from ( select regexp_replace(str, '.', '*' ) as new_str
from tbl
)
;
select max(new_str)
from ( select rpad('*', length(str), '*') as new_str
from tbl
)
;
Try this:
SELECT
REGEXP_REPLACE('B^%2',
'*([A-Z]|[a-z]|[0-9]|[ ]|([^A-Z]|[^a-z]|[^0-9]|[^ ]))', '*') "REGEXP_REPLACE"
FROM DUAL;
I have included for white spaces too
select name,lpad(regexp_replace(name,name,'*'),length(name),'*')
from customer;

Split specific chain of digits from a string

There is this table (called data) below:
row comments
1 Fortune favors https://something.aaa.org/show_screen.cgi?id=548545 the 23 bold
2 No man 87485 is id# 548522 an island 65654.
3 125 Better id NEWLINE #546654 late than 5875565 never.
4 555 Better id546654 late than 565 never
I used the query below:
select row, substring(substring(comments::text, '((id|ID) [0-9]+)'), '[0-9]+') as id
from data
where comments::text ~* 'id [0-9]+';
This query output ignored rows 1 to 3. It just processed row 4:
row id
4 546654
Does some of you know how to properly split the ID number? Note that the ID contains up to 9 digits.
Use regexp_replace():
SELECT c.rownr
, regexp_replace (c.comments, e'.*[Ii][Dd][^0-9]*([0-9]+).*', '\1' ) AS the_id
, c.comments AS comments
FROM comments c
;
.* matches the initial garbage
`[Ii][Dd] matches the Id string, case insignificant
[^0-9]* consumes al non-numeric characters
([0-9]+) Matches the numeric string that you want
.*matches any trailing characters
'\1' (in the 3rd argument) tells that you want the stuff matched inside the first ()
Results:
rownr | the_id | comments
-------+--------+--------------------------------------------------------------------------------
1 | 548545 | Fortune favors https://something.aaa.org/show_screen.cgi?id=548545 the 23 bold
2 | 548522 | No man 87485 is id# 548522 an island 65654.
3 | 546654 | 125 Better id NEWLINE #546654 late than 5875565 never.
4 | 546654 | 555 Better id546654 late than 565 never
(4 rows)

SELECT only Unique values from Multiple Columns in SQL

I have to concatenate around 35 Columns in a table into a single string. The data within a column can be repetitive with different case, as per the below.
COL_1
apple | ORANGE | APPLE | Orange
COL_2
GRAPE | grape | Grape
The data in each column is pipe separated and I am trying to concatenate each column by separating with '|'. I expect the final output to be "apple | orange | grape" (All in lower case is fine)
But currently I am getting
apple | ORANGE | APPLE | Orange | GRAPE | grape | Grape
My current SQL is
SELECT COL_1 || '|' || COL_2 from TABLE_X;
Can some one explain me how to extract unique value from each column? This will reduce my string length drastically. My current SQL is exceeding Oracle's 4000 character limit.
I tried doing this
WITH test AS
( SELECT 'Test | test | Test' str FROM dual
)
SELECT *
FROM
(SELECT DISTINCT(LOWER(regexp_substr (str, '[^ | ]+', 1, rownum))) split
FROM test
CONNECT BY level <= LENGTH (regexp_replace (str, '[^ | ]+')) + 1
)
WHERE SPLIT IS NOT NULL;
This query produces only 'test'
Some how its producing unique values after splitting the string separated by ' | ' in a column. But doing this for 35+ columns in a single SQL query would be cumbersome. Could someone suggest a better approach?