How to trim out letter in the column - sql

I don't know the effective way to trim out letter in the name. For example, the f_name column have Jenny, Johnny, Doe, Ken, Smith.
I wanted to trim out the letter in these name so it consist only the first 2 letter. Like Je, Jo, Do, Ke, Sm as the output for the new column.
But the letter in these name don't have equal number of letter, like Johnny have 6 letter and John have 4 letter.
Is there any effective way to trim the uneven character's length without count all the character's length in f_name and place all the condition to trim all names. Like these below.
CASE WHEN LENGTH(f_name) > 4 THEN LTRIM(f_name, 2)

For Oracle use substr():
with data (f_name) as (
select 'Jenny' from dual union all
select 'Johnny' from dual union all
select 'Doe' from dual union all
select 'Ken' from dual union all
select 'Smith' from dual
)
select substr(f_name, 1, 2)
from data
Returns:
SUBSTR(F_NAME,1,2)
------------------
Je
Jo
Do
Ke
Sm

USE SUBSTRING
CASE WHEN LENGTH(f_name) > 4 THEN SUBSTR(f_name,1, 2)

If you want to get least acronym by all names. You may write something like
with s as (select level as lvl from dual connect by level <(select max(LENGTH(f_name)) from your_table ))
select f_name,
max(sub_f_name) keep (dense_rank FIRST order by cnt, t.lvl desc) as least_acronym
select f_name
, substr(t.f_name,-lvl) as sub_f_name
, t.lvl
, count(*) over (partition by substr(t.f_name,-lvl)) as cnt
from your_table t
, s)
group by f_name
NB. Just as Idea. Not tested yet

Related

SQL remove unwanted special characters from a string

Hi i am new to SQL and am writing a case statement for a column of grade values.
The values can be a length of 3 like A02, B04, A10, A09, D03. The first character is a letter while the next 2 are digits.
If a user enters in 'A02 I want to change it to do A02. Basically remove any special characters if there are present.
CASE
WHEN Grade like '[^0-9A-z]%' THEN ''
else Grade end as Grade
So far I have this but I am not sure how to use regex to remove the character only search for it.
Unless you really want to do a CASE for the fun of it, in oracle I'd do it like this which removes punctuation characters and spaces when you select it. Note this does not verify format so a grade of Z1234 would get returned.
WITH tbl(ID, grade) AS (
SELECT 1, 'A01' FROM dual UNION ALL
SELECT 1, '''B02' FROM dual UNION ALL
SELECT 2, '$ C01&' FROM dual
)
SELECT ID, grade, REGEXP_REPLACE(grade, '([[:punct:]]| )') AS grade_scrubbed
from tbl;
ID GRADE GRADE_SCRUBBED
---------- --------- --------------
1 A01 A01
1 'B02 B02
2 $ C01& C01
3 rows selected.
HOWEVER, that said, since you seem to want to verify the format and use regex, you could do it this way although it's a little fugly. See comments.
WITH tbl(ID, grade) AS (
-- Test data. Include every crazy combo you'd never expect to see,
-- because you WILL see it, it's just a matter of time :-)
SELECT 1, 'A01' FROM dual UNION ALL
SELECT 1, '''B02' FROM dual UNION ALL
SELECT 2, '$ C01&' FROM dual UNION ALL
SELECT 3, 'DDD' FROM dual UNION ALL
SELECT 4, 'A'||CHR(10)||'DEF' FROM dual UNION ALL
SELECT 5, 'Z1234' FROM dual UNION ALL
SELECT 6, NULL FROM dual
)
SELECT ID, grade,
CASE
-- Correct format of A99.
WHEN REGEXP_LIKE(grade, '^[A-Z]\d{2}$')
THEN grade
-- if not A99, see if stripping out punctuation and spaces make it match A99.
-- If so, return with punctuation and spaces stripped out.
WHEN NOT REGEXP_LIKE(grade, '^[A-Z]\d{2}$')
AND REGEXP_LIKE(REGEXP_REPLACE(grade, '([[:punct:]]| )'), '^[A-Z]\d{2}$')
THEN REGEXP_REPLACE(grade, '([[:punct:]]| )')
-- if not A99, and stripping out punctuation and spaces didn't make it match A99,
-- then the grade is in the wrong format.
WHEN NOT REGEXP_LIKE(grade, '^[A-Z]\d{2}$')
AND NOT REGEXP_LIKE(REGEXP_REPLACE(grade, '([[:punct:]]| )'), '^[A-Z]\d{2}$')
THEN 'Invalid grade format'
-- Something fell through all cases we tested for. Always expect the unexpected!
ELSE 'No case matched!'
END AS grade_scrubbed
from tbl;
ID GRADE GRADE_SCRUBBED
---------- -------------------- --------------------
1 A01 A01
1 'B02 B02
2 $ C01& C01
3 DDD Invalid grade format
4 A
DEF Invalid grade format
5 Z1234 Invalid grade format
6 No case matched!
7 rows selected.

Extract city from the address column

enter image description hereWhat happens if if ship 3 and 4 are null, but ship2 is not null, that should be city state
Here is sample data in the picture.
I prefer the oldfashioned SUBSTR + INSTR combination which, if compared to Gordon's and Barbaros' suggestions, seems to be somewhat better as their queries return strings that don't even contain a comma, while the OP says
extract city from 1 letter until 1 comma
Here's a comparison:
SQL> with tab (addr) as
2 (
3 select 'RALEIGH, NC 27604-3229' from dual union all
4 select 'SUITE A' from dual union all
5 select 'COEUR D ALENE, ID 83815-8652' from dual union all
6 select '*O/S CITY LIMITS*' from dual
7 )
8 select addr,
9 substr(addr, 1, instr(addr, ',') - 1) littlefoot,
10 --
11 regexp_substr(addr, '[^,]+', 1, 1) gordon,
12 regexp_substr(addr,'[^,]+') barbaros
13 from tab;
ADDR LITTLEFOOT GORDON BARBAROS
---------------------------- --------------- -------------------- --------------------
RALEIGH, NC 27604-3229 RALEIGH RALEIGH RALEIGH
SUITE A SUITE A SUITE A
COEUR D ALENE, ID 83815-8652 COEUR D ALENE COEUR D ALENE COEUR D ALENE
*O/S CITY LIMITS* *O/S CITY LIMITS* *O/S CITY LIMITS*
SQL>
If you want the part before the first comma, you can use regexp_substr():
select regexp_substr(addr, '[^,]+', 1, 1)
Just use regexp_substr with [^,]+ pattern as below
select regexp_substr(address,'[^,]+') as city
from tab;
SQL Fiddle Demo 1
Or alternatively by creating an auxilary table :
with tab as
(
select 'RALEIGH, NC 27604-3229' as str from dual union all
select 'SALINAS, CA 93901' from dual union all
select 'DEPEW, NY 14043-2603' from dual
)
select regexp_substr(str,'[^,]+') as city
from tab;
SQL Fiddle Demo 2
If you don't want to use regexp, you can just use:
select substr(city,1,(instr(city,',')-1))
from mytable;

In SQL sort by Alphabets first then by Numbers

In H2 Database when i have applied order by on varchar column Numbers are coming first then Alphabets. But need to come Alphabets first then Numbers.
I have tried with
ORDER BY IF(name RLIKE '^[a-z]', 1, 2), name
but getting error like If condition is not available in H2.
My Column Data is Like
A
1-A
3
M
2-B
5
B-2
it should come like
A
B-2
M
1-A
2-B
3
5
try this out
SELECT MYCOLUMN FROM MYTABLE ORDER BY REGEXP_REPLACE (MYCOLUMN,'(*)(\d)(*)','}\2') , MYCOLUMN
One thing can be done is by altering the ASCII in order by clause.
WITH tab
AS (SELECT 'A' col FROM DUAL
UNION ALL
SELECT '1-A' FROM DUAL
UNION ALL
SELECT '3' FROM DUAL
UNION ALL
SELECT 'M' FROM DUAL
UNION ALL
SELECT '2-B' FROM DUAL
UNION ALL
SELECT '5' FROM DUAL
UNION ALL
SELECT 'B-2' FROM DUAL)
SELECT col
FROM tab
ORDER BY CASE WHEN SUBSTR (col, 1, 1) < CHR (58) THEN CHR (177) || col ELSE col END;
I have Used CHR(58) as ASCII value of numbers end at 57. and CHR(177) is used as this is the maximum in the ASCII table.
FYR : ASCII table
Given the example dataset, I'm not sure if you need further logic than this- so I'll refrain from making further assumptions:
DECLARE #temp TABLE (myval char(3))
INSERT INTO #temp VALUES
('A'), ('1-A'), ('3'), ('M'), ('2-B'), ('5'), ('B-2')
SELECT myval
FROM #temp
ORDER BY CASE WHEN LEFT(myval, 1) LIKE '[a-Z]'
THEN 1
ELSE 2
END
,LEFT(myval, 1)
Gives output:
myval
A
B-2
M
1-A
2-B
3
5

substring, after last occurrence of character?

I need help with this problem:
I have a column named phone_number and I wanted to query this column to get the the string right of the last occurrence of '.' for all kinds of numbers in one single sql query.
example #:
515.123.1277
011.44.1345.629268
I need to get 1277 and 629268 respectively.
I have this so far:
select phone_number,
case when length(phone_number) <= 12
then
substr(phone_number,-4)
else
substr (phone_number, -6) end
from employees;
This works for this example, but I want it for all kinds of # formats.
Would be great to get some input.
Thanks
It should be as easy as this regex:
SELECT phone_number, REGEXP_SUBSTR(phone_number, '[^.]*$')
FROM employees;
With the end anchor $ it should get everything that is not a . character after the final .. If the last character is . then it will return NULL.
Search for a pattern including the period, [.] with digits, \d, followed by the end of the string, $.
Associate the digits with a character group by placing the pattern, \d, in parenthesis (see below). This is referenced with the subexpr parameter, 1 (last parameter).
Here is the solution:
SCOTT#dev> list
1 WITH t AS
2 ( SELECT '414.352.3100' p_number FROM dual
3 UNION ALL
4 SELECT '515.123.1277' FROM dual
5 UNION ALL
6 SELECT '011.44.1345.629268' FROM dual
7 )
8* SELECT regexp_substr(t.p_number, '[.](\d+)$', 1, 1, NULL, 1) end_num FROM t
SCOTT#dev> /
END_NUM
========================================================================
3100
1277
629268
You can do something like this in oracle:
select regexp_substr(num,'[^\.]+',1,regexp_count(num,'\.')+1) last_number from
(select '515.123.1277' num from dual union all
select '011.44.1345.629268' from dual );
Previous to 11gR2 you can use regexp_replace instead regexp_count:
select regexp_substr(num,'[^\.]+',1,length(regexp_replace (num , '[^\.]+'))+1) last_number from
(select '515.123.1277' num from dual union all
select '011.44.1345.629268' from dual );

Check if string variations exists in another string

I need to check if a partial name matches full name. For example:
Partial_Name | Full_Name
--------------------------------------
John,Smith | Smith William John
Eglid,Timothy | Timothy M Eglid
I have no clue how to approach this type of matching.
Another thing is that name and last name may come in the wrong order, making it harder.
I could do something like this, but this only works if names are in the same order and 100% match
decode(LOWER(REGEXP_REPLACE(Partial_Name,'[^a-zA-Z'']','')), LOWER(REGEXP_REPLACE(Full_Name,'[^a-zA-Z'']','')), 'Same', 'Different')
you could use this pattern on the text provided - works for most engines
([^ ,]+),([^ ,]+)(?=.*\b\1\b)(?=.*\b\2\b)
Demo
WITH
/*
tab AS
(
SELECT 'Smith William John' Full_Name, 'John,Smith' Partial_Name FROM dual
UNION ALL SELECT 'Timothy M Eglid', 'Eglid,timothy' FROM dual
UNION ALL SELECT 'Tim M Egli', 'Egli,Tim,M2' FROM dual
UNION ALL SELECT 'Timot M Eg', 'Eg' FROM dual
),
*/
tmp AS (
SELECT Full_Name, Partial_Name,
trim(CASE WHEN instr(Partial_Name, ',') = 0 THEN Partial_Name
ELSE regexp_substr(Partial_Name, '[^,]+', 1, lvl+1)
END) token
FROM tab t CROSS JOIN (SELECT lvl FROM (SELECT LEVEL-1 lvl FROM dual
CONNECT BY LEVEL <= (SELECT MAX(LENGTH(Partial_Name) - LENGTH(REPLACE(Partial_Name, ',')))+1 FROM tab)))
WHERE LENGTH(Partial_Name) - LENGTH(REPLACE(Partial_Name, ',')) >= lvl
)
SELECT Full_Name, Partial_Name
FROM tmp
GROUP BY Full_Name, Partial_Name
HAVING count(DISTINCT token)
= count(DISTINCT CASE WHEN REGEXP_LIKE(Full_Name, token, 'i')
THEN token ELSE NULL END);
In the tmp each partial_name is splitted on tokens (separated by comma)
The resulting query retrieves only those rows which full_name matches all the corresponding tokens.
This query works with the dynamic number of commas in partial_name. If there can be only zero or one commas then the query will be much easier:
SELECT * FROM tab
WHERE instr(Partial_Name, ',') > 0
AND REGEXP_LIKE(full_name, substr(Partial_Name, 1, instr(Partial_Name, ',')-1), 'ix')
AND REGEXP_LIKE(full_name, substr(Partial_Name,instr(Partial_Name, ',')+1), 'ix')
OR instr(Partial_Name, ',') = 0
AND REGEXP_LIKE(full_name, Partial_Name, 'ix');
This is what I ended up doing... Not sure if this is the best approach.
I split partials by comma and check if first name present in full name and last name present in full name. If both are present then match.
CASE
WHEN
instr(trim(lower(Full_Name)),
trim(lower(REGEXP_SUBSTR(Partial_Name, '[^,]+', 1, 1)))) > 0
AND
instr(trim(lower(Full_Name)),
trim(lower(REGEXP_SUBSTR(Partial_Name, '[^,]+', 1, 2)))) > 0
THEN 'Y'
ELSE 'N'
END AS MATCHING_NAMES