How to delete all non-numerical letters in db2 - sql

I have some data in DATA column (varchar) that looks like this:
Nowshak 7,485 m
Maja e Korabit (Golem Korab) 2,764 m
Tahat 3,003 m
Morro de Moco 2,620 m
Cerro Aconcagua 6,960 m (located in the northwestern corner of the province of Mendoza)
Mount Kosciuszko 2,229 m
Grossglockner 3,798 m
What I want is this:
7485
2764
3003
2620
6960
2229
3798
Is there a way in IBM DB2 version 9.5 to remove/delete all those non-numeric letters by doing something like this:
SELECT replace(DATA, --somekind of regular expression--, '') FROM TABLE_A
or any other ways?
This question follows from this question.

As suggested in the other question, the TRANSLATE function might help. For example, try this:
select translate('Nowshak 7,485 m','','Nowshakm,') from sysibm.sysdummy1;
Returns:
7 485
Probably with a little tweaking you can get it to how you want it...in the third argument of the function you just need to specify the entire alphabet. Kind of ugly but it will work.

One easy way to accomplish that is to use the TRANSLATE(value, replacewith, replacelist) function. It replaces all of the characters in a list (third parameter) with the value in the second parameter.
You can leverage that to essentially erase all of the non-numeric charaters out of the character string, including the spaces.
Just make the list in the third parameter contain all of the possible characters you might see that you don't want. Translate those to an empty space, and you end up with just the characters you want, essentially erasing the undesired characters.
Note: I included all of the common symbols (non-alpha numeric) for the benefit of others who may have character values of a larger variety than your example.
Select
TRANSLATE(UCASE(CHAR_COLUMN),'',
'ABCDEFGHIJKLMNOPQRSTUVWXYZ!##$%^&*()-=+/\{}[];:.,<>? ')
FROM TABLE_A
More simply: For your particular set of values, since there is a much smaller set of possible characters you could trim the replace list down to this:
Select
TRANSLATE(UCASE(CHAR_COLUMN),'','ABCDEFGHIJKLMNOPQRSTUVWXYZ(), ')
FROM TABLE_A
NOTE: The "UCASE" on the CHAR_COLUMN is not necessary, but it was a nice enhancement to simplify this expression by eliminating the need to include all of the lower case alpha characters.
TRANSLATE(CHAR_COLUMN,'',
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!##$%^&*()-=+/\{}[];:.,<>? ')

As many of the answers above your best approach is to use the TRANSLATE function. However this approach is a different as you can white list the characters you want instead of black list the characters you don't want. We can do this by using the TRANSLATE function twice. We'll use the inner translate to generate a list of characters to remove for the parameter of the outer translate.
select TRANSLATE(dirty,'',TRANSLATE(dirty,'','1234567890',''),'') as clean
from (Values 'Nowshak 7,485 m'
,'Maja e Korabit (Golem Korab) 2,764 m'
,'Tahat 3,003 m','Morro de Moco 2,620 m'
,'Cerro Aconcagua 6,960 m (located in the northwestern corner of the province of Mendoza)'
,'Mount Kosciuszko 2,229 m','Grossglockner 3,798 m'
) as temp(dirty)

Just taking #Darryls99 and turning it into a UDF
CREATE OR REPLACE FUNCTION REMOVE_ALLBUT(in_string VARCHAR(32000), characters_to_remote VARCHAR(32000))
RETURNS VARCHAR(32000)
LANGUAGE SQL CONTAINS SQL DETERMINISTIC NO EXTERNAL ACTION
RETURN
TRANSLATE(in_string,'',TRANSLATE(in_string,'',characters_to_remote,''),'')
;
use like this
select DB_REMOVE_ALLBUT(s,'1234567890')
from (values 'Nowshak 7,485 m'
,'Maja e Korabit (Golem Korab) 2,764 m'
,'Tahat 3,003 m','Morro de Moco 2,620 m'
,'Cerro Aconcagua 6,960 m (located in the northwestern corner of the province of Mendoza)'
,'Mount Kosciuszko 2,229 m'
,'Grossglockner 3,798 m'
) t(s);
which returns
1
----
7485
2764
3003
2620
6960
2229
3798

Dirty string can be like this: 'qwerty12453lala<<>777*9'
We need to get cleared string and keep only digits.
We could remove any excess characters with TRANSLATE function,
but there is one problem: too long and ugly value of 3-th parameter.
Something like this:
VALUES
(
TRANSLATE( UPPER('qwerty12453lala<<>777*9'), '', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ!##$%^&*()-=+/\{}[];:.,<>? ')
)
So, this is not very convenient.
My idea is - use TRANSLATE functuion 2 times ( one time inside another one):
Calculate 3-th parameter as a particular list of replaced symbols
Use TRANSLATE function second time to replace excess symbols by using this calculated parameter
Let me show you here in code:
VALUES
(
REPLACE --Remove spaces from result
(
TRANSLATE
(
UPPER( 'qwerty12453lala<<>777*9')
, ' '
, TRANSLATE( UPPER( 'qwerty12453lala<<>777*9') , ' ' , '0123456789')-- This is calculation of 3-th param, it contains only NOT digital characters, like 'QWERTYLALA<<>*'
)
, ' '
, ''
)
)
Result must be like this: 124537779
In case SELECT statement, it would be like this:
SELECT REPLACE
(
TRANSLATE( UPPER(T.DIRTY_FIELD), ' ', TRANSLATE(UPPER(T.DIRTY_FIELD), '', '1234567890' ) )
, ' '
, ''
)
FROM SOMETABLE T

the proper combination of all
select replace(translate(dirty,' ','ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!##$%^&*()-=+/{}[];:.,<>?' ), ' ','') as clean

Is there a way in IBM DB2 version 9.5
to remove/delete all those non-numeric
letters by doing something like this:
SELECT replace(DATA, --somekind of
regular expression--, '') FROM TABLE_A
or any other ways?
No. You will have to create a User Defined Function or implement it in your host application's language.

The statement below will remove non-alphanumeric characters from any 'string-value' and prevents the SQLSTATE message 42815 when a zero length string-value is passed.
SELECT REPLACE(TRANSLATE(string-value || '|',
'||||||||||||||||||||||||||||||||',
'`¬!"£$%^&*()_-+={[}]:;#~#,<>.?/'''),'|','')
FROM SYSIBM.SYSDUMMY1;

Related

SQL - How to remove a space character from a string

I have a table where the max length of a column (varchar) is 12, someone has loaded some value with a space, so rather than 'SPACE' it's 'SPACE '
I want to remove the space using a script, I was positive RTRIM or REPLACE(myValue, ' ', '') would work but LEN(myValue) shows there is still and extra character?
As mentioned by a couple folks, it may not be a space. Grab a copy of ngrams8k and you use it to identify the issue. For example, here we have the text, " SPACE" with a preceding space and trailing CHAR(160) (HTML BR tag). CHAR(160) looks like a space in SSMS but isn't "trimable". For example consider this query:
DECLARE #string VARCHAR(100) = ' SPACE'+CHAR(160);
SELECT '"'+#string+'"'
Using ngrams8k you could do this:
DECLARE #string VARCHAR(100) = ' SPACE'+CHAR(160);
SELECT
ng.position,
ng.token,
asciival = ASCII(ng.token)
FROM dbo.ngrams8k(#string,1) AS ng;
Returns:
position token asciival
---------- ------- -----------
1 32
2 S 83
3 P 80
4 A 65
5 C 67
6 E 69
7   160
As you can see, the first character (position 1) is CHAR(32), that's a space. The last character (postion 7) is not a space.
Knowing that CHAR(160) is the issue you could fix it like so:
SET #string = REPLACE(LTRIM(#string),CHAR(160),'')
If you are using SQL Server 2017+ you can also use TRIM which does a whole lot more than just LTRIM-and-RTRIM-ing. For example, this will remove
leading and trailing tabs, spaces, carriage returns, line returns and HTML BR tags.
SET #string = SELECT TRIM(CHAR(32)+CHAR(9)+CHAR(10)+CHAR(13)+CHAR(160) FROM #string)
Odds are it is some other non-printing character, carriage return is a big one when going from between *nix and other OS. One way to tell is to use the DUMP function. So you could start with a query like:
SELECT dump(column_name)
FROM your_table
WHERE column_name LIKE 'SPACE%'
That should help you find the offending character, however, that doesn't fix your problem. Instead, I would use something like REGEXP_REPLACE:
SELECT REGEXP_REPLACE(column_name, '[^A-z]')
FROM your_table
That should take care of any non-printing characters. You may need to play with the regular expression if you expect numbers or symbols in your string. You could switch to a character class like:
SELECT REGEXP_REPLACE(column_name, '[:cntrl:]')
FROM your_table

Retrieve Second to Last Word in PostgreSQL

I am using PostgreSQL 9.5.1
I have an address field where I am trying to extract the street type (AVE, RD, ST, etc). Some of them are formatted like this: 5th AVE N or PEE DEE RD N
I have seen a few methods in PostgreSQL to count segments from the left based on spaces i.e. split_part(name, ' ', 3), but I can't seem to find any built-in functions or regular expression examples where I can count the characters from the right.
My idea for moving forward is something along these lines:
select case when regexp_replace(name, '^.* ', '') = 'N'
then *grab the second to last group of string values*
end as type;
Leaving aside the issue of robustness of this approach when applied to address data, you can extract the penultimate space-delimited substring in a string like this:
with a as (
select string_to_array('5th AVE N', ' ') as addr
)
select
addr[array_length(addr, 1)-1] as street
from
a;

SQL- Remove Trailing space and add it to the beginning of the Name

Was working on SQL-EX.ru exercises.
There is one question for DML that I could not do, but I cannot proceed to the next one, until this one is done.
the question itself: All the trailing spaces in the name column of the Battles table remove and add them at the beginning of the name.
My code:
Update Battles
set name=concat(' ',(LTRIM(RTRIM(name))))
The system does not let it go through, I understand that I am using ' ' for the concat, whereas I need to use the stuff that got trimmed. And I have no idea how...
Any help would be very much appreciated
Try Something Like:-
set name = lpad(trim(name), length(trim(name))+4, ' ')
Here use TRIM to remove space from both side. use LPAD to add something on left side with n (4) chars
I'm not familiar with SQL-EX.ru, but if it's Oracle compatible and you can use regular expressions (or you are at that point in the training) here's a way. Maybe it'll give you an idea at least. The first part is just setup and uses a WITH clause to create a table (like a temp table in memory, actually called a Common Table Expression or CTE) called battles containing a name column with 2 rows. Each name column datum has a different number of spaces at the end. Next select from that column using a regular expression that uses 2 "remembered" groups surrounded by parentheses, the first containing the string up to until but not including the first space, the second containing 0 or more space characters anchored to the end of the line. Replace that with the 2nd group (the spaces) first, followed by the first group (the first part of the string). This is surrounded by square brackets just to prove in the output the same spaces were moved to the front of the string.
SQL> with battles(name) as (
select 'test2 ' from dual union
select 'test1 ' from dual
)
select '[' || regexp_replace(name, '(.*?)([ ]*)$', '\2\1') || ']' fixed
from battles;
FIXED
----------------------------------------------------------------------------
[ test1]
[ test2]
SQL>
I hope this solution can be applied to your problem or at least give you some ideas.
Try this:
set name = case when len(name) > len(rtrim(name))
then replicate(' ', len(name) - len(rtrim(name))) + rtrim(name)
else name
end
update battles
set name = case when (len(name+'a')-1) > len(rtrim(name))
then
replicate(' ',
(len(name+'a')-1) - len(rtrim(name))) + rtrim(name)
else name
end
Len() doesn't count trailing spaces. So using (len(name+'a')-1).
Simplest answer:
UPDATE Battles
SET name = SPACE(DATALENGTH(name)-DATALENGTH(RTRIM(name))) + RTRIM(name)
But only works because name is VARCHAR.
More generic is to do:
UPDATE Battles
SET name = SPACE(len(name+'x')-1-len(RTRIM(name))) + RTRIM(name)
simple example below ... enjoy :)
update battles set name =
Space( DATALENGTH(name) - DATALENGTH(rtrim(name))) + rtrim(name)
where date in ( select date from battles)

Select statement with column contains '%'

I want to select names from a table where the 'name' column contains '%' anywhere in the value. For example, I want to retrieve the name 'Approval for 20 % discount for parts'.
SELECT NAME FROM TABLE WHERE NAME ... ?
You can use like with escape. The default is a backslash in some databases (but not in Oracle), so:
select name
from table
where name like '%\%%' ESCAPE '\'
This is standard, and works in most databases. The Oracle documentation is here.
Of course, you could also use instr():
where instr(name, '%') > 0
One way to do it is using replace with an empty string and checking to see if the difference in length of the original string and modified string is > 0.
select name
from table
where length(name) - length(replace(name,'%','')) > 0
Make life easy on yourselves and just use REGEXP_LIKE( )!
SQL> with tbl(name) as (
select 'ABC' from dual
union
select 'E%FS' from dual
)
select name
from tbl
where regexp_like(name, '%');
NAME
----
E%FS
SQL>
I read the documentation mentioned by Gordon. The relevent sentence is:
An underscore (_) in the pattern matches exactly one character (as opposed to one byte in a multibyte character set) in the value
Here was my test:
select c
from (
select 'a%be' c
from dual) d
where c like '_%'
The value a%be was returned.
While the suggestions of using instr() or length in the other two answers will lead to the correct answer, they will do so slowly. Filtering on function results simply take longer than filtering on fields.

Oracle SQL - Parsing a name string and converting it to first initial & last name

Does anyone know how to turn this string: "Smith, John R"
Into this string: "jsmith" ?
I need to lowercase everything with lower()
Find where the comma is and track it's integer location value
Get the first character after that comma and put it in front of the string
Then get the entire last name and stick it after the first initial.
Sidenote - instr() function is not compatible with my version
Thanks for any help!
Start by writing your own INSTR function - call it my_instr for example. It will start at char 1 and loop until it finds a ','.
Then use as you would INSTR.
The best way to do this is using Oracle Regular Expressions feature, like this:
SELECT LOWER(regexp_replace('Smith, John R',
'(.+)(, )([A-Z])(.+)',
'\3\1', 1, 1))
FROM DUAL;
That says, 1) when you find the pattern of any set of characters, followed by ", ", followed by an uppercase character, followed by any remaining characters, take the third element (initial of first name) and append the last name. Then make everything lowercase.
Your side note: "instr() function is not compatible with my version" doesn't make sense to me, as that function's been around for ages. Check your version, because Regular Expressions was only added to Oracle in version 9i.
Thanks for the points.
-- Stew
instr() is not compatible with your version of what? Oracle? Are you using version 4 or something?
There is no need to create your own function, and quite frankly, it seems a waste of time when this can be done fairly easily with sql functions that already exist. Care must be taken to account for sloppy data entry.
Here is another way to accomplish your stated goal:
with name_list as
(select ' Parisi, Kenneth R' name from dual)
select name
-- There may be a space after the comma. This will strip an arbitrary
-- amount of whitespace from the first name, so we can easily extract
-- the first initial.
, substr(trim(substr(name, instr(name, ',') + 1)), 1, 1) AS first_init
-- a simple substring function, from the first character until the
-- last character before the comma.
, substr(trim(name), 1, instr(trim(name), ',') - 1) AS last_name
-- put together what we have done above to create the output field
, lower(substr(trim(substr(name, instr(name, ',') + 1)), 1, 1)) ||
lower(substr(trim(name), 1, instr(trim(name), ',') - 1)) AS init_plus_last
from name_list;
HTH,
Gabe
I have a hard time believing you don’t have access to a proper instr() but if that’s the case, implement your own version.
Assuming you have that straightened out:
select
substr(
lower( 'Smith, John R' )
, instr( 'Smith, John R', ',' ) + 2
, 1
) || -- first_initial
substr(
lower( 'Smith, John R' )
, 1
, instr( 'Smith, John R', ',' ) - 1
) -- last_name
from dual;
Also, be careful about your assumption that all names will be in that format. Watch out for something other than a single space after the comma, last names having data like “Parisi, Jr.”, etc.