How to extract numerical data from SQL result - sql

Suppose there is a table "A" with 2 columns - ID (INT), DATA (VARCHAR(100)).
Executing "SELECT DATA FROM A" results in a table looks like:
DATA
---------------------
Nowshak 7,485 m
Maja e Korabit (Golem Korab) 2,764 m
Tahat 3,003 m
Morro de Moco 2,620 m
Cerro Aconcagua 6,960 m (located in the northwestern corner of the province of Mendoza)
Mount Kosciuszko 2,229 m
Grossglockner 3,798 m
// the DATA continues...
---------------------
How can I extract only the numerical data using some kind of string processing function in the SELECT SQL query so that the result from a modified SELECT would look like this:
DATA (in INTEGER - not varchar)
---------------------
7485
2764
3003
2620
6960
2229
3798
// the DATA in INTEGER continues...
---------------------
By the way, it would be best if this could be done in a single SQL statement. (I am using IBM DB2 version 9.5)
Thanks :)

I know this thread is old, and the OP doesn't need the answer, but I had to figure this out with a few hints from this and other threads. They all seem to be missing the exact answer.
The easy way to do this is to TRANSLATE all unneeded characters to a single character, then REPLACE that single character with an empty string.
DATA = 'Nowshak 7,485 m'
# removes all characters, leaving only numbers
REPLACE(TRANSLATE(TRIM(DATA), '_____________________________________________________________________________________________', ' abcdefghijklmnopqrstuvwzyaABCDEFGHIJKLMNOPQRSTUVWXYZ`~!##$%^&*()-_=+\|[]{};:",.<>/?'), '_', '')
=> '7485'
To break down the TRANSLATE command:
TRANSLATE( FIELD or String, <to characters>, <from characters> )
e.g.
DATA = 'Sample by John'
TRANSLATE(DATA, 'XYZ', 'abc')
=> a becomes X, b becomes Y, c becomes Z
=> 'SXmple Yy John'
** Note: I can't speak to performance or version compatibility. I'm on a 9.x version of DB2, and new to the technology. Hope this helps someone.

In Oracle:
SELECT TO_NUMBER(REGEXP_REPLACE(data, '[^0-9]', ''))
FROM a
In PostgreSQL:
SELECT CAST(REGEXP_REPLACE(data, '[^0-9]', '', 'g') AS INTEGER)
FROM a
In MS SQL Server and DB2, you'll need to create UDF's for regular expressions and query like this.
See links for more details.

Doing a quick search on line for DB2 the best inbuilt function I can find is Translate It lets you specify a list of characters you want to change to other characters. It's not ideal, but you can specify every character that you want to strip out, that is, every non numeric character available...
(Yes, that's a long list, a very long list, which is why I say it's not ideal)
TRANSLATE('data', 'abc...XYZ,./\<>?|[and so on]', ' ')
Alternatively you need to create a user defined function to search for the number. There are a few alternatives for that.
Check each character one by one and keep it only if it's a numeric.
If you know what precedes the number and what follows the number, you can search for those and keep what is in between...

To elaborate on Dems's suggeston, the approach I've used is a scalar user-defined function (UDF) that accepts an alphanumeric string and recursively iterates through the string (one byte per iteration) and suppresses the non-numeric characters from the output. The recursive expression will generate a row per iteration, but only the final row is kept and returned to the calling application.

Related

Extracting string between two characters in sql oracle database

I need to extract a string that will located between two characters, with always the same pattern
sample string:
A CRN_MOB_H_001 a--> <AVLB>
What is in bold AVLB is what I want to extract, the whole string will always have the same pattern, and everything that is before the < is irrelevant to me.
The string will always have the same pattern:
Some string with possible special characters such as <>, although very unlikely so, it can be ignored if too complicated
a space
then -->
a space
and then the part that is interesting <XXXXXXX>
The XXXXXXX representing the part I want to extract
thank you for your time.
I have tried several things, could not get anywhere I wanted.
Please try this REGEXP_SUBSTR(), which selects what is in the angled brackets when they occur at the end of the string.
Note the WITH clause just sets up test data and is a good way to supply data for people to help you here.
WITH tbl(str) AS (
SELECT 'A CRN_MOB_H_001 a--> <AVLB>' FROM dual
)
SELECT REGEXP_SUBSTR(str, '.*<(.*)>$', 1, 1, NULL, 1) DATA
FROM tbl;
DATA
----
AVLB
1 row selected.

Query to retrieve only columns which have last name starting with 'K'

]
The name column has both first and last name in one column.
Look at the 'rep' column. How do I filter only those rep names where the last name is starting with 'K'?
The way that table is defined won't allow you to do that query reliably--particularly if you have international names in the table. Not all names come with the given name first and the family name second. In Asia, for example, it is quite common to write names in the opposite order.
That said, assuming you have all western names, it is possible to get the information you need--but your indexes won't be able to help you. It will be slower than if your data had been broken out properly.
SELECT rep,
RTRIM(LEFT(LTRIM(RIGHT(rep, LEN(rep) - CHARINDEX(' ', rep))), CHARINDEX(' ', LTRIM(RIGHT(rep, LEN(rep) - CHARINDEX(' ', rep)))) - 1)) as family_name
WHERE family_name LIKE 'K%'
So what's going on in that query is some string manipulation. The dialect up there is SQL Server, so you'll have to refer to your vendor's string manipulation function. This picks the second word, and assumes the family name is the second word.
LEFT(str, num) takes the number of characters calculated from the left of the string
RIGHT(str, num) takes the number of characters calculated from the right of the string
CHARINDEX(char, str) finds the first index of a character
So you are getting the RIGHT side of the string where the count is the length of the string minus the first instance of a space character. Then we are getting the LEFT side of the remaining string the same way. Essentially if you had a name with 3 parts, this will always pick the second one.
You could probably do this with SUBSTRING(str, start, end), but you do need to calculate where that is precisely, using only the string itself.
Hopefully you can see where there are all kinds of edge cases where this could fail:
There are a couple records with a middle name
The family name is recorded first
Some records have a title (Mr., Lord, Dr.)
It would be better if you could separate the name into different columns and then the query would be trivial--and you have the benefit of your indexes as well.
Your other option is to create a stored procedure, and do the calculations a bit more precisely and in a way that is easier to read.
Assuming that the name is <firstname> <lastname> you can use:
where rep like '% K%'

How to determine if a variable has a special character using dual table on sql?

I'm currently trying to check if a string has a special character (value that is not 0 to 9 A to Z a to z), but the inhouse language that I'm currently using has a very limited function to do it (possible but it will take a lot of lines). but I am able to do a query on sql. Now I would like to ask if it is possible to query using the dual table on sql, My plan is to pass the string to variable and this variable will be use on my sql command. Thanks in advance.
Here is what you can use
SELECT REGEXP_INSTR('Test!ing','[^[:alnum:]]') FROM dual;
This will return a number other than 0 whenever your string has anything other than letters or numbers.
You can use TRANSLATE to remove all okay characters from the string. You get back a string containing only undesired characters - or an empty string when there are none.
select translate(
'AbcDefg1234%99.26éXYZ', -- your string
'.ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789',
'.') from dual;
returns: %.é

Matching exactly 2 characters in string - SQL

How can i query a column with Names of people to get only the names those contain exactly 2 “a” ?
I am familiar with % symbol that's used with LIKE but that finds all names even with 1 a , when i write %a , but i need to find only those have exactly 2 characters.
Please explain - Thanks in advance
Table Name: "People"
Column Names: "Names, Age, Gender"
Assuming you're asking for two a characters search for a string with two a's but not with three.
select *
from people
where names like '%a%a%'
and name not like '%a%a%a%'
Use '_a'. '_' is a single character wildcard where '%' matches 0 or more characters.
If you need more advanced matches, use regular expressions, using REGEXP_LIKE. See Using Regular Expressions With Oracle Database.
And of course you can use other tricks as well. For instance, you can compare the length of the string with the length of the same string but with 'a's removed from it. If the difference is 2 then the string contained two 'a's. But as you can see things get ugly real soon, since length returns 'null' when a string is empty, so you have to make an exception for that, if you want to check for names that are exactly 'aa'.
select * from People
where
length(Names) - 2 = nvl(length(replace(Names, 'a', '')), 0)
Another solution is to replace everything that is not an a with nothing and check if the resulting String is exactly two characters long:
select names
from people
where length(regexp_replace(names, '[^a]', '')) = 2;
This can also be extended to deal with uppercase As:
select names
from people
where length(regexp_replace(names, '[^aA]', '')) = 2;
SQLFiddle example: http://sqlfiddle.com/#!4/09bc6
select * from People where names like '__'; also ll work

SQL String contains ONLY

I have a table with a field that denotes whether the data in that row is valid or not. This field contains a string of undetermined length. I need a query that will only pull out rows where all the characters in this field are N. Some possible examples of this field.
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNN
NNNNNEEEENNNNNNNNNNNN
NNNNNOOOOOEEEENNNNNNNNNNNN
Any suggestions on a postcard please.
Many thanks
This should do the trick:
SELECT Field
FROM YourTable
WHERE Field NOT LIKE '%[^N]%' AND Field <> ''
What it's doing is a wildcard search, broken down:
The LIKE will find records where the field contains characters other than N in the field. So, we apply a NOT to that as we're only interested in records that do not contain characters other than N. Plus a condition to filter out blank values.
SELECT *
FROM mytable
WHERE field NOT LIKE '%[^N]%'
I don't know which SQL dialect you are using. For example Oracle has several functions you may use. With oracle you could use condition like :
WHERE LTRIM(field, 'N') = ''
The idea is to trim out all N's and see if the result is empty string. If you don't have LTRIM, check if you have some kind of TRANSLATE or REPLACE function to do the same thing.
Another way to do it could be to pick length of your field and then construct comparator value by padding empty string with N. Perhaps something like:
WHERE field = RPAD('', field, 'N)
Oracle pads that empty string with N's and picks number of pad characters from length of the second argument. Perhaps this works too:
WHERE field = RPAD('', LENGTH(field), 'N)
I haven't tested those, but hopefully that give you some ideas how to solve your problem. I guess that many of these solutions have bad performance if you have lot of rows and you don't have other WHERE conditions to select proper index.