SQL remove characters from string and leave number only - sql

I have string(nvarchar) from db data and I would like to transfer it to numbers only. I was searching on Google for solution but I didnt find anything. I found something similiar here on StackOverflow but everything was removing characters only from left side, but if there is any character on right side or between numbers it wont work.
Solution I found but is not working:
select substring(XX,
PatIndex('%[0-9]%', XX),
len(XX))
For example I have text: '4710000 text' so this substring returns me same text I putted inside of it which is again '4710000 text'. Is there any other way how to do that? Without creating functions or using IFs, begins, variables (#text etc.).

Try this, it seems to work like a charm. I wish I could take credit but it's from this post. If it works for you please give him the upvote.
The 'with' is just a CTE that sets up test data.
with tbl(str) as (
select '4710000 text'
)
SELECT
(SELECT CAST(CAST((
SELECT SUBSTRING(str, Number, 1)
FROM master..spt_values
WHERE Type='p' AND Number <= LEN(str) AND
SUBSTRING(str, Number, 1) LIKE '[0-9]' FOR XML Path(''))
AS xml) AS varchar(MAX)))
FROM
tbl

If you are using SQL Server and a fully supported version you can use translate like so:
select Replace(Translate('4710000 text', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', Replicate('*', 26)), '*', '');
If you have additional non-numerical characters add those in to the string and amend 26 accordingly.

Related

How to grab certain value found in a string column?

I have a column that contains several different values. This is not in JSON format. It is a string that is separated into different sections. I need to grab everything that is found under ID only.
In the examples below, I only want to grab the word: "syntax" and "village"
select value.id
from TBL_A
The above does not work since this is not a json.
Does anyone know how to grab the full word that is found under the "id" section in that string column?
Even though it's a string, since it's in properly formatted JSON you can convert the string to a JSON variant like this:
select parse_json(VALUE);
You can then access its properties using standard colon and dot notations:
select parse_json(VALUE):id::string
I would go with Greg's option of treat it as JSON because it sure looks like JSON, but if you know under some situations it most definitely is not JSON like, you could use SPLIT_TO_TABLE, and TRIM, if you know , is not inside any of the strings
SELECT t.value,
TRIM(t.value,'{}') as trim_value,
s.value as s_value,
split_part(s_value, ':', 1) as token,
split_part(s_value, ':', 2) as val,
FROM table t
,LATERAL SPLIT_TO_TABLE(trim_value, ',') s
Which can be compacted up, filtered with QUALIFY to get just the rows you want:
SELECT
SPLIT_PART(s.value, ':', 2) AS val,
FROM table t
,LATERAL SPLIT_TO_TABLE(TRIM(t.value,'{}'), ',') s
QUALIFTY SPLIT_PART(s.value, ':', 1) = 'id'

Sort a VARCHAR column in SQL Server that contains numbers?

I have a column in which data has letters with numbers.
For example:
1 name
2 names ....
100 names
When sorting this data, it is not sorted correctly, how can I fix this? I made a request but it doesn’t sort correctly.
select name_subagent
from Subagent
order by
case IsNumeric(name_subagent)
when 1 then Replicate('0', 100 - Len(name_subagent)) + name_subagent
else name_subagent
end
This should work
select name_subagent
from Subagent
order by CAST(LEFT(name_subagent, PATINDEX('%[^0-9]%', name_subagent + 'a') - 1) as int)
This expression will find the first occurrence of a letter withing a string and assume anything prior to this position is a number.
You will need to adapt this statement to your needs as apparently your data is not in Latin characters.
With a bit of tweaking you should be able to achieve exactly what you're looking for:
select
name_subagent
from
Subagent
order by
CAST(SUBSTRING(name_subagent,0,PATINDEX('%[A-Z]%',name_subagent)) as numeric)
Note, the '%[A-Z]%' expression. This will only look for the first occurrence of a letter within the string.
I'm not considering special characters such as '!', '#' and so on. This is the bit you might want to play around with and adapt to your needs.

Select everything to the right of a specific character

Given this data:
Home: (708) 296-2112
I want everything to the right of the : character.
This is what I have so far, but I'm getting no results:
right(phone1, locate(':', phone1 + ':')-1) phone
If I use left instead of right, I get just "HOME" - just for testing purposes. I know I'm close, but I'm missing something.
You can use SUBSTRING (might be SUBSTR dependent on your version) instead:
SELECT SUBSTRING(phone1, LOCATE(':', phone1) + 1, LENGTH(phone1))
FROM yourtable
Here's a way to do it without hard-coding in Home:, so you can also use Office: or Mobile: or Fax:, or any other word followed by a colon.
This uses ADS's scripting ability to use a variable and the built-in System.iota single row table (similar to Oracle's dual). You can just use the last line, replacing test with the name of your column and system.iota with the name of your table.
declare test string;
set test = 'Home: (708) 296-2112';
select substring(test, position(':' in test) + 1, length(test)) from system.iota;
You were on the right track, but your algebra is off. You want to take the full length of the string offset by the position of the colon, minus one:
right(phone1, length(phone1) - locate(':', phone1) - 1)
You can use RIGHT function as follows:
RIGHT(phone1, LEN(phone1)-CHARINDEX(':', phone1))

SQL Server pull out only data after = OR only the numerics

It seems that a regular expression would be ideal, yet some team members are not fond of regex...
Problem: Data in a column (from a mainframe flat file import) looks like 2 different ways
BreakID = 83823737237
OR
MFR BreakID=482883
Thus, the differences are a space before numerics, length of both the alphacharacter before the equals varies and finally the length of the numbers will vary.
Seems I have a few approaches,
1. Everything after the = sign , and trim ?
2. regex , get only the numerics?
So I found this code, in which I assume PATINDEX is standard way of doing regex in -tsql ? what is "string" in this?
SELECT SUBSTRING(string, PATINDEX('%[0-9]%', string), PATINDEX('%[0-9][^0-9]%', string + 't') - PATINDEX('%[0-9]%',
string) + 1) AS Number
How would this be solved with best practices?
Slightly different answer than scsimon. I usually go this route when I have to grab the vals at the end of a string. You reverse the string and grab position of the first instance of your key value ('=' in this case). Get that position with charindex, and then grab the RIGHT() chars using that charindex value.
DECLARE #val1 VARCHAR(100) = 'BreakID = 83823737237'
DECLARE #val2 VARCHAR(100) = 'MFR BreakID=482883'
SELECT
LTRIM(RTRIM(RIGHT(#val1, CHARINDEX('=', REVERSE(#val1), 0)-1)))
,LTRIM(RTRIM(RIGHT(#val2, CHARINDEX('=', REVERSE(#val2), 0)-1)))
This solution will play nice if you have weird cases, like if you have a company called SQL=Cool in your data and it needs an ID:
'SQL=CoolID = 12345'
and you wanted to still get 12345.
Seems like a good use case for substring and replace with charindex
We take the substring from everything starting with the first value after the = up to 99 digits (or how ever many you want to enter). We use replace to get rid of the leading space, if there is one.
select replace(substring(stringColumn,charindex('=',stringColumn) + 1,99),' ','')
That solution is good and versatile, although it sounds like your string will always have an = so you could write something more specific around that if you want to.
That solution finds the start location of the first number string:
PATINDEX('%[0-9]%', string)
And finds the location of the first non-numeric character after that number string (adding a 't' to the end of the string, in case it ends in a number which would otherwise throw an error):
PATINDEX('%[0-9][^0-9]%', string + 't')
And finally it subtracts the start position of the number from the end position to find the length of the number string, and pulls that length out with substring:
SELECT SUBSTRING(string, PATINDEX('%[0-9]%', string), PATINDEX('%[0-9][^0-9]%', string + 't') - PATINDEX('%[0-9]%',
string) + 1) AS Number
Here "string" is a placeholder that should be replaced with your column name. Also, the easiest way to test stuff like this in tsql is to use a variable:
DECLARE #string varchar(100) = 'foo bar la la la 83823737237'
SELECT SUBSTRING(#string, PATINDEX('%[0-9]%', #string), PATINDEX('%[0-9][^0-9]%', #string + 't') - PATINDEX('%[0-9]%',
#string) + 1) AS Number
Output:
83823737237
Kaizen: go for the simple solution, not the perfect one
SELECT substring(c, charindex('=', c), 999)
I'm assuming the column you're putting this in is some kind of number. Sqlserver doesn't care about leading spaces when casting to a number
If it's going in a string column then wrap it in a ltrim()
Now to your questions
1 .. trim
Sure, as above
2 regex...
Not implemented in sqlserver unless you use CLR
PATINDEX ...
It's like regex but it's a very limited subset that only does searching, only returns one string index, doesn't capture, has limited/no character classes. It's more like dos/vb6 wildcards/like than regex
...best practice?
Look at it simply; you're getting the part of a string after an =, not landing on the moon. the best solution to minor optimisations like these is the one that requires the least amount of mental effort from the next human who takes over your job, to get up to speed with this (it'll still be being used in 20 years) :)

Using LIKE to Get Substrings

How can I use LIKE to extract substrings of a VARCHAR column?
ie. I have the following records:
1: "#D.Test1 some text"
2: "some other text #D.Test3"
3: "text #D.Test1 text"
4: "text #D.Test14 text"
Now I want to build a list of unique values matching a pattern.
SELECT DISTINCT DoSomethingToExpr(expr) AS output FROM tbl WHERE expr LIKE '%#%'
What do I replace DoSomethingToExpr(expr) with in order to extract these variable-length matches? I can write a more sophisticated pattern to match the full values, but where can I use that? I don't see any straightforward way to make the actual substring function work well here with each case. My desired output would be something like:
1: #D.Test1
2: #D.Test3
3: #D.Test14
I'm using both Oracle and MS-Access, so a solution that can be adapted to both is preferable.
It's not pretty, but here's a solution using basic string functions:
SELECT SUBSTRING(
expr,
CHARINDEX( '#', expr ),
ISNULL(
NULLIF(
CHARINDEX( ' ', expr, CHARINDEX( '#', expr ) ),
0 ),
LEN( expr ) )
- CHARINDEX( '#', expr ) + 1 )
AS output
FROM tbl
WHERE expr LIKE '%#%'
Of course, this has some downsides. It expects the #D.Test… string to be followed by either a space or the end of the string. If it can be followed by any other character, you'd have to tweak this. However, if you want to do anything more than complicated than this, you might be better off doing this in your application code rather than SQL.
Unfortunately you cannot use Regex in SQL. In SQL you can only test whether a text matches a pattern using wildcards; however, you cannot extract the matching substring in a simple way with these wildcards.
I suggest you to split the logic between the query and the front-end. The query would only test for the existence of "#" for instance, and the front-end would perform a more sophisticated processing on the returned records in VBA or whatever language you are using.