Extract REGEXP Substring SQL Server

Extract REGEXP Substring SQL Server - sql

I want to extract a specific number value from this query...
Column: name
Four rows for this example:
1/2 Product1 some_description 1200 GR more data... (UND)
1/4 Product2 some_description 2400G more data (KG more data)
Product3 some_description 2200 GRS more data...
1/4 Product4 some_description 1800GR more data UND...
I want the integer value only. I want with the query:
1200
2400
2200
1800
The patterns are:
[0-9]{4} G
[0-9]{4}GR
[0-9]{4} GRS
How can i use this regexp on a SQL Query to parse an attribute value?
SELECT FROM myTable
SUBSTRING(name, (PATINDEX('%[0-9]%', [name])),4) as peso
This extract some values, but not in correct order... I think that i can apply LEFT with length until integer value, but i don't know how resolve it.

You are close. In SQL Server, the simplest method for your data is:
select left(name, 4) as peso
from mytable t;
That is, the first four characters seem to be what you want.
If the first four characters may not all be digits, then you want to use patindex():
select left(name, patindex('%[^0-9]%', name + ' ') - 1) as peso
from myTable t;

CONVERT(
INT,
(REPLACE(
SUBSTRING(
nome,
(
PATINDEX(
'%[0-9]%',
REPLACE(
REPLACE(
nome,
'1/2',
''
),
'1/4',
''
)
)
),
4
),
'G',
''
))) as pesoTotal
This resolve the question, thanks.

Should be something like SUBSTRING(name, PATINDEX('%[0-9][0-9][0-9][0-9][G ][GR]%', name),4), if they always do use the same pattern. Will get some false positives ('1400 Rhinos', '3216GGG' and the like).

Related

Formatting Phone Number to US Format (###) ###-####

I am trying to reformat around 1000 phone numbers in a SQL Server database to US format (###) ###-####
Currently the phone numbers are formatted in all sorts of ways ranging from ##########, ###-###-####, one is ###)-###-####. There is also one with only six digits.
As a first step I've been attempting to isolate the numbers in all of these rows but its just returning the same as they were already.
select SUBSTRING(phone, PATINDEX('%[0-9]%', phone), LEN(phone)) from people
How could I best go about writing a query which would format them all as (###) ###-####?
expected output:
(555) 222-3333
(555) 444-3030
(555) 092-0920
(555) 444-4444

Since one suggestion was made already and the suggestion there to isolate numbers in a string uses a while loop I need to post an alternative to that which doesn't use any looping. Instead it utilizes a tally or numbers table. There are lots of solutions for those. I like to use a view which is lightning fast and has zero reads.
Here is my version of a tally table.
create View [dbo].[cteTally] as
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select N from cteTally
Next we need a table valued function to remove the characters that are not numbers using our tally table. This is also super fast because we are using our tally table instead of looping.
create function GetOnlyNumbers
(
#SearchVal varchar(8000)
) returns table as return
with MyValues as
(
select substring(#SearchVal, N, 1) as number
, t.N
from cteTally t
where N <= len(#SearchVal)
and substring(#SearchVal, N, 1) like '[0-9]'
)
select distinct NumValue = STUFF((select number + ''
from MyValues mv2
order by mv2.N
for xml path('')), 1, 0, '')
from MyValues mv
Now that we have all the legwork done we can focus on the task at hand. Since you didn't provide any sample data I just made up some stuff. I am not really sure if this is representative of your data or not but this works on the sample data I created.
if OBJECT_ID('tempdb..#Something') is not null
drop table #Something
create table #Something(SomeVal varchar(100))
insert #Something values
('Maybe you have other stuff in here. 5552223333 additional characters can cause grief')
, ('321-654-9878')
, ('123)-333-4444')
, ('1234567')
select replace(format(try_convert(bigint, n.NumValue), '(###) ###-####'), '() ', '')
, n.NumValue
from #Something s
cross apply dbo.GetOnlyNumbers(s.SomeVal) n
The output for the formatted data looks like this:
(555) 222-3333
(321) 654-9878
(123) 333-4444
123-4567

If this reformatting something that is going to be used repeatedly then a creating a UDF as suggested by #GSerg would be the way to go.
If this is just a one time clean-up you could give this a try.
First replace all of the numbers with empty strings with a series of nested REPLACE() functions.
DECLARE #PhoneNumbers TABLE (
Number varchar (20))
INSERT INTO #PhoneNumbers VALUES ('(888-239/1239')
INSERT INTO #PhoneNumbers VALUES ('222.1234')
SELECT
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(Number, '0', '')
, '1', '')
, '2', '')
, '3', '')
, '4', '')
, '5', '')
, '6', '')
, '7', '')
, '8', '')
, '9', '')
FROM #PhoneNumbers
Then take those result non-numeric characters and put them each in their own nested REPLACE() function and format the result. You will have to deal with each length individually. If you have only 7 digits and you want to format it to have 10 digits what do you want those extra 3 digits to be. This will handle the 10 digit phone numbers.
SELECT FORMAT(x.NumbersOnly, '(###) ###-####')
FROM
(
SELECT
CONVERT(BIGINT,
REPLACE(
REPLACE(
REPLACE(
REPLACE(Number, '(', '')
, '-', '')
, '/', '')
, '.', '')
) AS NumbersOnly
FROM #PhoneNumbers
) x
WHERE LEN(x.NumbersOnly) = 10
Here is the dbfiddle.

Remove dot(.) from amout value using sqlite query

I want to remove dots (.) before decimal places in my amount value. Below is my database table
Query
SELECT sum( REPLACE( REPLACE( amt, ',', '' ) , ' ', '' ) ) FROM amt_demo
when I run the above query I get the following output.
Output
But I want the total sum of the values like:
1333.00
100000.50
100000.00
123456789
123456789
123456789
---------------
370571700.50
Any idea how can I solve this?

Use Maths CEIL function
SELECT CEIL(sum( REPLACE( REPLACE( amt, ',', '' ) , ' ', '' ) ))

Removing one word in a string (or between two white spaces)

I have this:
Dr. LeBron Jordan
John Bon Jovi
I would like this:
Dr. Jordan
John Jovi
How do I come about it? I think it's regexp_replace.
Thanks for looking.
Any help is much appreciated.

Here's a way using regexp_replace as you mentioned, using several forms of a name for testing. More powerful than nested SUBSTR(), INSTR() but you need to get your head around regular expressions, which will allow you way more pattern matching power for more complex patterns once you learn it:
with tbl as (
select 'Dr. LeBron Jordan' data from dual
union
select 'John Bon Jovi' data from dual
union
select 'Yogi Bear' data from dual
union
select 'Madonna' data from dual
union
select 'Mr. Henry Cabot Henhouse' data from dual )
select regexp_replace(data, '^([^ ]*) .* ([^ ]*)$', '\1 \2') corrected_string from tbl;
CORRECTED_STRING
----------------
Dr. Jordan
John Jovi
Madonna
Mr. Henhouse
Yogi Bear
The regex can be read as:
^ At the start of the string (anchor the pattern to the start)
( Start remembered group 1
[^ ]* Zero or more characters that are not a space
) End remembered group 1
space Where followed by a literal space
. Followed by any character
* Followed by any number of the previous any character
space Followed by another literal space
( Start remembered group 2
[^ ]* Zero or more characters that are not a space
) End remembered group 2
$ Where it occurs at the end of the line (anchored to the end)
Then the '\1 \2' means return remembered group 1, followed by a space, followed by remembered group 2.
If the pattern cannot be found, the original string is returned. This can be seen by surrounding the returned groups with square brackets and running again:
...
select regexp_replace(data, '^([^ ]*) .* ([^ ]*)$', '[\1] [\2]')
corrected_string from tbl;
CORRECTED_STRING
[Dr.] [Jordan]
[John] [Jovi]
Madonna
[Mr.] [Henhouse]
Yogi Bear

If it is only two words, it will return that. ("Lebron Jordan" will return "Lebron Jordan")
If it is three words, it will take out the middle word ("Dr. LeBron Jordan" will return "Dr. Jordan")
DECLARE #firstSpace int = 0
DECLARE #secondSpace int = 0
DECLARE #string nvarchar(50) = 'Dr. Lebron Jordan'
SELECT #string = LTRIM(RTRIM(#string))
SELECT #firstSpace = CHARINDEX(' ', #string, 0)
SELECT #secondSpace = CHARINDEX(' ', #string, #firstSpace + 1)
IF #secondSpace = 0
BEGIN
SELECT #string
END
ELSE
BEGIN
SELECT SUBSTRING(#string, 0, #firstSpace) + SUBSTRING(#string, #secondSpace, (LEN(#string) - #secondSpace) + 1)
END

Try below single statement in SQL Server:
declare #fullname varchar(200)
select #fullname='John Bon Jovi'
select substring(#fullname,1,charindex(' ',#fullname,1)) + substring(#fullname, charindex(' ',#fullname,charindex(' ',#fullname,1)+1)+1, len(#fullname) - charindex(' ',#fullname,charindex(' ',#fullname,1)))
Try below statement in Oracle
select substr(name,1,INSTR(name,' ', 1, 1))
|| substr(name, INSTR(name,' ', 1, 2)+1,length(name) - INSTR(name,' ', 1, 2))from temp
I tried same example, please refer fiddle link : http://sqlfiddle.com/#!4/74986/31

Formatting a number as a monetary value including separators

I need some help with a sql transformation. This part of query that I have been provided with:
'$' + replace(cast((CAST(p.Price1 AS decimal(10,2)) * cast(isnull(p.Multiplier,1) as decimal(10,2))) as varchar), '.0000', '')
Basically, it ends up being a varchar that looks like this: $26980
I need to insert a comma at the thousand and million mark (if applicable). So in this instance, $26,980
What's the easiest way to do that without having to rewrite the whole thing?

Do it on the client side. Having said that, this example should show you the way.
with p(price1, multiplier) as (select 1234.5, 10)
select '$' + replace(cast((CAST(p.Price1 AS decimal(10,2)) * cast(isnull(p.Multiplier,1) as decimal(10,2))) as varchar), '.0000', ''),
'$' + parsename(convert(varchar,cast(p.price1*isnull(p.Multiplier,1) as money),1),2)
from p
The key is in the last expression
'$' + parsename(convert(varchar,cast(p.price1*isnull(p.Multiplier,1) as money),1),2)
Note: if p.price1 is of a higher precision than decimal(10,2), then you may have to cast it in the expression as well to produce a faithful translation since the original CAST(p.Priced1 as decimal(10,2)) will be performing rounding.

If you really must do it in TSQL you can use CONVERT(), but this sort of thing really doesn't belong in the database:
declare #m money = 12345678
-- with decimal places
select '$' + convert(varchar, #m, 1)
-- without decimal places
select '$' + replace(convert(varchar, #m, 1), '.00', '')

You could turn this into a function, it only goes 50 characters back.
DECLARE #input VARCHAR(50)
SELECT #input = '123123123.00'
SELECT #input = CASE WHEN CHARINDEX('.', #input) > offset +1
THEN STUFF(#input, CHARINDEX('.', #input) - offset, 0, ',')
ELSE #input END
FROM (SELECT 3 offset UNION SELECT 7 UNION SELECT 12 UNION SELECT 18 UNION SELECT 25 UNION SELECT 33 UNION SELECT 42) b
PRINT #input
The offset grows by +1 for each position, because it's assuming you've already inserted the commas for the previous positions.

Interesting sql split with two chars

I need to write function for split string with two chars.
Ex:"Hyderabad Hyd,Test"
In the above string i need to spit with space(" ") and comma(,) and the out result will keep in a table
The oputput should be:
Hyderabad
Hyd,Test
Hyd
Test
CREATE function dbo.SplitString
(
#str nvarchar(4000),
#separator char(1)
)
returns table
AS
return (
with tokens(p, a, b) AS (
select
1,
1,
charindex(#separator, #str)
union all
select
p + 1,
b + 1,
charindex(#separator, #str, b + 1)
from tokens
where b > 0
)
select
p-1 SNO,
substring(
#str,
a,
case when b > 0 then b-a ELSE 4000 end)
AS word
from tokens
)
Plz do help.....
Thanks in advance..

For the results you showed, you don't need a new split function. Just a normal one that takes a list and a separator.
SELECT
second_split.*
FROM
dbo.fn_split(#myList, ' ') AS first_split
CROSS APPLY
(
SELECT first_split.item
UNION
SELECT item FROM dbo.fn_split(first_split.item, ',')
)
AS second_split
The first_split will be Hyderabad and Hyd,Test.
The second split will be...
- Hyderabad UNION Hyderabad which is just Hyderabad
- Plus Hyd,Test UNION Hyd and Test
Giving...
Hyderabad
Hyd,Test
Hyd
Test

There are several ways to do this. You might want to explore creating a SQL CLR as it may be faster and easier to do the splits you are looking for.
http://msdn.microsoft.com/en-us/library/ms254498(v=vs.100).aspx
Here's a blog post that may help as well.
http://dataeducation.com/faster-more-scalable-sqlclr-string-splitting/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Extract REGEXP Substring SQL Server - sql

CONVERT( INT, (REPLACE( SUBSTRING( nome, ( PATINDEX( '%[0-9]%', REPLACE( REPLACE( nome, '1/2', '' ), '1/4', '' ) ) ), 4 ), 'G', '' ))) as pesoTotal This resolve the question, thanks.

Should be something like SUBSTRING(name, PATINDEX('%[0-9][0-9][0-9][0-9][G ][GR]%', name),4), if they always do use the same pattern. Will get some false positives ('1400 Rhinos', '3216GGG' and the like).

Related

Formatting Phone Number to US Format (###) ###-####

Remove dot(.) from amout value using sqlite query

Removing one word in a string (or between two white spaces)

Formatting a number as a monetary value including separators

Interesting sql split with two chars

Categories

Resources