Using SQL Server 2016, I have a need to scrub white space a certain way and implement INITCAP.
The whitespace scrubber is simple. I'm having trouble getting the INITCAP replacement working properly.
The accepted answer to Initcap equivalent in mssql is wrong, as noted in the first comment.
My data contains values that have multiple spaces in a row and special characters, (&, %, etc.).
stuff(): In SQL Server 2016, string_split does not have an option to prove an ordinal value and does not guarantee that the results are returned in any specific order. So, I need to write code to ensure values are returned from split_string in the correct order.
convert(xml,...): Decodes most of the XML-encoded values.
convert(varchar(max),...): ...because XML can't be used when needing SELECT DISTINCT
SQL Fiddle
with T as (
select *
from (
values ('Minesota Mining and Manufacturing')
, ('Minesota Mining & Manufacturing ')
, (' tillamook')
, ('MUTUAL OF OMAHA')
, (' ')
) q(s)
),
scrubbed as (
select T.s as InitialValue
, CASE
WHEN LEN(RTRIM(T.s)) > 0
THEN LTRIM(RTRIM(T.s))
END as s
from T
)
select distinct s.InitialValue
, stuff(
(
SELECT ' ' + t2.word
from (
select str.value
, upper(substring(str.value, 1, 1)) +
case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
, charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
from string_split(s.s, ' ') str
) t2
order by t2.idx
FOR XML PATH('')
),
1,
1,
''
) as INITCAP_xml
, convert(
varchar(max),
convert(
xml,
stuff(
(
SELECT ' ' + t2.word
from (
select str.value
, upper(substring(str.value, 1, 1)) +
case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
, charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
from string_split(s.s, ' ') str
) t2
order by t2.idx
FOR XML PATH('')
),
1,
1,
''
)
)
) as INITCAP_decoded
from scrubbed s
You see in the output that using FOR XML causes some of the characters to be encoded (like [space] = $#x20; and & = &). By converting to XML data type, some of those characters are decoded. But some characters (like &) remain encoded.
InitialValue
INITCAP_attempt1
INITCAP_xml
INITCAP_decoded
Minesota Mining and Manufacturing
Minesota Mining And Manufacturing
Minesota Mining And Manufacturing
Minesota Mining And Manufacturing
Minesota Mining & Manufacturing
Minesota Mining & Manufacturing
Minesota Mining & Manufacturing
Minesota Mining & Manufacturing
tillamook
Tillamook
Tillamook
Tillamook
MUTUAL OF OMAHA
Mutual Of Omaha
Mutual Of Omaha
Mutual Of Omaha
null
null
null
REPLACE(s, '&', '&') doesn't seem like a reasonable option because I don't know what other values I'll run into over time. Is there a good, general way to handle characters that will be encoded by FOR XML?
Within a view (so, without using user defined functions or stored procedures), is there a better way to implement INITCAP in SQL Server?
If interested in a SVF, here is a scaled down version which allows customization and edge events. For example rather than Phd, you would get PhD ... MacDonald, O'Neil
This is a dramatically scaled down version.. My rules/exceptions are in a generic mapping table.
Example
select *
,[dbo].[svf-Str-Proper] (S)
from (
values ('Minesota Mining and Manufacturing')
, ('Minesota Mining & Manufacturing ')
, (' tillamook')
, ('MUTUAL OF OMAHA')
, (' ')
) q(s)
Results
s (No column name)
Minesota Mining and Manufacturing Minesota Mining And Manufacturing
Minesota Mining & Manufacturing Minesota Mining & Manufacturing
tillamook Tillamook
MUTUAL OF OMAHA Mutual Of Omaha
The Function if Iterested
CREATE FUNCTION [dbo].[svf-Str-Proper] (#S varchar(max))
Returns varchar(max)
As
Begin
Set #S = ' '+ltrim(rtrim(replace(replace(replace(lower(#S),' ','†‡'),'‡†',''),'†‡',' ')))+' '
;with cte1 as (Select * From (Values(' '),('-'),('/'),('\'),('['),('{'),('('),('.'),(','),('&'),(' Mc'),(' Mac'),(' O''') ) A(P))
,cte2 as (Select * From (Values('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('I'),('J'),('K'),('L'),('M')
,('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U'),('V'),('W'),('X'),('Y'),('Z')
,('LLC'),('PhD'),('MD'),('DDS'),('II'),('III'),('IV')
) A(S))
,cte3 as (Select F = Lower(A.P+B.S),T = A.P+B.S From cte1 A Cross Join cte2 B
Union All
Select F = Lower(B.S+A.P),T = B.S+A.P From cte1 A Cross Join cte2 B where A.P in ('&')
)
Select #S = replace(#S,F,T) From cte3
Return rtrim(ltrim(#S))
End
-- Syntax : Select [dbo].[svf-Str-Proper]('john cappelletti')
-- Select [dbo].[svf-Str-Proper]('james e. o''neil')
-- Select [dbo].[svf-Str-Proper]('CAPPELLETTI II,john old macdonald iv phd,dds llc b&o railroad bank-one at&t BD&I Bank-Five dr. Langdon,dds')
Please try the following solution.
It is using SQL Server XML, XQuery, and its FLWOR expression.
Notable points:
cast as xs:token? is taking care of the whitespaces, i.e:
All invisible TAB, Carriage Return, and Line Feed characters will be
replaced with spaces.
Then leading and trailing spaces are removed from the value.
Further, contiguous occurrences of more than one space will be replaced with a single space.
FLWOR expression is taking care of a proper case.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (tokens VARCHAR(MAX));
INSERT #tbl (tokens) VALUES
('mineSota Mining and MaNufacturing'),
('Minesota Mining & Manufacturing '),
(' tillamook'),
('MUTUAL OF OMAHA'),
(' ');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = SPACE(1);
SELECT t.*, scrubbed
, result = c.query('
for $x in /root/r/text()
return concat(upper-case(substring($x,1,1)),lower-case(substring($x,2,1000)))
').value('text()[1]', 'VARCHAR(MAX)')
FROM #tbl AS t
CROSS APPLY (SELECT TRY_CAST('<r><![CDATA[' + tokens + ' ' + ']]></r>' AS XML).value('(/r/text())[1] cast as xs:token?','VARCHAR(MAX)')) AS t1(scrubbed)
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(scrubbed, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t2(c);
Output
tokens
scrubbed
result
mineSota Mining and MaNufacturing
mineSota Mining and MaNufacturing
Minesota Mining And Manufacturing
Minesota Mining & Manufacturing
Minesota Mining & Manufacturing
Minesota Mining & Manufacturing
tillamook
tillamook
Tillamook
MUTUAL OF OMAHA
MUTUAL OF OMAHA
Mutual Of Omaha
NULL
You've made the classic SQL Server XML mistake, one cannot just use the PATH('').
You have to do the convuluted PATH(''), TYPE).value('.', 'NVARCHAR(MAX)') thing to get proper encoded characters.
Here's your fixed version:
with T as (
select *
from (
values ('Minesota Mining and Manufacturing')
, ('Minesota Mining & Manufacturing ')
, (' tillamook')
, ('MUTUAL OF OMAHA')
, (' ')
) q(s)
),
scrubbed as (
select T.s as InitialValue
, CASE
WHEN LEN(RTRIM(T.s)) > 0
THEN LTRIM(RTRIM(T.s))
END as s
from T
)
select distinct s.InitialValue
, stuff(
(
SELECT ' ' + t2.word
from (
select str.value
, upper(substring(str.value, 1, 1)) +
case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
, charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
from string_split(s.s, ' ') str
) t2
order by t2.idx
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)'),
1,
1,
''
) as INITCAP
from scrubbed s
Related
I have a stuff function that concatenates multiple records and I put a line break after every second record and its works fine with this query:
STUFF((
SELECT CASE WHEN ROW_NUMBER() OVER (order by new_name) % 2 = 1 THEN CHAR(10) ELSE ',' END + new_name
FROM new_subcatagories
FOR XML PATH('')), 1, 1, '')
and the result is
Auditory,Kinesthetic vestibular
Multitasking,Planning & organization
Proprioception,Tactile
Vestibular tactile,Visual
But I want now to make this with a other column that I need to DISTINCT and I can't get it work my query is:
STUFF((
SELECT distinct (CASE WHEN ROW_NUMBER() OVER (order by new_maincatgoriesname) % 2 = 1 THEN CHAR(10) ELSE ',' END
+ new_maincatgoriesname)
FOR XML PATH('')), 1, 1, '')
and I get the result is in multiple not expected ways for example
Executive Function
Sensory Discrimination
Sensory modulation ,Multitasking,Sensory Discrimination,Sensory modulation
or other not expected ways, and I want the result to be
Executive Function,Sensory Discrimination
Sensory modulation,Multitasking
If someone can help my it will be really appreciated.
DISTINCT applies to the entire row so having an extra column populated with unneeded data (such as ROW_NUMBER()) would give invalid results.
To fix it you need to add another query nesting level.
DECLARE #Blah TABLE( new_maincatgoriesname VARCHAR( 200 ))
INSERT INTO #Blah
VALUES( 'Executive Function' ), ( 'Sensory Discrimination' ), ( 'Multitasking' ),
( 'Sensory Discrimination' ), ( 'Executive Function' ), ( 'Sensory modulation' )
SELECT
STUFF( CAST((
-- Step 2: manipulate result of Step 1
SELECT (CASE WHEN ROW_NUMBER() OVER (order by new_maincatgoriesname) % 2 = 1 THEN CHAR(10) ELSE ',' END + new_maincatgoriesname )
FROM
-- Step 1: Get distinct values
( SELECT DISTINCT new_maincatgoriesname
FROM #Blah ) AS MainQuery
FOR XML PATH('') ) AS VARCHAR( 2000 )), 1, 1, '' )
Output:
Executive Function,Multitasking
Sensory Discrimination,Sensory modulation
My requirement is as follows: I need to extract all companies data, which has similar name in table (first 3 words should occur in the name,can be in middle) as the input company name.
My query is working fine for data where name has more than 3 words ,but for less or equal to 3 it's failing.
I didn't get how to incorporate conditions in where clause.
My query is as below
select regno,name from ereg
where
(name like '%' +(
SELECT SUBSTRING(name, 0, CHARINDEX(' ', name, CHARINDEX(' ', name, CHARINDEX(' ', name, 0)+1)+1)) matchingwrd
FROM ereg where regno='C2113-UPD01')+'%')
script is as below
CREATE TABLE ereg(
regnoINT, name VARCHAR(50)
);
INSERT INTO ereg (regno,name)
values
('C2113-UPD01','future company Ltd'),
('C2223-UPD01','MY future company Ltd Corp'),
('C2113-UPD01','Prime Private Furnishings housing Ltd '),
('C26903-UPD01','My Prime Private Furnishings Service '),
for example,its working fine for regno='C2113-UPD01' and gives output -->>'C26903-UPD01','My Prime Private Furnishings Service
but if input is 'C2113-UPD01' My query fails and not able to fetch 'C2223-UPD01' company data
Your table data insert script, the data and the script all seems to be not taken from a working version. I had to clean everything.
What I did to get the 3 words is adding another space in front of the name:
Still your query gave me trouble. But here is how I did that
;With cted as
(
Select regno, name,
SUBSTRING(name + ' ', 0, CHARINDEX(' ', name + ' ', CHARINDEX(' ', name + ' ', CHARINDEX(' ', name + ' ', 0)+1)+1)) as ThreeWords
from ereg
)
Select c1.regno, c1.name, c2.regno, c2.name
from cted c1
inner join cted c2 on c2.name like '%' + c1.ThreeWords + '%' and c1.regno <> c2.regno
Where c1.regno='C2213-UPD01' -- or c1.regno='C2113-UPD01'
Here is the fiddle
I want to perform multiple word on particular column. The given search string may be in different order. For example , I want to search the book name "Harry Potter Dream world "from books table using like operator and regular expression.
I know, using multiple like operator, we can perform operation using below query
SELECT *
FROM TABLE_1
WHERE bookname LIKE 'Harry Potter' OR LIKE 'Heaven world'
In this case, I want to perform this in a single query. Also I tried with FREETEXT options. That wont be useful when i use self-join. Kindly provide me any other alternatives to solve this.
Also can you provide , how to use regular expression to search multiple word in SQL Server. I tried with multiple options. It won't work for me.
How about this one...
DECLARE #phrase nvarchar(max) = 'Harry Potter Dream world'
;WITH words AS (
SELECT word = y.i.value('(./text())[1]', 'nvarchar(4000)')
FROM (
SELECT x =
CONVERT(XML, '<i>'
+ REPLACE(#phrase, ' ', '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
)
SELECT *
FROM
TABLE_1
CROSS APPLY (
SELECT found = 1
FROM words
WHERE bookname like '%' + word + '%') search
Searching with LIKE could lead to very many hits, especially if you deal with a search string containing "the" or "a"...
The following code will first split your search string into its words, then split the book's names into the words and check for full word hits
DECLARE #tbl TABLE(ID INT, BookName VARCHAR(100));
INSERT INTO #tbl VALUES
(1,'Harry Potter')
,(2,'Dream world')
,(3,'A Midsumme Night''s Dream')
,(4,'Some other Book') --will not be found
,(5,'World of Warcraft');
DECLARE #phrase nvarchar(max) = 'Harry Potter o Dream world'
;WITH words AS (
SELECT word = z.i.value('.', 'nvarchar(max)')
FROM (SELECT CAST('<i>' + REPLACE(#phrase, ' ', '</i><i>') + '</i>' AS XML)) AS x(y)
CROSS APPLY x.y.nodes('/i') AS z(i)
)
SELECT *
FROM #tbl AS tbl
WHERE EXISTS
(
SELECT 1
FROM
(
SELECT z.i.value('.', 'nvarchar(max)')
FROM (SELECT CAST('<i>' + REPLACE(tbl.BookName, ' ', '</i><i>') + '</i>' AS XML)) AS x(y)
CROSS APPLY x.y.nodes('/i') AS z(i)
) AS checkWords(word)
WHERE EXISTS(SELECT 1 FROM words WHERE words.word=checkWords.word)
)
I want to convert numbers to 3 decimal places for each number between the character ".". For example:
1.1.5.2 -> 001.001.005.002
1.2 -> 001.002
4.0 -> 004.000
4.3 ->004.003
4.10 -> 004.010
This is my query:
SELECT ItemNo
FROM EstAsmTemp
This is fairly easy once you understand all the steps:
Split the string into the individual data points.
Convert the parsed values into the format you want.
Shove the new values back into a delimited list.
Ideally you shouldn't store data with multiple datapoints in a single intersection like this but sometimes you just have no choice.
I am using the string splitter from Jeff Moden and the community at Sql Server Central which can be found here. http://www.sqlservercentral.com/articles/Tally+Table/72993/. There are plenty of other decent string splitters out there. Here are some excellent examples of other options. http://sqlperformance.com/2012/07/t-sql-queries/split-strings.
Make sure you understand this code before you use it in your production system because it will be you that gets the phone call at 3am asking for it to be fixed.
with something(SomeValue) as
(
select '1.1.5.2' union all
select '1.2' union all
select '4.0' union all
select '4.3' union all
select '4.10'
)
, parsedValues as
(
select SomeValue
, right('000' + CAST(x.Item as varchar(3)), 3) as NewValue
, x.ItemNumber as SortOrder
from something s
cross apply dbo.DelimitedSplit8K(SomeValue, '.') x
)
select SomeValue
, STUFF((Select '.' + NewValue
from parsedValues pv2
where pv2.SomeValue = pv.SomeValue
order by pv2.SortOrder
FOR XML PATH('')), 1, 1, '') as Details
from parsedValues pv
group by pv.SomeValue
I decided to change it in the presentation layer, per Zohar Peled's comment.
You did not mention the number of '.' separator a column can have. I assume, the max is 4 and the solution is below.
SELECT STUFF(ISNULL('.' + RIGHT('000' + PARSENAME(STRVALUE,4),4),'') + ISNULL('.' + RIGHT('000' + PARSENAME(STRVALUE,3),4) ,'') + ISNULL('.' + RIGHT('000' + PARSENAME(STRVALUE,2),4) ,'') + ISNULL('.' + RIGHT('000' + PARSENAME(STRVALUE,1),4),''),1,1,'')
FROM (VALUES('1.1.5.2'), ('1.2'), ('4.0'),('4.3'), ('4.10')) A (STRVALUE)
I have a table holding IDs in one column and a string in the second column like below.
COLUMN01 COLUMN02
----------------------------------------------------------------------------------
1 abc"11444,12,13"efg"14,15"hij"16,17,18,19"opqr
2 ahsdhg"21,22,23"ghshds"24,25"fgh"26,27,28,28"shgshsg
3 xvd"3142,32,33"hty"34,35"okli"36,37,38,39"adfd
Now I want to have the following result
COLUMN01 COLUMN02
-----------------------------------------------------------
1 11444,12,13,14,15,16,17,18,19
2 21,22,23,24,25,26,27,28,28
3 3142,32,33,34,35,36,37,38,39
How can I do that?
Thanks so much
Here is one way (maybe not the best, but it seems to work). I am NOT a SQL guru...
First, create this SQL Function. It came from: Extract numbers from a text in SQL Server
create function [dbo].[GetNumbersFromText](#String varchar(2000))
returns table as return
(
with C as
(
select cast(substring(S.Value, S1.Pos, S2.L) as int) as Number,
stuff(s.Value, 1, S1.Pos + S2.L, '') as Value
from (select #String+' ') as S(Value)
cross apply (select patindex('%[0-9]%', S.Value)) as S1(Pos)
cross apply (select patindex('%[^0-9]%', stuff(S.Value, 1, S1.Pos, ''))) as S2(L)
union all
select cast(substring(S.Value, S1.Pos, S2.L) as int),
stuff(S.Value, 1, S1.Pos + S2.L, '')
from C as S
cross apply (select patindex('%[0-9]%', S.Value)) as S1(Pos)
cross apply (select patindex('%[^0-9]%', stuff(S.Value, 1, S1.Pos, ''))) as S2(L)
where patindex('%[0-9]%', S.Value) > 0
)
select Number
from C
)
Then, you can do something like this to get the results you were asking for. Note that I broke the query up into 3 parts for clarity. And, obviously, you don't need to declare the table variable and insert data into it.
DECLARE #tbl
TABLE (
COLUMN01 int,
COLUMN02 varchar(max)
)
INSERT INTO #tbl VALUES (1, 'abc"11444,12,13"efg"14,15"hij"16,17,18,19"opqr')
INSERT INTO #tbl VALUES (2, 'ahsdhg"21,22,23"ghshds"24,25"fgh"26,27,28,28"shgshsg')
INSERT INTO #tbl VALUES (3, 'xvd"3142,32,33"hty"34,35"okli"36,37,38,39"adfd')
SELECT COLUMN01, SUBSTRING(COLUMN02, 2, LEN(COLUMN02) - 1) as COLUMN02 FROM
(
SELECT COLUMN01, REPLACE(COLUMN02, ' ', '') as COLUMN02 FROM
(
SELECT COLUMN01, (select ',' + number as 'data()' from dbo.GetNumbersFromText(Column02) for xml path('')) as COLUMN02 FROM #tbl
) t
) tt
GO
output:
COLUMN01 COLUMN02
1 11444,12,13,14,15,16,17,18,19
2 21,22,23,24,25,26,27,28,28
3 3142,32,33,34,35,36,37,38,39
I know you want to do it using SQL. But ones I had nearly the same problem and getting this data to a string using a php or another language, than parsing is a way to do it. For example, you can use this kind of code after receiving the data into a string.
function clean($string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
return preg_replace('/-+/', '-', $string); // Replaces multiple hyphens with single one.
}
For more information you might want to look at this post that I retrieved the function: Remove all special characters from a string
As I said this is an easy way to do it, I hope this could help.