How to split names into first/middle/last if there are people who typed last/first/middle? Order known - sql

I am trying to split names into first, middle, last, based on an indicated order. I am at a lost on how to do this, and any help would be much appreciated. I am using sql server 2008 for work.
I attached an example dataset and the ideal dataset I would like to create.
ID ORDER NAME
1 first, middle, last Bruce, Batman, Wayne
2 middle, last, first Superman, Kent, Clark
3 last, first, middle Prince, Diana, Wonderwoman
INTO:
ID ORDER NAME
1 first Bruce
1 middle Batman
1 last Wayne
2 middle Superman
2 last Kent
2 first Clark
3 last Prince
3 first Diana
3 middle Wonderwoman

SQL Server does not have very good string processing functions. You can do this using a recursive CTE, though:
with cte as (
select id,
convert(varchar(max), left(ord, charindex(',', ord) - 1)) as ord,
convert(varchar(max), left(name, charindex(',', name) - 1)) as name,
convert(varchar(max), stuff(ord, 1, charindex(',', ord) + 1, '')) as ord_rest,
convert(varchar(max), stuff(name, 1, charindex(',', name) + 1, '')) as name_rest,
1 as lev
from t
union all
select id,
convert(varchar(max), left(ord_rest, charindex(',', ord_rest + ',') - 1)) as ord,
convert(varchar(max), left(name_rest, charindex(',', name_rest + ',') - 1)) as name,
convert(varchar(max), stuff(ord_rest, 1, charindex(',', ord_rest + ',') + 1, '')) as ord_rest,
convert(varchar(max), stuff(name_rest, 1, charindex(',', name_rest + ',') + 1, '')) as name_rest,
lev + 1
from cte
where ord_rest <> '' and lev < 10
)
select id, ord, name
from cte
order by id, lev
Here is a db<>fiddle.

With the help of a parse/split function that returns the sequence, this becomes a small matter using a CROSS APPLY
Example
Select A.ID
,B.*
From YourTable A
Cross Apply (
Select [Order] = B1.RetVal
,[Name] = B2.RetVal
From [dbo].[tvf-Str-Parse]([ORDER],',') B1
Join [dbo].[tvf-Str-Parse]([NAME] ,',') B2 on B1.RetSeq=B2.RetSeq
) B
Returns
ID Order Name
1 first Bruce
1 middle Batman
1 last Wayne
2 middle Superman
2 last Kent
2 first Clark
3 last Prince
3 first Diana
3 middle Wonderwoman
The Function if Interested
CREATE FUNCTION [dbo].[tvf-Str-Parse] (#String varchar(max),#Delimiter varchar(10))
Returns Table
As
Return (
Select RetSeq = row_number() over (order by 1/0)
,RetVal = ltrim(rtrim(B.i.value('(./text())[1]', 'varchar(max)')))
From ( values (cast('<x>' + replace((Select replace(#String,#Delimiter,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml).query('.'))) as A(x)
Cross Apply x.nodes('x') AS B(i)
);

I found the other answers a bit hard to follow - they're neat tricks for sure but I think anyone coming to maintain them might be like "whaaaat?". Here I work out the indexes of the commas (the first comma string index goes in o1/n1, the second comma goes in o2/n2) in the first cte, cut the string up (substring between 1 and first comma, substring between first and second comma, substring after third comma) in the second cte and then use a couple of unions to turn the results from 7 columns into 3
WITH idxs AS
(
SELECT
id,
order,
name,
CHARINDEX(',', [order]) as o1,
CHARINDEX(',', [order], CHARINDEX(',', [order]) + 1) as o2,
CHARINDEX(',', name) as n1,
CHARINDEX(',', name, CHARINDEX(',', name) + 1) as n2
FROM
t
),
cuts as (
SELECT
id,
SUBSTRING([order], 1, o1-1) as ord1,
SUBSTRING([order], o1+1, o2-o1-1) as ord2,
SUBSTRING([order], o2+1, 4000) as ord3,
SUBSTRING(name, 1, n1-1) as nam1,
SUBSTRING(name, n1+1, n2-n1-1) as nam2,
SUBSTRING(name, n2+1, 4000) as nam3
FROM
idxs
)
SELECT id, ord1 as [order], nam1 as name FROM cuts
UNION ALL
SELECT id, ord2, nam2 FROM cuts
UNION ALL
SELECT id, ord3, nam3 FROM cuts
Note that if your data sometimes has spaces in and sometimes does not you'll benefit from using either LTRIM/RTRIM in the output
if the spaces are always there after a comma, you could also adjust the substring indexes to cut the spaces out (any start index that is x+1 would be x+2 and the length would hence have to be -2)

Related

Extracting numbers separately from string column

I have a table stat that is VARCHAR type and would like to grab all the numbers within it individually. For example, if a specific record in the column has the value 12 to 24 MONTHS Fl then I would like to grab 12 and 24 separately. I've seen other posts where the numbers end up grouped together and would be 1224 in this case, but how could I do the aforementioned separating of the numbers? Given that I do not know the number of digits in each of the numbers, I was wondering how best to do this. Thanks
For an example like 12 to 24 months APY1.8semi the output would need to be 12,24 and 1.8, but to be clear there are only whole numbers and there aren't any . characters in the column.
I shamelessly copied this answer from another post but made a small modification to preserve your spaces. This one is essentially replacing letters with the # symbol, then replacing the # symbol.
select id, REPLACE(TRANSLATE([comments], 'abcdefghijklmnopqrstuvwxyz+()- ,#+', '##################################'), '#', ' ')
from my_data
id
(No column name)
1
12 24
2
12 24 1.8
fiddle
Or if you would prefer results as a tall table, then you could apply the string_split function.
select id, value from (
select id, ca.value
from my_data
cross apply string_split (REPLACE(TRANSLATE([comments], 'abcdefghijklmnopqrstuvwxyz+()- ,#+', '##################################'), '#', ','), ',')ca
)z
where value <> ''
id
value
1
12
1
24
2
12
2
24
2
1.8
fiddle
First we create this function.
create function [dbo].[GetNumbersFromText](#String varchar(2000))
returns table as return
(
with C as
(
select cast(substring(S.Value, S1.Pos, S2.L) as decimal(10,2)) as Number,
stuff(s.Value, 1, S1.Pos + S2.L, '') as Value
from (select #String+' ') as S(Value)
cross apply (select patindex('%[0-9.]%', S.Value)) as S1(Pos)
cross apply (select patindex('%[^0-9.]%', stuff(S.Value, 1, S1.Pos, ''))) as S2(L)
union all
select cast(substring(S.Value, S1.Pos, S2.L) as decimal(10,2)),
stuff(S.Value, 1, S1.Pos + S2.L, '')
from C as S
cross apply (select patindex('%[0-9.]%', S.Value)) as S1(Pos)
cross apply (select patindex('%[^0-9.]%', stuff(S.Value, 1, S1.Pos, ''))) as S2(L)
where patindex('%[0-9.]%', S.Value) > 0
)
select number
from C
)
Then we use it in our query.
select string_agg(number, ', ') as result from GetNumbersFromText('12 to 24 months APY1.8semi')
result
12.00, 24.00, 1.80
Fiddle

SQL Server : to get last 4 character in first column and get the first letter of the words in 2nd column but ignore non alphabets

Can I check will it be possible to run SQL with this requirement? I trying to get a new value for new column from these 2 existing columns ID and Description.
For ID, simply retrieve last 4 characters
For Description, would like to get the first alphabets for each word but ignore the numbers & symbols.
SQL Server has lousy string processing capabilities. Even split_string() doesn't preserve the order of the words that it finds.
One approach to this uses a recursive CTE to split the strings and accumulate the initials:
with t as (
select v.*
from (values (2004120, 'soccer field 2010'), (2004121, 'ruby field')) v(id, description)
),
cte as (
select id, description, convert(varchar(max), left(description, charindex(' ', description + ' '))) as word,
convert(varchar(max), stuff(description, 1, charindex(' ', description + ' ') , '')) as rest,
1 as lev,
(case when description like '[a-zA-Z]%' then convert(varchar(max), left(description, 1)) else '' end) as inits
from t
union all
select id, description, convert(varchar(max), left(rest, charindex(' ', rest + ' '))) as word,
convert(varchar(max), stuff(rest, 1, charindex(' ', rest + ' ') , '')) as rest,
lev + 1,
(case when rest like '[a-zA-Z]%' then convert(varchar(max), inits + left(rest, 1)) else inits end) as inits
from cte
where rest > ''
)
select id, description, inits + right(id, 4)
from (select cte.*, max(lev) over (partition by id) as max_lev
from cte
) cte
where lev = max_lev;
Here is a db<>fiddle.
To get the last 4 numbers of the ID you could use:
SELECT Id%10000 as New_Id from Tablename;
To get the starting of each Word you could use(letting the answer be String2):
LEFT(Description,1)
This is equivalent to using SUBSTRING(Description,1,1)
This helps you get the first letter of each word.
To concatenate both of them you could use the CONCAT function:
SELECT CONCAT(String2,New_Id)
See more on the CONCAT function here

How To Split Separate Strings in 2 Different Columns in SQL Server

I have 2 columns of pipe delimited data that I need to break out into rows but the columns must stay together. Here's what my data looks like:
Plan Name: ABC|DEF|GHI|JKL
Plan Type: HMO|POS|HMO|PPO
I need to end up with 4 rows that look like this:
1 - ABC HMO
2 - DEF POS
3 - GHI HMO
4 - JKL PPO
I know how to separate each column individually using the STUFF function but how do I keep the first value from column 1 with the first value from column 2, etc? Don't know where to start. Appreciate any help!
p.s. - I am not on SQL Server 2016 so can't use STRING_SPLIT
One method is a recursive CTE:
with t as (
select *
from (values ('ABC|DEF|GHI|JKL', 'HMO|POS|HMO|PPO')) v(plannames, plantypes)
),
cte as (
select convert(varchar(max), left(plannames, charindex('|', plannames + '|') - 1)) as planname,
convert(varchar(max), left(plantypes, charindex('|', plantypes + '|') - 1)) as plantype,
convert(varchar(max), stuff(plannames, 1, charindex('|', plannames + '|'), '')) as planname_rest,
convert(varchar(max), stuff(plantypes, 1, charindex('|', plantypes + '|'), '')) as plantype_rest,
1 as lev
from t
union all
select convert(varchar(max), left(planname_rest, charindex('|', planname_rest + '|') - 1)) as planname,
convert(varchar(max), left(plantype_rest, charindex('|', plantype_rest + '|') - 1)) as plantype,
convert(varchar(max), stuff(planname_rest, 1, charindex('|', planname_rest + '|'), '')) as planname_rest,
convert(varchar(max), stuff(plantype_rest, 1, charindex('|', plantype_rest + '|'), '')) as plantype_rest,
lev + 1
from cte
where planname_rest <> ''
)
select *
from cte;
Here is a db<>fiddle.
Using delimitedsplit8k_lead you could do:
SELECT CONVERT(varchar(3), itemnumber) + ' - ' + PN.item + ' ' + PT.item
FROM YourTable YT
CROSS APPLY dbo.delimitedsplit8k_lead(YT.PlanName,'|') PN
CROSS APPLY dbo.delimitedsplit8k_lead(YT.PlanType,'|') PT
WHERE PN.ItemNumber = PT.ItemNumber;
This assumes PlanName and PlanType have the same number of elements.

How to get middle portion from Sql server table data?

I am trying to get First name from employee table, in employee table full_name is like this: Dow, Mike P.
I tried with to get first name using below syntax but it comes with Middle initial - how to remove middle initial from first name if any. because not all name contain middle initial value.
-- query--
select Employee_First_Name as full_name,
SUBSTRING(
Employee_First_Name,
CHARINDEX(',', Employee_First_Name) + 1,
len(Employee_First_Name)) AS FirstName
---> remove middle initial from right side from employee
-- result
Full_name Firstname Dow,Mike P. Mike P.
--few example for Full_name data---
smith,joe j. --->joe (need result as)
smith,alan ---->alan (need result as)
Instead of specifying the len you need to use charindex again, but specify that you want the second occurrence of a space.
select Employee_First_Name as full_name,
SUBSTRING(
Employee_First_Name,
CHARINDEX(',', Employee_First_Name) + 1,
CHARINDEX(' ', Employee_First_Name, 2)) AS FirstName
One thing to note, the second charindex can return 0 if there is no second occurence. In that case, you would want to use something like the following:
select Employee_First_Name as full_name,
SUBSTRING(
Employee_First_Name,
CHARINDEX(',', Employee_First_Name) + 1,
IIF(CHARINDEX(' ', Employee_First_Name, 2) = 0, Len(Employee_First_name), CHARINDEX(' ', Employee_First_Name, 2))) AS FirstName
This removes the portion before the comma.. then uses that string and removes everything after space.
WITH cte AS (
SELECT *
FROM (VALUES('smith,joe j.'),('smith,alan'),('joe smith')) t(fullname)
)
SELECT
SUBSTRING(
LTRIM(SUBSTRING(fullname,CHARINDEX(',',fullname) + 1,LEN(fullname))),
0,
COALESCE(NULLIF(CHARINDEX(' ',LTRIM(SUBSTRING(fullname,CHARINDEX(',',fullname) + 1,LEN(fullname)))),0),LEN(fullname)))
FROM cte
output
------
joe
alan
joe
To be honest, this is most easily expressed using multiple levels of logic. One way is using outer apply:
select ttt.firstname
from t outer apply
(select substring(t.full_name, charindex(', ', t.full_name) + 2, len(t.full_name) as firstmi
) tt outer apply
(select (case when tt.firstmi like '% %'
then left(tt.firstmi, charindex(' ', tt.firstmi)
else tt.firstmi
end) as firstname
) as ttt
If you want to put this all in one complicated statement, I would suggest a computed column:
alter table t
add firstname as (stuff((case when full_name like '%, % %.',
then left(full_name,
charindex(' ', full_name, charindex(', ', full_name) + 2)
)
else full_name
end),
1,
charindex(', ', full_name) + 2,
'')
If format of this full_name field is the same for all rows, you may utilize power of SQL FTS word breaker for this task:
SELECT N'Dow, Mike P.' AS full_name INTO #t
SELECT display_term FROM #t
CROSS APPLY sys.dm_fts_parser(N'"' + full_name + N'"', 1033, NULL, 1) p
WHERE occurrence = 2
DROP TABLE #t

How to split two words and number between two number?

My table Data looks like
Sno Componet Subcomponent IRNo
1 1 C1 to C100 001
2 1 C101 to C200 002
3 1 C201 to C300 003
4 1 C301,C400 004
5 1 C401,C500 005
If user enter C50 into textbox then it will get the data from First Row.Mean C50 between C1 to C100(C1,C100)
as same as if user enter C340 , then it will the data from SNO 4.
Means C340 between C301,C400(C301 to C400)
How can I write the query for this in sql server?
This is a terrible design and should be replaced with a better one if possible.
If re-designing is not possible then this answer by Eduard Uta is a good one, but still has one drawback compared to my suggested solution:
It assumes that the Subcomponent will always contain exactly one letter and a number, and that the range specified in the table has the same letter in both sides. a range like AB1 to AC100 might be possible (at least I don't think there's a way to prevent it using pure t-sql).
This is the only reason I present my solution as well. Eduard already got my vote up.
DECLARE #Var varchar(50) = 'C50'
-- also try 'AB150' and 'C332'
;WITH CTE AS (
SELECT Sno, Comp, SubComp,
LEFT(FromValue, PATINDEX('%[0-9]%', FromValue)-1) As FromLetter,
CAST(RIGHT(FromValue, LEN(FromValue) - (PATINDEX('%[0-9]%', FromValue)-1)) as int) As FromNumber,
LEFT(ToValue, PATINDEX('%[0-9]%', ToValue)-1) As ToLetter,
CAST(RIGHT(ToValue, LEN(ToValue) - (PATINDEX('%[0-9]%', ToValue)-1)) as int) As ToNumber
FROM
(
SELECT Sno, Comp, SubComp,
LEFT(SubComp,
CASE WHEN CHARINDEX(' to ', SubComp) > 0 THEN
CHARINDEX(' to ', SubComp)-1
WHEN CHARINDEX(',', SubComp) > 0 THEN
CHARINDEX(',', SubComp)-1
END
) FromValue,
RIGHT(SubComp,
CASE WHEN CHARINDEX(' to ', SubComp) > 0 THEN
LEN(SubComp) - (CHARINDEX(' to ', SubComp) + 3)
WHEN CHARINDEX(',', SubComp) > 0 THEN
CHARINDEX(',', SubComp)-1
END
) ToValue
FROM T
) InnerQuery
)
SELECT Sno, Comp, SubComp
FROM CTE
WHERE LEFT(#Var, PATINDEX('%[0-9]%', #Var)-1) BETWEEN FromLetter AND ToLetter
AND CAST(RIGHT(#Var, LEN(#Var) - (PATINDEX('%[0-9]%', #Var)-1)) as int) BETWEEN FromNumber And ToNumber
sqlfiddle here
No comments about the design. One solution for your question is using a CTE to sanitize the range boundaries and get them to a format that you can work with like so:
DECLARE #inputVal varchar(100) = 'C340'
-- sanitize input:
SELECT #inputVal = RIGHT(#inputVal, (LEN(#inputVal)-1))
;WITH cte (Sno,
SubcomponentStart,
SubcomponentEnd,
IRNo
)
AS
(
SELECT
Sno,
CASE WHEN Subcomponent LIKE '%to%'
THEN REPLACE(SUBSTRING(Subcomponent, 2, CHARINDEX('to', Subcomponent)), 'to','')
ELSE REPLACE(SUBSTRING(Subcomponent, 2,CHARINDEX(',', Subcomponent)), ',','')
END as SubcomponentStart,
CASE WHEN Subcomponent LIKE '%to%'
THEN REPLACE(SUBSTRING(Subcomponent, CHARINDEX('to', Subcomponent)+4, LEN(Subcomponent)), 'to', '')
ELSE REPLACE(SUBSTRING(Subcomponent, CHARINDEX(',', Subcomponent)+3, LEN(Subcomponent)), ',', '')
END as SubcomponentEnd,
IRNo
from test
)
SELECT t.*
FROM test t
INNER JOIN cte c
ON t.Sno = c.Sno
WHERE CAST(#inputVal as int) BETWEEN CAST(c.SubcomponentStart as INT) AND CAST(c.SubcomponentEnd as INT)
SQL Fiddle / tested here: http://sqlfiddle.com/#!6/1b9f0/19
For example you're getting UserEntry in variable #UserEntry, entry value is 'C5'.
-- Start From Here --
set #UserEntry = substring(#UserEntry,2,len(#UserEntry)-1)
select * from <tablename> where convert(int,#UserEntry)>=convert(int,SUBSTRING(Subcomponent,2,charindex('to',Subcomponent,1)-2)) and convert(int,#UserEntry)<=convert(int,(SUBSTRING(Subcomponent,charindex('c',Subcomponent,2)+1,len(Subcomponent)-charindex('c',Subcomponent,3))))