Removing spaces in complex names - sql

I am trying to insert complex names like Juan Carlos but I want to remove all the spaces except the one between Juan and Carlos. Lets use # as space to see spaces better.
When inserting I have tried RTRIM(LTRIM(#Name)) however It seems not to work, I tried to insert ###Jua#Car### but when I select the field with DATALENGTH([Name]) I get the lenght of 14.
As I see that string I can count 13 characters, not 14.
1. What is the character I cannot count?
2. How can I end up getting Juan#Carlos removing all the spaces if LTRIM and RTRIM does not work?
Update with more info:
The column datatype is nvarchar(100)
I just tried REPLACE([Name], ' ','') and the lenght i get is 12

You can trim non-alphanumeric characters using a somewhat complicated method:
select t2.name2
from t outer apply
(select (case when name like '%[a-zA-Z0-9]%'
then stuff(t.name, 1, patindex(t.name, '%[a-zA-Z0-9]%'), '')
else ''
end) as name1
) t1 outer apply
(select (case when name1 like '%[a-zA-Z0-9]%'
then left(t1.name1,
len(t1.name1) - patindex(reverse(t.name), '%[a-zA-Z0-9.]%')
)
else ''
end) as name2
) t2

Try to use in following, select separately FirstName, LastName and concat them with space:
DECLARE #FullName VARCHAR(MAX) = ' Juan Carlos '
SELECT LEN(SUBSTRING(LTRIM(RTRIM(#FullName)), 1, CHARINDEX(' ', LTRIM(RTRIM(#FullName))) - 1) + ' ' +
REVERSE(SUBSTRING(REVERSE(LTRIM(RTRIM(#FullName))), 1,
CHARINDEX(' ', REVERSE(LTRIM(RTRIM(#FullName)))) - 1) )) AS [Len]
It returning len = 11

Related

Split name into multiple parts in SELECT statement

I cannot seem to find an existing post on splitting a string into the parts I require. I have a database field in SQL Server that contains the "LastName FirstName MI" (no commas just spaces delimiting each part of a person's name). I have the following SQL to get the FirstName and Last, but cannot figure out how to get the Middle Initial or Middle Name.
Ex. Doe John B
SELECT
RTRIM(LEFT([PATIENT_NAME], CHARINDEX(' ', [PATIENT_NAME]))) AS LastName,
SUBSTRING([PATIENT_NAME], CHARINDEX(' ', [PATIENT_NAME]) + 1, LEN([PATIENT_NAME])) AS FirstName
FROM
Clients
Results in:
FirstName = John B
LastName = Doe
How to just return the first name without the middle initial and get the 'B' as middle name from this string in this SELECT statement?
You can either take the right 1 character, or reverse the string the take the first char.
SELECT RIGHT(LTRIM(RTRIM([Patient_Name])), 1) AS Middle_Initial
SELECT LEFT(REVERSE(LTRIM(RTRIM([Patient_Name]))), 1) AS Middle_Initial
As for removing MI from your firstname string, I would either find the length of the string and take the left N-2 chars or I would charindex the space and then take that many chars. To put it all together:
DECLARE #name VARCHAR(100) = 'Smith David M '
--Clean the string of leading/trailing whitespace
SELECT LTRIM(RTRIM(#name)) AS name_cleaned
--Find the first space to parse out the last name
SELECT CHARINDEX(' ', #name) AS first_space
--Select all chars before the first space
SELECT LEFT(LTRIM(RTRIM(#name)), CHARINDEX(' ', #name)-1) AS last_name
--Find the next space, use the starting location as the previous space and add 1
SELECT CHARINDEX(' ', #name, 7) AS second_space
--Select all chars between the spaces
SELECT SUBSTRING(#name, CHARINDEX(' ', #name)+1, CHARINDEX(' ', #name, 7) - CHARINDEX(' ', #name)) AS first_name
--Select the right most char for middle initial
SELECT RIGHT(LTRIM(RTRIM(#name)), 1) AS middle
You can REPLACE the space characters with period characters (.) and use PARSENAME().
Note that this would work for all 3 parts of the name, not just the middle initial.
When using the CHARINDEX on the last name, you'll use it as the length of the substring. Then, on the FirstName, use it again as start position on the substring. Now, the trick on the Middle, on the CHARINDEX, you have to include the start position which will be the LEN minus the LastName CHARINDEX. this would gives you the second space which is the position you want to start with for taking the Middle Name.
See the example below :
DECLARE #tb TABLE (PATIENT_NAME varchar(250));
INSERT INTO #tb VALUES
('Doe John B')
DECLARE
#LastName INT
, #Middle INT
SELECT
#LastName = CHARINDEX(' ', PATIENT_NAME)
, #Middle = CHARINDEX(' ', PATIENT_NAME, LEN(PATIENT_NAME) - CHARINDEX(' ', PATIENT_NAME))
FROM #tb
SELECT
SUBSTRING(PATIENT_NAME, 1, #LastName) LastName
, SUBSTRING(PATIENT_NAME, #LastName, LEN(PATIENT_NAME) - #LastName) FirstName
, SUBSTRING(PATIENT_NAME, #Middle, LEN(PATIENT_NAME) - #Middle + 1 ) Middle
FROM #tb
I have declared some variables to make things much readable, but you can do it without them.
Surely, LEFT and RIGHT are the easier approaches on taking the lastname and Middle Name. Along with using some helper functions such as REVERSE and TRIM, but I would prefer PARSENAME as a simpler and cleaner approach.
Here is an example :
SELECT
PARSENAME(REPLACE(PATIENT_NAME,' ','.'),3) LastName
, PARSENAME(REPLACE(PATIENT_NAME,' ','.'),2) FirstName
, PARSENAME(REPLACE(PATIENT_NAME,' ','.'),1) Middle
Since the number of elements you must extract from your string is fixed(3) you can use XML based split:
DECLARE #clients TABLE (PATIENT_NAME nvarchar(max));
INSERT INTO #clients VALUES
(' Doe John B ')
,(' Doe Jane C ')
,(' Doe Jill ')
;WITH Splitted
AS (
SELECT PATIENT_NAME as ORIGINAL_PATIENT_NAME
,REPLACE(REPLACE(REPLACE(ltrim(rtrim(PATIENT_NAME)),' ','<>'),'><',''),'<>',' ') as PATIENT_NAME
,CAST('<x>' + REPLACE(REPLACE(REPLACE(REPLACE(ltrim(rtrim(PATIENT_NAME)),' ','<>'),'><',''),'<>',' '), ' ', '</x><x>') + '</x>' AS XML) AS Parts
FROM #clients
)
SELECT
ORIGINAL_PATIENT_NAME
,PATIENT_NAME
,Parts.value(N'/x[1]', 'nvarchar(max)') AS LAST_NAME
,Parts.value(N'/x[2]', 'nvarchar(max)') AS FIRST_NAME
,Parts.value(N'/x[3]', 'nvarchar(max)') AS MIDDLE_NAME
FROM Splitted
Results:
As you can see it works even with random-spaced names.

How to get middle portion from Sql server table data?

I am trying to get First name from employee table, in employee table full_name is like this: Dow, Mike P.
I tried with to get first name using below syntax but it comes with Middle initial - how to remove middle initial from first name if any. because not all name contain middle initial value.
-- query--
select Employee_First_Name as full_name,
SUBSTRING(
Employee_First_Name,
CHARINDEX(',', Employee_First_Name) + 1,
len(Employee_First_Name)) AS FirstName
---> remove middle initial from right side from employee
-- result
Full_name Firstname Dow,Mike P. Mike P.
--few example for Full_name data---
smith,joe j. --->joe (need result as)
smith,alan ---->alan (need result as)
Instead of specifying the len you need to use charindex again, but specify that you want the second occurrence of a space.
select Employee_First_Name as full_name,
SUBSTRING(
Employee_First_Name,
CHARINDEX(',', Employee_First_Name) + 1,
CHARINDEX(' ', Employee_First_Name, 2)) AS FirstName
One thing to note, the second charindex can return 0 if there is no second occurence. In that case, you would want to use something like the following:
select Employee_First_Name as full_name,
SUBSTRING(
Employee_First_Name,
CHARINDEX(',', Employee_First_Name) + 1,
IIF(CHARINDEX(' ', Employee_First_Name, 2) = 0, Len(Employee_First_name), CHARINDEX(' ', Employee_First_Name, 2))) AS FirstName
This removes the portion before the comma.. then uses that string and removes everything after space.
WITH cte AS (
SELECT *
FROM (VALUES('smith,joe j.'),('smith,alan'),('joe smith')) t(fullname)
)
SELECT
SUBSTRING(
LTRIM(SUBSTRING(fullname,CHARINDEX(',',fullname) + 1,LEN(fullname))),
0,
COALESCE(NULLIF(CHARINDEX(' ',LTRIM(SUBSTRING(fullname,CHARINDEX(',',fullname) + 1,LEN(fullname)))),0),LEN(fullname)))
FROM cte
output
------
joe
alan
joe
To be honest, this is most easily expressed using multiple levels of logic. One way is using outer apply:
select ttt.firstname
from t outer apply
(select substring(t.full_name, charindex(', ', t.full_name) + 2, len(t.full_name) as firstmi
) tt outer apply
(select (case when tt.firstmi like '% %'
then left(tt.firstmi, charindex(' ', tt.firstmi)
else tt.firstmi
end) as firstname
) as ttt
If you want to put this all in one complicated statement, I would suggest a computed column:
alter table t
add firstname as (stuff((case when full_name like '%, % %.',
then left(full_name,
charindex(' ', full_name, charindex(', ', full_name) + 2)
)
else full_name
end),
1,
charindex(', ', full_name) + 2,
'')
If format of this full_name field is the same for all rows, you may utilize power of SQL FTS word breaker for this task:
SELECT N'Dow, Mike P.' AS full_name INTO #t
SELECT display_term FROM #t
CROSS APPLY sys.dm_fts_parser(N'"' + full_name + N'"', 1033, NULL, 1) p
WHERE occurrence = 2
DROP TABLE #t

SQl string split (parse) on space where some field values have no space

I've been trying to split a string value in my query to return the first part of a two part postcode in a new column. Some values have only the first part, and some have both. After a bit of searching I found this code:
SELECT
TblPLSSU.PLSSUPostcode
,SUBSTRING(TblPLSSU.PLSSUPostcode, 1, CHARINDEX(' ', TblPLSSU.PLSSUPostcode)) AS PCode
FROM
TblPostcodes
This happily splits the values that have two parts to the postcode but it seems to be ignoring the single part post codes.
For example, values for TblPLSSU.PLSSUPostcode might be:
EH1 1AB
EH2
I want to return the values
EH1
EH2
But with the code above I am only getting EH1
Thanks
Eils
Use case to pick up post codes as-is when they don't have a space separated value.
SELECT
PLSSUPostcode
,case when CHARINDEX(' ', PLSSUPostcode) > 0
then SUBSTRING(PLSSUPostcode, 1, CHARINDEX(' ', PLSSUPostcode))
else PLSSUPostcode end AS PCode
FROM
TblPostcodes
Use case as well:
SELECT TblPLSSU.PLSSUPostcode,
(CASE WHEN TblPLSSU.PLSSUPostcode LIKE '% %'
THEN SUBSTRING(TblPLSSU.PLSSUPostcode, 1, CHARINDEX(' ', TblPLSSU.PLSSUPostcode))
END) AS PCode
FROM . . .
A little trick with STUFF function. This works because of CHARINDEX returns index of the first occurrence. So you are just adding space to strings and replacing all symbols from first occurrence till the end with empty string:
DECLARE #t TABLE(v VARCHAR(20))
INSERT INTO #t VALUES('EH1 1AB'), ('EH2')
SELECT STUFF(v + ' ', CHARINDEX(' ', v + ' '), LEN(v), '')
FROM #t
Another version:
SELECT SUBSTRING(v + ' ', 1, CHARINDEX(' ', v + ' ') - 1)
FROM #t
Output:
EH1
EH2

deleting second comma in data

Ok so I have a table called PEOPLE that has a name column. In the name column is a name, but its totally a mess. For some reason its not listed such as last, first middle. It's sitting like last,first,middle and last first (and middle if there) are separated by a comma.. two commas if the person has a middle name.
example:
smith,steve
smith,steve,j
smith,ryan,tom
I'd like the second comma taken away (for parsing reason ) spaces put after existing first comma so the above would come out looking like:
smith, steve
smith, steve j
smith, ryan tom
Ultimately I'd like to be able to parse the names into first, middle, and last name fields, but that's for another post :_0. I appreciate any help.
thank you.
Drop table T1;
Create table T1(Name varchar(100));
Insert T1 Values
('smith,steve'),
('smith,steve,j'),
('smith,ryan,tom');
UPDATE T1
SET Name=
CASE CHARINDEX(',',name, CHARINDEX(',',name)+1) WHEN
0 THEN Name
ELSE
LEFT(name,CHARINDEX(',',name, CHARINDEX(',',name)+1)-1)+' ' +
RIGHT(name,LEN(Name)-CHARINDEX(',',name, CHARINDEX(',',name)+1))
END
Select * from T1
This seems to work. Not the most concise but avoids cursors.
DECLARE #people TABLE (name varchar(50))
INSERT INTO #people
SELECT 'smith,steve'
UNION
SELECT 'smith,steve,j'
UNION
SELECT 'smith,ryan,tom'
UNION
SELECT 'commaless'
SELECT name,
CASE
WHEN CHARINDEX(',',name) > 0 THEN
CASE
WHEN CHARINDEX(',',name,CHARINDEX(',',name) + 1) > 0 THEN
STUFF(STUFF(name, CHARINDEX(',',name,CHARINDEX(',',name) + 1), 1, ' '),CHARINDEX(',',name),1,', ')
ELSE
STUFF(name,CHARINDEX(',',name),1,', ')
END
ELSE name
END AS name2
FROM #people
Using a table function to split apart the names with a delimiter and for XML Path to stitch them back together, we can get what you're looking for! Hope this helps!
Declare #People table(FullName varchar(200))
Insert Into #People Values ('smith,steve')
Insert Into #People Values ('smith,steve,j')
Insert Into #People Values ('smith,ryan,tom')
Insert Into #People Values ('smith,john,joseph Jr')
Select p.*,stuff(fn.FullName,1,2,'') as ModifiedFullName
From #People p
Cross Apply (
select
Case When np.posID<=2 Then ', ' Else ' ' End+np.Val
From #People n
Cross Apply Custom.SplitValues(n.FullName,',') np
Where n.FullName=p.FullName
For XML Path('')
) fn(FullName)
Output:
ModifiedFullName
smith, steve
smith, steve j
smith, ryan tom
smith, john joseph Jr
SplitValues table function definition:
/*
This Function takes a delimited list of values and returns a table containing
each individual value and its position.
*/
CREATE FUNCTION [Custom].[SplitValues]
(
#List varchar(max)
, #Delimiter varchar(1)
)
RETURNS
#ValuesTable table
(
posID int
,val varchar(1000)
)
AS
BEGIN
WITH Cte AS
(
SELECT CAST('<v>' + REPLACE(#List, #Delimiter, '</v><v>') + '</v>' AS XML) AS val
)
INSERT #ValuesTable (posID,val)
SELECT row_number() over(Order By x) as posID, RTRIM(LTRIM(Split.x.value('.', 'VARCHAR(1000)'))) AS val
FROM Cte
CROSS APPLY val.nodes('/v') Split(x)
RETURN
END
GO
String manipulation in SQLServer, outside of writing your own User Defined Function, is limited but you can use the PARSENAME function for your purposes here. It takes a string, splits it on the period character, and returns the segment you specify.
Try this:
DECLARE #name VARCHAR(100) = 'smith,ryan,tom'
SELECT REVERSE(PARSENAME(REPLACE(REVERSE(#name), ',', '.'), 1)) + ', ' +
REVERSE(PARSENAME(REPLACE(REVERSE(#name), ',', '.'), 2)) +
COALESCE(' ' + REVERSE(PARSENAME(REPLACE(REVERSE(#name), ',', '.'), 3)), '')
Result: smith, ryan tom
If you set #name to 'smith,steve' instead, you'll get:
Result: smith, steve
Segment 1 actually gives you the last segment, segment 2 the second to last etc. Hence I've used REVERSE to get the order you want. In the case of 'steve,smith', segment 3 will be null, hence the COALESCE to add an empty string if that is the case. The REPLACE of course changes the commas to periods so that the split will work.
Note that this is a bit of a hack. PARSENAME will not work if there are more than four parts and this will fail if the name happens to contain a period. However if your data conforms to these limitations, hopefully it provides you with a solution.
Caveat: it sounds like your data may be inconsistently formatted. In that case, applying any automated treatment to it is going to be risky. However, you could try:
UPDATE people SET name = REPLACE(name, ',', ' ')
UPDATE people SET name = LEFT(name, CHARINDEX(' ', name)-1)+ ', '
+ RIGHT(name, LEN(name) - CHARINDEX(' ', name)
That'll work for the three examples you give. What it will do to the rest of your set is another question.
Here's an example with CHARINDEX() and SUBSTRING
WITH yourTable
AS
(
SELECT names
FROM
(
VALUES ('smith,steve'),('smith,steve,j'),('smith,ryan,tom')
) A(names)
)
SELECT names AS old,
CASE
WHEN comma > 0
THEN SUBSTRING(spaced_names,0,comma + 1) --before the comma
+ SUBSTRING(spaced_names,comma + 2,1000) --after the comma
ELSE spaced_names
END AS new
FROM yourTable
CROSS APPLY(SELECT CHARINDEX(',',names,CHARINDEX(',',names) + 1),REPLACE(names,',',', ')) AS CA(comma,spaced_names)

SQL Server 2008: How to find trailing spaces

How can I find all column values in a column which have trailing spaces? For leading spaces it would simply be
select col from table where substring(col,1,1) = ' ';
You can find trailing spaces with LIKE:
SELECT col FROM tbl WHERE col LIKE '% '
SQL Server 2005:
select col from tbl where right(col, 1) = ' '
As a demo:
select
case when right('said Fred', 1) = ' ' then 1 else 0 end as NoTrail,
case when right('said Fred ', 1) = ' ' then 1 else 0 end as WithTrail
returns
NoTrail WithTrail
0 1
This is what worked for me:
select * from table_name where column_name not like RTRIM(column_name)
This will give you all the records that have trailing spaces.
If you want to get the records that have either leading or trailing spaces then you could use this:
select * from table_name where column_name not like LTRIM(RTRIM(column_name))
A very simple method is to use the LEN function.
LEN will trim trailing spaces but not preceeding spaces, so if your LEN() is different from your LEN(REVERSE()) you'll get all rows with trailing spaces:
select col from table where LEN(col) <> LEN(REVERSE(col));
this can also be used to figure out how many spaces you have for more advanced logic.
SELECT * FROM tbl WHERE LEN(col) != DATALENGTH(col)
Should work also.
There's a few different ways to do this...
My favorite option, assuming your intention is to remove any leading and / or trailing spaces, is to execute the following, which will dynamically create the T-SQL to UPDATE all columns with an unwanted space to their trimmed value:
SELECT
'UPDATE [<DatabaseName>].[dbo].['+TABLE_NAME+']
SET ['+COLUMN_NAME+']=LTRIM(RTRIM(['+COLUMN_NAME+']))
WHERE ['+COLUMN_NAME+']=LTRIM(RTRIM(['+COLUMN_NAME+']));'+CHAR(13)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME LIKE '<TableName>%'
AND DATA_TYPE!='date'
ORDER BY TABLE_NAME,COLUMN_NAME
If you really need to identify them though, try one of these queries:
SELECT *
FROM [database].[schema].[table]
WHERE [col1]!=LTRIM(RTRIM([col1]))
More dynamic SQL:
SELECT 'SELECT ''['+TABLE_NAME+'].['+COLUMN_NAME+']'',*
FROM [<your database name>].[dbo].['+TABLE_NAME+']
WHERE ['+COLUMN_NAME+'] LIKE ''% ''
OR ['+COLUMN_NAME+'] LIKE '' %'';
GO
'
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME LIKE '<filter table name as desired>%'
AND DATA_TYPE!='date'
Here's an alternative to find records with leading or trailing whitespace, including tabs etc:
SELECT * FROM tbl WHERE NOT TRIM(col) = col
Try this:
UPDATE Battles
SET name = CASE WHEN (LEN(name+'a')-1)>LEN(RTRIM(name))
THEN REPLICATE(' ', (LEN(name+'a')-1)- LEN(RTRIM(name)))+RTRIM(name)
ELSE name
END
Spaces are ignored in SQL Server so for me even the leading space was not working .
select col from table where substring(col,1,1) = ' '
wont work if there is only one space (' ') or blank ('')
so I devised following:
select * from [table] where substring(REPLACE(col, ' ', '#'),1,1) = '#'
Here is another alternative for trailing spaces.
DECLARE #VALUE VARCHAR(50) = NULL
DECLARE #VALUE VARCHAR(50) = ' '
IF ((#VALUE IS NOT NULL) AND (LTRIM(RTRIM(#VALUE)) != ''))
BEGIN
SELECT 'TRUE'
END
ELSE
BEGIN
SELECT 'FALSE'
END
I have found the accepted answer a little bit slower:
SELECT col FROM tbl WHERE col LIKE '% ';
against this technique:
SELECT col FROM tbl WHERE ASCII(RIGHT([value], 1)) = 32;
The idea is to get the last char, but compare its ASCII code with the ASCII code of space instead only with ' ' (space). If we use only ' ' space, an empty string will yield true:
DECLARE #EmptyString NVARCHAR(12) = '';
SELECT IIF(RIGHT(#EmptyString, 1) = ' ', 1, 0); -- this returns 1
The above is because of the Microsoft's implementation of string comparisons.
So, how fast exactly?
You can try the following code:
CREATE TABLE #DataSource
(
[RowID] INT PRIMARY KEY IDENTITY(1,1)
,[value] NVARCHAR(1024)
);
INSERT INTO #DataSource ([value])
SELECT TOP (1000000) 'text ' + CAST(ROW_NUMBER() OVER(ORDER BY t1.number) AS VARCHAR(12))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
UPDATE #DataSource
SET [value] = [value] + ' '
WHERE [RowID] = 100000;
SELECT *
FROM #DataSource
WHERE ASCII(RIGHT([value], 1)) = 32;
SELECT *
FROM #DataSource
WHERE [value] LIKE '% ';
On my machine there is around 1 second difference:
I have test it on table with 600k rows, but larger size, and the difference was above 8 seconds. So, how fast exactly will depend on your real case data.
Another way to achieve this by using CHARINDEX and REVERSE like below:
select col1 from table1
WHERE charindex(' ', reverse(col1)) = 1
See example Here
Simple use below query to get values having any number of spaces in begining or at end of values in column.
select * from table_name where column_name like ' %' or column_name like '% ';
We can try underscore to find the entries which are blanks,though not an accurate solution like using '% %' or ' ', But I could find entries which are blanks.
select col_name from table where col_name like '_'