I'm running a series of SQL queries to find data that needs cleaning up. One of them I want to do is look for:
2 or more uppercase letters in a row
starting with a lowercase letter
space then a lowercase letter
For example my name should be "John Doe". I would want it to find "JOhn Doe" or "JOHN DOE" or "John doe", but I would not want it to find "John Doe" since that is formatted correctly.
I am using SQL Server 2008.
The key is to use a case-sensitive collation, i.e. Latin1_General_BIN*. You can then use a query with a LIKE expression like the following (SQL Fiddle demo):
select *
from foo
where name like '%[A-Z][A-Z]%' collate Latin1_General_BIN --two uppercase in a row
or name like '% [a-z]%' collate Latin1_General_BIN --space then lowercase
*As per How do I perform a case-sensitive search using LIKE?, apparently there is a "bug" in the Latin1_General_CS_AS collation where ranges like [A-Z] fail to be case sensitive. The solution is to use Latin1_General_BIN.
First, I think you should make a function that returns a proper name (sounds like you need one anyway). See here under the heading "Proper Casing a Persons Name". Then find the ones that don't match.
SELECT Id, Name, dbo.ProperCase(Name)
FROM MyTable
WHERE Name <> dbo.PoperCase(Name) collate Latin1_General_BIN
This will help you clean up the data and tweak the function to what you need.
You can use a regular expression. I'm not a SQL Server whiz, but you want to use RegexMatch. Something like this:
select columnName
from tableName
where dbo.RegexMatch( columnName,
N'[A-Z]\W[A-Z]' ) = 1
If your goal is to update your column to capitalize the first character of each word (in your case firstName and lastName) , you can use the following query.
Create a sample table with data
Declare #t table (Id int IDENTITY(1,1),Name varchar(50))
insert into #t (name)values ('john doe'),('lohn foe'),('tohnytty noe'),('gohnsdf fgedsfsdf')
Update query
UPDATE #t
SET name = UPPER(LEFT(SUBSTRING(Name, 1, CHARINDEX(' ', Name) - 1), 1)) + RIGHT(SUBSTRING(Name, 1, CHARINDEX(' ', Name) - 1), LEN(SUBSTRING(Name, 1, CHARINDEX(' ', Name) - 1)) - 1) +
' ' +
UPPER(LEFT(SUBSTRING(Name, CHARINDEX(' ', Name) + 1, 8000), 1)) + RIGHT(SUBSTRING(Name, CHARINDEX(' ', Name) + 1, 8000), LEN(SUBSTRING(Name, CHARINDEX(' ', Name) + 1, 8000)) - 1)
FROM #t
Output
SELECT * FROM #t
Id Name
1 John Doe
2 Lohn Foe
3 Tohnytty Noe
4 Gohnsdf Fgedsfsdf
I use this way:
;WITH yourTable AS(
SELECT 'John Doe' As name
UNION ALL SELECT 'JOhn Doe'
UNION ALL SELECT 'JOHN DOE'
UNION ALL SELECT 'John doe'
UNION ALL SELECT 'John DoE'
UNION ALL SELECT 'john Doe'
UNION ALL SELECT 'jOhn dOe'
UNION ALL SELECT 'jOHN dOE'
UNION ALL SELECT 'john doe'
)
SELECT name
FROM (
SELECT name,
LOWER(PARSENAME(REPLACE(name, ' ', '.'), 1)) part2,
LOWER(PARSENAME(REPLACE(name, ' ', '.'), 2)) part1
FROM yourTable) t
WHERE name COLLATE Latin1_General_BIN = UPPER(LEFT(part1,1)) + RIGHT(part1, LEN(part1) -1) +
' ' + UPPER(LEFT(part2,1)) + RIGHT(part2, LEN(part2) -1)
Note:
This will be good for just two parted names for more, it should improved.
Related
I am looking to return all names with more than one space in a single field.
For example 'John Paul Smith'. Using SQL server management studio 2005
Example I have a patients table with forename and surname
I want to return all forenames that have example 'John Paul Smith' in one field.
The query given seems to work on the surname field but not the forename. I knot for certain that the forename columns has these types of data but it is returning no results.
Con
Oracle:
SELECT MyField
from MyTable
where REGEXP_INSTR (MyField, ' ', 1, 2, 0, 'i') > 0
SQL server:
SELECT MyField
from MyTable
where CHARINDEX(' ', MyField, charindex(' ',MyField)+1) > 0
MySQL
select MyField
from MyTable
where length(SUBSTRING_INDEX(MyField, ' ', 2)) < length(MyField)
Here are two solutions that in my opinion are easier to read/understand than JohnHC's.
It can't get any simpler. Use wildcards to search for (at least) two spaces.
SELECT * FROM your_table WHERE your_column LIKE '% % %';
Check the length after replacing the spaces
SELECT * FROM your_table WHERE LEN(your_column) - LEN(REPLACE(your_column, ' ', '')) >= 2;
Assuming I have a table full of names.
firstname.lastname in a single cell.
How can I seperate these into "Firstname Lastname", with uppercase for the first letters? Using TSQL
Sample:
mike.mikeson -> Mike Mikeson
katy.lumberjack -> Katy Lumberjack
One of those times we can use the ParseName function for our benefit ;-)
SELECT original_value
, forename
, surname
, Upper(SubString(forename, 1, 1)) + Lower(Substring(forename, 2, 8000)) AS formatted_forename
, Upper(SubString(surname , 1, 1)) + Lower(Substring(surname , 2, 8000)) AS formatted_surname
FROM (
SELECT name AS original_value
, ParseName(name, 2) AS forename
, ParseName(name, 1) AS surname
FROM (
VALUES ('mike.mikeson')
, ('katy.lumberjack')
) AS users (name)
) AS step1
The below will answer you question as is but as comments have pointed out, you may need to also take into account names that have more than one uppercase letter in either part, such as Mary-Anne McDonald, or those that simply don't conform to your convention.
declare #a table (Name nvarchar(50))
insert into #a values
('fred.bloggs')
,('john.doe')
,('alan.smith')
select Name
,upper(left(Name,1))
+ substring(Name,2,charindex('.',Name,1)-2)
+ ' '
+ upper(substring(Name,charindex('.',Name,1)+1,1))
+ right(Name,len(Name) - charindex('.',Name,1)-1)
as FormattedName
from #a
You can try using concat and substring for this as below
declare #name varchar(50) = 'firstname.lastname'
select case when charindex('.',#name) > 0 then concat(upper(left(#name,1)), substring(#name,2,charindex('.',#name)-2), ' ', upper(substring(#name,charindex('.',#name)+1,1)), substring(#name,charindex('.',#name)+2, len(#name)))
else concat(upper(left(#name,1)), substring(#name,2,len(#name))) end
I am trying to get First name from employee table, in employee table full_name is like this: Dow, Mike P.
I tried with to get first name using below syntax but it comes with Middle initial - how to remove middle initial from first name if any. because not all name contain middle initial value.
-- query--
select Employee_First_Name as full_name,
SUBSTRING(
Employee_First_Name,
CHARINDEX(',', Employee_First_Name) + 1,
len(Employee_First_Name)) AS FirstName
---> remove middle initial from right side from employee
-- result
Full_name Firstname Dow,Mike P. Mike P.
--few example for Full_name data---
smith,joe j. --->joe (need result as)
smith,alan ---->alan (need result as)
Instead of specifying the len you need to use charindex again, but specify that you want the second occurrence of a space.
select Employee_First_Name as full_name,
SUBSTRING(
Employee_First_Name,
CHARINDEX(',', Employee_First_Name) + 1,
CHARINDEX(' ', Employee_First_Name, 2)) AS FirstName
One thing to note, the second charindex can return 0 if there is no second occurence. In that case, you would want to use something like the following:
select Employee_First_Name as full_name,
SUBSTRING(
Employee_First_Name,
CHARINDEX(',', Employee_First_Name) + 1,
IIF(CHARINDEX(' ', Employee_First_Name, 2) = 0, Len(Employee_First_name), CHARINDEX(' ', Employee_First_Name, 2))) AS FirstName
This removes the portion before the comma.. then uses that string and removes everything after space.
WITH cte AS (
SELECT *
FROM (VALUES('smith,joe j.'),('smith,alan'),('joe smith')) t(fullname)
)
SELECT
SUBSTRING(
LTRIM(SUBSTRING(fullname,CHARINDEX(',',fullname) + 1,LEN(fullname))),
0,
COALESCE(NULLIF(CHARINDEX(' ',LTRIM(SUBSTRING(fullname,CHARINDEX(',',fullname) + 1,LEN(fullname)))),0),LEN(fullname)))
FROM cte
output
------
joe
alan
joe
To be honest, this is most easily expressed using multiple levels of logic. One way is using outer apply:
select ttt.firstname
from t outer apply
(select substring(t.full_name, charindex(', ', t.full_name) + 2, len(t.full_name) as firstmi
) tt outer apply
(select (case when tt.firstmi like '% %'
then left(tt.firstmi, charindex(' ', tt.firstmi)
else tt.firstmi
end) as firstname
) as ttt
If you want to put this all in one complicated statement, I would suggest a computed column:
alter table t
add firstname as (stuff((case when full_name like '%, % %.',
then left(full_name,
charindex(' ', full_name, charindex(', ', full_name) + 2)
)
else full_name
end),
1,
charindex(', ', full_name) + 2,
'')
If format of this full_name field is the same for all rows, you may utilize power of SQL FTS word breaker for this task:
SELECT N'Dow, Mike P.' AS full_name INTO #t
SELECT display_term FROM #t
CROSS APPLY sys.dm_fts_parser(N'"' + full_name + N'"', 1033, NULL, 1) p
WHERE occurrence = 2
DROP TABLE #t
If I have a column in which strings vary in length but they ALL have a slash \ within,
how can I SELECT to have one column display everything BEFORE the \ and another column displaying everything AFTER the \?
name column1 column2
DB5697\DEV DB5697 DEV
I have seen CHARINDEX and REVERSE on MSDN but haven't been able to put together a soltuion.
How can I best split a varchar/string column value into 2 columns in a result set in TSQL ?
what about using PARSENAME function in a tricky way?
USE tempdb;
GO
CREATE TABLE #names
(
id int NOT NULL PRIMARY KEY CLUSTERED
, name varchar(30) NOT NULL
);
GO
INSERT INTO #names (id, name)
VALUES
(1, 'DB5697\DEV'),
(2, 'DB5800\STG'),
(3, 'DB5900\PRD');
GO
SELECT
name
, PARSENAME(REPLACE(name, '\', '.'), 2) AS [Server]
, PARSENAME(REPLACE(name, '\', '.'), 1) AS [Instance]
FROM
#names;
GO
DROP TABLE #names;
GO
The PARSENAME function accepts 2 parameters and gets the name part of a fully qualified name. The second parameter is the part name enumerator.
Value 2 is for SCHEMA and 1 is for OBJECT.
So, with the REPLACE function the "\" char is replaced by "." in order to have a SCHEMA.OBJECT format of your SERVERNAME\INSTANCE values. Then, PARSENAME behave like having a simple object name in the string.
How about the following (SQL Fiddle):
SELECT m.name,
LEFT(m.name, CHARINDEX('\', m.name) - 1) AS column1,
RIGHT(m.name, LEN(m.name) - CHARINDEX('\', m.name)) AS column2
FROM MyTable m
How to handle strings with no \ in them (SQL Fiddle):
SELECT m.name,
CASE WHEN CHARINDEX('\', m.name) = 0 THEN ''
ELSE LEFT(m.name, CHARINDEX('\', m.name) - 1) END AS column1,
CASE WHEN CHARINDEX('\', m.name) = 0 THEN ''
ELSE RIGHT(m.name, LEN(m.name) - CHARINDEX('\', m.name)) END AS column2
FROM MyTable m;
You can use CHARINDEX to check for the character position of the splitter ('/') and use SUBSTRING to split the string.
However care has to be taken to ensure you handle records without splitters else you would invoke an error.
Also in the case where splitter is unavailable, decision has to be made as to which column the data should be mapped to. Here I am mapping data to FirstName and assigning NULL to LastName
DECLARE #TableBuyer TABLE (ID INT, FullName VARCHAR(100))
INSERT INTO #TableBuyer
SELECT '1','Bryan/Greenberg' UNION ALL
SELECT '2','Channing/Tatum' UNION ALL
SELECT '3','Paul/William' UNION ALL
SELECT '4','EricBana' UNION ALL
SELECT '5','James/Lafferty' UNION ALL
SELECT '6','Wentworth/Miller'
SELECT
CASE
WHEN CHARINDEX('/', FullName) > 0 THEN SUBSTRING(FullName, 1, CHARINDEX('/', FullName) - 1)
ELSE FullName
END AS FirstName
,
CASE
WHEN CHARINDEX('/', FullName) > 0 THEN SUBSTRING(FullName, CHARINDEX('/', FullName) + 1, LEN(FullName))
ELSE NULL
END AS LastName
FROM #TableBuyer;
DECLARE #TableBuyer TABLE (ID INT, FullName VARCHAR(100))
INSERT INTO #TableBuyer
SELECT '1','Bryan/Greenberg' UNION ALL
SELECT '2','Channing/Tatum' UNION ALL
SELECT '3','Paul/William' UNION ALL
SELECT '4','EricBana' UNION ALL
SELECT '5','James/Lafferty' UNION ALL
SELECT '6','Wentworth/Miller'
select left(FullName, len(FullName)-CHARINDEX('/', REVERSE(FullName))) as firstname,
substring(FullName, len(FullName)-CHARINDEX('/', REVERSE(FullName))+ 2, len(FullName)) as lastname
from #TableBuyer
OR
select left(FullName, len(FullName)-CHARINDEX('/', REVERSE(FullName))) as firstname,
RIGHT(FullName, len(FullName)-CHARINDEX('/', FullName)) as lastname
from #TableBuyer
There is no "simple" method. Something like this should work:
select left(col, charindex('\', col) - 1) as column1,
right(col, charindex('\', reverse(col)) - 1) as column2
You might need to double up on the backslash ('\\') to get it to work properly.
i have column in a table (column name is “from”), looks like this
blabla#hotmail.com
frank ocean real frankocean#mail.com
ari#gold.com
frits west f.west#mail.com
I want to select the email addresses only, how do i do this? with a substring?
I can find a domain, but i want to have the complete mail addresses, like this:
blabla#hotmail.com
frankocean#mail.com
ari#gold.com
f.west#mail.com
thanks!
Reverse the string and look for a space before the address: try this
CREATE TABLE #Addresses (EmailAddress VARCHAR(100))
INSERT INTO #Addresses (EmailAddress)
SELECT 'blabla#hotmail.com'
UNION
SELECT 'frank ocean real frankocean#mail.com'
UNION
SELECT 'ari#gold.com'
UNION
SELECT 'frits west f.west#mail.com'
SELECT LTRIM(RTRIM(RIGHT(EmailAddress, CHARINDEX(' ', REVERSE(' ' + EmailAddress),CHARINDEX('#', REVERSE(' '+emailAddress)))))) FROM #Addresses
EDIT: if you have any strings that contain the name after the address, you can use the following to strip out the address:
CREATE TABLE #Addresses (EmailAddress VARCHAR(100))
INSERT INTO #Addresses (EmailAddress)
SELECT 'blabla#hotmail.com'
UNION
SELECT 'frank ocean real frankocean#mail.com'
UNION
SELECT 'ari#gold.com'
UNION
SELECT 'frits west f.west#mail.com'
UNION
SELECT 'me#me.com my name'
SELECT LTRIM(RTRIM(LEFT(RIGHT(EmailAddress, CHARINDEX(' ', REVERSE(' ' + EmailAddress),CHARINDEX('#', REVERSE(' '+emailAddress)))), CHARINDEX(' ', EmailAddress + ' '))) FROM #Addresses
DROP TABLE #Addresses
EDIT 2: forgot to add trimming functions
EDIT 3: final code using the OP's column name (table name not posted):
SELECT LTRIM(RTRIM(LEFT(RIGHT([From], CHARINDEX(' ', REVERSE(' ' + [From]),CHARINDEX('#', REVERSE(' '+[From])))), CHARINDEX(' ', [From]+ ' '))) FROM -- whatever your table is named
If the output is always in the format that you gave, then hopefully this will work:
SELECT RIGHT([From], CHARINDEX(' ', REVERSE(' ' + [From])) - 1) AS [Result]
FROM YourTable
This will only work if there is a space before the input that you want (the actual email address). I use this for a similar purpose in for some legacy dodgy customer data.
This is for SQL Server, I don't know if it will work for any other RDBMS.