Trying to removes spaces between initials in sql - sql

I have a column that contains a persons name and I need to extract it to pass to another system but I need to remove the spaces but only from between the initials
for example I might have
Mr A B Bloggs and I want Mr AB Bloggs or
Mrs A B C Bloggs and I want Mrs ABC Bloggs
As there are millions of records in the table I wont know how many initials there are or indeed if there are any initials. All I know is the prefix (Mr, Mrs etc) will be more than 1 character and so will the surname. I've tried using trim, replace, charindex but obviously not in the right combination. Any help would be appreciated.

Unfortunately SQL server does not support regex. You have two options:
Use .Net in CLR to perform the transformation. This link explains how to implement regex in SQL server using CLR: https://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/.
Other option is to use a cursor to iterate through all the reocords and transform each entry. This may be slow for a large table. For example, you could write a function that returns location of spaces surrounded by single letters and then remove them. The trick is not to remove them until you have recorded all of them, and then remove them from right to left to avoid the location changing.

Try this:
declare #test varchar(100)='Mrs A B C Bloggs'
select (substring (#test,0,charindex(' ',#test)))+' '+
replace(replace(replace(substring(#test,len((substring (#test,0,charindex(' ',#test))))+1,len(#test)),
(substring (#test,0,charindex(' ',#test))),''),reverse((substring (reverse(#test),0,charindex(' ',reverse(#test))))),''),' ','')
+' '+reverse((substring (reverse(#test),0,charindex(' ',reverse(#test)))))

Related

SQL Server LIKE caret (^) for NOT does not work as expected

I was reading the article at mssqltips and wanted to try the caret in regex. I understand regex pretty well and use it often, although not much in SQl Server queries.
For the following list of names, I had thought that 1) select * from people where name like '%[^m]%;' will return those names that do not contain 'm'. But it doesn't work like that. I know I can do 2) select * from people where name not like '%m%'; to get the result I want, but I'm just baffled why 1) doesn't work as expected.
Amy
Jasper
Jim
Kathleen
Marco
Mike
Mitchell
I am using SQL Server 2017, but here is a fiddle:
sql fiddle
'%[^m]%' would be true for any string containing a character that is not m. An expanded version would be '%[Any character not m]%'. Since all of those strings contain a character other than m, they are valid results.
If you had a string like mmm, where name like '%[^m]%' would not return that row.

Remove nonspecific character and a space in SQL string

I need to clean up some records in an SQL table. Somehow the middle initial was placed in front of some last names in the last name field.
The table looks like this:
EmplyID last_name_field
123 A Smith
456 Jones
789 J Gillum
Not all of the records have a middle initial and space in front of them. I can't use TRIM to take away the first two characters in each of the records because it would mess up the ones whose last names where correctly imported. Is there a way to remove the first character and space for the only the records that have the middle initial?
Thanks in advance
Very simple solution. Just remove everything before the space. I was on the right track with TRIM instead use RIGHT and LEN
update TABLE_NAME
Set COLUMN_NAME = RIGHT(COLUMN_NAME, LEN(COLUMN_NAME) - CHARINDEX(' ', COLUMN_NAME))
where CHARINDEX(' ', COLUMN_NAME) > 0

SQL Query to Split a string, format each result, then build back into a single row

I've been working on a rather convoluted process to format some data for a work project. We received a data extract and need to export it for import during a migration, but some of the data won't import due to case sensitivity (user logons with sentence case for example).
In an ideal world, I could demand the data be sanitised and formatted before it's provided for me to build the import, but I can't, so I'm stuck where I have to format it myself.
Plan:
Take string result
Split string result by pipe delimitation
Format each split results ( ) into lower case (where applicable)
Put all split results back into one string using FOR XML PATH
Example of problem:
Field 'Assigned To' can contain a pipe delimitted string of users and/or user groups, e.g.
John Smith (jsmith)|College Group of Arts|Bob Jones (BJones)
Now as you can see above, John Smith (jsmith) looks fine, as does College Group of Arts, however Bob Jones has had his logon sentence cased, so I need to use a LOWER command, chained with SUBSTRING and CHARINDEX to convert the logon to lower. Standalone, this approach works fine, but the problem I'm having is where I'm using a function found here on Stack Overflow (slightly manipulated to account for pipe delimitation) T-SQL split string.
When I retrieve the table results of the split string, I can't apply CHARINDEX against any characters in the result string, and I can't work out why.
Scenario:
The raw data extract, untouched, returns the below when queried;
|College of Science Administrators|Bob Jones (BJones)|
I then apply the below query, which calls the function queried above;
declare #assignedto nvarchar(max) = (select assigned_to from project where project_id = 1234)
SELECT SUBSTRING(Name,CHARINDEX(Name,'('),255)
FROM dbo.splitstring(#assignedto)
I then get the below results;
College of Science Administrators
Bob Jones (BJones)
What I'd expect to see is;
College of Science Administrators
(BJones)
I could then apply my LOWER logic to change it to lower case.
If that worked, then thought process was then to take those results and pass them back into a single string using a FOR XML PATH.
So I guess technically, there are 2 questions here;
Why won't my function let me manipulate the results with CHARINDEX?
And is this the best way to do what I'm trying to achieve overall?
I would strongly suggest you take that splitstring function you found and throw it away. It is horribly inefficient and doesn't even take the delimiter as a parameter. There are so many better splitter options available. One such example is the DelimitedSplit8K_LEAD which can be found here.
I noticed you also have your delimiters at the beginning and the end so you have to eliminate those but not a big deal. Here is how I would go about parsing this string. I am using a variable for your string here with the value you said is in your table.
declare #Something varchar(100) = '|College of Science Administrators|Bob Jones (BJones)|'
select MyOutput = case when charindex('(', x.Item) > 1 then substring(x.Item, charindex('(', x.Item), len(x.Item)) else Item end
from dbo.DelimitedSplit8K_LEAD(#Something, '|') x
where x.Item > ''
For question #1 you must simply invert parameters in CharIndex :
CHARINDEX('(', Name))

how do I retrieve data from a sql table with huge number of inputs for a single column

I have a Company table in SQL Server and I would like to retrieve list of data related to particular companies and list of companies is very huge of around 200 company names and I am trying to use IN clause of T-SQL which is complicating the retrieval as few the companies have special characters in their name like O'Brien and so its throwing up an error as it is obvious.
SELECT *
FROM COMPANY
WHERE COMPANYNAME IN
('Archer Daniels Midland'
'Shell Trading (US) Company - Financial'
'Redwood Fund, LLC'
'Bunge Global Agribusiness - Matt Thibodeaux'
'PTG, LLC'
'Morgan Stanley Capital Group'
'Vitol Inc.'..
.....
....
.....)
Above is the script that is not working for obvious reasons, is there any way I can input those company names from an excel file and retrieve the data?
The easiest way would be to make a table and join it:
CREATE TABLE dbo.IncludedCompanies (CompanyName varchar(1000)
INSERT INTO dbo.IncludedCompanies
VALUES
('Archer Daniels Midland'),
('PTG, LLC')
...
SELECT *
FROM Company C
JOIN IncludedCompanies IC
ON C.CompanyName = IC.CompanyName
I do not think that mysql knows how to handle excel format, but you can fix your query.
Check how complicated names are stored in database (check if they have escape characters in them or anything else".
Replace all ' with \' in your query and it will take care of the ' characters
mysql> select now() as 'O\'Brian'; returns
O'Brian
2014-03-17 15:06:39
So i'm guessing you have a excel sheet with a column containing these names, and you want to use this in your where clause. In addition, some of the values have special characters in them, which needs to be escaped.
First thing you do is to escape the '-characters. You do this in excel, with a search replace for all occurences of ' with '' (the escaped version in sqlserver (\' in MySQL.)) Then, create a new column on each side side of your companies column, and in the first row input a ' on the left hand side, and ', on the right. Then use the copy cell functionality (the little square in the bottom right of the cell when you select it) to copy the cells to the left and right to all the rows, as far as the company list goes (just grab the square and pull it downwards..)
Then, take your list, now containing three columns and x rows and paste it into your favorite text editor. It should look something like this:
' Company#1 ',
' Company with special '' char ',
[...]
' Last company ',
Now, you will have some whitespace to get rid of. Use search replace and replace two space characters with nothing, and repeat (or take the space from the first ' to the start of the text and replace this with nothing.
Now, you should have a list of:
'Company#1',
'Company with special '' char',
[...]
'Last company',
Remove the last comma, and you'll have a valid list of parameters to your in-clause (or a (temporary) table if you want to keep your query a bit cleaner.)

Repetitions in field in Firebird without regex

I'm trying to craft a query which rejects a row when some field is all the same characters. Ie. I want to select people named Smith but not people named aaaaaa or bbbb.
I can't use regexes, as Firebird's SIMILAR TO doesn't have backreferences.
How would you do it?
Meh, this is not what I wanted, but this will do. It works on aaaaaa, but wouldn't on abbbbbb.
SELECT *
FROM PEOPLE
WHERE replace(upper(NAME), substring(upper(NAME) FROM 1 FOR 1), '') = ''