How to replace underscore with a blank space with a regular expression in SQL - sql

I'm trying to insert post codes into my database but getting rid of the underscores.
I have a table called FeedDataSetMapping that is used to map the fields before they get inserted:
INSERT INTO FeedDataSetMapping (
[source_field]
,[database_field]
,[template_id]
,[conversion_id]
,[order_id]
,[values_group]
,[direct_value]
,[value_regex]
,[condition_regex]
,[split_separator]
,[enclosing_character]
,[cumulative_field]
,[cumulative_format])
VALUES
('manufacturerId','manufacturer_Id',#template_id,0,0,null,null,null,null,null,null,null,null),
('dealership','leasing_broker_name',#template_id,0,0,null,null,null,null,null,null,null,null),
('manufacturersDealerId','supplier_ref',#template_id,0,0,null,null,19,null,null,null,null,null),
('address1','address1',#template_id,0,0,null,null,null,null,null,null,null,null),
('address2','address2',#template_id,0,0,null,null,null,null,null,null,null,null),
('postcode','post_code',#template_id,0,0,null,null,null,null,null,null,null,null),
('telephone','telephone',#template_id,0,0,null,null,null,null,null,null,null,null),
('fax','fax_number',#template_id,0,0,null,null,null,null,null,null,null,null),
('email','email',#template_id,0,0,null,null,null,null,null,null,null,null),
('website','web_address',#template_id,0,0,null,null,null,null,null,null,null,null),
('NewCarSales','service_mask',#template_id,0,0,null,1,null,'^(?!(?i:^0$|^n$|^no$|^f$|^false$|^$))',null,null,1,null),
('UsedCarSales','service_mask',#template_id,0,0,null,2,null,'^(?!(?i:^0$|^n$|^no$|^f$|^false$|^$))',null,null,1,null),
('Servicing','service_mask',#template_id,0,0,null,8,null,'^(?!(?i:^0$|^n$|^no$|^f$|^false$|^$))',null,null,1,null),
('Repairs','service_mask',#template_id,0,0,null,16,null,'^(?!(?i:^0$|^n$|^no$|^f$|^false$|^$))',null,null,1,null),
('Longitude','longitude',#template_id,0,0,null,null,null,null,null,null,null,null),
('Latitude','latitude',#template_id,0,0,null,null,null,null,null,null,null,null)
This already contains some condition regex that in case that this field contains some text it converts it to true or false respectively.
What I need is a condition_regex that gets rid of these underscores and replaces it with a blank space i.e: 'GDB_A45' to 'GDB A45'. I don't know much about regex so any idea would be greatly appreciated. Thanks in advance!

SQL Server does not have much of regular expression support, but in this case I don't think you need it. You can do a simple replace:
UPDATE mytable
SET mycolumn = REPLACE(mycolumn, '_', ' ')
WHERE mycolumn LIKE '%[_]%'
To do this while updating you can use INSERT ... SELECT instead of INSERT ... VALUES:
INSERT INTO mytable (mycolumn)
SELECT REPLACE('my data 1', '_', ' ') UNION
SELECT REPLACE('my data 2', '_', ' ') UNION
SELECT REPLACE('my_data_3', '_', ' ') UNION
...
There will be some maximum number of unions you can do, so you should split your inserts into batches with this method.
Or, you could define a trigger on the target table that will do the job for you:
CREATE TRIGGER mytrigger ON mytable
AFTER INSERT AS
BEGIN
UPDATE mytable
SET mytable.mycolumn = REPLACE(i.mycolumn, '_', ' ')
FROM mytable
INNER JOIN inserted i
ON i.id = mytable.id
AND i.mycolumn LIKE '%[_]%'
END
... where it is assumed your table has a primary key named id.

Well after been thinking a while I got to the conclusion that would be easier if I replace the underscore from the scraped data during the scraping (in the c# code) before I generate the XML file. That would avoid me a lot of headaches. Anyway thank you for your help guys ;)

Related

Inserting space between the string (SQL Server)

I have a table called Manufacturers which contains a column Name with a data type of Varchar(100) that has 5761 entries such as XYZ(XYZ Corp).
I wish to insert a space between the Z and the (.
I have seen the STUFF and LEFT commands but can't seem to figure out if they apply to my scenario.
You can try the REPLACE (Transact-SQL)
function as shown below.
update <yourTableName>
set <yourColumName> = replace(<yourColumName>, '(', ' ( ')
where <put the conditions here>
Here is an implementation to you.
create table test (Name varchar(20))
insert into test values ('XYZ(XYZ Corp)')
--selecting before update
select * from test
--updating the record
update test
set Name = replace(Name, '(', ' (')
where name like '%(%' --Here you can add the conditions as you want to restrict the number of rows to be updated based on the available data or the patterns.
--selecting after update
select * from test
Live Demo
I would recommend:
update Manufacturers
set name = replace(name, '(', ' (')
where name like '%[^ ](%';
This will only update rows where there is not already a space before the open paren.
Note: If you have multiple open parens, it will update all of them. If that is an issue, ask a new question with appropriate sample data.

SQL server trimming string

I have a query, I want to Trim a column and get the only Right side values:
Here in the pic the result is like SubAccountCode and SubaccountName but user has entered code and name in SubaccountName. I want to trim the codes from Subaccountname and update the table. T
The query I have tried is mentioned below, but I think it's not so working:
Select Substring(Subaccountname,8,20)as Name from #temp
There are several ways to do this.
When looking at your example data the easiest way would be using an replace() in the update statement.
Syntax:
REPLACE ( string_expression , string_pattern , string_replacement )
Example:
UPDATE table_name
SET column2 = replace([column2], [column1], '')
What this does it updates the column2 with the value from column2 where the value from column1 is replaced with '' nothing. In your example this does leave you with an unwanted space in front. You could trim this or try as followed:
UPDATE Test
SET [SubAccountName] = replace([SubAccountName], [SubAccountCode] + ' ', '')
If the SubAccountCode could be different from the code in the SubAccountName and you only want to remove the first 8 characters (if you are sure it is always the first 8) you can use:
UPDATE YourTable SET SubAccountName = RIGHT(SubAccountName, LEN(SubAccountName) - 7)
Example script:
create table test (
SubAccountName varchar(100),
SubAccountCode varchar(100)
)
insert into test (SubAccountCode, SubAccountName) VALUES
(1234567, '1234567 AUBC' ),
(1234467, '1234467 AUBC' ),
(1235567, '1235567 AUBC' )
select * from test -- Check that the data is like your example.
UPDATE Test SET SubAccountName = RIGHT(SubAccountName, LEN(SubAccountName) - 8)
select * from test -- Check that the result is like your wanted result.
drop table test -- Cleanup the test table.
This code will work for you.
UPDATE TableName SET SubaccountName = REPLACE(LTRIM(RTRIM(SubaccountName)),LTRIM(RTRIM(SubAccountCode)),'')

How to apply trim function inside this query [duplicate]

Below is simple sql query to select records using in condition.
--like this I have 6000 usernames
select * from tblUsers where Username in ('abc ','xyz ',' pqr ',' mnop ' );
I know there are LTrim & Rtrim in sql to remove the leading trailing spaces form left & right respectively.
I want to remove the spaces from left & right in all the usernames that I am supplying to the select query.
Note:-
I want to trim the values that I am passing in the in clause.(I don't want to pass LTrim & RTrim to each value passed).
There are no trailing space in the records but value that I am passing in the clause is copied from excel & then pasted in Visual Studio. Then using ALT key I put '(single quote) at the left & right sides of the string. Due to this some strings has spaces in the right side trailing.
How to use the trim function in the select query?
I am using MS SQL Server 2012
If I understand your question correctly you are pasting from Excel into an IN clause in an adhoc query as below.
The trailing spaces don't matter. It will still match the string foo without any trailing spaces.
But you need to ensure that there are no leading spaces.
As the source of the data is Excel why not just do it all there?
You can use formula
= CONCATENATE("'",TRIM(SUBSTITUTE(A1,"'","''")),"',")
Then copy the result (from column B in the screenshot above) and just need to trim off the extra comma from the final entry.
You can do like this:
select * from tblUsers where LTRIM(RTRIM(Username)) in (ltrim(rtrim('abc')),ltrim(rtrim('xyz')),ltrim(rtrim('pqr')),ltrim(rtrim('mnop')));
However, if you have permission to update the database. Please remove all the spaces in your Username field. It is really not good to do the query like this.
One way to tackle your problem and still be able to benefit from an index on username is to use a persisted computed column:
Setup
-- drop table dbo.tblUsers
create table dbo.tblUsers
(
UserId INT NOT NULL IDENTITY(1, 1) CONSTRAINT PK_UserTest PRIMARY KEY,
Username NVARCHAR(64) NOT NULL,
UsernameTrimmed AS LTRIM(RTRIM(Username)) PERSISTED
)
GO
-- other columns may be included here with INCLUDE (col1, col2)
CREATE INDEX IDX_UserTest ON dbo.tblUsers (UsernameTrimmed)
GO
insert into dbo.tblUsers (Username) VALUES ('abc '),('xyz '),(' pqr '), (' mnop '), ('abc'), (' useradmin '), ('etc'), (' other user ')
GO
-- some mock data to obtain a large number of records
insert into dbo.tblUsers (Username)
select top 20000 SUBSTRING(text, 1, 64) from sys.messages
GO
Test
-- this will use the index (index seek)
select * from tblUsers where UsernameTrimmed in (LTRIM(RTRIM('abc')), LTRIM(RTRIM(' useradmin ')));
This allows for faster retrievals at the expense of extra space.
In order to get rid of query construction (and the ugliness of many LTRIMs and RTRIMs), you can push searched users in a table that looks like tblUsers.
create table dbo.searchedUsers
(
Username NVARCHAR(64) NOT NULL,
UsernameTrimmed AS LTRIM(RTRIM(Username)) PERSISTED
)
GO
Push raw values into dbo.searchedUsers.Username column and the query should look like this:
select U.*
from tblUsers AS U
join dbo.searchedUsers AS S ON S.UsernameTrimmed = U.UsernameTrimmed
The big picture
It is way better to properly trim your data in the service layer of your application (C#) so that future clients of your table may rely on decent information. So, trimming should be performed both when inserting information into tblUsers and when searching for users (IN values)
select *
from tblUsers
where RTRIM(LTRIM(Username)) in ('abc','xyz','pqr','mnop');
Answer: SELECT * FROM tblUsers WHERE LTRIM(RTRIM(Username)) in ('abc','xyz','pqr','mnop');
However, please note that if you have functions in your WHERE clause it defeats the purpose of having an indexes on that column and will use a
scan than a seek.
I would propose you clean your data before inserting into tblUsers
I think you can try this:
Just replace the table2 with you table name form where you are getting the username
select * from tblUsers where Username in ((select distinct
STUFF((SELECT distinct ', ' + RTRIM(LTRIM(t1.Username))
from table2 t1
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,2,'') UserName
from table2 t) );
I'd do it in two step:
1) populate a temp table with all your strings with blanks
2) do a select with a subselect
create table a (a char(1))
insert into a values('a')
insert into a values('b')
insert into a values('c')
insert into a values('d')
create table #b (atmp char(5))
insert into #b values ('a ')
insert into #b values (' b')
insert into #b values (' c ')
select * from a where a in (select ltrim(rtrim(atmp)) from #b)

How to replace duplicate words in a column with just one word in SQL Server

I have a few million strings that relate to file paths in my database;
due to a third party program these paths have become nested like below:
C:\files\thirdparty\thirdparty\thirdparty\thirdparty\thirdparty\thirdparty\unique_bit_here\
I want update the entries so that thirdparty\thirdparty\etc becomes \thirdparty.
I have tried this code:
UPDATE table
SET Field = REPLACE(Field, 'tables\thirdparty\%thirdparty\%\', 'tables\thirdparty\')
WHILE EXISTS (SELECT * FROM table WHERE Field LIKE '%\thirdparty\thirdparty\%')
BEGIN
UPDATE table SET Field = REPLACE(Field, '\thirdparty\thirdparty\', '\thirdparty\')
END
So do you want something like this?
SELECT SUBSTRING('tables\thirdparty\%thirdparty\%\',0,CHARINDEX('\','tables\thirdparty\%thirdparty\%\',0)) + '\thirdparty\'
OR
UPDATE table
SET Field = REPLACE(Field, Field, (SELECT SUBSTRING(Field,0,CHARINDEX('\',Field,0)) + '\thirdparty\'))
You can avoid using a loop with the following technique:
Update TABLE
SET Field = left(Field,charindex('*',replace(Field, 'thirdparty\', '*'))-1)+'thirdparty\'+right(Field,charindex('*',reverse(replace(Field, 'thirdparty\', '*')))-1)
Its too late, but I just guess if I want replace a single word in repeating multiple time same word as he want. This will replace all with append a single time '\thirdparty'...
Check this.
Declare #table table(Field varchar(max))
insert into #table values('C:\files\thirdparty\thirdparty\thirdparty\thirdparty\thirdparty\thirdparty\unique_bit_here\')
,('C:\files\thirdparty\thirdparty\thirdparty\thirdparty\thirdparty\thirdparty\unique_bit_here\1')
,('C:\files\thirdparty\thirdparty\thirdparty\thirdparty\thirdparty\thirdparty\unique_bit_here\2')
,('C:\files\thirdparty\thirdparty\thirdparty\thirdparty\thirdparty\thirdparty\unique_bit_here\3')
,('C:\files\thirdparty\thirdparty\thirdparty\thirdparty\thirdparty\thirdparty\unique_bit_here\4')
UPDATE #table
SET Field = SUBSTRING (Field, 1, CHARINDEX('\thirdparty', Field ) ) + 'thirdparty\'
--replace (Field , 'thirdparty\' ,'')
+ reverse( SUBSTRING ( REVERSE(Field), 1, CHARINDEX(reverse('\thirdparty'), REVERSE(Field) )-2 ) )
--REPLACE(Field, 'tables\thirdparty\%thirdparty\%\', 'tables\thirdparty\')
select * from #table

String manipulation SQL

I have a row of strings that are in the following format:
'Order was assigned to lastname,firsname'
I need to cut this string down into just the last and first name but it is always a different name for each record.
The 'Order was assigned to' part is always the same.......
Thanks
I am using SQL Server. It is multiple records with different names in each record.
In your specific case you can use something like:
SELECT SUBSTRING(str, 23) FROM table
However, this is not very scalable, should the format of your strings ever change.
If you are using an Oracle database, you would want to use SUBSTR instead.
Edit:
For databases where the third parameter is not optional, you could use SUBSTRING(str, 23, LEN(str))
Somebody would have to test to see if this is better or worse than subtraction, as in Martin Smith's solution but gives you the same result in the end.
In addition to the SUBSTRING methods, you could also use a REPLACE function. I don't know which would have better performance over millions of rows, although I suspect that it would be the SUBSTRING - especially if you were working with CHAR instead of VARCHAR.
SELECT REPLACE(my_column, 'Order was assigned to ', '')
For SQL Server
WITH testData AS
(
SELECT 'Order was assigned to lastname,firsname' as Col1 UNION ALL
SELECT 'Order was assigned to Bloggs, Jo' as Col1
)
SELECT SUBSTRING(Col1,23,LEN(Col1)-22) AS Name
from testData
Returns
Name
---------------------------------------
lastname,firsname
Bloggs, Jo
on MS SQL Server:
declare #str varchar(100) = 'Order was assigned to lastname,firsname'
declare #strLen1 int = DATALENGTH('Order was assigned to ')
declare #strLen2 int = len(#str)
select #strlen1, #strLen2, substring(#str,#strLen1,#strLen2),
RIGHT(#str, #strlen2-#strlen1)
I would require that a colon or some other delimiter be between the message and the name.
Then you could just search for the index of that character and know that anything after it was the data you need...
Example with format changing over time:
CREATE TABLE #Temp (OrderInfo NVARCHAR(MAX))
INSERT INTO #Temp VALUES ('Order was assigned to :Smith,Mary')
INSERT INTO #Temp VALUES ('Order was assigned to :Holmes,Larry')
INSERT INTO #Temp VALUES ('New Format over time :LootAt,Me')
SELECT SUBSTRING(OrderInfo, CHARINDEX(':',OrderInfo)+1, LEN(OrderInfo))
FROM #Temp
DROP TABLE #Temp