Multiple replacements in string in single Update Statement in SQL server 2005 - sql

I've a table 'tblRandomString' with following data:
ID ItemValue
1 *Test"
2 ?Test*
I've another table 'tblSearchCharReplacement' with following data
Original Replacement
* `star`
? `quest`
" `quot`
; `semi`
Now, I want to make a replacement in the ItemValues using these replacement.
I tried this:
Update T1
SET ItemValue = select REPLACE(ItemValue,[Original],[Replacement])
FROM dbo.tblRandomString T1
JOIN
dbo.tblSpecialCharReplacement T2
ON T2.Original IN ('"',';','*','?')
But it doesnt help me because only one replacement is done per update.
One solution is I've to use as a CTE to perform multiple replacements if they exist.
Is there a simpler way?

Sample data:
declare #RandomString table (ID int not null,ItemValue varchar(500) not null)
insert into #RandomString(ID,ItemValue) values
(1,'*Test"'),
(2,'?Test*')
declare #SearchCharReplacement table (Original varchar(500) not null,Replacement varchar(500) not null)
insert into #SearchCharReplacement(Original,Replacement) values
('*','`star`'),
('?','`quest`'),
('"','`quot`'),
(';','`semi`')
And the UPDATE:
;With Replacements as (
select
ID,ItemValue,0 as RepCount
from
#RandomString
union all
select
ID,SUBSTRING(REPLACE(ItemValue,Original,Replacement),1,500),rs.RepCount+1
from
Replacements rs
inner join
#SearchCharReplacement scr
on
CHARINDEX(scr.Original,rs.ItemValue) > 0
), FinalReplacements as (
select
ID,ItemValue,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY RepCount desc) as rn
from
Replacements
)
update rs
set ItemValue = fr.ItemValue
from
#RandomString rs
inner join
FinalReplacements fr
on
rs.ID = fr.ID and
rn = 1
Which produces:
select * from #RandomString
ID ItemValue
----------- -----------------------
1 `star`Test`quot`
2 `quest`Test`star`
What this does is it starts with the unaltered texts (the top select in Replacements), then it attempts to apply any valid replacements (the second select in Replacements). What it will do is to continue applying this second select, based on any results it produces, until no new rows are produced. This is called a Recursive Common Table Expression (CTE).
We then use a second CTE (a non-recursive one this time) FinalReplacements to number all of the rows produced by the first CTE, assigning lower row numbers to rows which were produced last. Logically, these are the rows which were the result of applying the last applicable transform, and so will no longer contain any of the original characters to be replaced. So we can use the row number 1 to perform the update back against the original table.
This query does do more work than strictly necessary - for small numbers of rows of replacement characters, it's not likely to be too inefficient. We could clear it up by defining a single order in which to apply the replacements.

Will skipping the join table and nesting REPLACE functions work?
Or do you need to actually get the data from the other table?
-- perform 4 replaces in a single update statement
UPDATE T1
SET ItemValue = REPLACE(
REPLACE(
REPLACE(
REPLACE(
ItemValue,'*','star')
ItemValue,'?','quest')
ItemValue,'"','quot')
ItemValue,';','semi')
Note: I'm not sure if you need to escape any of the characters you're replacing

Related

Loop through table and update a specific column

I have the following table:
Id
Category
1
some thing
2
value
This table contains a lot of rows and what I'm trying to do is to update all the Category values to change every first letter to caps. For example, some thing should be Some Thing.
At the moment this is what I have:
UPDATE MyTable
SET Category = (SELECT UPPER(LEFT(Category,1))+LOWER(SUBSTRING(Category,2,LEN(Category))) FROM MyTable WHERE Id = 1)
WHERE Id = 1;
But there are two problems, the first one is trying to change the Category Value to upper, because only works ok for 1 len words (hello=> Hello, hello world => Hello world) and the second one is that I'll need to run this query X times following the Where Id = X logic. So my question is how can I update X rows? I was thinking in a cursor but I don't have too much experience with it.
Here is a fiddle to play with.
You can split the words apart, apply the capitalization, then munge the words back together. No, you shouldn't be worrying about subqueries and Id because you should always approach updating a set of rows as a set-based operation and not one row at a time.
;WITH cte AS
(
SELECT Id, NewCat = STRING_AGG(CONCAT(
UPPER(LEFT(value,1)),
SUBSTRING(value,2,57)), ' ')
WITHIN GROUP (ORDER BY CHARINDEX(value, Category))
FROM
(
SELECT t.Id, t.Category, s.value
FROM dbo.MyTable AS t
CROSS APPLY STRING_SPLIT(Category, ' ') AS s
) AS x GROUP BY Id
)
UPDATE t
SET t.Category = cte.NewCat
FROM dbo.MyTable AS t
INNER JOIN cte ON t.Id = cte.Id;
This assumes your category doesn't have non-consecutive duplicates within it; for example, bora frickin bora would get messed up (meanwhile bora bora fickin would be fine). It also assumes a case insensitive collation (which could be catered to if necessary).
In Azure SQL Database you can use the new enable_ordinal argument to STRING_SPLIT() but, for now, you'll have to rely on hacks like CHARINDEX().
Updated db<>fiddle (thank you for the head start!)

Random selection in SQL Server

I have two sets and for each value in the first set I want to apply a number of random values from the second. The approach I have chosen uses a select from the first with a cross apply from the second. A simplified MWE is as follows:
DROP TABLE IF EXISTS #S;
CREATE TABLE #S (c CHAR(1));
INSERT INTO #S VALUES ('A'), ('B');
DROP TABLE IF EXISTS #T;
WITH idGen(id) AS (
SELECT 1
UNION ALL
SELECT id + 1 FROM idGen WHERE id < 1000
)
SELECT id INTO #T FROM idGen OPTION(MAXRECURSION 0);
DROP TABLE IF EXISTS #R;
SELECT c, id INTO #R FROM #S
CROSS APPLY (
SELECT id, ROW_NUMBER() OVER (
/*
-- this gives 100% overlap
PARTITION BY c
ORDER BY RAND(CHECKSUM(NEWID()))
*/
-- this gives the expected ~10% overlap
ORDER BY RAND(CHECKSUM(NEWID()) + CHECKSUM(c))
) AS R
FROM #T
) t
WHERE t.R <= 100;
SELECT COUNT(*) AS PercentOverlap -- ~10%
FROM #R rA JOIN #R rB
ON rB.id = rA.id AND rB.c = 'B'
WHERE rA.c = 'A';
While this solution works, I am wondering why changing to the (commented) partitioning method does not? Also, are there any caveats using this solution, seeing as it feels sort of dirty to add two checksums?
In the actual problem there is also a count in the first set containing the number of random values to select from the second set, which replaces the static 100 in the example above. However, using the fixed 100 made it easy to verify the expected overlap.
RAND() function is a run-time constant in SQL Server. It means that usually it is evaluated once for the query. When you pass a value to RAND this value serves as a starting seed.
You need to examine execution plan and you'll see where optimiser puts evaluation of the functions. It the case which doesn't produce expected result most likely optimiser has optimised it too aggressively and moved all "randomness" outside the loop.
Also, there is no point wrapping NEWID() into CHECKSUM() and into RAND().
Simple NEWID() is enough. Or, even better, a function that is designed to produce a random number, such as CRYPT_GEN_RANDOM()
Either version of your query looks a bit strange. I'd write it like this:
SELECT c, id INTO #R
FROM #S
CROSS APPLY
(
SELECT TOP(100) -- or #S.SomeField instead of 100
id
FROM #T
ORDER BY CRYPT_GEN_RANDOM(4) -- generate 4 random bytes, usually it is enough
) AS t
;
This gives 100 random rows from #T for each row from #S.
Actually, the query above is not good. Optimiser again sees that inner query (inside the CROSS APPLY) doesn't depend on outer query and optimises it away.
End result is that random rows are selected only once.
We need something to make optimiser run the inner query for each row from #S.
One way would be something like this:
SELECT c, id INTO #R
FROM #S
CROSS APPLY
(
SELECT TOP(100) -- or #S.SomeField instead of 100
id
FROM #T
ORDER BY CRYPT_GEN_RANDOM(4) + CHECKSUM(c)
) AS t
;
Something in the inner query to reference the row from the outer query. If you put TOP(#S.SomeField) instead of constant TOP(100), then + CHECKSUM(c) is not needed.
This is the plan for the first variant. You can see that #T is scanned once (1000 rows are read).
This is the plan for the second variant. You can see that #T is scanned twice (2000 rows are read).

compare some lists in where condition sql

I have some question in Sqlserver2012. I have a table that contains a filed that save who System Used from this information and separated by ',', I want to set into parameter the name of Systems and query the related rows:
declare #System nvarchar(50)
set #System ='BPM,SEM'
SELECT *
FROM dbo.tblMeasureCatalog t1
where ( ( select Upper(value) from dbo.split(t1.System,','))
= any( select Upper(value) from dbo.split(#System,',')))
dbo.split is a function to return systems in separated rows
Forgetting for a second that storing delimited lists in a relational database is abhorrent, you can do it using a combination of INTERSECT and EXISTS, for example:
DECLARE #System NVARCHAR(50) = 'BPM,SEM';
DECLARE #tblMeasureCatalog TABLE (System VARCHAR(MAX));
INSERT #tblMeasureCatalog VALUES ('BPM,XXX'), ('BPM,SEM'), ('XXX,SEM'), ('XXX,YYY');
SELECT mc.System
FROM #tblMeasureCatalog AS mc
WHERE EXISTS
( SELECT Value
FROM dbo.Split(mc.System, ',')
INTERSECT
SELECT Value
FROM dbo.Split(#System, ',')
);
Returns
System
---------
BPM,XXX
BPM,SEM
XXX,SEM
EDIT
Based on your question stating "Any" I assumed that you wanted rows where the terms matched any of those provided, based on your comment I now assume you want records where the terms match all. This is a fairly similar approach but you need to use NOT EXISTS and EXCEPT instead:
Now all is still quite ambiguous, for example if you search for "BMP,SEM" should it return a record that is "BPM,SEM,YYY", it does contain all of the searched terms, but it does contain additional terms too. So the approach you need depends on your requirements:
DECLARE #System NVARCHAR(50) = 'BPM,SEM,XXX';
DECLARE #tblMeasureCatalog TABLE (System VARCHAR(MAX));
INSERT #tblMeasureCatalog
VALUES
('BPM,XXX'), ('BPM,SEM'), ('XXX,SEM'), ('XXX,YYY'),
('SEM,BPM'), ('SEM,BPM,XXX'), ('SEM,BPM,XXX,YYY');
-- METHOD 1 - CONTAINS ALL SEARCHED TERMS BUT CAN CONTAIN ADDITIONAL TERMS
SELECT mc.System
FROM #tblMeasureCatalog AS mc
WHERE NOT EXISTS
(
SELECT Value
FROM dbo.Split(#System, ',')
EXCEPT
SELECT Value
FROM dbo.Split(mc.System, ',')
);
-- METHOD 2 - ONLY CONTAINS ITEMS WITHIN THE SEARCHED TERMS, BUT NOT
-- NECESSARILY ALL OF THEM
SELECT mc.System
FROM #tblMeasureCatalog AS mc
WHERE NOT EXISTS
( SELECT Value
FROM dbo.Split(mc.System, ',')
EXCEPT
SELECT Value
FROM dbo.Split(#System, ',')
);
-- METHOD 3 - CONTAINS ALL ITEMS IN THE SEARCHED TERMS, AND NO ADDITIONAL ITEMS
SELECT mc.System
FROM #tblMeasureCatalog AS mc
WHERE NOT EXISTS
( SELECT Value
FROM dbo.Split(#System, ',')
EXCEPT
SELECT Value
FROM dbo.Split(mc.System, ',')
)
AND LEN(mc.System) = LEN(#System);
You have a problem with your data structure because you are storing lists of things in a comma-delimited list. SQL has a great data structure for storing lists. It goes by the name "table". You should have a junction table with one row per "measure catalog" and "system".
Sometimes, you are stuck with other people's really bad design decisions. One solution is to use split(). Here is one method:
select mc.*
from dbo.tblMeasureCatalog mc
where exists (select 1
from dbo.split(t1.System, ',') t1s join
dbo.split(#System, ',') ss
on upper(t1s.value) = upper(ss.value)
);
you can try this :
declare #System nvarchar(50)
set #System ='BPM,SEM'
SELECT * from dbo.tblMeasureCatalog t1 inner join dbo.Split (#System ,',') B on t1.it=B.items

SQL IN() operator with condition inside

I've got table with few numbers inside (or even empty): #states table (value int)
And I need to make SELECT from another table with WHERE clause by definite column.
This column's values must match one of #states numbers or if #states is empty then accept all values (like there is no WHERE condition for this column).
So I tried something like this:
select *
from dbo.tbl_docs docs
where
docs.doc_state in(iif(exists(select 1 from #states), (select value from #states), docs.doc_state))
Unfortunately iif() can't return subquery resulting dataset. I tried different variations with iif() and CASE but it wasn't successful. How to make this condition?
select *
from dbo.tbl_docs docs
where
(
(select count(*) from #states) > 0
AND
docs.doc_state in(select value from #states)
)
OR
(
(select count(*) from #states)=0
AND 1=1
)
Wouldn't a left join do?
declare #statesCount int;
select #statesCount = count(1) from #states;
select
docs.*
from dbo.tbl_docs docs
left join #states s on docs.doc_state = s.value
where s.value is not null or #statesCount = 0;
In general, whenever your query contains sub-queries, you should stop for five minutes, and think hard about whether you really need a sub-query at all.
And if you've got a server capable of doing that, in many cases it might be better to preprocess the input parameters first, or perhaps use constructs such as MS SQL's with.
select *
from dbo.tbl_docs docs
where exists (select 1 from #states where value = doc_state)
or not exists (select 1 from #state)

Solution to avoid non-sargable argument in where clause

In the code_list CTE in this query I have a row constructor that will eventually take any number of arguments. The column icd in the patient_codes CTE is a five digit identifier that is most descriptive that the three digit codes that the row constructor has. The table icd_patient has a 100 million rows so for performance's sake, I would like to filer the rows on this table before I do any further work. I have
;with code_list(code_list)
as
(
select x.code_list
from (values ('70700'),('25002')) as x(code_list)
),patient_codes
as
(
select distinct icd,pat_id,id
from icd_patient
where icd in (select icd from code_list)
)
select distinct pat_id from patient_codes
The problem is, however, is that in the icd_patient table all of the icd columns are five digit and more descriptive. If I look at the execution plan of this query it's pretty streamlined. If I do
;with code_list(code_list)
as
(
select x.code_list
from (values ('70700'),('25002')) as x(code_list)
),patient_codes
as
(
select substring(icd,1,3) as icd,pat_id
from icd_patient2
where substring(icd,1,3) in (select * from code_list)
)
select * from patient_codes
this if course has a large performance impact because of the substring expression in the where clause. Does something akin to like in exist so I can take advantage of my indexes?
Index on icd_patient
CREATE NONCLUSTERED INDEX [ix_icd_patient] ON [dbo].[icd_patient2]
(
[pat_id] ASC
)
INCLUDE ( [id],
This much simpler query should be better than (or, at worst, the same as) your existing query.
select pat_id
FROM dbo.icd_patient
where icd LIKE '707%'
OR icd LIKE '250%'
GROUP BY pat_id;
Note that sargability only matters if there is actually an index on this column.
An alternative (since OR can sometimes give the optimizer fits):
SELECT pat_id FROM
(
SELECT pat_id
FROM dbo.icd_patient
WHERE icd LIKE '707%'
UNION ALL
SELECT pat_id
FROM dbo.icd_patient
WHERE icd LIKE '250%'
) AS x
GROUP BY pat_id;
To make this extensible beyond a handful of OR conditions, I would use a table-valued parameter (TVP).
CREATE TYPE dbo.StringPatterns AS TABLE(s VARCHAR(3) PRIMARY KEY);
Then your stored procedure could say:
CREATE PROCEDURE dbo.whatever
#sp dbo.StringPatterns READONLY
AS
BEGIN
SET NOCOUNT ON;
SELECT p.pat_id
FROM dbo.icd_patient AS p
INNER JOIN #sp AS sp
ON p.pat_id LIKE sp.s + '%'
GROUP BY p.pat_id;
END
Then you can pass in your set of three-character substrings from a DataTable or other collection in C#. From T-SQL just as an example:
DECLARE #p dbo.StringPatterns;
INSERT #p VALUES('707'),('250');
EXEC dbo.whatever #sp = #p;
Something like like in does not exist. The following is sargable:
select *
from icd_patient
where icd like '70700%' or
icd like '25002%'
Because like with a constant initial substring is a special case for SQL Server. This does not work when the strings on the right are variables.
One solution is to create an indexed view on the icd_patient table with an index on the first five characters of the icd code.
Using "IN" makes that part of a command non-sargable on both sides. End of discussion.
Saying he fixes it using substring, completely changes what it would return while it remains non sarged.
Any "fix" should exactly match results. The actual fix is to join the cte so the five characters match or put three characters in the cte and match that in a join or put 4 characters in the cte where the fourth is "%" and join matching by using LIKE
Using a "like" that starts with "%" increases the complexity of the search, but it would still use the index to find the value because parsing the index should use less reading by only getting the full table row when a search is successful.