I have problem in optimizing an SQL query to do some data cleansing.
In fact, I have a table which is a sort of referential of a multiple special characters and word. Let's call it ABNORMAL(ID,PATTERN)
I have also another table INDIVIDUALS containing a column (NAME) which I want to clean by removing from it all characters that exist in the table ABNORMAL.
Currently, I have tried to use update statements, but I'm not sure if there is a better way to do this.
Approach one
Use a while loop to build a replace containing all characters from ABNORMALS by a blank '' and do one update using the built-in REPLACE
DECLARE #REPLACE_EXPRESSION nvarchar(max) ='REPLACE(NAME,'''','''')'
DECLARE #i int = 1
DECLARE #nbr int = (SELECT COUNT(*) FROM ABNORMAL)
-- CURRENT_CHARAC
DECLARE #CURRENT_CHARAC nvarchar(max)
-- NEW REPLACE EXPRESSION TO IMBRICATE INTO THE REPLACE EXPRESSION VARIABLE
DECLARE #CURR_REP NVARCHAR(max)
-- STRING TO BUILD AN SQL QUERY CONTAINING THE REPLACE EXPRESSION
DECLARE #UPDATE_QUERY nvarchar(max)
WHILE #i < #nbr
BEGIN
SELECT #CURRENT_CHARAC=PATTERN FROM CLEANSING_STG_PRISM_FRA_REF_UNSIGNIFICANT_VALUES WHERE ID_PATTERN=#i ;
SET #REPLACE_EXPRESSION = REPLACE(#REPLACE_EXPRESSION ,'NAME','REPLACE(NAME,'+''''+#CURRENT_CHARAC+''''+','''')')
set #i=#i+1
END
SET #UPDATE_QUERY = 'UPDATE INDIVIDUAL SET NAME ='+ #REPLACE_EXPRESSION
EXEC sp_executesql #UPDATE_QUERY
Approach two
Use a while loop to select every character in abnormal and do an update using replace containing the characters to remove:
DECLARE #i int = 1
DECLARE #nbr int = (SELECT COUNT(*) FROM ABNORMAL)
-- CURRENT_CHARAC
DECLARE #CURRENT_CHARAC nvarchar(max)
-- STRING TO BUILD AN SQL QUERY CONTAINING THE REPLACE EXPRESSION
DECLARE #UPDATE_QUERY nvarchar(max)
WHILE #i < #nbr
BEGIN
SELECT #CURRENT_CHARAC=PATTERN FROM CLEANSING_STG_PRISM_FRA_REF_UNSIGNIFICANT_VALUES WHERE ID_PATTERN=#i ;
UPDATE INDIVIDUAL
SET NAME = REPLACE(NAME,#CURRENT_CHARAC,'')
SET #i=#i+1
END
I already tested both approaches for 2 millions records, and I found that the first approach is faster than the second. I would know if you have already done something similar and new (better) ideas to try.
If you are using SQL Server 2017 you could use TRANSLATE and avoid dynamic SQL:
SELECT i.*
, REPLACE(TRANSLATE(i.NAME, f, REPLICATE('!', s.l)), '!', '') AS cleansed
FROM INDIVIDUALS i
OUTER APPLY (SELECT STRING_AGG(PATTERN, '') AS f
,LEN(STRING_AGG(PATTERN,'')) AS l
FROM ABNORMAL) AS s
DBFiddle Demo
Anyway 1st approach is better becasue you do one UPDATE, with second approach you remove characters one character at time (so you will have multiple UPDATE).
I would also track transaction log growth with both approaches.
If there's not too many characters that to be cleaned, then this trick might work.
Basically, you build 1 big update statement with a replace for each value in the table with the characters to be removed.
Example code:
Test data (using temp tables)
create table #ABNORMAL_CHARACTERS (id int identity(1,1), chr varchar(30));
insert into #ABNORMAL_CHARACTERS (chr) values ('!'),('&'),('#');
create table #INDIVIDUAL (id int identity(1,1), name varchar(30));
insert into #INDIVIDUAL (name) values ('test 1 &'),('test !'),('test 3');
Code:
declare #FieldName varchar(30) = 'name';
declare #Replaces varchar(max) = #FieldName;
declare #UpdateSQL varchar(max);
select #Replaces = concat('replace('+#Replaces+', ', ''''+chr+''','''')') from #ABNORMAL_CHARACTERS order by id;
set #UpdateSQL = 'update #INDIVIDUAL
set name = '+#Replaces + '
where exists (select 1 from #ABNORMAL_CHARACTERS where charindex(chr,name)>0)';
exec (#UpdateSQL);
select * from #INDIVIDUAL;
A test here on rextester
And if you would have a UDF that can do a regex replace.
For example here
Then the #Replaces variable could be simplified with only 1 RegexReplace function and a pattern.
Related
My table has column names m1,m2,m3...,m12.
I'm using iterator to select them and insert them one by one in another table.
In this iterator I'm trying to generate filed names with:
'['+concat('m',cast(#P_MONTH as nvarchar))+']'
where #P_MONTH is incrementing in each loop.
so for #P_MONTH = 1 this suppose to give [m1] which works fine.
But when I run query I get:
Conversion failed when converting the nvarchar value '[m1]' to data
type int.
And if I put simply [m1] in that select it works ok.
How to concat filed name so it can be actually interpreted as filed name from certain table?
EDIT
Here is full query:
DECLARE #SQLString nvarchar(500),
#P_YEAR int,
#P_MONTH int = 1
set #P_YEAR = 2018
WHILE #P_MONTH < 13
BEGIN
SET #SQLString =
'INSERT INTO [dbo].[MASTER_TABLE]
(sector,serial,
date, number, source)'+
'SELECT ' + '[SECTOR],[DEPARTMENT]' +
QUOTENAME(cast(CONVERT(datetime,CONVERT(VARCHAR(4),#P_YEAR)+RIGHT('0'+CONVERT(VARCHAR(2),#P_MONTH),2)+'01',5) as nvarchar))+
QUOTENAME ('M',cast(#P_MONTH as nvarchar)) +
'EMPLOYED' +
'FROM [dbo].[STATS]'+
'where YEAR= #P_YEAR'
EXECUTE sp_executesql #SQLString
SET #P_MONTH = #P_MONTH + 1
END
It's still not working. It executes successfully but it does nothing.
Good day,
Let's create a simple table for the sake of the explanation
DROP TABLE IF EXISTS T
GO
CREATE TABLE T(a1 INT)
GO
INSERT T(a1) VALUES (1),(2)
GO
SELECT a1 FROM T
GO
When we are using a query like bellow, the server parse the text as a value and not as a column name
DECLARE #String NVARCHAR(10)
SELECT #String = '1'
--
SELECT '['+concat('a',cast(#String as nvarchar))+']'
FROM T
GO
This mean that the result will be 2 rows with no name for the column and the value will be "[a1]"
Moreover, the above query uses the brackets as part of the string.
One simple solution is to use the function QUOTENAME in order to add brackets around a name.
Another issue in this approach is the optional risk of SQL Injection. QUOTENAME might not be perfect solution but can help in this as well.
If we need to use entities name dynamically like in this case the column name then for most cases using dynamic query is the best solution. This mean to use the Stored Procedure sp_executesql as bellow
DECLARE #String INT
SELECT #String = 1
DECLARE #SQLString nvarchar(500);
SET #SQLString =
'SELECT ' + QUOTENAME(concat('a',cast(#String as nvarchar))) + ' FROM T'
EXECUTE sp_executesql #SQLString
GO
I'm asking this question for SQL Server 2008 R2
I'd like to know if there is a way to create multiple functions in a single batch statement.
I've made the following code as an example; suppose I want to take a character string and rearrange its letters in alphabetical order. So, 'Hello' would become 'eHllo'
CREATE FUNCTION char_split (#string varchar(max))
RETURNS #characters TABLE
(
chars varchar(2)
)
AS
BEGIN
DECLARE #length int,
#K int
SET #length = len(#string)
SET #K = 1
WHILE #K < #length+1
BEGIN
INSERT INTO #characters
SELECT SUBSTRING(#string,#K,1)
SET #K = #K+1
END
RETURN
END
CREATE FUNCTION rearrange (#string varchar(max))
RETURNS varchar(max)
AS
BEGIN
DECLARE #SplitData TABLE (
chars varchar(2)
)
INSERT INTO #SplitData SELECT * FROM char_split(#string)
DECLARE #Output varchar(max)
SELECT #Output = coalesce(#Output,' ') + cast(chars as varchar(10))
from #SplitData
order by chars asc
RETURN #Output
END
declare #string varchar(max)
set #string = 'Hello'
select dbo.rearrange(#string)
When I try running this code, I get this error:
'CREATE FUNCTION' must be the first statement in a query batch.
I tried enclosing each function in a BEGIN END block, but no luck. Any advice?
Just use a GO statement between the definition of the UDFs
Not doable. SImple like that.
YOu can make it is one statement using a GO between them.
But as the GO is a batch delimiter.... this means you send multiple batches, which is explicitly NOT Wanted in your question.
So, no - it is not possible to do that in one batch as the error clearly indicates.
This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
SQL Multiple Parameter Values
SQL Server (2008) Pass ArrayList or String to SP for IN()
I would like to SELECT some rows from a table that have certain values which are not known at the time a stored procedure is written. For example, searching for books of a particular type or types in a library database:
SELECT * FROM Books WHERE Type IN (_expr_);
Where I want _expr_ to be ('Humor', 'Thriller') one run, and maybe ('Education') the next, depending on the user's choices. How can I vary the expression at run-time?
Unfortunately, I still have a lot to learn about SQL in general and am not sure if I'm even asking a question that makes sense. I would appreciate any guidance!
This is trickier than you might think in SQL Server 2005 (2008 has table valued parameters which makes it easier)
See http://www.sommarskog.se/arrays-in-sql-2005.html for a review of the methods.
I feel like I've answered this question before...
anyway, I've long used the following user defined split function:
Usage: dbo.Split("#ParamName", ",") where the 2nd parameter is the separator.
You can then join this onto a table, as it returns a table value function with the elementID and Element.
CREATE FUNCTION [dbo].[Split]
(
#vcDelimitedString varchar(max),
#vcDelimiter varchar(100)
)
RETURNS #tblArray TABLE
(
ElementID smallint IDENTITY(1,1), --Array index
Element varchar(1000) --Array element contents
)
AS
BEGIN
DECLARE #siIndex smallint, #siStart smallint, #siDelSize smallint
SET #siDelSize = LEN(#vcDelimiter)
--loop through source string and add elements to destination table array
WHILE LEN(#vcDelimitedString) > 0
BEGIN
SET #siIndex = CHARINDEX(#vcDelimiter, #vcDelimitedString)
IF #siIndex = 0
BEGIN
INSERT INTO #tblArray VALUES(#vcDelimitedString)
BREAK
END
ELSE
BEGIN
INSERT INTO #tblArray VALUES(SUBSTRING(#vcDelimitedString, 1,#siIndex - 1))
SET #siStart = #siIndex + #siDelSize
SET #vcDelimitedString = SUBSTRING(#vcDelimitedString, #siStart , LEN(#vcDelimitedString) - #siStart + 1)
END
END
RETURN
END
another approach, is to build a sql string and use execute to execute it. The string is of "INSERT...SELECT form" and inserts the results into a temporary table. Then you select from the temp.
declare #sql varchar(1000)
set #sql = 'INSERT INTO sometemptable SELECT * FROM Books WHERE Type IN ('
set #sql = #sql + {code that builds a syntactically correct list}
set #sql = #sql + ')'
execute #s_sql
select * from sometemptable
What you do here for sql server 2005 and prior is put the user parameters in a table, and then select from the table:
select columns
from books
where type in
(
select choices
from userchoices
where sessionkey= #sessionkey and userid= #userid
)
I am having a small problem with the IN SQL statement. I was just wondering if anyone could help me?
#Ids = "1,2,3,4,5"
SELECT * FROM Nav WHERE CONVERT(VARCHAR,NavigationID) IN (CONVERT(VARCHAR,#Ids))
This is coming back with the error below, I am sure this is pretty simple!
Conversion failed when converting the varchar value '1,' to data type int.
The SQL IN clause does not accept a single variable to represent a list of values -- no database does, without using dynamic SQL. Otherwise, you could use a Table Valued Function (SQL Server 2000+) to pull the values out of the list & return them as a table that you can join against.
Dynamic SQL example:
EXEC('SELECT *
FROM Nav
WHERE NavigationID IN ('+ #Ids +')')
I recommend reading The curse and blessings of dynamic SQL before using dynamic SQL on SQL Server.
Jason:
First create a function like this
Create FUNCTION [dbo].[ftDelimitedAsTable](#dlm char, #string varchar(8000))
RETURNS
--------------------------------------------------------------------------*/
/*------------------------------------------------------------------------
declare #dlm char, #string varchar(1000)
set #dlm=','; set #string='t1,t2,t3';
-- tHIS FUNCION RETUNRS IN THE ASCENDING ORDER
-- 19TH Apr 06
------------------------------------------------------------------------*/
--declare
#table_var TABLE
(id int identity(1,1),
r varchar(1000)
)
AS
BEGIN
declare #n int,#i int
set #n=dbo.fnCountChars(#dlm,#string)+1
SET #I =1
while #I <= #N
begin
insert #table_var
select dbo.fsDelimitedString(#dlm,#string,#i)
set #I= #I+1
end
if #n =1 insert #TABLE_VAR VALUES(#STRING)
delete from #table_var where r=''
return
END
And then
set quoted_identifier off
declare #ids varchar(max)
select #Ids = "1,2,3,4,5"
declare #nav table ( navigationid int identity(1,1),theother bigint)
insert #nav(theother) select 10 union select 11 union select 15
SELECT * FROM #Nav WHERE CONVERT(VARCHAR,NavigationID) IN (select id from dbo.ftDelimitedAsTable(',',#Ids))
select * from dbo.ftDelimitedAsTable(',',#Ids)
What you're doing is not possible with the SQL IN statement. You cannot pass a string to it and expect that string to be parsed. IN is for specific, hard-coded values.
There are two ways to do what you want to do here.
One is to create a 'dynamic sql' query and execute it, after substituting in your IN list.
DECLARE #query varchar(max);
SET #query = 'SELECT * FROM Nav WHERE CONVERT(VARCHAR,NavigationID) IN (' + #Ids + ')'
exec (#query)
This can have performance impacts and other complications. Generally I'd try to avoid it.
The other method is to use a User Defined Function (UDF) to split the string into its component parts and then query against that.
There's a post detailing how to create that function here
Once the function exists, it's trivial to join onto it
SELECT * FROM Nav
CROSS APPLY dbo.StringSplit(#Ids) a
WHERE a.s = CONVERT(varchar, Nav.NavigationId)
NB- the 'a.s' field reference is based on the linked function, which stores the split value in a column named 's'. This may differ based on the implementation of your string split function
This is nice because it uses a set based approach to the query rather than an IN subquery, but a CROSS JOIN may be a little complex for the moment, so if you want to maintain the IN syntax then the following should work:
SELECT * FROM Nav
WHERE Nav.NavigationId IN
(SELECT CONVERT(int, a.s) AS Value
FROM dbo.StringSplit(#Ids) a
I have a SQL stored procedure of the form
SELECT [fields] FROM [table] WHERE #whereSql
I want to pass the procedure an argument (#whereSql) which specifies the entire WHERE clause, but the following error is returned:
An expression of non-boolean type specified in a context where a condition is expected
Can this be done?
The short answer is that you can't do it like this -- SQL Server looks at the contents of a variable as a VALUE. It doesn't dynamically build up the string to execute (which is why this is the correct way to avoid SQL injection attacks).
You should make every effort to avoid a dynamic WHERE as you're trying to do, largely for this reason, but also for the sake of efficiency. Instead, try to build up the WHERE clause so that it short-circuits pieces with lots of ORs, depending on the situation.
If there's no way around it, you can still build a string of your own assembled from the pieces of the command, and then EXEC it.
So you could do this:
DECLARE #mywhere VARCHAR(500)
DECLARE #mystmt VARCHAR(1000)
SET #mywhere = ' WHERE MfgPartNumber LIKE ''a%'' '
SELECT #mystmt = 'SELECT TOP 100 * FROM Products.Product AS p ' + #mywhere + ';'
EXEC( #mystmt )
But I recommend instead that you do this:
SELECT TOP 100 *
FROM Products.Product AS p
WHERE
( MfgPartNumber LIKE 'a%' AND ModeMfrPartNumStartsWith=1)
OR ( CategoryID = 123 AND ModeCategory=1 )
I believe this can be done using Dynamic SQL. See below:
CREATE PROCEDURE [dbo].[myProc]
#whereSql nvarchar(256)
AS
EXEC('SELECT [fields] FROM [table] WHERE ' + #whereSql)
GO
That said, you should do some serious research on dynamic SQL before you actually use it.
Here are a few links that I came across after a quick search:
http://www.sommarskog.se/dynamic_sql.html
http://msdn.microsoft.com/en-us/library/aa224806%28SQL.80%29.aspx
http://www.itjungle.com/fhg/fhg100505-story02.html
Make sure you read this fully
www.sommarskog.se/dynamic_sql.html
Dynamic SQL listed in some of the Answers is definitely a solution. However, if Dynamic SQL needs to be avoided, one of the solutions that I prefer is to make use of table variables (or temp tables) to store the parameter value that is used for comparison in WHERE clause.
Here is an example Stored Procedure implementation.
CREATE PROCEDURE [dbo].[myStoredProc]
#parameter1 varchar(50)
AS
declare #myTempTableVar Table(param1 varchar(50))
insert into #myTempTableVar values(#parameter1)
select * from MyTable where MyColumn in (select param1 from #myTempTableVar)
GO
In case you want to pass in multiple values, then the comma separated values can be stored as rows in the table variable and used in the same way for comparison.
CREATE PROCEDURE [dbo].[myStoredProc]
#parameter1 varchar(50)
AS
--Code Block to Convert Comma Seperated Parameter into Values of a Temporary Table Variable
declare #myTempTableVar Table(param1 varchar(50))
declare #index int =0, #tempString varchar(10)
if charindex(',',#parameter1) > 0
begin
set #index = charindex(',',#parameter1)
while #index > 0
begin
set #tempString = SubString(#parameter1,1,#index-1)
insert into #myTempTableVar values (#tempString)
set #parameter1 = SubString(#parameter1,#index+1,len(#parameter1)-#index)
set #index = charindex(',',#parameter1)
end
set #tempString = #parameter1
insert into #myTempTableVar values (#tempString)
end
else
insert into #myTempTableVar values (#parameter1)
select * from MyTable where MyColumn in (select param1 from #myTempTableVar)
GO
http://sqlmag.com/t-sql/passing-multivalued-variables-stored-procedure
try this it works!!
CHARINDEX (',' + ColumnName + ',', ',' +
REPLACE(#Parameter, ' ', '') + ',') > 0
execute syntax set #Parameter= 'nc1,nc2'