Checking for a string within a string, and not in another string using SQL - sql

I am trying to build a short SQL script that will check if #NewProductAdded is somewhere in #NewTotalProducts. And also if #NewProductAdded is NOT in #OldTotalProducts. Please have a look at the setup below (The real data is in tables, not variables, but a simple example is all I need):
declare #NewProductAdded as varchar(max)
declare #NewTotalProducts as varchar(max)
declare #OldTotalProducts as varchar(max)
set #NewProductAdded ='ProductB'
set #NewTotalProducts = 'ProductAProductBProductC'
set #OldTotalProducts = 'ProductAProductC'
SELECT CustomerID FROM Products WHERE NewProductAdded ...
I want to make sure that 'ProductB' is contained somewhere within #NewTotalProducts, and is NOT contained anywhere within #OldTotalProducts. Product names vary vastly with thousands of combinations, and there is no way to really separate them from each other in a string. I am sure there is a simple solution or function for this, I just don't know it yet.

The specific answer to your question is like (or charindex() if you are using SQL Server or Sybase):
where #NewTotalProducts like '%'+#NewProductAdded+'%' and
#OldTotalProducts not like '%'+#NewProductAdded+'%'
First comment. If you have to use lists stored in strings, at least use delimiters:
where ','+#NewTotalProducts+',' like '%,'+#NewProductAdded+',%' and
','+#OldTotalProducts+',' not like '%,'+#NewProductAdded+',%'
Second comment. Don't store lists in strings. Instead, use a temporary tables or table variable:
declare #NewTotalProducts table (name varchar(255));
insert into #NewTotalProducts(name)
select 'ProductA' union all
select 'ProductB' . . .
Note: throughout this answer I have used SQL Server syntax. The code appears to be SQL Server.

Related

Why is it not possible to set a column's alias dynamically?

I have a stored procedure and I want to pass a column alias as a parameter, something like:
SELECT u.userLoginName AS #columnName
FROM -- some JOINs
WHERE -- some conditions
where #columnName can be one of two options and it is set before the SELECT statement according to some condition.
I already know that it can be done only by dynamic SQL, but I don't understand why?
I know that the Order of execution of a Query is: FROM and JOINs -> WHERE -> GROUP BY and only then SELECT.
So if at this point I already got the result set, i.e, the finale table, why can't I just rename the column name as #columnName? What happens in the background that I miss?
This may answer your question.
A SQL result set is conceptually just like a table: it has well defined rows and columns and no ordering unless created with an explicit order by.
A SQL query is processed in two phases: it is compiled and optimized, then it is run. (Happily some databases are now starting to provide dynamic optimization as well, but the queries still go through the compilation phase.)
All information about the result set needs to be known during the compilation phase -- and that includes the resulting column names and column types. Dynamic names would prevent this from happening. They would only be known during the execution phase.
Note that this applies to parameters as identifier as well. Parameters are substituted at the beginning of the execution phase.
This is not a limitation of any particular database. It applies to all of them. I suspect that some more modern databases are implemented in a way that would allow for more dynamic naming, but I don't know of any databases that actually implement it except through dynamic SQL.
It is possible but you have to use a dynamic query.
Let assume we have the following table
Create table #TBL ([Months] VARCHAR(3), Value INT)
INSERT INTO #TBL values
('Jan',20),('Feb',12),('Jan',15),('Mar',25),
('Feb',18),('Jan',9),('Mar',10),('Jan',19)
GO
And I want to dynamically set the columns name using variable. I can use the bellow code
DECLARE #M VARCHAR(10)='Months',
#T VARCHAR(10)='Total'
-- Dynamic query to get the column name
DECLARE #qry VARCHAR(MAX)
SET #qry = 'SELECT [Months] AS '+#M+',
sum(Value) as '+#T+'
FROM #TBL
group by [Months]
DROP TABLE #TBL'
EXEC (#qry)
Note the query itself have to be dynamic

How to get a list of IDs from a parameter which sometimes includes the IDs already, but sometimes include another sql query

I have developed a SQL query in SSMS-2017 like this:
DECLARE #property NVARCHAR(MAX) = #p;
SET #property = REPLACE(#property, '''', '');
DECLARE #propList TABLE (hproperty NUMERIC(18, 0));
IF CHARINDEX('SELECT', #property) > 0 OR CHARINDEX('select', #property) > 0
BEGIN
INSERT INTO #propList
EXECUTE sp_executesql #property;
END;
ELSE
BEGIN
DECLARE #x TABLE (val NUMERIC(18, 0));
INSERT INTO #x
SELECT CONVERT(NUMERIC(18, 0), strval)
FROM dbo.StringSplit(#property, ',');
INSERT INTO #propList
SELECT val
FROM #x;
END;
SELECT ...columns...
FROM ...tables and joins...
WHERE ...filters...
AND HMY IN (SELECT hproperty FROM #propList)
The issue is, it is possible that the value of the parameter #p can be a list of IDs (Example: 1,2,3,4) or a direct select query (Example: Select ID from mytable where code='A123').
The code is working well as shown above. However it causes a problem in our system (as we use Yardi7-Voyager), and we need to leave only the select statement as a query. To manage it, I was planning to create a function and use it in the where clause like:
WHERE HMY IN (SELECT myFunction(#p))
However I could not manage it as I see I cannot execute a dynamic query in an SQL Function. Then I am stacked. Any idea at this point to handle this issue will be so appreciated.
Others have pointed out that the best fix for this would be a design change, and I agree with them. However, I'd also like to treat your question as academic and answer it in case any future readers ever have the same question in a use case where a design change wouldn't be possible/desirable.
I can think of two ways you might be able to do what you're attempting in a single select, as long as there are no other restrictions on what you can do that you haven't mentioned yet. To keep this brief, I'm just going to give you psuedo-code that can be adapted to your situation as well as those of future readers:
OPENQUERY (or OPENROWSET)
You can incorporate your code above into a stored procedure instead of a function, since stored procedures DO allow dynamic sql, unlike functions. Then the SELECT query in your app would be a SELECT from OPENQUERY(Execute Your Stored Prodedure).
UNION ALL possibilities.
I'm about 99% sure no one would ever want to use this, but I'm mentioning it to be as academically complete as I know how to be.
The second possibility would only work if there are a limited, known, number of possible queries that might be supported by your application. For instance, you can only get your Properties from either TableA, filtered by column1, or from TableB, filtered by Column2 and/or Column3.
Could be more than these possibilities, but it has to be a limited, known quantity, and the more possibilities, the more complex and lengthy the code will get.
But if that's the case, you can simply SELECT from a UNION ALL of every possible scenario, and make it so that only one of the SELECTs in the UNION ALL will return results.
For example:
SELECT ... FROM TableA WHERE Column1=fnGetValue(#p, 'Column1')
AND CHARINDEX('SELECT', #property) > 0
AND CHARINDEX('TableA', #property) > 0
AND CHARINDEX('Column1', #property) > 0
AND (Whatever other filters are needed to uniquely identify this case)
UNION ALL
SELECT
...
Note that fnGetValue() isn't a built-in function. You'd have to write it. It would parse the string in #p, find the location of 'Column1=', and return whatever value comes after it.
At the end of your UNION ALL, you'd need to add a last UNION ALL to a query that will handle the case where the user passed a comma-separated string instead of a query, but that's easy, because all the steps in your code where you populated table variables are unnecessary. You can simply do the final query like this:
WHERE NOT CHARINDEX('SELECT', #p) > 0
AND HMY IN (SELECT strval FROM dbo.StringSplit(#p, ','))
I'm pretty sure this possibility is way more work than its worth, but it is an example of how, in general, dynamic SQL can be replaced with regular SQL that simply covers every possible option you wanted the dynamic sql to be able to handle.

Sql Server 2008 r2 Using a WHILE loop inside a function

I read an answer that said you don't want to use WHILE loops in SQL Server. I don't understand that generalization. I'm fairly new to SQL so I might not understand the explanation yet. I also read that you don't really want to use cursors unless you must. The search results I've found are too specific to the problem presented and I couldn't glean useful technique from them, so I present this to you.
What I'm trying to do is take the values in a client file and shorten them where necessary. There are a couple of things that need to be achieved here. I can't simply hack the field values provided. My company has standard abbreviations that are to be used. I have put these in a table, Abbreviations. the table has the LongName and the ShortName. I don't want to simply abbreviate every LongName in the row. I only want to apply the update as long as the field length is too long. This is why I need the WHILE loop.
My thought process was thus:
CREATE FUNCTION [dbo].[ScrubAbbrev]
(#Field nvarchar(25),#Abbrev nvarchar(255))
RETURNS varchar(255)
AS
BEGIN
DECLARE #max int = (select MAX(stepid) from Abbreviations)
DECLARE #StepID int = (select min(stepid) from Abbreviations)
DECLARE #find varchar(150)=(select Longname from Abbreviations where Stepid=#stepid)
DECLARE #replace varchar(150)=(select ShortName from Abbreviations where Stepid=#stepid)
DECLARE #size int = (select max_input_length from FieldDefinitions where FieldName = 'title')
DECLARE #isDone int = (select COUNT(*) from SizeTest where LEN(Title)>(#size))
WHILE #StepID<=#max or #isDone = 0 and LEN(#Abbrev)>(#size) and #Abbrev is not null
BEGIN
RETURN
REPLACE(#Abbrev,#find,#replace)
SET #StepID=#StepID+1
SET #find =(select Longname from Abbreviations where Stepid=#stepid)
SET #replace =(select ShortName from Abbreviations where Stepid=#stepid)
SET #isDone = (select COUNT(*) from SizeTest where LEN(Title)>(#size))
END
END
Obviously the RETURN should go at the end, but I need to reset the my variables to the next #stepID, #find, and #replace.
Is this one of those times where I'd have to use a cursor (which I've never yet written)?
Generally, you don't want to use cursors or while loops in SQL because they only process a single row at a time, and thus perform very poorly. SQL is designed and optimized to process (potentially very large) sets of data, not individual values.
You could factor out the while loop by doing something like this:
UPDATE t
SET t.targetColumn = a.ShortName
FROM targetTable t
INNER JOIN Abbreviations a
ON t.targetColumn = a.LongName
WHERE LEN(t.targetColumn) > #maxLength
This is generalized and you will need to tweak it to fit your specific data model, but here's what's going on:
For every row in "targetTable", set the value of "targetColumn" (what you want to abbreviate) to the relevant abbreviation (found in Abbreviations.ShortName) iff: the current value has a standardized abbreviation (the inner join) and the current value is longer than desired (the where condition).
You'll need to add an integer parameter or local variable, #maxLength, to indicate what constitutes "too long". This query processes the target table all at once, updating the value in the target column for every eligible row, while a function will only find the abbreviation for a single item (the intersection of one row and one column) at a time.
Note that this won't do anything if the value is too long but doesn't have a standard abbreviation. Your existing code has this same limitation, so I assume this is desired behavior.
I also recommend making this a stored procedure rather than a function. Functions on SQL Server are treated as black boxes and can seriously harm performance, because the optimizer generally doesn't have a good idea of what they're doing.

Selecting everything in a table... with a where statement

I have an interesting situation where I'm trying to select everything in a sql server table but I only have to access the table through an old company API instead of SQL. This API asks for a table name, a field name, and a value. It then plugs it in rather straightforward in this way:
select * from [TABLE_NAME_VAR] where [FIELD_NAME_VAR] = 'VALUE_VAR';
I'm not able to change the = sign to != or anything else, only those vars. I know this sounds awful, but I cannot change the API without going through a lot of hoops and it's all I have to work with.
There are multiple columns in this table that are all numbers, all strings, and set to not null. Is there a value I can pass this API function that would return everything in the table? Perhaps a constant or special value that means it's a number, it's not a number, it's a string, *, it's not null, etc? Any ideas?
No this isn't possible if the API is constructed correctly.
If this is some home grown thing it may not be, however. You could try entering YourTable]-- as the value for TABLE_NAME_VAR such that when plugged into the query it ends up as
select * from [YourTable]--] where [FIELD_NAME_VAR] = 'VALUE_VAR';
If the ] is either rejected or properly escaped (by doubling it up) this won't work however.
You might try to pass this VALUE_VAR
1'' or ''''=''
If it's used as-is and executed as Dynamic SQL it should result in
SELECT * FROM tab WHERE fieldname = '1' or ''=''
here is a simple example,
hope it might help
declare #a varchar(max)
set #a=' ''1'' or 1=1 '
declare #b varchar(max)
set #b=('select * from [TABLE_NAME_VAR] where [FIELD_NAME_VAR]='+#a)
exec(#b)
If your API allows column name instead of constant,
select * from [TABLE_NAME_VAR] where [FIELD_NAME_VAR] = [FIELD_NAME_VAR] ;

Matching sub string in a column

First I apologize for the poor formatting here.
Second I should say up front that changing the table schema is not an option.
So I have a table defined as follows:
Pin varchar
OfferCode varchar
Pin will contain data such as:
abc,
abc123
OfferCode will contain data such as:
123
123~124~125
I need a query to check for a count of a Pin/OfferCode combination and when I say OfferCode, I mean an individual item delimited by the tilde.
For example if there is one row that looks like abc, 123 and another that looks like abc,123~124, and I search for a count of Pin=abc,OfferCode=123 I wand to get a count = 2.
Obviously I can do a similar query to this:
SELECT count(1) from MyTable (nolock) where OfferCode like '%' + #OfferCode + '%' and Pin = #Pin
using like here is very expensive and I'm hoping there may be a more efficient way.
I'm also looking into using a split string solution. I have a Table-valued function SplitString(string,delim) that will return table OutParam, but I'm not quite sure how to apply this to a table column vs a string. Would this even be worth wile pursuing? It seems like it would be much more expensive, but I'm unable to get a working solution to compare to the like solution.
Your like/% solution is open to a bug if you had offer codes other than 3 digits (if there was offer code 123 and 1234, searching for like '%123%' would return both, which is wrong). You can use your string function this way:
SELECT Pin, count(1)
FROM MyTable (nolock)
CROSS APPLY SplitString(OfferCode,'~') OutParam
WHERE OutParam.Value = #OfferCode and Pin = #Pin
GROUP BY Pin
If you have a relatively small table you can probably get away with this. If you are working with a large number of rows or encountering performance problems, it would be more effective to normalize it as RedFilter suggested.
using like here is very expensive and I'm hoping there may be a more efficient way
The efficient way is to normalize the schema and put each OfferCode in its own row.
Then your query is more like (although you may need to use an intersection table depending on your schema):
select count(*)
from MyTable
where OfferCode = #OfferCode
and Pin = #Pin
Here is one way to use like for this problem, which is standard for getting exact matches when searching delimited strings while avoiding the '%123%' matches '123' and '1234' problem:
-- Create some test data
declare #table table (
Pin varchar(10) not null
, OfferCode varchar(100) not null
)
insert into #table select 'abc', '123'
insert into #table select 'abc', '123~124'
-- Mock some proc params
declare #Pin varchar(10) = 'abc'
declare #OfferCode varchar(10) = '123'
-- Run the actual query
select count(*) as Matches
from #table
where Pin = #Pin
-- Append delimiters to find exact matches
and '~' + OfferCode + '~' like '%~' + #OfferCode + '~%'
As you can see, we're adding the delimiters to the searched string, and also the search string in order to find matches, thus avoiding the bugs mentioned by other answers.
I highly doubt that a string splitting function will yield better performance over like, but it may be worth a test or two using some of the more recently suggested methods. If you still have unacceptable performance, you have a few options:
Updated:
Try an index on OfferCode (or on a computed persisted column of '~' + OfferCode + '~'). Contrary to the myth that SQL Server won't use an index with like and wildcards, this might actually help.
Check out full text search.
Create a normalized version of this table using a string splitter. Use this table to run your counts. Update this table according to some schedule or event (trigger, etc.).
If you have some standard search terms, pre-calculate the counts for these and store them on some regular basis.
Actually, the LIKE condition is going to have much less cost than doing any sort of string manipulation and comparison.
http://www.simple-talk.com/sql/performance/the-seven-sins-against-tsql-performance/