SQL Precedence Matching - sql

I'm trying to do precedence matching on a table within a stored procedure. The requirements are a bit tricky to explain, but hopefully this will make sense. Let's say we have a table called books, with id, author, title, date, and pages fields.
We also have a stored procedure that will match a query with ONE row in the table.
Here is the proc's signature:
create procedure match
#pAuthor varchar(100)
,#pTitle varchar(100)
,#pDate varchar(100)
,#pPages varchar(100)
as
...
The precedence rules are as follows:
First, try and match on all 4 parameters. If we find a match return.
Next try to match using any 3 parameters. The 1st parameter has the highest precedence here and the 4th the lowest. If we find any matches return the match.
Next we check if any two parameters match and finally if any one matches (still following the parameter order's precedence rules).
I have implemented this case-by-case. Eg:
select #lvId = id
from books
where
author = #pAuthor
,title = #pTitle
,date = #pDate
,pages = #pPages
if ##rowCount = 1 begin
select #lvId
return
end
select #lvId = id
from books
where
author = #pAuthor
,title = #pTitle
,date = #pDate
if ##rowCount = 1 begin
select #lvId
return
end
....
However, for each new column in the table, the number of individual checks grows by an order of 2. I would really like to generalize this to X number of columns; however, I'm having trouble coming up with a scheme.
Thanks for the read, and I can provide any additional information needed.
Added:
Dave and Others, I tried implementing your code and it is choking on the first Order by Clause, where we add all the counts. Its giving me an invalid column name error. When I comment out the total count, and order by just the individual aliases, the proc compiles fine.
Anyone have any ideas?
This is in Microsoft Sql Server 2005

I believe that the answers your working on are the simplest by far. But I also believe that in SQL server, they will always be full table scans. (IN Oracle you could use Bitmap indexes if the table didn't undergo a lot of simultaneous DML)
A more complex solution but a much more performant one would be to build your own index. Not a SQL Server index, but your own.
Create a table (Hash-index) with 3 columns (lookup-hash, rank, Rowid)
Say you have 3 columns to search on. A, B, C
For every row added to Books you'll insert 7 rows into hash_index either via a trigger or CRUD proc.
First you'll
insert into hash_index
SELECT HASH(A & B & C), 7 , ROWID
FROM Books
Where & is the concatenation operator and HASH is a function
then you'll insert hashes for A & B, A & C and B & C.
You now have some flexibility you can give them all the same rank or if A & B are a superior match to B & C you can give them a higher rank.
And then insert Hashes for A by itself and B and C with the same choice of rank... all the same number or all different... you can even say that a match on A is higher choice than a match on B & C. This solution give you a lot of flexibility.
Of course, this will add a lot of INSERT overhead, but if DML on Books is low or performance is not relevant you're fine.
Now when you go to search you'll create a function that returns a table of HASHes for your #A, #B and #C. you'll have a small table of 7 values that you'll join to the lookup-hash in the hash-index table. This will give you every possible match and possibly some false matches (that's just the nature of hashes). You'll take that result, order desc on the rank column. Then take the first rowid back to the book table and make sure that all of the values of #A #B #C are actually in that row. On the off chance it's not and you've be handed a false positive you'll need to check the next rowid.
Each of these operation in this "roll your own" are all very fast.
Hashing your 3 values into a small 7 row table variable = very fast.
joining them on an index in your Hash_index table = very fast index lookups
Loop over result set will result in 1 or maybe 2 or 3 table access by rowid = very fast
Of course, all of these together could be slower than an FTS... But an FTS will continue to get slower and slower. There will be a size which the FTS is slower than this. You'll have to play with it.

I don't have time to write out the query, but I think this idea would work.
For your predicate, use "author = #pAuthor OR title = #ptitle ...", so you get all candidate rows.
Use CASE expressions or whatever you like to create virtual columns in the result set, like:
SELECT CASE WHEN author = #pAuthor THEN 1 ELSE 0 END author_match,
...
Then add this order by and get the first row returned:
ORDER BY (author_match+title_match+date_match+page_match) DESC,
author_match DESC,
title_match DESC,
date_match DESC
page_match DESC
You still need to extend it for each new column, but only a little bit.

You don't explain what should happen if more than one result matches any given set of parameters that is reached, so you will need to change this to account for those business rules. Right now I've set it to return books that match on later parameters ahead of those that don't. For example, a match on author, title, and pages would come before one that just matches on author and title.
Your RDBMS may have a different way of handling "TOP", so you may need to adjust for that as well.
SELECT TOP 1
author,
title,
date,
pages
FROM
Books
WHERE
author = #author OR
title = #title OR
date = #date OR
pages = #pages OR
ORDER BY
CASE WHEN author = #author THEN 1 ELSE 0 END +
CASE WHEN title = #title THEN 1 ELSE 0 END +
CASE WHEN date = #date THEN 1 ELSE 0 END +
CASE WHEN pages = #pages THEN 1 ELSE 0 END DESC,
CASE WHEN author = #author THEN 8 ELSE 0 END +
CASE WHEN title = #title THEN 4 ELSE 0 END +
CASE WHEN date = #date THEN 2 ELSE 0 END +
CASE WHEN pages = #pages THEN 1 ELSE 0 END DESC

select id,
CASE WHEN #pPages = pages
THEN 1 ELSE 0
END
+ Case WHEN #pAuthor=author
THEN 1 ELSE 0
END AS
/* + Do this for each attribute. If each of your
attributes are just as important as the other
for example matching author is jsut as a good as matching title then
leave the values alone, if different matches are more
important then change the values */ as MatchRank
from books
where author = #pAuthor OR
title = #pTitle OR
date = #pDate
ORDER BY MatchRank DESC
Edited
When I run this query (modified only to fit one of my own tables) it works fine in SQL2005.
I'd recommend a where clause but you will want to play around with this to see performance impacts. You will need to use an OR clause otherwise you will loose potential matches

In regards to the Order By clause failing to compile:
As recursive said(in a comment), alias' may not be within expressions which are used in Order By clauses. to get around this I used a subquery which returned the rows, then ordered by in the outer query. In this way I am able to use the alias' in the order by clause. A little slower but a lot cleaner.

Okay, let me restate my understanding of your question: You want a stored procedure that can take a variable number of parameters and pass back the top row that matches the parameters in the weighted order of preference passed on SQL Server 2005.
Ideally, it will use WHERE clauses to prevent full tables scans plus take advantage of indices and will "short circuit" the search - you don't want to search all possible combinations if one can be found early. Perhaps we can also allow other comparators than = such as >= for dates, LIKE for strings, etc.
One possible way is to pass the parameters as XML like in this article and use .Net stored procedures but let's keep it plain vanilla T-SQL for now.
This looks to me like a binary search on the parameters: Search all parameters, then drop the last one, then drop the second last one but include the last one, etc.
Let's pass the parameters as a delimited string since stored procedures don't allow for arrays to be passed as parameters. This will allow us to get a variable number of parameters in to our stored procedure without requiring a stored procedure for each variation of parameters.
In order to allow any sort of comparison, we'll pass the entire WHERE clause list, like so: title like '%something%'
Passing multiple parameters means delimiting them in a string. We'll use the tilde ~ character to delimit the parameters, like this: author = 'Chris Latta'~title like '%something%'~pages >= 100
Then it is simply a matter of doing a binary weighted search for the first row that meets our ordered list of parameters (hopefully the stored procedure with comments is self-explanatory but if not, let me know). Note that you are always guaranteed a result (assuming your table has at least one row) as the last search is parameterless.
Here is the stored procedure code:
CREATE PROCEDURE FirstMatch
#SearchParams VARCHAR(2000)
AS
BEGIN
DECLARE #SQLstmt NVARCHAR(2000)
DECLARE #WhereClause NVARCHAR(2000)
DECLARE #OrderByClause NVARCHAR(500)
DECLARE #NumParams INT
DECLARE #Pos INT
DECLARE #BinarySearch INT
DECLARE #Rows INT
-- Create a temporary table to store our parameters
CREATE TABLE #params
(
BitMask int, -- Uniquely identifying bit mask
FieldName VARCHAR(100), -- The field name for use in the ORDER BY clause
WhereClause VARCHAR(100) -- The bit to use in the WHERE clause
)
-- Temporary table identical to our result set (the books table) so intermediate results arent output
CREATE TABLE #junk
(
id INT,
author VARCHAR(50),
title VARCHAR(50),
printed DATETIME,
pages INT
)
-- Ill use tilde ~ as the delimiter that separates parameters
SET #SearchParams = LTRIM(RTRIM(#SearchParams))+ '~'
SET #Pos = CHARINDEX('~', #SearchParams, 1)
SET #NumParams = 0
-- Populate the #params table with the delimited parameters passed
IF REPLACE(#SearchParams, '~', '') <> ''
BEGIN
WHILE #Pos > 0
BEGIN
SET #NumParams = #NumParams + 1
SET #WhereClause = LTRIM(RTRIM(LEFT(#SearchParams, #Pos - 1)))
IF #WhereClause <> ''
BEGIN
-- This assumes your field names dont have spaces and that you leave a space between the field name and the comparator
INSERT INTO #params (BitMask, FieldName, WhereClause) VALUES (POWER(2, #NumParams - 1), LTRIM(RTRIM(LEFT(#WhereClause, CHARINDEX(' ', #WhereClause, 1) - 1))), #WhereClause)
END
SET #SearchParams = RIGHT(#SearchParams, LEN(#SearchParams) - #Pos)
SET #Pos = CHARINDEX('~', #SearchParams, 1)
END
END
-- Set the binary search to search from all parameters down to one in order of preference
SET #BinarySearch = POWER(2, #NumParams)
SET #Rows = 0
WHILE (#BinarySearch > 0) AND (#Rows = 0)
BEGIN
SET #BinarySearch = #BinarySearch - 1
SET #WhereClause = ' WHERE '
SET #OrderByClause = ' ORDER BY '
SELECT #OrderByClause = #OrderByClause + FieldName + ', ' FROM #params WHERE (#BinarySearch & BitMask) = BitMask ORDER BY BitMask
SET #OrderByClause = LEFT(#OrderByClause, LEN(#OrderByClause) - 1) -- Remove the trailing comma
SELECT #WhereClause = #WhereClause + WhereClause + ' AND ' FROM #params WHERE (#BinarySearch & BitMask) = BitMask ORDER BY BitMask
SET #WhereClause = LEFT(#WhereClause, LEN(#WhereClause) - 4) -- Remove the trailing AND
IF #BinarySearch = 0
BEGIN
-- If nothing found so far, return the top row in the order of the parameters fields
SET #WhereClause = ''
-- Use the full order sequence of fields to return the results
SET #OrderByClause = ' ORDER BY '
SELECT #OrderByClause = #OrderByClause + FieldName + ', ' FROM #params ORDER BY BitMask
SET #OrderByClause = LEFT(#OrderByClause, LEN(#OrderByClause) - 1) -- Remove the trailing comma
END
-- Find out if there are any results for this search
SET #SQLstmt = 'SELECT TOP 1 id, author, title, printed, pages INTO #junk FROM books' + #WhereClause + #OrderByClause
Exec (#SQLstmt)
SET #Rows = ##RowCount
END
-- Stop the result set being eaten by the junk table
SET #SQLstmt = REPLACE(#SQLstmt, 'INTO #junk ', '')
-- Uncomment the next line to see the SQL you are producing
--PRINT #SQLstmt
-- This gives the result set
Exec (#SQLstmt)
END
This stored procedure is called like so:
FirstMatch 'author = ''Chris Latta''~pages > 100~title like ''%something%'''
There you have it - a fully expandable, optimised search for the top result in weighted order of preference. This was an interesting problem and shows just what you can pull off with native T-SQL.
A couple of small issues with this:
it relies on the caller to know that they must leave a space after the field name for the parameter to work properly
you can't have field names with spaces in them - fixable with some effort
it assumes that the relevant sort order is always ascending
the next programmer that has to look at this procedure will think you're insane :)

Try this:
ALTER PROCEDURE match
#pAuthor varchar(100)
,#pTitle varchar(100)
,#pDate varchar(100)
,#pPages varchar(100)
-- exec match 'a title', 'b author', '1/1/2007', 15
AS
SELECT id,
CASE WHEN author = #pAuthor THEN 1 ELSE 0 END
+ CASE WHEN title = #pTitle THEN 1 ELSE 0 END
+ CASE WHEN bookdate = #pDate THEN 1 ELSE 0 END
+ CASE WHEN pages = #pPages THEN 1 ELSE 0 END AS matches,
CASE WHEN author = #pAuthor THEN 4 ELSE 0 END
+ CASE WHEN title = #pTitle THEN 3 ELSE 0 END
+ CASE WHEN bookdate = #pDate THEN 2 ELSE 0 END
+ CASE WHEN pages = #pPages THEN 1 ELSE 0 END AS score
FROM books
WHERE author = #pAuthor
OR title = #pTitle
OR bookdate = #PDate
OR pages = #pPages
ORDER BY matches DESC, score DESC
However, this of course causes a table scan. You can avoid that by making it a union of a CTE and 4 WHERE clauses, one for each property - there will be duplicates, but you can just take the TOP 1 anyway.
EDIT: Added the WHERE ... OR clause. I'd feel more comfortable if it were
SELECT ... FROM books WHERE author = #pAuthor
UNION
SELECT ... FROM books WHERE title = #pTitle
UNION
...

Related

How to replace all special characters in string

I have a table with the following columns:
dbo.SomeInfo
- Id
- Name
- InfoCode
Now I need to update the above table's InfoCode as
Update dbo.SomeInfo
Set InfoCode= REPLACE(Replace(RTRIM(LOWER(Name)),' ','-'),':','')
This replaces all spaces with - & lowercase the name
When I do check the InfoCode, I see there are Names with some special characters like
Cathe Friedrich''s Low Impact
coffeyfit-cardio-box-&-burn
Jillian Michaels: Cardio
Then I am manually writing the update sql against this as
Update dbo.SomeInfo
SET InfoCode= 'cathe-friedrichs-low-impact'
where Name ='Cathe Friedrich''s Low Impact '
Now, this solution is not realistic for me. I checked the following links related to Regex & others around it.
UPDATE and REPLACE part of a string
https://www.codeproject.com/Questions/456246/replace-special-characters-in-sql
But none of them is hitting the requirement.
What I need is if there is any character other [a-z0-9] replace it - & also there should not be continuous -- in InfoCode
The above Update sql has set some values of InfoCode as the-dancer's-workout®----starter-package
Some Names have value as
Sleek Technique™
The Dancer's-workout®
How can I write Update sql that could handle all such special characters?
Using NGrams8K you could split the string into characters and then rather than replacing every non-acceptable character, retain only certain ones:
SELECT (SELECT '' + CASE WHEN N.token COLLATE Latin1_General_BIN LIKE '[A-z0-9]'THEN token ELSE '-' END
FROM dbo.NGrams8k(V.S,1) N
ORDER BY position
FOR XML PATH(''))
FROM (VALUES('Sleek Technique™'),('The Dancer''s-workout®'))V(S);
I use COLLATE here as on my default collation in my instance the '™' is ignored, therefore I use a binary collation. You may want to use COLLATE to switch the string back to its original collation outside of the subquery.
This approach is fully inlinable:
First we need a mock-up table with some test data:
DECLARe #SomeInfo TABLE (Id INT IDENTITY, InfoCode VARCHAR(100));
INSERT INTO #SomeInfo (InfoCode) VALUES
('Cathe Friedrich''s Low Impact')
,('coffeyfit-cardio-box-&-burn')
,('Jillian Michaels: Cardio')
,('Sleek Technique™')
,('The Dancer''s-workout®');
--This is the query
WITH cte AS
(
SELECT 1 AS position
,si.Id
,LOWER(si.InfoCode) AS SourceText
,SUBSTRING(LOWER(si.InfoCode),1,1) AS OneChar
FROM #SomeInfo si
UNION ALL
SELECT cte.position +1
,cte.Id
,cte.SourceText
,SUBSTRING(LOWER(cte.SourceText),cte.position+1,1) AS OneChar
FROM cte
WHERE position < DATALENGTH(SourceText)
)
,Cleaned AS
(
SELECT cte.Id
,(
SELECT CASE WHEN ASCII(cte2.OneChar) BETWEEN 65 AND 90 --A-Z
OR ASCII(cte2.OneChar) BETWEEN 97 AND 122--a-z
OR ASCII(cte2.OneChar) BETWEEN 48 AND 57 --0-9
--You can easily add more ranges
THEN cte2.OneChar ELSE '-'
--You can easily nest another CASE to deal with special characters like the single quote in your examples...
END
FROM cte AS cte2
WHERE cte2.Id=cte.Id
ORDER BY cte2.position
FOR XML PATH('')
) AS normalised
FROM cte
GROUP BY cte.Id
)
,NoDoubleHyphens AS
(
SELECT REPLACE(REPLACE(REPLACE(normalised,'-','<>'),'><',''),'<>','-') AS normalised2
FROM Cleaned
)
SELECT CASE WHEN RIGHT(normalised2,1)='-' THEN SUBSTRING(normalised2,1,LEN(normalised2)-1) ELSE normalised2 END AS FinalResult
FROM NoDoubleHyphens;
The first CTE will recursively (well, rather iteratively) travers down the string, character by character and a return a very slim set with one row per character.
The second CTE will then GROUP the Ids. This allows for a correlated sub-query, where the actual check is performed using ASCII-ranges. FOR XML PATH('') is used to re-concatenate the string. With SQL-Server 2017+ I'd suggest to use STRING_AGG() instead.
The third CTE will use a well known trick to get rid of multiple occurances of a character. Take any two characters which will never occur in your string, I use < and >. A string like a--b---c will come back as a<><>b<><><>c. After replacing >< with nothing we get a<>b<>c. Well, that's it...
The final SELECT will cut away a trailing hyphen. If needed you can add similar logic to get rid of a leading hyphen. With v2017+ There was TRIM('-') to make this easier...
The result
cathe-friedrich-s-low-impact
coffeyfit-cardio-box-burn
jillian-michaels-cardio
sleek-technique
the-dancer-s-workout
You can create a User-Defined-Function for something like that.
Then use the UDF in the update.
CREATE FUNCTION [dbo].LowerDashString (#str varchar(255))
RETURNS varchar(255)
AS
BEGIN
DECLARE #result varchar(255);
DECLARE #chr varchar(1);
DECLARE #pos int;
SET #result = '';
SET #pos = 1;
-- lowercase the input and remove the single-quotes
SET #str = REPLACE(LOWER(#str),'''','');
-- loop through the characters
-- while replacing anything that's not a letter to a dash
WHILE #pos <= LEN(#str)
BEGIN
SET #chr = SUBSTRING(#str, #pos, 1)
IF #chr LIKE '[a-z]' SET #result += #chr;
ELSE SET #result += '-';
SET #pos += 1;
END;
-- SET #result = TRIM('-' FROM #result); -- SqlServer 2017 and beyond
-- multiple dashes to one dash
WHILE #result LIKE '%--%' SET #result = REPLACE(#result,'--','-');
RETURN #result;
END;
GO
Example snippet using the function:
-- using a table variable for demonstration purposes
declare #SomeInfo table (Id int primary key identity(1,1) not null, InfoCode varchar(100) not null);
-- sample data
insert into #SomeInfo (InfoCode) values
('Cathe Friedrich''s Low Impact'),
('coffeyfit-cardio-box-&-burn'),
('Jillian Michaels: Cardio'),
('Sleek Technique™'),
('The Dancer''s-workout®');
update #SomeInfo
set InfoCode = dbo.LowerDashString(InfoCode)
where (InfoCode LIKE '%[^A-Z-]%' OR InfoCode != LOWER(InfoCode));
select *
from #SomeInfo;
Result:
Id InfoCode
-- -----------------------------
1 cathe-friedrichs-low-impact
2 coffeyfit-cardio-box-burn
3 jillian-michaels-cardio
4 sleek-technique-
5 the-dancers-workout-

How to process SQL string char-by-char to build a match weight?

The problem: I need to display fields for user entry on a form, dynamic to some lookup criteria.
My current solution: I've created a SQL table with some field entry criteria, based on a relatively simple matching criteria. The match criteria basically is such that Lookup Value starts with Match Code, and the most precise match is found by doing a LEN comparison.
select
f.[IS_REQUIRED]
, f.[MASK]
, f.[MAX_LENGTH]
, f.[MIN_LENGTH]
, f.[RESOURCE_KEY]
, f.[SEQUENCE]
from [dbo].[MY_RECORD] r with(nolock)
inner join [dbo].[ENTRY_FORMAT] f with(nolock)
on r.[LOOKUP_VALUE] like f.[MATCH_CODE]
-- Logic to filter by single, most-precise record match.
cross apply (
select f1.[SEQUENCE]
from [dbo].[ENTRY_FORMAT] f1 with(nolock)
where f.[SEQUENCE] = f1.[SEQUENCE]
and s.[MATCH_CODE] like f1.[MATCH_CODE]
group by f1.[SEQUENCE]
having len(f.[MATCH_CODE]) = max(len(f1.[MATCH_CODE]))
) tFilter
where r.[ID] = #RecordId
Current issues with this is that the most precise match has to be calculated each and every call, against each and every match. Additionally, I'm only currently able to support the % in the MATCH_CODE. (e.g., '%' is the default for all LOOKUP_VALUE, while an entry of '12%' would be the more precise match for a LOOKUP_VALUE of '12345', and MATCH_CODE of '12345' should obviously me the most precise match.) However, I would like to add support for [4-7], etc. wildcards. Going just off of LEN, this would definitely be wrong, because '[4-7]' adds a lot to the length, but, for example '12345' is still the desired match over '123[4-7]'
My desired update: To add a MATCH_WEIGHT column to ENTRY_FORMAT, which I can update via a trigger on insert/update. For my initial implementation, I'm just looking for something that can go through MATCH_CODE, character by character, increasing MATCH_WEIGHT, but treating [..] as just a single character when doing so. Is there a good mechanism (UDF - either SQL or CLR? CURSOR?) for iterating through characters of a varchar field to calculate a value in this way? Something like increasing MATCH_WEIGHT by two per non-wildcard, and perhaps by one on a wildcard; with details to be further thought out and worked out...
The goal being to use a query more like:
select
f.[IS_REQUIRED]
, f.[MASK]
, f.[MAX_LENGTH]
, f.[MIN_LENGTH]
, f.[RESOURCE_KEY]
, f.[SEQUENCE]
from [dbo].[MY_RECORD] r with(nolock)
-- Logic to filter by single, most-precise record match.
cross apply (
select top 1
f1.[MATCH_CODE]
, f1.[SEQUENCE]
from [dbo].[ENTRY_FORMAT] f1 with(nolock)
where r.[LOOKUP_VALUE] like f1.[MATCH_CODE]
group by f1.[SEQUENCE]
order by f1.[MATCH_WEIGHT] desc
) tFilter
inner join [dbo].[ENTRY_FORMAT] f with(nolock)
on f.[MATCH_CODE] = tFilter.[MATCH_CODE]
and f.[SEQUENCE] = tFilter.[SEQUENCE]
where r.[ID] = #RecordId
Note: I realize this is a relatively fragile setup. The ENTRY_FORMAT records are only entered by developers, who are aware of the restrictions, so for now assume that valid data is entered, and which does not cause match collisions.
With some help, I've come up with one implementation (answer below), but am still unsure as to my total design, so welcoming better answers or any criticism.
From Steve's answer on another question, I've used much of the body to create a function to accomplish support for the [..] wildcard at the end of a match code.
CREATE FUNCTION CalculateMatchWeight
(
-- Add the parameters for the function here
#MatchCode varchar(100)
)
RETURNS smallint
AS
BEGIN
-- Declare the return variable here
DECLARE #Result smallint = 0;
-- Add the T-SQL statements to compute the return value here
DECLARE #Pos int = 1, #N0 int = ascii('0'), #N9 int = ascii('9'), #AA int = ascii('A'), #AZ int = ascii('Z'), #Wild int = ascii('%'), #Range int = ascii('[');
DECLARE #Asc int;
DECLARE #WorkingString varchar(100) = upper(#MatchCode)
WHILE #Pos <= LEN(#WorkingString)
BEGIN
SET #Asc = ascii(substring(#WorkingString, #Pos, 1));
If ((#Asc between #N0 and #N9) or (#Asc between #AA and #AZ))
SET #Result = #Result + 2;
ELSE
BEGIN
-- Check wildcard matching, update value according to match strength, and stop calculating further.
-- TODO: In the future we may wish to have match codes with wildcards not just at the end; try to figure out a mechanism to calculating that case.
IF (#Asc = #Range)
BEGIN
SET #Result = #Result + 2;
SET #Pos = 100;
END
IF (#Asc = #Wild)
BEGIN
SET #Result = #Result + 1;
SET #Pos = 100;
END
END
SET #Pos = #Pos + 1
END
-- Return the result of the function
RETURN #Result
END
I've checked that this can generate desired output for the current cases that I'm trying to cover:
SELECT [dbo].[CalculateMatchWeight] ('12345'); -- Most precise (10)
SELECT [dbo].[CalculateMatchWeight] ('123[4-5]'); -- Middle (8)
SELECT [dbo].[CalculateMatchWeight] ('123%'); -- Least (7)
Now I can call this function in a trigger on INSERT/UPDATE to update the MATCH_WEIGHT:
CREATE TRIGGER TRG_ENTRY_FORMAT_CalcMatchWeight
ON ENTRY_FORMAT
AFTER INSERT,UPDATE
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
-- Insert statements for trigger here
DECLARE #NewMatchWeight smallint = (select dbo.CalculateMatchWeight(inserted.MATCH_CODE) from inserted),
#CurrentMatchWeight smallint = (select inserted.MATCH_WEIGHT from inserted);
IF (#CurrentMatchWeight <> #NewMatchWeight)
BEGIN
UPDATE ENTRY_FORMAT
SET MATCH_WEIGHT = #NewMatchWeight
FROM inserted
WHERE ENTRY_FORMAT.[MATCH_CODE] = inserted.[MATCH_CODE]
AND ENTRY_FORMAT.[SEQUENCE] = inserted.[SEQUENCE]
END
END

Dynamic column - how to improve performance

OK, second attempt at the question (first is How to build virtual columns?)
Apologies in advance if this kind of question isn't suitable fo StackOverflow. Feel free to take it down if needed.
The basic question is "what's the best way to have a column whose content is built dynamically".
The code revolves around four tables.
First three (equipment, accessory, association) can be seen as two colums each, an ID and a name.
The goal is to replace the association name with a name built dynamically based on the name of the association components.
The fourth table describes the associations. The association should be seen as a tree, and each "branch" of the tree is represented as a line in this table. Columns are:
branchID (primary key)
association ID (int)
parent node kind (association = 1, equipement = 2, accessory = 3) (int)
parent node ID (ID in one of the three other tables) (int)
kid node kind
kid node ID
I do have something that works, using a view and a function (the function code follows). However, performance isn't satisfactory.
I see three improvement path:
minor adjustments through primary keys and indexes (code is significantly faster if there is NO primary key on the 4th table - I haven't been able to explain that)
fully reviewing the design behind the 4th table (I'm open to ideas)
replacing the custom function below by... something else! But what could that be?
Sorry for the French names... I chose not to edit the code before posting, assuming that copy/paste errors are worse than translation
Type = kind
Enfant = kid
Jumelage = association
Numero = name (oops...)
liens = branches
Thanks.
USE [testDB]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [dbo].[testjmrFN]
(
#JumelageID int
)
RETURNS varchar(max)
AS
BEGIN
DECLARE #Result varchar(max)
DECLARE #TypeParent int
DECLARE #ParentID int
DECLARE #TypeEnfant int
DECLARE #EnfantID int
DECLARE #NumeroEquipement varchar(max)
DECLARE #NumeroAccessoire varchar(max)
SET #Result = ''
DECLARE liens CURSOR LOCAL FOR
SELECT l.TypeParent, l.ParentID, l.TypeEnfant, l.EnfantID, e.Numero, a.Numero
FROM ges_Jumelages_Liens l
LEFT JOIN ges_Equipements e ON l.EnfantID = e.EquipementID
LEFT JOIN ges_Accessoires a ON l.EnfantID = a.AccessoireID
WHERE l.JumelageID = #JumelageID
ORDER BY LienID
OPEN liens
FETCH NEXT FROM liens INTO #TypeParent, #ParentID, #TypeEnfant, #EnfantID, #NumeroEquipement, #NumeroAccessoire
WHILE ##FETCH_STATUS = 0
BEGIN
IF #TypeParent = 1 AND #TypeEnfant = 2
BEGIN
IF #Result <> ''
BEGIN
SET #Result = #Result + '§'
END
SET #Result = #Result + IsNull(#NumeroEquipement,'')
END
IF #TypeParent = 2 AND #TypeEnfant = 3
BEGIN
IF #Result <> ''
BEGIN
SET #Result = #Result + '~'
END
SET #Result = #Result + IsNull(#NumeroAccessoire,'')
END
FETCH NEXT FROM liens INTO #TypeParent, #ParentID, #TypeEnfant, #EnfantID, #NumeroEquipement, #NumeroAccessoire
END
CLOSE liens
DEALLOCATE liens
RETURN #Result
END
This gives you the list that you need. If you want to concatonate the values into a long string with delimiters based on the cte's Delimiter field, see here: Concatenate many rows into a single text string?
use master;
go
with cte (TypeParent,ParentID,TypeEnfant,EnfantID,Numero,Delimiter)
as
( select l.TypeParent
, l.ParentID
, l.TypeEnfant
, l.EnfantID
, e.Numero
, '§' as Delimiter
from dbo.ges_Jumelages_Liens as l
join dbo.ges_Equipements as e
on l.EnfantID = e.EquipmentID
where l.TypeParent = 1
and l.TypeEnfant = 2
union all
select l.TypeParent
, l.ParentID
, l.TypeEnfant
, l.EnfantID
, a.Numero
,'~' as Delimiter
from dbo.ges_Jumelages_Liens as l
join dbo.ges_Accessoires as a
on l.EnfantID = a.EquipmentID
where l.TypeParent = 2
and l.TypeEnfant = 3
)
select *
from cte
If you need further help, please clarify your question.

sql server cursor

I want to copy data from one table (rawdata, all columns are VARCHAR) to another table (formatted with corresponding column format).
For copying data from the rawdata table into formatted table, I'm using cursor in order to identify which row is affected. I need to log that particular row in an error log table, skip it, and continue copying remaining rows.
It takes more time to copying. Is there any other way to achieve this?
this is my query
DECLARE #EntityId Varchar(16) ,
#PerfId Varchar(16),
#BaseId Varchar(16) ,
#UpdateStatus Varchar(16)
DECLARE CursorSample CURSOR FOR
SELECT EntityId, PerfId, BaseId, #UpdateStatus
FROM RawdataTable
--Returns 204,000 rows
OPEN CursorSample
FETCH NEXT FROM CursorSample INTO #EntityId,#PerfId,#BaseId,#UpdateStatus
WHILE ##FETCH_STATUS = 0
BEGIN
BEGIN TRY
--try insertting row in formatted table
Insert into FormattedTable
(EntityId,PerfId,BaseId,UpdateStatus)
Values
(Convert(int,#EntityId),
Convert(int,#PerfId),
Convert(int,#BaseId),
Convert(int,#UpdateStatus))
END TRY
BEGIN CATCH
--capture Error EntityId in errorlog table
Insert into ERROR_LOG
(TableError_Message,Error_Procedure,Error_Log_Time)
Values
(Error_Message()+#EntityId,’xxx’, GETDATE())
END CATCH
FETCH NEXT FROM outerCursor INTO #EntityId, #BaseId
END
CLOSE CursorSample
DEALLOCATE CursorSampler –cleanup CursorSample
You should just be able to use a INSERT INTO statement to put the records directly into the formatted table. INSERT INTO will perform much better than using a cursor.
INSERT INTO FormattedTable
SELECT
CONVERT(int, EntityId),
CONVERT(int, PerfId),
CONVERT(int, BaseId),
CONVERT(int, UpdateStatus)
FROM RawdataTable
WHERE
IsNumeric(EntityId) = 1
AND IsNumeric(PerfId) = 1
AND IsNumeric(BaseId) = 1
AND IsNumeric(UpdateStatus) = 1
Note that IsNumeric can sometimes return 1 for values that will then fail on CONVERT. For example, IsNumeric('$e0') will return 1, so you may need to create a more robust user defined function for determining if a string is a number, depending on your data.
Also, if you need a log of all records that could not be moved into the formatted table, just modify the WHERE clause:
INSERT INTO ErrorLog
SELECT
EntityId,
PerfId,
BaseId,
UpdateStatus
FROM RawdataTable
WHERE
NOT (IsNumeric(EntityId) = 1
AND IsNumeric(PerfId) = 1
AND IsNumeric(BaseId) = 1
AND IsNumeric(UpdateStatus) = 1)
EDIT
Rather than using IsNumeric directly, it may be better to create a custom UDF that will tell you if a string can be converted to an int. This function worked for me (albeit with limited testing):
CREATE FUNCTION IsInt(#value VARCHAR(50))
RETURNS bit
AS
BEGIN
DECLARE #number AS INT
DECLARE #numeric AS NUMERIC(18,2)
SET #number = 0
IF IsNumeric(#value) = 1
BEGIN
SET #numeric = CONVERT(NUMERIC(18,2), #value)
IF #numeric BETWEEN -2147483648 AND 2147483647
SET #number = CONVERT(INT, #numeric)
END
RETURN #number
END
GO
The updated SQL for the insert into the formatted table would then look like this:
INSERT INTO FormattedTable
SELECT
CONVERT(int, CONVERT(NUMERIC(18,2), EntityId)),
CONVERT(int, CONVERT(NUMERIC(18,2), PerfId)),
CONVERT(int, CONVERT(NUMERIC(18,2), BaseId)),
CONVERT(int, CONVERT(NUMERIC(18,2), UpdateStatus))
FROM RawdataTable
WHERE
dbo.IsInt(EntityId) = 1
AND dbo.IsInt(PerfId) = 1
AND dbo.IsInt(BaseId) = 1
AND dbo.IsInt(UpdateStatus) = 1
There may be a little weirdness around handling NULLs (my function will return 0 if NULL is passed in, even though an INT can certainly be null), but that can be adjusted depending on what is supposed to happen with NULL values in the RawdataTable.
You can put a WHERE clause in your cursor definition so that only valid records are selected in the first place. You might need to create a function to determine validity, but it should be faster than looping over them.
Actually, you might want to create a temp table of the invalid records, so that you can log the errors, then define the cursor only on the rows that are not in the temp table.
Insert into will work much more better than Cursor.
As Cursor work solely in Memory of your PC and slows down the optimization of SQL Server. We should avoid using Cursors but (of course) there are situations where usage of Cursor cannot be avoided.

creating SQL command to return match or else everything else

i have three checkboxs in my application. If the user ticks a combination of the boxes i want to return matches for the boxes ticked and in the case where a box is not checked i just want to return everything . Can i do this with single SQL command?
I recommend doing the following in the WHERE clause;
...
AND (#OnlyNotApproved = 0 OR ApprovedDate IS NULL)
It is not one SQL command, but works very well for me. Basically the first part checks if the switch is set (checkbox selected). The second is the filter given the checkbox is selected. Here you can do whatever you would normally do.
You can build a SQL statement with a dynamic where clause:
string query = "SELECT * FROM TheTable WHERE 1=1 ";
if (checkBlackOnly.Checked)
query += "AND Color = 'Black' ";
if (checkWhiteOnly.Checked)
query += "AND Color = 'White' ";
Or you can create a stored procedure with variables to do this:
CREATE PROCEDURE dbo.GetList
#CheckBlackOnly bit
, #CheckWhiteOnly bit
AS
SELECT *
FROM TheTable
WHERE
(#CheckBlackOnly = 0 or (#CheckBlackOnly = 1 AND Color = 'Black'))
AND (#CheckWhiteOnly = 0 or (#CheckWhiteOnly = 1 AND Color = 'White'))
....
sure. example below assumes SQL Server but you get the gist.
You could do it pretty easily using some Dynamic SQL
Lets say you were passing your checkboxes to a sproc as bit values.
DECLARE bit #cb1
DECLARE bit #cb2
DECLARE bit #cb3
DECLARE nvarchar(max) #whereClause
IF(#cb1 = 1)
SET #whereClause = #whereClause + ' AND col1 = ' + #cb1
IF(#cb2 = 1)
SET #whereClause = #whereClause + ' AND col2 = ' + #cb2
IF(#cb3 = 1)
SET #whereClause = #whereClause + ' AND col3 = ' + #cb3
DECLARE nvarchar(max) #sql
SET #sql = 'SELECT * FROM Table WHERE 1 = 1' + #whereClause
exec (#sql)
Sure you can.
If you compose your SQL SELECT statement in the code, then you just have to generate:
in case nothing or all is selected (check it using your language), you just issue non-filter version:
SELECT ... FROM ...
in case some checkboxes are checked, you create add a WHERE clause to it:
SELECT ... FROM ... WHERE MyTypeID IN (3, 5, 7)
This is single SQL command, but it is different depending on the selection, of course.
Now, if you would like to use one stored procedure to do the job, then the implementation would depend on the database engine since what you need is to be able to pass multiple parameters. I would discourage using a procedure with just plain 3 parameters, because when you add another check-box, you will have to change the SQL procedure as well.
SELECT *
FROM table
WHERE value IN
(
SELECT option
FROM checked_options
UNION ALL
SELECT option
FROM all_options
WHERE NOT EXISTS (
SELECT 1
FROM checked_options
)
)
The inner subquery will return either the list of the checked options, or all possible options if the list is empty.
For MySQL, it will be better to use this:
SELECT *
FROM t_data
WHERE EXISTS (
SELECT 1
FROM t_checked
WHERE session = 2
)
AND opt IN
(
SELECT opt
FROM t_checked
WHERE session = 2
)
UNION ALL
SELECT *
FROM t_data
WHERE NOT EXISTS (
SELECT 1
FROM t_checked
WHERE session = 2
)
MySQL will notice IMPOSSIBLE WHERE on either of the SELECT's, and will execute only the appropriate one.
See this entry in my blog for performance detail:
Selecting options
If you pass a null into the appropriate values, then it will compare that specific column against itself. If you pass a value, it will compare the column against the value
CREATE PROCEDURE MyCommand
(
#Check1 BIT = NULL,
#Check2 BIT = NULL,
#Check3 BIT = NULL
)
AS
SELECT *
FROM Table
WHERE Column1 = ISNULL(#Check1, Column1)
AND Column2 = ISNULL(#Check2, Column2)
AND Column3 = ISNULL(#Check3, Column3)
The question did not specify a DB product or programming language. However it can be done with ANSI SQL in a cross-product manner.
Assuming a programming language that uses $var$ for variable insertion on strings.
On the server you get all selected values in a list, so if the first two boxes are selected you would have a GET/POST variable like
http://url?colors=black,white
so you build a query like this (pseudocode)
colors = POST['colors'];
colors_list = replace(colors, ',', "','"); // separate colors with single-quotes
sql = "WHERE ('$colors$' == '') OR (color IN ('$colors_list$'));";
and your DB will see:
WHERE ('black,white' == '') OR (color IN ('black','white')); -- some selections
WHERE ('' == '') OR (color IN ('')); -- nothing selected (matches all rows)
Which is a valid SQL query. The first condition matches any row when nothing is selected, otherwise the right side of the OR statement will match any row that is one of the colors. This query scales to an unlimited number of options without modification. The brackets around each clause are optional as well but I use them for clarity.
Naturally you will need to protect the string from SQL injection using parameters or escaping as you see fit. Otherwise a malicious value for colors will allow your DB to be attacked.