Dynamic column - how to improve performance - sql

OK, second attempt at the question (first is How to build virtual columns?)
Apologies in advance if this kind of question isn't suitable fo StackOverflow. Feel free to take it down if needed.
The basic question is "what's the best way to have a column whose content is built dynamically".
The code revolves around four tables.
First three (equipment, accessory, association) can be seen as two colums each, an ID and a name.
The goal is to replace the association name with a name built dynamically based on the name of the association components.
The fourth table describes the associations. The association should be seen as a tree, and each "branch" of the tree is represented as a line in this table. Columns are:
branchID (primary key)
association ID (int)
parent node kind (association = 1, equipement = 2, accessory = 3) (int)
parent node ID (ID in one of the three other tables) (int)
kid node kind
kid node ID
I do have something that works, using a view and a function (the function code follows). However, performance isn't satisfactory.
I see three improvement path:
minor adjustments through primary keys and indexes (code is significantly faster if there is NO primary key on the 4th table - I haven't been able to explain that)
fully reviewing the design behind the 4th table (I'm open to ideas)
replacing the custom function below by... something else! But what could that be?
Sorry for the French names... I chose not to edit the code before posting, assuming that copy/paste errors are worse than translation
Type = kind
Enfant = kid
Jumelage = association
Numero = name (oops...)
liens = branches
Thanks.
USE [testDB]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [dbo].[testjmrFN]
(
#JumelageID int
)
RETURNS varchar(max)
AS
BEGIN
DECLARE #Result varchar(max)
DECLARE #TypeParent int
DECLARE #ParentID int
DECLARE #TypeEnfant int
DECLARE #EnfantID int
DECLARE #NumeroEquipement varchar(max)
DECLARE #NumeroAccessoire varchar(max)
SET #Result = ''
DECLARE liens CURSOR LOCAL FOR
SELECT l.TypeParent, l.ParentID, l.TypeEnfant, l.EnfantID, e.Numero, a.Numero
FROM ges_Jumelages_Liens l
LEFT JOIN ges_Equipements e ON l.EnfantID = e.EquipementID
LEFT JOIN ges_Accessoires a ON l.EnfantID = a.AccessoireID
WHERE l.JumelageID = #JumelageID
ORDER BY LienID
OPEN liens
FETCH NEXT FROM liens INTO #TypeParent, #ParentID, #TypeEnfant, #EnfantID, #NumeroEquipement, #NumeroAccessoire
WHILE ##FETCH_STATUS = 0
BEGIN
IF #TypeParent = 1 AND #TypeEnfant = 2
BEGIN
IF #Result <> ''
BEGIN
SET #Result = #Result + '§'
END
SET #Result = #Result + IsNull(#NumeroEquipement,'')
END
IF #TypeParent = 2 AND #TypeEnfant = 3
BEGIN
IF #Result <> ''
BEGIN
SET #Result = #Result + '~'
END
SET #Result = #Result + IsNull(#NumeroAccessoire,'')
END
FETCH NEXT FROM liens INTO #TypeParent, #ParentID, #TypeEnfant, #EnfantID, #NumeroEquipement, #NumeroAccessoire
END
CLOSE liens
DEALLOCATE liens
RETURN #Result
END

This gives you the list that you need. If you want to concatonate the values into a long string with delimiters based on the cte's Delimiter field, see here: Concatenate many rows into a single text string?
use master;
go
with cte (TypeParent,ParentID,TypeEnfant,EnfantID,Numero,Delimiter)
as
( select l.TypeParent
, l.ParentID
, l.TypeEnfant
, l.EnfantID
, e.Numero
, '§' as Delimiter
from dbo.ges_Jumelages_Liens as l
join dbo.ges_Equipements as e
on l.EnfantID = e.EquipmentID
where l.TypeParent = 1
and l.TypeEnfant = 2
union all
select l.TypeParent
, l.ParentID
, l.TypeEnfant
, l.EnfantID
, a.Numero
,'~' as Delimiter
from dbo.ges_Jumelages_Liens as l
join dbo.ges_Accessoires as a
on l.EnfantID = a.EquipmentID
where l.TypeParent = 2
and l.TypeEnfant = 3
)
select *
from cte
If you need further help, please clarify your question.

Related

How to process SQL string char-by-char to build a match weight?

The problem: I need to display fields for user entry on a form, dynamic to some lookup criteria.
My current solution: I've created a SQL table with some field entry criteria, based on a relatively simple matching criteria. The match criteria basically is such that Lookup Value starts with Match Code, and the most precise match is found by doing a LEN comparison.
select
f.[IS_REQUIRED]
, f.[MASK]
, f.[MAX_LENGTH]
, f.[MIN_LENGTH]
, f.[RESOURCE_KEY]
, f.[SEQUENCE]
from [dbo].[MY_RECORD] r with(nolock)
inner join [dbo].[ENTRY_FORMAT] f with(nolock)
on r.[LOOKUP_VALUE] like f.[MATCH_CODE]
-- Logic to filter by single, most-precise record match.
cross apply (
select f1.[SEQUENCE]
from [dbo].[ENTRY_FORMAT] f1 with(nolock)
where f.[SEQUENCE] = f1.[SEQUENCE]
and s.[MATCH_CODE] like f1.[MATCH_CODE]
group by f1.[SEQUENCE]
having len(f.[MATCH_CODE]) = max(len(f1.[MATCH_CODE]))
) tFilter
where r.[ID] = #RecordId
Current issues with this is that the most precise match has to be calculated each and every call, against each and every match. Additionally, I'm only currently able to support the % in the MATCH_CODE. (e.g., '%' is the default for all LOOKUP_VALUE, while an entry of '12%' would be the more precise match for a LOOKUP_VALUE of '12345', and MATCH_CODE of '12345' should obviously me the most precise match.) However, I would like to add support for [4-7], etc. wildcards. Going just off of LEN, this would definitely be wrong, because '[4-7]' adds a lot to the length, but, for example '12345' is still the desired match over '123[4-7]'
My desired update: To add a MATCH_WEIGHT column to ENTRY_FORMAT, which I can update via a trigger on insert/update. For my initial implementation, I'm just looking for something that can go through MATCH_CODE, character by character, increasing MATCH_WEIGHT, but treating [..] as just a single character when doing so. Is there a good mechanism (UDF - either SQL or CLR? CURSOR?) for iterating through characters of a varchar field to calculate a value in this way? Something like increasing MATCH_WEIGHT by two per non-wildcard, and perhaps by one on a wildcard; with details to be further thought out and worked out...
The goal being to use a query more like:
select
f.[IS_REQUIRED]
, f.[MASK]
, f.[MAX_LENGTH]
, f.[MIN_LENGTH]
, f.[RESOURCE_KEY]
, f.[SEQUENCE]
from [dbo].[MY_RECORD] r with(nolock)
-- Logic to filter by single, most-precise record match.
cross apply (
select top 1
f1.[MATCH_CODE]
, f1.[SEQUENCE]
from [dbo].[ENTRY_FORMAT] f1 with(nolock)
where r.[LOOKUP_VALUE] like f1.[MATCH_CODE]
group by f1.[SEQUENCE]
order by f1.[MATCH_WEIGHT] desc
) tFilter
inner join [dbo].[ENTRY_FORMAT] f with(nolock)
on f.[MATCH_CODE] = tFilter.[MATCH_CODE]
and f.[SEQUENCE] = tFilter.[SEQUENCE]
where r.[ID] = #RecordId
Note: I realize this is a relatively fragile setup. The ENTRY_FORMAT records are only entered by developers, who are aware of the restrictions, so for now assume that valid data is entered, and which does not cause match collisions.
With some help, I've come up with one implementation (answer below), but am still unsure as to my total design, so welcoming better answers or any criticism.
From Steve's answer on another question, I've used much of the body to create a function to accomplish support for the [..] wildcard at the end of a match code.
CREATE FUNCTION CalculateMatchWeight
(
-- Add the parameters for the function here
#MatchCode varchar(100)
)
RETURNS smallint
AS
BEGIN
-- Declare the return variable here
DECLARE #Result smallint = 0;
-- Add the T-SQL statements to compute the return value here
DECLARE #Pos int = 1, #N0 int = ascii('0'), #N9 int = ascii('9'), #AA int = ascii('A'), #AZ int = ascii('Z'), #Wild int = ascii('%'), #Range int = ascii('[');
DECLARE #Asc int;
DECLARE #WorkingString varchar(100) = upper(#MatchCode)
WHILE #Pos <= LEN(#WorkingString)
BEGIN
SET #Asc = ascii(substring(#WorkingString, #Pos, 1));
If ((#Asc between #N0 and #N9) or (#Asc between #AA and #AZ))
SET #Result = #Result + 2;
ELSE
BEGIN
-- Check wildcard matching, update value according to match strength, and stop calculating further.
-- TODO: In the future we may wish to have match codes with wildcards not just at the end; try to figure out a mechanism to calculating that case.
IF (#Asc = #Range)
BEGIN
SET #Result = #Result + 2;
SET #Pos = 100;
END
IF (#Asc = #Wild)
BEGIN
SET #Result = #Result + 1;
SET #Pos = 100;
END
END
SET #Pos = #Pos + 1
END
-- Return the result of the function
RETURN #Result
END
I've checked that this can generate desired output for the current cases that I'm trying to cover:
SELECT [dbo].[CalculateMatchWeight] ('12345'); -- Most precise (10)
SELECT [dbo].[CalculateMatchWeight] ('123[4-5]'); -- Middle (8)
SELECT [dbo].[CalculateMatchWeight] ('123%'); -- Least (7)
Now I can call this function in a trigger on INSERT/UPDATE to update the MATCH_WEIGHT:
CREATE TRIGGER TRG_ENTRY_FORMAT_CalcMatchWeight
ON ENTRY_FORMAT
AFTER INSERT,UPDATE
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
-- Insert statements for trigger here
DECLARE #NewMatchWeight smallint = (select dbo.CalculateMatchWeight(inserted.MATCH_CODE) from inserted),
#CurrentMatchWeight smallint = (select inserted.MATCH_WEIGHT from inserted);
IF (#CurrentMatchWeight <> #NewMatchWeight)
BEGIN
UPDATE ENTRY_FORMAT
SET MATCH_WEIGHT = #NewMatchWeight
FROM inserted
WHERE ENTRY_FORMAT.[MATCH_CODE] = inserted.[MATCH_CODE]
AND ENTRY_FORMAT.[SEQUENCE] = inserted.[SEQUENCE]
END
END

SQL Server WHILE

I am trying to generating unique card number from following function. I put my query inside a while loop to prevent duplicate card number but still I am getting duplicate numbers.
Anyone can help me?
Create FUNCTION GetCardNumber ()
RETURNS varchar(20)
AS
BEGIN
Declare #NewID varchar(20);
Declare #NewID1 varchar(36) ;
Declare #Counter int = 0;
While(1=1)
Begin
Set #NewID1 = (SELECT [MyNewId] FROM Get_NewID);
Set #NewID = '2662464' + '823' + '001' +right(YEAR(GETUTCDATE()),2) +(left(convert(varchar,ABS(CAST(CAST(#NewID1 AS VARBINARY(5)) AS bigint))),5));
Set #Counter = (Select count(*) from ContactTBL where ContactMembershipID = #NewID);
If #Counter = 0
BEGIN
BREAK;
END
End
return #newID
END
Go
Update : I am getting MyNewID from View:
CREATE VIEW Get_NewID
AS
SELECT NEWID() AS MyNewID
GO
Many thanks in advance.
Won't this just return the same value every time you run it? I can't see anywhere where you're incrementing anything, or getting any kind of value that would give you unique values each time. You need to do something that changes the value each time, for example using the current exact date and time.
You're returning varchar(20) in line 2. To get your 'unique' NewId, you're doing this:
Set #NewId = (13 digit constant value) + (last 2 digits of current year) +
left(
convert(varchar,
ABS(CAST
(CAST(#NewID1 AS VARBINARY(5)) AS bigint)
)
)
,5)
which leaves you only 5 characters of uniqueness! This is almost certainly the issue. An easy fix may be increase the characters you return on line 2 e.g. RETURNS varchar(30)
What you're doing is unnecessarily complicated, and I think there is an element of overprotecting against potential duplicate values. This line is very suspect:
Set #NewID = '2662464' + '823' + '001' +right(YEAR(GETUTCDATE()),2) +(left(convert(varchar,ABS(CAST(CAST(#NewID1 AS VARBINARY(5)) AS bigint))),5));
The maximum for bigint is 2^63-1, so casting your 5-byte VARBINARY to a bigint could result in an overflow, which may also cause an issue.
I'm not sure exactly what you're trying to achieve, but you need to simplify things and make sure you have more scope for unique values!
Set #NewID1 = (SELECT [MyNewId] FROM Get_NewID);
always return the same result (if no other changes)
Set #NewID = '2662464' + '823' + '001' +right(YEAR(GETUTCDATE()),2) +(left(convert(varchar,ABS(CAST(CAST(#NewID1 AS VARBINARY(5)) AS bigint))),5));
as result #New_ID will be the same also

**Occasional** Arithmetic overflow error converting expression to data type int

I'm running an update script to obfuscate data and am occasionally experiencing the arithmetic overflow error message, as in the title. The table being updated has 260k records and yet the update script will need to be run several times to produce the error. Although it's so rare I can't rely on the code until it's fixed as it's a pain to debug.
Looking at other similar questions, this is often resolved by changing the data type e.g from INT to BIGINT either in the table or in a calculation. However, I can't see where this could be required. I've reduced the script to the below as I've managed to pin point it to the update of one column.
A function is being called by the update and I've included this below. I suspect that, due to the randomness of the error, the use of the NEW_ID function could be causing it but I haven't been able to re-create the error when just running this part of the function multiple times. The NEW_ID function can't be used in functions so it's being called from a view, also included below.
Update script:
UPDATE dbo.Addresses
SET HouseNumber = CASE WHEN LEN(HouseNumber) > 0
THEN dbo.fn_GenerateRandomString (LEN(HouseNumber), 1, 1, 1)
ELSE HouseNumber
END
NEW_ID view and random string function
CREATE VIEW dbo.vw_GetNewID
AS
SELECT NEWID() AS New_ID
CREATE FUNCTION dbo.fn_GenerateRandomString (
#stringLength int,
#upperCaseBit bit,
#lowerCaseBit bit,
#numberBit bit
)
RETURNS nvarchar(100)
AS
BEGIN
-- Sanitise string length values.
IF ISNULL(#stringLength, -1) < 0
SET #stringLength = 0
-- Generate a random string from the specified character sets.
DECLARE #string nvarchar(100) = ''
SELECT
#string += c2
FROM
(
SELECT TOP (#stringLength) c2 FROM (
SELECT c1 FROM
(
VALUES ('A'),('B'),('C')
) AS T1(c1)
WHERE #upperCaseBit = 1
UNION ALL
SELECT c1 FROM
(
VALUES ('a'),('b'),('c')
) AS T1(c1)
WHERE #lowerCaseBit = 1
SELECT c1 FROM
(
VALUES ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9')
) AS T1(c1)
WHERE #numberBit = 1
)
AS T2(c2)
ORDER BY (SELECT ABS(CHECKSUM(New_ID)) from vw_GetNewID)
) AS T2
RETURN #string
END
Addresses table (for testing):
CREATE TABLE dbo.Addresses(HouseNumber nchar(32) NULL)
INSERT Addresses(HouseNumber)
VALUES ('DSjkmf jkghjsh35hjk h2jkhj3h jhf'),
('SDjfksj3548 ksjk'),
(NULL),
(''),
('2a'),
('1234567890'),
('An2b')
Note: only 7k of the rows in the addresses table have a value entered i.e. LEN(HouseNumber) > 0.
An arithmetic overflow in what is otherwise string-based code is confounding. But there is one thing that could be causing the arithmetic overflow. That is your ORDER BY clause:
ORDER BY (SELECT ABS(CHECKSUM(New_ID)) from vw_GetNewID)
CHECKSUM() returns an integer, whose range is -2,147,483,648 to 2,147,483,647. Note the absolute value of the smallest number is 2,147,483,648, and that is just outside the range. You can verify that SELECT ABS(CAST('-2147483648' as int)) generates the arithmetic overflow error.
You don't need the checksum(). Alas, you do need the view because this logic is in a function and NEWID() is side-effecting. But, you can use:
ORDER BY (SELECT New_ID from vw_GetNewID)
I suspect that the reason you are seeing this every million or so rows rather than every 4 billion rows or so is because the ORDER BY value is being evaluated multiple times for each row as part of the sorting process. Eventually, it is going to hit the lower limit.
EDIT:
If you care about efficiency, it is probably faster to do this using string operations rather than tables. I might suggest this version of the function:
CREATE VIEW vw_rand AS SELECT rand() as rand;
GO
CREATE FUNCTION dbo.fn_GenerateRandomString (
#stringLength int,
#upperCaseBit bit,
#lowerCaseBit bit,
#numberBit bit
)
RETURNS nvarchar(100)
AS
BEGIN
DECLARE #string NVARCHAR(255) = '';
-- Sanitise string length values.
IF ISNULL(#stringLength, -1) < 0
SET #stringLength = 0;
DECLARE #lets VARCHAR(255) = '';
IF (#upperCaseBit = 1) SET #lets = #lets + 'ABC';
IF (#lowerCaseBit = 1) SET #lets = #lets + 'abc';
IF (#numberBit = 1) SET #lets = #lets + '0123456789';
DECLARE #len int = len(#lets);
WHILE #stringLength > 0 BEGIN
SELECT #string += SUBSTRING(#lets, 1 + CAST(rand * #len as INT), 1)
FROM vw_rand;
SET #stringLength = #stringLength - 1;
END;
RETURN #string
END;
As a note: rand() is documented as being exclusive of the end of its range, so you don't have to worry about it returning exactly 1.
Also, this version is subtly different from your version because it can pull the same letter more than once (and as a consequence can also handle longer strings). I think this is actually a benefit.

Selecting only numbers from a string

We have a program that can pull information from a database that we use for shipping. The way it works is it uses an ODBC driver to pull from our database, so that when we type in "order number 5" into the shipping program it will also pull the matching address, phone number, etc.
The problem is that the database contains only numbers for the orders, however the program that contains the database which we use for inventory management prints our labels with the order number in the format TK123456. I need to figure out how to make SQL interpret the order number as just numbers when inputted, so basically cut the TK off the start.
SELECT RXFILL.RXFILL_ID, RXMAIN.RX_NUMBER, PATIENT.FIRSTNAME, PATIENT.LASTNAME,
SHIPADDRESS1, SHIPADDRESS2, SHIPCITY, SHIPSTATE, SHIPZIP, EMAIL
FROM RXFILL
LEFT JOIN RXMAIN ON RXFILL.RXMAIN_ID = RXMAIN.RXMAIN_ID
LEFT JOIN PATIENT ON RXMAIN.PATIENT_ID = PATIENT.PATIENT_ID
WHERE RXFILL_ID=$ORDERNUMBER
If I am understanding it correctly the $ORDERNUMBER is what needs to be adjusted to not include letters. However the program does specify the final line must be in the format WHERE [field name]=$ORDERNUMBER.
How can this be done?
If you only want this to be solved in SQL and not in the calling application, and you know that the first two characters of $ORDERNUMBER will always be 'TK', then you can easily solve it by taking a substring of $ORDERNUMBER starting at the third character... i.e.
WHERE RXFILL_ID=SUBSTRING($ORDERNUMBER, 2).
That syntax might not be exact, since you haven't divulged your DBMS type and each DBMS implements SUBSTRING in whatever way they want.
If you share more info about the calling application which sets $ORDERNUMBER, I'm sure it would be better to make the change there.
I have write a function for this.
CREATE Function SelectNumbersFromString(#str varchar(max))
Returns varchar(max) as
BEGIN
Declare #cchk char(5);
Declare #len int ;
Declare #aschr int;
SET #len = ( SElect len(#str) );
Declare #count int
SET #count = 1
DECLARE #ans varchar(max)
SET #ans = ''
While #count <= #len
BEGIN
SET #cchk = ( select Substring(#str,#count,1) );
SET #aschr = ( select ASCII(#cchk) );
IF #aschr in ( 49,50,51,52,53,54,55,56,57,58 )
BEGIN
SET #ans = #ans + CHAR(#aschr)
END
SET #count = #count + 1;
END
RETURN #ans;
END
TESTED
SELECT SelectNumbersFromString('abc3deef5ff6') will return 356
From http://wfjanjua.blogspot.com/2012/07/add-numbers-from-stringvarchar-in-tsql.html

SQL Precedence Matching

I'm trying to do precedence matching on a table within a stored procedure. The requirements are a bit tricky to explain, but hopefully this will make sense. Let's say we have a table called books, with id, author, title, date, and pages fields.
We also have a stored procedure that will match a query with ONE row in the table.
Here is the proc's signature:
create procedure match
#pAuthor varchar(100)
,#pTitle varchar(100)
,#pDate varchar(100)
,#pPages varchar(100)
as
...
The precedence rules are as follows:
First, try and match on all 4 parameters. If we find a match return.
Next try to match using any 3 parameters. The 1st parameter has the highest precedence here and the 4th the lowest. If we find any matches return the match.
Next we check if any two parameters match and finally if any one matches (still following the parameter order's precedence rules).
I have implemented this case-by-case. Eg:
select #lvId = id
from books
where
author = #pAuthor
,title = #pTitle
,date = #pDate
,pages = #pPages
if ##rowCount = 1 begin
select #lvId
return
end
select #lvId = id
from books
where
author = #pAuthor
,title = #pTitle
,date = #pDate
if ##rowCount = 1 begin
select #lvId
return
end
....
However, for each new column in the table, the number of individual checks grows by an order of 2. I would really like to generalize this to X number of columns; however, I'm having trouble coming up with a scheme.
Thanks for the read, and I can provide any additional information needed.
Added:
Dave and Others, I tried implementing your code and it is choking on the first Order by Clause, where we add all the counts. Its giving me an invalid column name error. When I comment out the total count, and order by just the individual aliases, the proc compiles fine.
Anyone have any ideas?
This is in Microsoft Sql Server 2005
I believe that the answers your working on are the simplest by far. But I also believe that in SQL server, they will always be full table scans. (IN Oracle you could use Bitmap indexes if the table didn't undergo a lot of simultaneous DML)
A more complex solution but a much more performant one would be to build your own index. Not a SQL Server index, but your own.
Create a table (Hash-index) with 3 columns (lookup-hash, rank, Rowid)
Say you have 3 columns to search on. A, B, C
For every row added to Books you'll insert 7 rows into hash_index either via a trigger or CRUD proc.
First you'll
insert into hash_index
SELECT HASH(A & B & C), 7 , ROWID
FROM Books
Where & is the concatenation operator and HASH is a function
then you'll insert hashes for A & B, A & C and B & C.
You now have some flexibility you can give them all the same rank or if A & B are a superior match to B & C you can give them a higher rank.
And then insert Hashes for A by itself and B and C with the same choice of rank... all the same number or all different... you can even say that a match on A is higher choice than a match on B & C. This solution give you a lot of flexibility.
Of course, this will add a lot of INSERT overhead, but if DML on Books is low or performance is not relevant you're fine.
Now when you go to search you'll create a function that returns a table of HASHes for your #A, #B and #C. you'll have a small table of 7 values that you'll join to the lookup-hash in the hash-index table. This will give you every possible match and possibly some false matches (that's just the nature of hashes). You'll take that result, order desc on the rank column. Then take the first rowid back to the book table and make sure that all of the values of #A #B #C are actually in that row. On the off chance it's not and you've be handed a false positive you'll need to check the next rowid.
Each of these operation in this "roll your own" are all very fast.
Hashing your 3 values into a small 7 row table variable = very fast.
joining them on an index in your Hash_index table = very fast index lookups
Loop over result set will result in 1 or maybe 2 or 3 table access by rowid = very fast
Of course, all of these together could be slower than an FTS... But an FTS will continue to get slower and slower. There will be a size which the FTS is slower than this. You'll have to play with it.
I don't have time to write out the query, but I think this idea would work.
For your predicate, use "author = #pAuthor OR title = #ptitle ...", so you get all candidate rows.
Use CASE expressions or whatever you like to create virtual columns in the result set, like:
SELECT CASE WHEN author = #pAuthor THEN 1 ELSE 0 END author_match,
...
Then add this order by and get the first row returned:
ORDER BY (author_match+title_match+date_match+page_match) DESC,
author_match DESC,
title_match DESC,
date_match DESC
page_match DESC
You still need to extend it for each new column, but only a little bit.
You don't explain what should happen if more than one result matches any given set of parameters that is reached, so you will need to change this to account for those business rules. Right now I've set it to return books that match on later parameters ahead of those that don't. For example, a match on author, title, and pages would come before one that just matches on author and title.
Your RDBMS may have a different way of handling "TOP", so you may need to adjust for that as well.
SELECT TOP 1
author,
title,
date,
pages
FROM
Books
WHERE
author = #author OR
title = #title OR
date = #date OR
pages = #pages OR
ORDER BY
CASE WHEN author = #author THEN 1 ELSE 0 END +
CASE WHEN title = #title THEN 1 ELSE 0 END +
CASE WHEN date = #date THEN 1 ELSE 0 END +
CASE WHEN pages = #pages THEN 1 ELSE 0 END DESC,
CASE WHEN author = #author THEN 8 ELSE 0 END +
CASE WHEN title = #title THEN 4 ELSE 0 END +
CASE WHEN date = #date THEN 2 ELSE 0 END +
CASE WHEN pages = #pages THEN 1 ELSE 0 END DESC
select id,
CASE WHEN #pPages = pages
THEN 1 ELSE 0
END
+ Case WHEN #pAuthor=author
THEN 1 ELSE 0
END AS
/* + Do this for each attribute. If each of your
attributes are just as important as the other
for example matching author is jsut as a good as matching title then
leave the values alone, if different matches are more
important then change the values */ as MatchRank
from books
where author = #pAuthor OR
title = #pTitle OR
date = #pDate
ORDER BY MatchRank DESC
Edited
When I run this query (modified only to fit one of my own tables) it works fine in SQL2005.
I'd recommend a where clause but you will want to play around with this to see performance impacts. You will need to use an OR clause otherwise you will loose potential matches
In regards to the Order By clause failing to compile:
As recursive said(in a comment), alias' may not be within expressions which are used in Order By clauses. to get around this I used a subquery which returned the rows, then ordered by in the outer query. In this way I am able to use the alias' in the order by clause. A little slower but a lot cleaner.
Okay, let me restate my understanding of your question: You want a stored procedure that can take a variable number of parameters and pass back the top row that matches the parameters in the weighted order of preference passed on SQL Server 2005.
Ideally, it will use WHERE clauses to prevent full tables scans plus take advantage of indices and will "short circuit" the search - you don't want to search all possible combinations if one can be found early. Perhaps we can also allow other comparators than = such as >= for dates, LIKE for strings, etc.
One possible way is to pass the parameters as XML like in this article and use .Net stored procedures but let's keep it plain vanilla T-SQL for now.
This looks to me like a binary search on the parameters: Search all parameters, then drop the last one, then drop the second last one but include the last one, etc.
Let's pass the parameters as a delimited string since stored procedures don't allow for arrays to be passed as parameters. This will allow us to get a variable number of parameters in to our stored procedure without requiring a stored procedure for each variation of parameters.
In order to allow any sort of comparison, we'll pass the entire WHERE clause list, like so: title like '%something%'
Passing multiple parameters means delimiting them in a string. We'll use the tilde ~ character to delimit the parameters, like this: author = 'Chris Latta'~title like '%something%'~pages >= 100
Then it is simply a matter of doing a binary weighted search for the first row that meets our ordered list of parameters (hopefully the stored procedure with comments is self-explanatory but if not, let me know). Note that you are always guaranteed a result (assuming your table has at least one row) as the last search is parameterless.
Here is the stored procedure code:
CREATE PROCEDURE FirstMatch
#SearchParams VARCHAR(2000)
AS
BEGIN
DECLARE #SQLstmt NVARCHAR(2000)
DECLARE #WhereClause NVARCHAR(2000)
DECLARE #OrderByClause NVARCHAR(500)
DECLARE #NumParams INT
DECLARE #Pos INT
DECLARE #BinarySearch INT
DECLARE #Rows INT
-- Create a temporary table to store our parameters
CREATE TABLE #params
(
BitMask int, -- Uniquely identifying bit mask
FieldName VARCHAR(100), -- The field name for use in the ORDER BY clause
WhereClause VARCHAR(100) -- The bit to use in the WHERE clause
)
-- Temporary table identical to our result set (the books table) so intermediate results arent output
CREATE TABLE #junk
(
id INT,
author VARCHAR(50),
title VARCHAR(50),
printed DATETIME,
pages INT
)
-- Ill use tilde ~ as the delimiter that separates parameters
SET #SearchParams = LTRIM(RTRIM(#SearchParams))+ '~'
SET #Pos = CHARINDEX('~', #SearchParams, 1)
SET #NumParams = 0
-- Populate the #params table with the delimited parameters passed
IF REPLACE(#SearchParams, '~', '') <> ''
BEGIN
WHILE #Pos > 0
BEGIN
SET #NumParams = #NumParams + 1
SET #WhereClause = LTRIM(RTRIM(LEFT(#SearchParams, #Pos - 1)))
IF #WhereClause <> ''
BEGIN
-- This assumes your field names dont have spaces and that you leave a space between the field name and the comparator
INSERT INTO #params (BitMask, FieldName, WhereClause) VALUES (POWER(2, #NumParams - 1), LTRIM(RTRIM(LEFT(#WhereClause, CHARINDEX(' ', #WhereClause, 1) - 1))), #WhereClause)
END
SET #SearchParams = RIGHT(#SearchParams, LEN(#SearchParams) - #Pos)
SET #Pos = CHARINDEX('~', #SearchParams, 1)
END
END
-- Set the binary search to search from all parameters down to one in order of preference
SET #BinarySearch = POWER(2, #NumParams)
SET #Rows = 0
WHILE (#BinarySearch > 0) AND (#Rows = 0)
BEGIN
SET #BinarySearch = #BinarySearch - 1
SET #WhereClause = ' WHERE '
SET #OrderByClause = ' ORDER BY '
SELECT #OrderByClause = #OrderByClause + FieldName + ', ' FROM #params WHERE (#BinarySearch & BitMask) = BitMask ORDER BY BitMask
SET #OrderByClause = LEFT(#OrderByClause, LEN(#OrderByClause) - 1) -- Remove the trailing comma
SELECT #WhereClause = #WhereClause + WhereClause + ' AND ' FROM #params WHERE (#BinarySearch & BitMask) = BitMask ORDER BY BitMask
SET #WhereClause = LEFT(#WhereClause, LEN(#WhereClause) - 4) -- Remove the trailing AND
IF #BinarySearch = 0
BEGIN
-- If nothing found so far, return the top row in the order of the parameters fields
SET #WhereClause = ''
-- Use the full order sequence of fields to return the results
SET #OrderByClause = ' ORDER BY '
SELECT #OrderByClause = #OrderByClause + FieldName + ', ' FROM #params ORDER BY BitMask
SET #OrderByClause = LEFT(#OrderByClause, LEN(#OrderByClause) - 1) -- Remove the trailing comma
END
-- Find out if there are any results for this search
SET #SQLstmt = 'SELECT TOP 1 id, author, title, printed, pages INTO #junk FROM books' + #WhereClause + #OrderByClause
Exec (#SQLstmt)
SET #Rows = ##RowCount
END
-- Stop the result set being eaten by the junk table
SET #SQLstmt = REPLACE(#SQLstmt, 'INTO #junk ', '')
-- Uncomment the next line to see the SQL you are producing
--PRINT #SQLstmt
-- This gives the result set
Exec (#SQLstmt)
END
This stored procedure is called like so:
FirstMatch 'author = ''Chris Latta''~pages > 100~title like ''%something%'''
There you have it - a fully expandable, optimised search for the top result in weighted order of preference. This was an interesting problem and shows just what you can pull off with native T-SQL.
A couple of small issues with this:
it relies on the caller to know that they must leave a space after the field name for the parameter to work properly
you can't have field names with spaces in them - fixable with some effort
it assumes that the relevant sort order is always ascending
the next programmer that has to look at this procedure will think you're insane :)
Try this:
ALTER PROCEDURE match
#pAuthor varchar(100)
,#pTitle varchar(100)
,#pDate varchar(100)
,#pPages varchar(100)
-- exec match 'a title', 'b author', '1/1/2007', 15
AS
SELECT id,
CASE WHEN author = #pAuthor THEN 1 ELSE 0 END
+ CASE WHEN title = #pTitle THEN 1 ELSE 0 END
+ CASE WHEN bookdate = #pDate THEN 1 ELSE 0 END
+ CASE WHEN pages = #pPages THEN 1 ELSE 0 END AS matches,
CASE WHEN author = #pAuthor THEN 4 ELSE 0 END
+ CASE WHEN title = #pTitle THEN 3 ELSE 0 END
+ CASE WHEN bookdate = #pDate THEN 2 ELSE 0 END
+ CASE WHEN pages = #pPages THEN 1 ELSE 0 END AS score
FROM books
WHERE author = #pAuthor
OR title = #pTitle
OR bookdate = #PDate
OR pages = #pPages
ORDER BY matches DESC, score DESC
However, this of course causes a table scan. You can avoid that by making it a union of a CTE and 4 WHERE clauses, one for each property - there will be duplicates, but you can just take the TOP 1 anyway.
EDIT: Added the WHERE ... OR clause. I'd feel more comfortable if it were
SELECT ... FROM books WHERE author = #pAuthor
UNION
SELECT ... FROM books WHERE title = #pTitle
UNION
...