The scenario...
I'm developing a web site in ASP.net and Visual Basic that displays product listings (hereafter referred to as "Posts") generated by the users. In this site, I have a search box that allows the user to find his post more easily. The search box is designed to allow the user to input specific keywords which will hopefully provide a more customized search. For example:
In the example above, the user has specified that he would like to search for books with an author matching the name "John Smith" and also with the tags "Crime" and "Suspense".
Steps:
My SearchBoxManager class retrieves these key names (Author,
Tags) and the key values (John Smith, Crime, Suspense).
I run a SQL query using those parameters looking for exact matching.
I extract each word from the key values so I can search them separately.
This is where I am having issues. I run a SQL query using those parameters looking for non-exact matching (e.g., '%John%', '%Smith%'). This step is to create a system of results based on relevancy. This is my first relevancy algorithm so I may be going about it the wrong way.
The problem...
In order for the third step to work properly, I would like to place each set of separated words from the key values into a table-valued parameter, and then use that table-valued parameter in the SqlCommandText surrounded by the wildcard '%'.
In other words, because the number of words in each key value will probably change each time, I need to place them in a table-valued parameter of some kind. I also think I could string them together somehow and just use them directly in the query string, but I stopped mid-way because it was getting a little messy.
The question...
Can I use wildcards with table-valued parameters, and if so how?
Also, if any of you have developed a relevancy ranking algorithm for local search boxes or know of one, I would be ever-grateful if you could point me in the right direction.
Thank you in advance.
so some sample data
create table authors ( id int, name varchar(200) )
insert into authors values (1, 'John Smith')
insert into authors values (2, 'Jack Jones')
insert into authors values (3, 'Charles Johnston')
if you want to work with a table var we can make one and populate with a couple of search words
declare #t table(word varchar(10) )
insert into #t select 'smith' union select 'jones'
now we can select using wildcards
select authors.* from authors, #t words
where authors.name like '%'+words.word+'%'
Related
Alright.. this is both an interesting solution, and a question of "is there a better way to do this".
I have a database table of addresses, broken into various fields (name, phone, suite, address, etc..)
we want to be able to do a very loose search against the address records, with multiple parameters.
so if someone searches "123 Mac", that's an easy enough match against an address.. but if they search for "David lincoln", where the name is "David something" and the address is "123 lincoln park", that's a much trickier search.
I came across another post where someone used a cross apply for search parameters, which I though was just nifty, so I fashioned something similar.
I take the user search string, and break it into values (split) on the spaces, and insert that into a temp table in memory.
On the search table, I've created a somewhat constrained view, with a "searchText" column where I've literally mashed all the conceivable search fields into one big text field. I then created an index on this view. (had to force the use of the view/index, which does perform substantially better than the engine attempting to build a plan against the underlying tables)
And finally the query:
create table #searchValues (SString varchar(100) null)
insert into #searchValues (sstring) select '%'+[value]+'%' from dbo.Split(ltrim(rtrim(replace(#searchstring,'%',' '))),' ')
select top 50 addressID, ROW_NUMBER() OVER (ORDER BY a.[3mUsage] desc, searchText) AS RowNumber
from vwAddressSearch a WITH (NOEXPAND)
cross apply #searchValues s
where searchText like s.SString
group by addressID, [3musage], searchText
having count(*) = (select count(*) from #searchValues)
So this works well enough... (I take the output of this query and re-join it back to the main table to pull all the relevant values)
Also note this is an AND logic, not an OR, which is why I'm having to group and compare counts.
But the somewhat fugly part.. that indexed view is still around 900k rows, and the more search terms there are, the more cross apply and text searching there ends up being.. the performance is .. okay.. not great, not horrible.
it actually seems to perform slightly better than a manual select where... searchtext like '%one%' and searchtext like '%two%'.. etc.
Anyway, the question here, the group by and count compare, it works.. but it seems a little ugly to me. is there a better way to do this?
I'm looking for advice on how to tackle the issue of different spelling for the same name.
I have a SQL Server database with company names, and there are some companies that are the same but the spelling is different.
For example:
Building Supplies pty
Buidings Supplies pty
Building Supplied l/d
The problem is that there are no clear consistencies in the variation. Sometimes it's an extra 's', other times its an extra space.
Unfortunately I don't have a lookup list, so I can't use Fuzzy LookUp. I need to create the clean list.
Is there a method that people use to deal with this problem?
p.s I tried searching for this problem but can't seem to find a similar thread
Thanks
You can use SOUNDEX() DIFFERENCE() for this purpose.
DECLARE #SampleData TABLE(ID INT, BLD VARCHAR(50), SUP VARCHAR(50))
INSERT INTO #SampleData
SELECT 1, 'Building','Supplies'
UNION
SELECT 2, 'Buidings','Supplies'
UNION
SELECT 3, 'Biulding','Supplied'
UNION
SELECT 4, 'Road','Contractor'
UNION
SELECT 5, 'Raod','Consractor'
UNION
SELECT 6, 'Highway','Supplies'
SELECT *, DIFFERENCE('Building', BLD) AS DIF
FROM #SampleData
WHERE DIFFERENCE('Building', BLD) >= 3
Result
ID BLD SUP DIF
1 Building Supplies 4
2 Buidings Supplies 3
3 Biulding Supplied 4
If this serves your purpose you can write an update query to update selected record accordingly.
Aside from the SOUNDEX() DIFFERENCE() option (which is a very good one btw!) you could look into SSIS more.
Provided your data is in english and not exclusively names of people there is a lot you can do with these components:
Term extraction
Term lookup
Fuzzy grouping
Fuzzy lookup
The main flow would be a tiered structure where you try to find duplicates at increasingly less certain ways. Instead of applying them automaticaly you send all the names and keys you would need to apply the changes to a staging area where they can be reviewed and if needed applied.
If you go about it really smart you can use the reviewed data as a repository for making the package "learn", for example iu is hardly ever valid in english so if that is found and changing it to ui makes a valid english word you might want to start applying those automaticaly at some point.
One other thing to consider is keeping a list of all validated names and use this to check for duplicates of that names and to prevent unnecesary recursion/load on checking the source data.
I have a database which can be modified by our users through an interface. For one field (companyID) they should have the ability to place an asterisk in the string as a wildcard character.
For example, they can put in G378* to stand for any companyID starting with G378.
Now on my client program I'm providing a "full" companyID as a parameter:
SELECT * FROM table WHERE companyID = '" + myCompanyID + "'
But I have to check for the wildcard, is there anything I can add to my query to check for this. I'm not sure how to explain it but it's kinda backwards from what I'm used to. Can I modify the value I provide (the full companyID) to match the wildcard value from in the query itself??
I hope this maked sense.
Thanks!
EDIT: The user is not using SELECT. The user is only using INSERT or UPDATE and THEY are the ones placing the * in the field. My program is using SELECT and I only have the full companyID (no asterisk).
This is a classic SQL Injection target! You should be glad that you found it now.
Back to your problem, when users enter '*', replace it with '%', and use LIKE instead of = in your query.
For example, when end-users enter "US*123", run this query:
SELECT * FROM table WHERE companyID LIKE #companyIdTemplate
set #companyIdTemplate parameter to "US%123", and run the query.
I used .NET's # in the example, but query parameters are denoted in ways specific to your hosting language. For example, they become ? in Java. Check any DB programming tutorial on use of parameterized queries to find out how it's done in your system.
EDIT : If you would like to perform an insert based on a wildcard that specifies records in another table, you can do an insert-from-select, like this:
INSERT INTO CompanyNotes (CompanyId, Note)
SELECT c.companyId, #NoteText
FROM Company c
WHERE c.companyId LIKE 'G378%'
This will insert a record with the value of the #NoteText parameter into CompanyNotes table for each company with the ID matching "G378%".
in TSQL I would use replace and like. ie:
select * from table where companyid like replace(mycompanyid,'*','%');
This is somewhat implementation dependant and you did not mention which type of SQL you are dealing with. However, looking at MS SQL Server wildcards include % (for any number of characters) or _ (for a single character). Wildcards are only evaluated as wildcards when used with "like" and not an = comparison. But you can pass in a paramater that includes a wildcard and have it evaluated as a wildcard as long as you are using "like"
I am trying to generate a query where I want to select columns(text) matching multiple values.
eg: I have two columns, id and description. suppose my first row contains description column with value
Google is website and an awesome
search engine
, second row with description column value
Amazon website is an awesome eCommerce
store
I have created a query
Select * from table_name where
description REGEXP 'Website \|
Search'
It returns both the rows i.e with both google and amazon, but i want to return only google as i want those rows with both the words website and search also the number of words to be matched is also not fixed, basically the query I am creating is for a search drop down,
All the words that are passed should be present in the column, the order of the words present in the column is not important. If there are other better options besides using regex , please do point out.
Editing: the number of words that are passed are dynamic and not known, the user may pass additional words to be matched against the column. I would be using the query within a stored Procedure
Really don;t think the regex solution is going to be good for you from a performance point of view. Think you should be looking for FULL text searches.
Specifically you need to create a full text index with something like this in the table definition:
create table testTable
(
Id int auto_increment not null,
TextCol varchar(500)
fulltext(TextCol)
);
Then your query gets easier:
select * from testTable where Match(TextCol) against ('web')
AND Match(TextCol) against ('server')
Strongly suggest you read the MySQL docs regarding FULLTEXT matching and there are lots of little tricks and features that will be useful in this task (including more efficient ways to run the query above)
Edit: Perhaps Boolean mode will help you to an easy solution like this:
Match(textCol) against ('web+ Server+' in boolean mode)
All you have to do is build the against string so I think this can be done in an SP with out dynamic SQL
I'm looking for a pattern for performing a dynamic search on multiple tables.
I have no control over the legacy (and poorly designed) database table structure.
Consider a scenario similar to a resume search where a user may want to perform a search against any of the data in the resume and get back a list of resumes that match their search criteria. Any field can be searched at anytime and in combination with one or more other fields.
The actual sql query gets created dynamically depending on which fields are searched. Most solutions I've found involve complicated if blocks, but I can't help but think there must be a more elegant solution since this must be a solved problem by now.
Yeah, so I've started down the path of dynamically building the sql in code. Seems godawful. If I really try to support the requested ability to query any combination of any field in any table this is going to be one MASSIVE set of if statements. shiver
I believe I read that COALESCE only works if your data does not contain NULLs. Is that correct? If so, no go, since I have NULL values all over the place.
As far as I understand (and I'm also someone who has written against a horrible legacy database), there is no such thing as dynamic WHERE clauses. It has NOT been solved.
Personally, I prefer to generate my dynamic searches in code. Makes testing convenient. Note, when you create your sql queries in code, don't concatenate in user input. Use your #variables!
The only alternative is to use the COALESCE operator. Let's say you have the following table:
Users
-----------
Name nvarchar(20)
Nickname nvarchar(10)
and you want to search optionally for name or nickname. The following query will do this:
SELECT Name, Nickname
FROM Users
WHERE
Name = COALESCE(#name, Name) AND
Nickname = COALESCE(#nick, Nickname)
If you don't want to search for something, just pass in a null. For example, passing in "brian" for #name and null for #nick results in the following query being evaluated:
SELECT Name, Nickname
FROM Users
WHERE
Name = 'brian' AND
Nickname = Nickname
The coalesce operator turns the null into an identity evaluation, which is always true and doesn't affect the where clause.
Search and normalization can be at odds with each other. So probably first thing would be to get some kind of "view" that shows all the fields that can be searched as a single row with a single key getting you the resume. then you can throw something like Lucene in front of that to give you a full text index of those rows, the way that works is, you ask it for "x" in this view and it returns to you the key. Its a great solution and come recommended by joel himself on the podcast within the first 2 months IIRC.
What you need is something like SphinxSearch (for MySQL) or Apache Lucene.
As you said in your example lets imagine a Resume that will composed of several fields:
List item
Name,
Adreess,
Education (this could be a table on its own) or
Work experience (this could grow to its own table where each row represents a previous job)
So searching for a word in all those fields with WHERE rapidly becomes a very long query with several JOINS.
Instead you could change your framework of reference and think of the Whole resume as what it is a Single Document and you just want to search said document.
This is where tools like Sphinx Search do. They create a FULL TEXT index of your 'document' and then you can query sphinx and it will give you back where in the Database that record was found.
Really good search results.
Don't worry about this tools not being part of your RDBMS it will save you a lot of headaches to use the appropriate model "Documents" vs the incorrect one "TABLES" for this application.