Dynamic multiple Like AND query search against a collection of text fields - sql

Alright.. this is both an interesting solution, and a question of "is there a better way to do this".
I have a database table of addresses, broken into various fields (name, phone, suite, address, etc..)
we want to be able to do a very loose search against the address records, with multiple parameters.
so if someone searches "123 Mac", that's an easy enough match against an address.. but if they search for "David lincoln", where the name is "David something" and the address is "123 lincoln park", that's a much trickier search.
I came across another post where someone used a cross apply for search parameters, which I though was just nifty, so I fashioned something similar.
I take the user search string, and break it into values (split) on the spaces, and insert that into a temp table in memory.
On the search table, I've created a somewhat constrained view, with a "searchText" column where I've literally mashed all the conceivable search fields into one big text field. I then created an index on this view. (had to force the use of the view/index, which does perform substantially better than the engine attempting to build a plan against the underlying tables)
And finally the query:
create table #searchValues (SString varchar(100) null)
insert into #searchValues (sstring) select '%'+[value]+'%' from dbo.Split(ltrim(rtrim(replace(#searchstring,'%',' '))),' ')
select top 50 addressID, ROW_NUMBER() OVER (ORDER BY a.[3mUsage] desc, searchText) AS RowNumber
from vwAddressSearch a WITH (NOEXPAND)
cross apply #searchValues s
where searchText like s.SString
group by addressID, [3musage], searchText
having count(*) = (select count(*) from #searchValues)
So this works well enough... (I take the output of this query and re-join it back to the main table to pull all the relevant values)
Also note this is an AND logic, not an OR, which is why I'm having to group and compare counts.
But the somewhat fugly part.. that indexed view is still around 900k rows, and the more search terms there are, the more cross apply and text searching there ends up being.. the performance is .. okay.. not great, not horrible.
it actually seems to perform slightly better than a manual select where... searchtext like '%one%' and searchtext like '%two%'.. etc.
Anyway, the question here, the group by and count compare, it works.. but it seems a little ugly to me. is there a better way to do this?

Related

Regexp search SQL query fields

I have a repository of SQL queries and I want to understand which queries use certain tables or fields.
Let's say I want to understand what queries use the email field, how can I write it?
Example SQL query:
select
users.email as email_user
,users.email as email_user_too
,email as email_user_too_2
email as email_user_too_3,
back_email as wrong_email -- wrong field
from users
So to state the problem more accurately, you are sorting through a list of SQL queries [as text], and you now need to find the queries that use certain fields using SQL & RegEx (Regular Expressions) in PostgreSQL. (please tag the question so that StackOverflow indexes your question correctly, more importantly, readers have more context about the question)
PostgreSQL has Regular Expression support OOTB (Out Of The Box). So we skip exploring other ways to do this. (If you are reading this as Microsoft SQL Server person, then I strongly suggest you to have a read of this brilliant article on Microsoft's website on defining a Table-Valued UDF (User Defined Function))
The simplest way I could think of to approach your problem, is to throw away what we don't want out of the query text first, and then filter out what's left.
This way, after throwing away the stuff you don't need, you will be left with a set of "tokens" that you can easily filter, and I'm putting token in quotes since we are not really parsing the SQL language, but if we did that would be the first step: to extract tokens.. (:
Take this query for example:
With Queries (
Id
, QueryText
) As (
values (1, 'select
users.email as email_user
,users.email as email_user_too
,email as email_user_too_2,
email as email_user_too_3,
back_email as wrong_email -- wrong field
from users')
)
Select QueryText
, found
From (
Select Id
, QueryText
, regexp_split_to_table (QueryText, '(--[\s\w]+|select|from|as|where|[ \s\n,])') As found
From Queries
) As Result
Where found != ''
And found = 'back_email'
I have sourced the concept of a "query repository" with a WITH statement for ease of doing the pseudo-code.
I have also selected few words/characters to split QueryText with. Like select, where etc. We don't need these in our 'found' set.
And in the end, as you can see above, I simply used found as what's left and filtered it with the field name you are looking for. (Assuming that you know the field you are looking for)
You could improve upon the RegEx I did, or change the method as you wish to make it better. But I think the general concept addresses what you need to achieve. One problem I can see with my solution right off the bat is the fact that you can search for anything really, not just names of the selected fields - which begs the question, why use RegEx, and not Like statements? But again, as I mentioned, you can improve upon the RegEx and address specific requirements you may have. Using Like might limit you in that direction. (In other words, only you know what's good for you. I can't say that from here.)
You can play with the query online here: db-fiddle query and use https://regex101.com/ for testing your RegEx.
Disclaimer I'm not a PostgreSQL developer. There must be other, perhaps better ways of doing this. (:

SQL SELECT... WHERE with REPLACE - worried that it's inefficient

In SQL Server 2005, I have a Product search that looks like:
select ProductID, Name, Email
from Product
where Name = #Name
I've been asked to ignore a couple "special" characters in Product.Name, so that a search for "Potatoes" returns "Po-ta-toes" as well as "Potatoes". My first thought is to just do this:
select ProductID, Name, Email
from Product
where REPLACE(Name, '-', '') = #Name
...but on second thought, I wonder if I'm killing performance by running a function on EVERY candidate result. Does SQL have some optimization magic that help it do this kind of thing quickly? Can you think of anything easier I might be able to try with the requirements I have?
More standards-based: You could add a new column, e.g., searchable_name, precalculate the results of the REPLACE (and any other tweaks, e.g., SOUNDEX) on INSERT/UPDATE and store them in the new column, then search against that column.
Less standards-based: Lots of RDBMS provide a feature where you can create an INDEX using a function; this is often called a functional index. Your situation seems fairly well suited to such a feature.
Most powerful/flexible: Use a dedicated search tool such as Lucene. It might seem overkill for this situation, but they were designed for searching, and most offer sophisticated stemming algorithms that would almost certainly solve this problem.
You will likely get better performance if you are willing to force the first character to be alphabetic, like this...
select ProductID, Name, Email
from Product
where REPLACE(Name, '-', '') = #Name
And Name Like Left(#Name, 1) + '%'
If the name column is indexed, you will likely get an index seek instead of a scan. The downside is, you will not return rows where the value is "-po-ta-to-es" because the first character does not match.
Can you add a field to your product table with a search-able version of the product name with special characters already removed? Then you can do the 'replace' only once for each record, and do efficient searches against the new field.

Sort with one option forced to top of list

I have a PHP application that displays a list of options to a user. The list is generated from a simple query against SQL 2000. What I would like to do is have a specific option at the top of the list, and then have the remaining options sorted alphabetically.
For example, here's the options if sorted alphabetically:
Calgary
Edmonton
Halifax
Montreal
Toronto
What I would like the list to be is more like this:
**Montreal**
Calgary
Edmonton
Halifax
Toronto
Is there a way that I can do this using a single query? Or am I stuck running the query twice and appending the results?
SELECT name
FROM locations
ORDER BY
CASE
WHEN name = 'Montreal'
THEN 0
ELSE 1
END, name
SELECT name FROM options ORDER BY name = "Montreal", name;
Note: This works with MySQL, not SQL 2000 like the OP requested.
create table Places (
add Name varchar(30),
add Priority bit
)
select Name
from Places
order by Priority desc,
Name
I had a similar problem on a website I built full of case reports. I wanted the case reports where the victim name is known to sort to the top, because they are more compelling. Conversely I wanted all the John Doe cases to be at the bottom. Since this also involved people's names, I had the firstname/lastname sorting problem as well. I didn't want to split it into two name fields because some cases aren't people at all.
My solution:
I have a "Name" field which is what is displayed. I also have a "NameSorted" field that is used in all queries but is never displayed. My input UI takes care of converting "LAST, FIRST" entered into the sorting field into the display version automatically.
Finally, to "rig" the sorting I simply put appropriate characters at the beginning of the sort field. Since I want stuff to come out at the end, I put "zzz" at the beginning. To sort at the top you could put "!" at the beginning. Again your editing UI can take care of this for you.
Yes, I admit its a bit cheezy, but it works. One advantage for me is I have to do more complex queries with joins in different places to generate pages versus RSS etc, and I don't have to keep remembering a complex expression to get the sorting right, its always just sort by the "NameSorted" field.
Click my profile to see the resulting website.
I ended up with this
SELECT name
FROM locations
LEFT JOIN (VALUES ('Toronto', 1), ('Montreal', 2)) city (name, rank)
ON locations.name = city.name
ORDER BY city.rank, locations.name;
Which may be overkill for this example but can be extended for more complex needs.

How do I perform a simple one-statement SQL search across tables?

Suppose that two tables exist: users and groups.
How does one provide "simple search" in which a user enters text and results contain both users and groups whose names contain the text?
The result of the search must distinguish between the two types.
The trick is to combine a UNION with a literal string to determine the type of 'object' returned. In most (?) cases, UNION ALL will be more efficient, and should be used unless duplicates are required in the sub-queries. The following pattern should suffice:
SELECT "group" type, name
FROM groups
WHERE name LIKE "%$text%"
UNION ALL
SELECT "user" type, name
FROM users
WHERE name LIKE "%$text%"
NOTE: I've added the answer myself, because I came across this problem yesterday, couldn't find a good solution, and used this method. If someone has a better approach, please feel free to add it.
If you use "UNION ALL" then the db doesn't try to remove duplicates - you won't have duplicates between the two queries anyway (since the first column is different), so UNION ALL will be faster.
(I assume that you don't have duplicates inside each query that you want to remove)
Using LIKE will cause a number of problems as it will require a table scan every single time when the LIKE comparator starts with a %. This forces SQL to check every single row and work it's way, byte by byte, through the string you are using for comparison. While this may be fine when you start, it quickly causes scaling issues.
A better way to handle this is using Full Text Search. While this would be a more complex option, it will provide you with better results for very large databases. Then you can use a functioning version of the example Bobby Jack gave you to UNION ALL your two result sets together and display the results.
I would suggest another addition
SELECT "group" type, name
FROM groups
WHERE UPPER(name) LIKE UPPER("%$text%")
UNION ALL
SELECT "user" type, name
FROM users
WHERE UPPER(name) LIKE UPPER("%$text%")
You could convert $text to upper case first or do just do it in the query. This way you get a case insensitive search.

Need Pattern for dynamic search of multiple sql tables

I'm looking for a pattern for performing a dynamic search on multiple tables.
I have no control over the legacy (and poorly designed) database table structure.
Consider a scenario similar to a resume search where a user may want to perform a search against any of the data in the resume and get back a list of resumes that match their search criteria. Any field can be searched at anytime and in combination with one or more other fields.
The actual sql query gets created dynamically depending on which fields are searched. Most solutions I've found involve complicated if blocks, but I can't help but think there must be a more elegant solution since this must be a solved problem by now.
Yeah, so I've started down the path of dynamically building the sql in code. Seems godawful. If I really try to support the requested ability to query any combination of any field in any table this is going to be one MASSIVE set of if statements. shiver
I believe I read that COALESCE only works if your data does not contain NULLs. Is that correct? If so, no go, since I have NULL values all over the place.
As far as I understand (and I'm also someone who has written against a horrible legacy database), there is no such thing as dynamic WHERE clauses. It has NOT been solved.
Personally, I prefer to generate my dynamic searches in code. Makes testing convenient. Note, when you create your sql queries in code, don't concatenate in user input. Use your #variables!
The only alternative is to use the COALESCE operator. Let's say you have the following table:
Users
-----------
Name nvarchar(20)
Nickname nvarchar(10)
and you want to search optionally for name or nickname. The following query will do this:
SELECT Name, Nickname
FROM Users
WHERE
Name = COALESCE(#name, Name) AND
Nickname = COALESCE(#nick, Nickname)
If you don't want to search for something, just pass in a null. For example, passing in "brian" for #name and null for #nick results in the following query being evaluated:
SELECT Name, Nickname
FROM Users
WHERE
Name = 'brian' AND
Nickname = Nickname
The coalesce operator turns the null into an identity evaluation, which is always true and doesn't affect the where clause.
Search and normalization can be at odds with each other. So probably first thing would be to get some kind of "view" that shows all the fields that can be searched as a single row with a single key getting you the resume. then you can throw something like Lucene in front of that to give you a full text index of those rows, the way that works is, you ask it for "x" in this view and it returns to you the key. Its a great solution and come recommended by joel himself on the podcast within the first 2 months IIRC.
What you need is something like SphinxSearch (for MySQL) or Apache Lucene.
As you said in your example lets imagine a Resume that will composed of several fields:
List item
Name,
Adreess,
Education (this could be a table on its own) or
Work experience (this could grow to its own table where each row represents a previous job)
So searching for a word in all those fields with WHERE rapidly becomes a very long query with several JOINS.
Instead you could change your framework of reference and think of the Whole resume as what it is a Single Document and you just want to search said document.
This is where tools like Sphinx Search do. They create a FULL TEXT index of your 'document' and then you can query sphinx and it will give you back where in the Database that record was found.
Really good search results.
Don't worry about this tools not being part of your RDBMS it will save you a lot of headaches to use the appropriate model "Documents" vs the incorrect one "TABLES" for this application.