Performing an ASP.NET database search with StartsWith keyword - sql

I have a query. I am developing a site that has a search engine. The search engine is the main function. It is a business directory site.
A present I have basic search sql that searches the database using the "LIKE" keyword.
My client has asked me to change the search function so instead of using the "Like" keyword they are after something along the lines of the "Startswith" keyword.
To clarify this need here is an example.
If somebody types "plu" for plumbers in the textbox it currently returns, e.g.,
CENTRE STATE PLUMBING & ROOFING
PLUMBING UNLIMITED
The client only wants to return the "PLUMBING UNLIMITED" because it startswith "plu", and doesn't "contain" "plu"
I know this is a weird and maybe silly request, however does anyone have any example SQL code to point me in the right direction on how to achieve this goal.
Any help would be greatly appreciated, thanks...

how about this:
SELECT * FROM MyTable WHERE MyColumn LIKE 'PLU%'
please note that the % sign is only on the right side of the string
example in MS SQL

Instead of:
select * from professions where name like '%plu%'
, use a where clause without the leading %:
select * from professions where name like 'plu%'

LIKE won't give you the performance you need for a really effective search. Looking into something like Lucence or your engine's Full Text Search equivalent.

Related

Regexp search SQL query fields

I have a repository of SQL queries and I want to understand which queries use certain tables or fields.
Let's say I want to understand what queries use the email field, how can I write it?
Example SQL query:
select
users.email as email_user
,users.email as email_user_too
,email as email_user_too_2
email as email_user_too_3,
back_email as wrong_email -- wrong field
from users
So to state the problem more accurately, you are sorting through a list of SQL queries [as text], and you now need to find the queries that use certain fields using SQL & RegEx (Regular Expressions) in PostgreSQL. (please tag the question so that StackOverflow indexes your question correctly, more importantly, readers have more context about the question)
PostgreSQL has Regular Expression support OOTB (Out Of The Box). So we skip exploring other ways to do this. (If you are reading this as Microsoft SQL Server person, then I strongly suggest you to have a read of this brilliant article on Microsoft's website on defining a Table-Valued UDF (User Defined Function))
The simplest way I could think of to approach your problem, is to throw away what we don't want out of the query text first, and then filter out what's left.
This way, after throwing away the stuff you don't need, you will be left with a set of "tokens" that you can easily filter, and I'm putting token in quotes since we are not really parsing the SQL language, but if we did that would be the first step: to extract tokens.. (:
Take this query for example:
With Queries (
Id
, QueryText
) As (
values (1, 'select
users.email as email_user
,users.email as email_user_too
,email as email_user_too_2,
email as email_user_too_3,
back_email as wrong_email -- wrong field
from users')
)
Select QueryText
, found
From (
Select Id
, QueryText
, regexp_split_to_table (QueryText, '(--[\s\w]+|select|from|as|where|[ \s\n,])') As found
From Queries
) As Result
Where found != ''
And found = 'back_email'
I have sourced the concept of a "query repository" with a WITH statement for ease of doing the pseudo-code.
I have also selected few words/characters to split QueryText with. Like select, where etc. We don't need these in our 'found' set.
And in the end, as you can see above, I simply used found as what's left and filtered it with the field name you are looking for. (Assuming that you know the field you are looking for)
You could improve upon the RegEx I did, or change the method as you wish to make it better. But I think the general concept addresses what you need to achieve. One problem I can see with my solution right off the bat is the fact that you can search for anything really, not just names of the selected fields - which begs the question, why use RegEx, and not Like statements? But again, as I mentioned, you can improve upon the RegEx and address specific requirements you may have. Using Like might limit you in that direction. (In other words, only you know what's good for you. I can't say that from here.)
You can play with the query online here: db-fiddle query and use https://regex101.com/ for testing your RegEx.
Disclaimer I'm not a PostgreSQL developer. There must be other, perhaps better ways of doing this. (:

Is there an alternation for wildcards using LIKE in T-SQL?

My query looks for dataset containing a particular label, let say:
SELECT * FROM Authors
WHERE Title LIKE #pattern
where #pattern is defined by user. So, %abc% would match abcd, 0abc, etc. Sometimes there are labels like
Xabc-ONE
blaYabc-TWO-sometext
Zabc-THREE
blubXabc-FOUR
and I'm looking for labels containing abc and ONE or TWO, something like %abc%(ONE|TWO)%. Is it possible?
You can add support for regular expressions to SQL Server via a CLR function, as shown in this answer, but it may not be possible for you in your environment. Check with your friendly sysadmin!
Maybe I don't understand your question right, but why not simply
SELECT * FROM Authors
WHERE Title LIKE '%abc%ONE%' OR '%abc%TWO%'
?
LIKE is just a search with wildcards, nothing more, so there's actually no other way of doing what you want with LIKE. If you need more, have a look into regular expressions. But be aware that it's slower than LIKE and in your case absolutely not necessary.
UPDATE:
From comments "don't want to care for what the user wants to match"...then simply do it like this:
SELECT * FROM Authors
WHERE Title LIKE CONCAT('%', $userInput, '%ONE%') OR CONCAT('%', $userInput, '%TWO%')
Or do I still don't get you right?
If you are using SQL Server you can enable the full text search engine and use the keyboards contains and near to find abc near ONE
I will say in the query(pseudo code)
Select something from table where CONTAINS(column_name, 'abc NEAR ONE')
http://msdn.microsoft.com/en-us/library/ms142568.aspx

SQL - searching database with the LIKE operator

Given your data stored somewhere in a database:
Hello my name is Tom I like dinosaurs to talk about SQL.
SQL is amazing. I really like SQL.
We want to implement a site search, allowing visitors to enter terms and return relating records. A user might search for:
Dinosaurs
And the SQL:
WHERE articleBody LIKE '%Dinosaurs%'
Copes fine with returning the correct set of records.
How would we cope however, if a user mispells dinosaurs? IE:
Dinosores
(Poor sore dino). How can we search allowing for error in spelling? We can associate common misspellings we see in search with the correct spelling, and then search on the original terms + corrected term, but this is time consuming to maintain.
Any way programatically?
Edit
Appears SOUNDEX could help, but can anyone give me an example using soundex where entering the search term:
Dinosores wrocks
returns records instead of doing:
WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'
which would return squadoosh?
If you're using SQL Server, have a look at SOUNDEX.
For your example:
select SOUNDEX('Dinosaurs'), SOUNDEX('Dinosores')
Returns identical values (D526) .
You can also use DIFFERENCE function (on same link as soundex) that will compare levels of similarity (4 being the most similar, 0 being the least).
SELECT DIFFERENCE('Dinosaurs', 'Dinosores'); --returns 4
Edit:
After hunting around a bit for a multi-text option, it seems that this isn't all that easy. I would refer you to the link on the Fuzzt Logic answer provided by #Neil Knight (+1 to that, for me!).
This stackoverflow article also details possible sources for implentations for Fuzzy Logic in TSQL. Once respondant also outlined Full text Indexing as a potential that you might want to investigate.
Perhaps your RDBMS has a SOUNDEX function? You didn't mention which one was involved here.
SQL Server's SOUNDEX
Just to throw an alternative out there. If SSIS is an option, then you can use Fuzzy Lookup.
SSIS Fuzzy Lookup
I'm not sure if introducing a separate "search engine" is possible, but if you look at products like the Google search appliance or Autonomy, these products can index a SQL database and provide more searching options - for example, handling misspellings as well as synonyms, search results weighting, alternative search recommendations, etc.
Also, SQL Server's full-text search feature can be configured to use a thesaurus, which might help:
http://msdn.microsoft.com/en-us/library/ms142491.aspx
Here is another SO question from someone setting up a thesaurus to handle common misspellings:
FORMSOF Thesaurus in SQL Server
Short answer, there is nothing built in to most SQL engines that can do dictionary-based correction of "fat fingers". SoundEx does work as a tool to find words that would sound alike and thus correct for phonetic misspellings, but if the user typed in "Dinosars" missing the final U, or truly "fat-fingered" it and entered "Dinosayrs", SoundEx would not return an exact match.
Sounds like you want something on the level of Google Search's "Did you mean __?" feature. I can tell you that is not as simple as it looks. At a 10,000-foot level, the search engine would look at each of those keywords and see if it's in a "dictionary" of known "good" search terms. If it isn't, it uses an algorithm much like a spell-checker suggestion to find the dictionary word that is the closest match (requires the fewest letter substitutions, additions, deletions and transpositions to turn the given word into the dictionary word). This will require some heavy procedural code, either in a stored proc or CLR Db function in your database, or in your business logic layer.
You can also try the SubString(), to eliminate the first 3 or so characters . Below is an example of how that can be achieved
SELECT Fname, Lname
FROM Table1 ,Table2
WHERE substr(Table1.Fname, 1,3) || substr(Table1.Lname,1 ,3) = substr(Table2.Fname, 1,3) || substr(Table2.Lname, 1 , 3))
ORDER BY Table1.Fname;

What is the best way to do a wildcard search in sql server 2005?

So I have a stored procedure that accepts a product code like 1234567890. I want to facilitate a wildcard search option for those products. (i.e. 123456*) and have it return all those products that match. What is the best way to do this?
I have in the past used something like below:
SELECT #product_code = REPLACE(#product_code, '*', '%')
and then do a LIKE search on the product_code field, but i feel like it can be improved.
What your doing already is about the best you can do.
One optimization you might try is to ensure there's an index on the columns you're allowing this on. SQL Server will still need to do a full scan for the wildcard search, but it'll be only over the specific index rather than the full table.
As always, checking the query plan before and after any changes is a great idea.
A couple of random ideas
It depends, but you might like to consider:
Always look for a substring by default. e.g. if the user enters "1234", you search for:
WHERE product like "%1234%"
Allow users full control. i.e. simply take their input and pass it to the LIKE clause. This means that they can come up with their own custom searches. This will only be useful if your users are interested in learning.
WHERE product like #input

Best way to implement a stored procedure with full text search

I would like to run a search with MSSQL Full text engine where given the following user input:
"Hollywood square"
I want the results to have both Hollywood and square[s] in them.
I can create a method on the web server (C#, ASP.NET) to dynamically produce a sql statement like this:
SELECT TITLE
FROM MOVIES
WHERE CONTAINS(TITLE,'"hollywood*"')
AND CONTAINS(TITLE, '"square*"')
Easy enough. HOWEVER, I would like this in a stored procedure for added speed benefit and security for adding parameters.
Can I have my cake and eat it too?
I agreed with above, look into AND clauses
SELECT TITLE
FROM MOVIES
WHERE CONTAINS(TITLE,'"hollywood*" AND "square*"')
However you shouldn't have to split the input sentences, you can use variable
SELECT TITLE
FROM MOVIES
WHERE CONTAINS(TITLE,#parameter)
by the way
search for the exact term (contains)
search for any term in the phrase (freetext)
The last time I had to do this (with MSSQL Server 2005) I ended up moving the whole search functionality over to Lucene (the Java version, though Lucene.Net now exists I believe). I had high hopes of the full text search but this specific problem annoyed me so much I gave up.
Have you tried using the AND logical operator in your string? I pass in a raw string to my sproc and stuff 'AND' between the words.
http://msdn.microsoft.com/en-us/library/ms187787.aspx