Is there a way to use the LIKE operator from an entity framework query? - sql

OK, I want to use the LIKE keyword from an Entity Framework query for a rather unorthodox reason - I want to match strings more precisely than when using the equals operator.
Because the equals operator automatically pads the string to be matched with spaces such that col = 'foo ' will actually return a row where col equals 'foo' OR 'foo ', I want to force trailing whitespaces to be taken into account, and the LIKE operator actually does that.
I know that you can coerce Entity Framework into using the LIKE operator using .StartsWith, .EndsWith, and .Contains in a query. However, as might be expected, this causes EF to prefix, suffix, and surround the queried text with wildcard % characters. Is there a way I can actually get Entity Framework to directly use the LIKE operator in SQL to match a string in a query of mine, without adding wildcard characters? Ideally it would look like this:
string usernameToMatch = "admin ";
if (context.Users.Where(usr => usr.Username.Like(usernameToMatch)).Any()) {
// An account with username 'admin ' ACTUALLY exists
}
else {
// An account with username 'admin' may exist, but 'admin ' doesn't
}
I can't find a way to do this directly; right now, the best I can think of is this hack:
context.Users.Where(usr =>
usr.Username.StartsWith(usernameToMatch) &&
usr.Username.EndsWith(usernameToMatch) &&
usr.Username == usernameToMatch
)
Is there a better way? By the way I don't want to use PATINDEX because it looks like a SQL Server-specific thing, not portable between databases.

There isn't a way to get EF to use LIKE in its query, However you could write a stored procedure that finds users using LIKE with an input parameter and use EF to hit your stored procedure.
Your particular situation however seems to be more of a data integrity issue though. You shouldn't be allowing users to register usernames that start or end with a space (username.Trim()) for pretty much this reason. Once you do that then this particular issue goes away entirely.
Also, allowing 'rough' matches on authentication details is beyond insecure. Don't do it.

Well there doesn't seem to be a way to get EF to use the LIKE operator without padding it at the beginning or end with wildcard characters, as I mentioned in my question, so I ended up using this combination which, while a bit ugly, has the same effect as a LIKE without any wildcards:
context.Users.Where(usr =>
usr.Username.StartsWith(usernameToMatch) &&
usr.Username.EndsWith(usernameToMatch) &&
usr.Username == usernameToMatch
)
So, if the value is LIKE '[usernameToMatch]%' and it's LIKE '%[usernameToMatch]' and it = '[usernameToMatch]' then it matches exactly.

Related

How to search in lucene where there is only a single token for a field

I am creating an index where the documents are only a single term.
I am indexing domain names, so the field "domain" would look like:
example.com
thisiscool.com
justtesting.org
cnn.com
I am creating my search terms etc. programatically, and because all my document field is just a single term, it appears as though my searches won't work as they are since there is only a single term and if I add multiple terms in a boolean query it will never find anything.
How should I be searching given I have only a single term? I want to make this as efficient as possible.
Query term = new TermQuery("domain", "this")
Query term2 = new TermQuery("domain", "cool")
// add to boolean query
bq.add(term, Occur.MUST)
bq.add(term2, Occur.MUST)
indexSearcher.search(bq, 100)
I was expecting to get "thisiscool.com" back, but I get 0 hits. My guess is because lucene can't break things down into tokens, so it will never find any document that has both tokens "this" and "cool".
How should I be searching given this scenerio?
Apply a wildcard to your search clause.
Query term = new TermQuery("domain", "this*");
Query term2 = new TermQuery("domain", "cool*"); // *cool* won't work sadly
However, that might not work because the logic is going to result in a query like this, where the domain has to begin with "this" as well as "cool"
bq.add(term, Occur.MUST)
bq.add(term2, Occur.MUST)
=> +domain:this* +domain:cool*
Query term = new TermQuery("domain", "this*cool*");
=> +domain:this*cool* // probably gets hits
If you're using newer versions then you can use regular expressions in queries:
http://lucene.apache.org/core/6_6_0/core/org/apache/lucene/util/automaton/RegExp.html
The above example isn't actually how you should do this. I tested it out, and it doesn't even really work. What you'll want to do is build specialized queries, such as PrefixQuery, WildcardQuery, or RegexpQuery.
Additionally, if you're not using QueryParser or something that takes an Analyzer, queries have to match exactly to what's in your index. If domain is a TextField it might have been lowercased or had something else happen to it, so you'll need to know that too.
I'd just use regex.
RegExp r = new RegExp("this.*cool");
Query q = new RegexpQuery(new Term("domain", r.toString()));
It can be slow, but if you don't prefix with any char it should be perfectly fine. I'm also not entirely sure how to ignore case with this, but that might be default.

How to _properly_ escape LIKE queries in Rails? [duplicate]

I want to match a url field against a url prefix (which may contain percent signs), e.g. .where("url LIKE ?", "#{some_url}%").
What's the most Rails way?
From Rails version 4.2.x there is an active record method called sanitize_sql_like. So, you can do in your model a search scope like:
scope :search, -> search { where('"accounts"."name" LIKE ?', "#{sanitize_sql_like(search)}%") }
and call the scope like:
Account.search('Test_%')
The resulting escaped sql string is:
SELECT "accounts".* FROM "accounts" WHERE ("accounts"."name" LIKE 'Test\_\%%')
Read more here: http://edgeapi.rubyonrails.org/classes/ActiveRecord/Sanitization/ClassMethods.html
If I understand correctly, you're worried about "%" appearing inside some_url and rightly so; you should also be worried about embedded underscores ("_") too, they're the LIKE version of "." in a regex. I don't think there is any Rails-specific way of doing this so you're left with gsub:
.where('url like ?', some_url.gsub('%', '\\\\\%').gsub('_', '\\\\\_') + '%')
There's no need for string interpolation here either. You need to double the backslashes to escape their meaning from the database's string parser so that the LIKE parser will see simple "\%" and know to ignore the escaped percent sign.
You should check your logs to make sure the two backslashes get through. I'm getting confusing results from checking things in irb, using five (!) gets the right output but I don't see the sense in it; if anyone does see the sense in five of them, an explanatory comment would be appreciated.
UPDATE: Jason King has kindly offered a simplification for the nightmare of escaped escape characters. This lets you specify a temporary escape character so you can do things like this:
.where("url LIKE ? ESCAPE '!'", some_url.gsub(/[!%_]/) { |x| '!' + x })
I've also switched to the block form of gsub to make it a bit less nasty.
This is standard SQL92 syntax, so will work in any DB that supports that, including PostgreSQL, MySQL and SQLite.
Embedding one language inside another is always a bit of a nightmarish kludge and there's not that much you can do about it. There will always be ugly little bits that you just have to grin and bear.
https://gist.github.com/3656283
With this code,
Item.where(Item.arel_table[:name].matches("%sample!%code%"))
correctly escapes % between "sample" and "code", and matches "AAAsample%codeBBB" but does not for "AAAsampleBBBcodeCCC" on MySQL, PostgreSQL and SQLite3 at least.
Post.where('url like ?', "%#{some_url + '%'}%)

Negating a string to be passed to mssql server

I have a text field which acts a filter field. It checks for equals, contains and starts with. My problem is, with out changing any of my code, can I check for the 'does not contain', 'does not start with' and so on by just using the string i'm passing with something like '!' operator or "<>"?
for example:
I want to get all the records that do not have 'a' in them, so can I pass the string as "!a" or "<>a" or something so that I can get the required records? (I know these two don't work cause I tried.)
You have to use the keyword NOT, e.g.
SELECT * FROM Table WHERE Foo NOT LIKE '%bar%'
Please refer http://www.sqlservercentral.com/Forums/Topic1213442-338-1.aspx
Can solve with case in where clause

How to make Lucene match all words in query?

I am using Lucene to allow a user to search for words in a large number of documents. Lucene seems to default to returning all documents containing any of the words entered.
Is it possible to change this behaviour? I know that '+' can be use to force a term to be included but I would like to make that the default action.
Ideally I would like functionality similar to Google's: '-' to exclude words and "abc xyz" to group words.
Just to clarify
I also thought of inserting '+' into all spaces in the query. I just wanted to avoid detecting grouped terms (brackets, quotes etc) and potentially breaking the query. Is there another approach?
This looks similar to the Lucene Sentence Search question. If you're interested, this is how I answered that question:
String defaultField = ...;
Analyzer analyzer = ...;
QueryParser queryParser = new QueryParser(defaultField, analyzer);
queryParser.setDefaultOperator(QueryParser.Operator.AND);
Query query = queryParser.parse("Searching is fun");
Like Adam said, there's no need to do anything to the query string. QueryParser's setDefaultOperator does exactly what you're asking for.
Why not just preparse the user search input and adjust it to fit your criteria using the Lucene query syntax before passing it on to Lucene. Alternatively, you could just create some help documentation on how to use the standard syntax to create a specific query and let the user decide how the query should be performed.
Lucene has a extensive query language as described here that describes everything you want except for + being the default but that's something you can simple handle by replacing spaces with +. So the only thing you need to do is define the format you want people to enter their search queries in (I would strongly advise to adhere to the default Lucene syntax) and then you can write the transformations from your own syntax to the Lucene syntax.
The behavior is hard-coded in method addClause(List, int, int, Query) of class org.apache.lucene.queryParser.QueryParser, so the only way to change the behavior (other than the workarounds above) is to change that method. The end of the method looks like this:
if (required && !prohibited)
clauses.addElement(new BooleanClause(q, BooleanClause.Occur.MUST));
else if (!required && !prohibited)
clauses.addElement(new BooleanClause(q, BooleanClause.Occur.SHOULD));
else if (!required && prohibited)
clauses.addElement(new BooleanClause(q, BooleanClause.Occur.MUST_NOT));
else
throw new RuntimeException("Clause cannot be both required and prohibited");
Changing "SHOULD" to "MUST" should make clauses (e.g. words) required by default.

T-SQL: checking for email format

I have this scenario where I need data integrity in the physical database. For example, I have a variable of #email_address VARCHAR(200) and I want to check if the value of #email_address is of email format. Anyone has any idea how to check format in T-SQL?
Many thanks!
I tested the following query with many different wrong and valid email addresses. It should do the job.
IF (
CHARINDEX(' ',LTRIM(RTRIM(#email_address))) = 0
AND LEFT(LTRIM(#email_address),1) <> '#'
AND RIGHT(RTRIM(#email_address),1) <> '.'
AND CHARINDEX('.',#email_address ,CHARINDEX('#',#email_address)) - CHARINDEX('#',#email_address ) > 1
AND LEN(LTRIM(RTRIM(#email_address ))) - LEN(REPLACE(LTRIM(RTRIM(#email_address)),'#','')) = 1
AND CHARINDEX('.',REVERSE(LTRIM(RTRIM(#email_address)))) >= 3
AND (CHARINDEX('.#',#email_address ) = 0 AND CHARINDEX('..',#email_address ) = 0)
)
print 'valid email address'
ELSE
print 'not valid'
It checks these conditions:
No embedded spaces
'#' can't be the first character of an email address
'.' can't be the last character of an email address
There must be a '.' somewhere after '#'
the '#' sign is allowed
Domain name should end with at least 2 character extension
can't have patterns like '.#' and '..'
AFAIK there is no good way to do this.
The email format standard is so complex parsers have been known to run to thousands of lines of code, but even if you were to use a simpler form which would fail some obscure but valid addresses you'd have to do it without regular expressions which are not natively supported by T-SQL (again, I'm not 100% on that), leaving you with a simple fallback of somethign like:
LIKE '%_#_%_.__%'
..or similar.
My feeling is generally that you shouln't be doing this at the last possible moment though (as you insert into a DB) you should be doing it at the first opportunity and/or a common gateway (the controller which actually makes the SQL insert request), where incidentally you would have the advantage of regex, and possibly even a library which does the "real" validation for you.
If you use SQL 2005 or 2008 you might want to look at writing CLR stored proceudues and use the .NET regex engine like this. If you're using SQL 2000 or earlier you can use the VBScript scripting engine's regular expression like ths. You could also use an extended stored procedure like this
There is no easy way to do it in T-SQL, I am afraid. To validate all the varieties of email address allowed byRFC 2822 you will need to use a regular expression.
More info here.
You will need to define your scope, if you want to simplify it.