Are these two statements equivalent? (from NerdDinner tutorial) - sql

Basically I want to know if the two statements below are ultimately exactly the same. The NerdDinner tutorial states that IQueryable<> objects won't query the database until we attempt to access/iterate over the data or call ToList on it. So aside from returning the same exact items do the two statements below also PERFORM the same as far as querying the database is concerned? If I had a million records, would one be better than the other?
I have the following statement:
return from party in entities.Parties
where party.PartyDate > DateTime.Now
orderby party.PartyDate
select party;
is that the same as:
return entities.Parties.Where(p => p.PartyDate > DateTime.Now);

They will perform exactly the same, yes - aside from the ordering. With a very slight change, they will compile into exactly the same code:
// Extension method syntax
return from party in entities.Parties
where party.PartyDate > DateTime.Now
orderby party.PartyDate
select party;
// Query expression
return entities.Parties
.Where(party => party.PartyDate > DateTime.Now)
.OrderBy(party => party.PartyDate);
Note that as well as adding the OrderBy, I've also changed the name of the lambda expression parameter name to match that in the query expression.
Effectively the compiler transforms the first block into the second block before applying all the normal compilation steps. You can think of query expression support as being a bit like a preprocessor step.
I wrote about this in more detail in my Edulinq blog series, in Part 41: How Query Expressions Work.

YES. They are both the same, only the first one orders the results whereas the second one does no ordering.

The first statement converted to lambda form would look like this:
return entities.Parties
.Where(p => p.PartyDate > DateTime.Now)
.OrderBy(p => p.PartyDate)
.Select(p => p);
As far as performance is concerned, the query comprehension syntax is converted to the lambda syntax by the compiler; it's simply syntactic sugar. They perform equally because they're compiled to the same expression.

Yes, except the top one orders. Performance is pretty much on a par except for that.
First one is LinQ syntax, second one is a Lambda expression

Related

How to create where statement based on result of multiset

So, i would like to filter my query by exact match in result of multiset. Any ideas how to do it in JOOQ?
Example:
val result = dsl.select(
PLANT_PROTECTION_REGISTRATION.ID,
PLANT_PROTECTION_REGISTRATION.REGISTRATION_NUMBER,
PLANT_PROTECTION_REGISTRATION.PLANT_PROTECTION_ID,
multiset(
select(
PLANT_PROTECTION_APPLICATION.ORGANISM_ID,
PLANT_PROTECTION_APPLICATION.ORGANISM_TEXT
).from(PLANT_PROTECTION_APPLICATION)
.where(PLANT_PROTECTION_APPLICATION.REGISTRATION_ID.eq(PLANT_PROTECTION_REGISTRATION.ID))
).`as`("organisms")
).from(PLANT_PROTECTION_REGISTRATION)
// here i would like to filter my result only for records that their organisms contain specific
// organism id
.where("organisms.organism_id".contains(organismId))
I've explained the following answer more in depth in this blog post
About the MULTISET value constructor
The MULTISET value constructor operator is so powerful, we'd like to use it everywhere :) But the way it works is that it creates a correlated subquery, which produces a nested data structure, which is hard to further process in the same SQL statement. It's not impossible. You could create a derived table and then unnest the MULTISET again, but that would probably be quite unwieldy. I've shown an example using native PostgreSQL in that blog post
Alternative using MULTISET_AGG
If you're not nesting things much more deeply, how about using the lesser known and lesser hyped MULTISET_AGG alternative, instead? In your particular case, you could do:
// Using aliases to make things a bit more readable
val ppa = PLANT_PROTECTION_APPLICATION.as("ppa");
// Also, implicit join helps keep things more simple
val ppr = ppa.plantProtectionRegistration().as("ppr");
dsl.select(
ppr.ID,
ppr.REGISTRATION_NUMBER,
ppr.PLANT_PROTECTION_ID,
multisetAgg(ppa.ORGANISM_ID, ppa.ORGANISM_TEXT).`as`("organisms"))
.from(ppa)
.groupBy(
ppr.ID,
ppr.REGISTRATION_NUMBER,
ppr.PLANT_PROTECTION_ID)
// Retain only those groups which contain the desired ORGANISM_ID
.having(
boolOr(trueCondition())
.filterWhere(ppa.ORGANISM_ID.eq(organismId)))
.fetch()

NHibernate QueryOver condition on several properties

I have this armor table, that has three fields to identify individual designs: make, model and version.
I have to implement a search feature for our software, that lets a user search armors according to various criteria, among which their design.
Now, the users' idea of a design is a single string that contains make, model and version concatenated, so the entry is that single string. Let's say they want to look up the specifications for make FH, model TT, version 27, they'll think of (and type) "FHTT27".
We use an IQueryOver object, upon which we add successive conditions according to the criteria. For the design, our code is
z_quoQuery = z_quoQuery.And(armor => armor.make + armor.model + armor.version == z_strDesign);
Which raises an InvalidOperationException, "variable 'armor' of type 'IArmor' referenced from scope '', but it is not defined".
This is described as a bug here: https://github.com/mbdavid/LiteDB/issues/637
After a lot of trial and error, it seems that syntaxs that don't use the armor variable first raise that exception.
Obviously, I have to go another route at least for now, but after searching for some time I can't seem to find how. I thought of using something like
z_quoQuery = z_quoQuery.And(armor => armor.make == z_strDesign.SubString(0, 2).
And(armor => armor.model == z_strDesign.SubString(2, 2).
And(armor => armor.version == z_strDesign.SubString(4, 2);
Unfortunately, the fields are liable to have variable lengths. For instance, another set of values for make, model, and version might be, respectively, "NGI", "928", and "RX", that the code above would parse wrong. So I can't bypass the difficulty that way. Nor can I use a RegEx.
Neither can I make up a property in my Armor class that would concatenate all three properties, since it cannot be converted to SQL by NHibernate.
Has someone an idea of how to do it?
Maybe I should use an explicit SQL condition here, but how would it mix with other conditions?
It seems you can use Projections.Concat to solve your issue:
z_quoQuery = z_quoQuery.And(armor => Projections.Concat(armor.make, armor.model, armor.version) == z_strDesign);

Is there a way to use the LIKE operator from an entity framework query?

OK, I want to use the LIKE keyword from an Entity Framework query for a rather unorthodox reason - I want to match strings more precisely than when using the equals operator.
Because the equals operator automatically pads the string to be matched with spaces such that col = 'foo ' will actually return a row where col equals 'foo' OR 'foo ', I want to force trailing whitespaces to be taken into account, and the LIKE operator actually does that.
I know that you can coerce Entity Framework into using the LIKE operator using .StartsWith, .EndsWith, and .Contains in a query. However, as might be expected, this causes EF to prefix, suffix, and surround the queried text with wildcard % characters. Is there a way I can actually get Entity Framework to directly use the LIKE operator in SQL to match a string in a query of mine, without adding wildcard characters? Ideally it would look like this:
string usernameToMatch = "admin ";
if (context.Users.Where(usr => usr.Username.Like(usernameToMatch)).Any()) {
// An account with username 'admin ' ACTUALLY exists
}
else {
// An account with username 'admin' may exist, but 'admin ' doesn't
}
I can't find a way to do this directly; right now, the best I can think of is this hack:
context.Users.Where(usr =>
usr.Username.StartsWith(usernameToMatch) &&
usr.Username.EndsWith(usernameToMatch) &&
usr.Username == usernameToMatch
)
Is there a better way? By the way I don't want to use PATINDEX because it looks like a SQL Server-specific thing, not portable between databases.
There isn't a way to get EF to use LIKE in its query, However you could write a stored procedure that finds users using LIKE with an input parameter and use EF to hit your stored procedure.
Your particular situation however seems to be more of a data integrity issue though. You shouldn't be allowing users to register usernames that start or end with a space (username.Trim()) for pretty much this reason. Once you do that then this particular issue goes away entirely.
Also, allowing 'rough' matches on authentication details is beyond insecure. Don't do it.
Well there doesn't seem to be a way to get EF to use the LIKE operator without padding it at the beginning or end with wildcard characters, as I mentioned in my question, so I ended up using this combination which, while a bit ugly, has the same effect as a LIKE without any wildcards:
context.Users.Where(usr =>
usr.Username.StartsWith(usernameToMatch) &&
usr.Username.EndsWith(usernameToMatch) &&
usr.Username == usernameToMatch
)
So, if the value is LIKE '[usernameToMatch]%' and it's LIKE '%[usernameToMatch]' and it = '[usernameToMatch]' then it matches exactly.

LINQ to SQL indirect variable

First, may I apologise if this question is too basic for this forum. I am very new to this and am struggling with a number of basics - but am persevering!
I have a problem in that I want to create a LINQ to SQL query with a WHERE clause with an indirect reference to one of the columns in my database. For example, if I had some code that looked something like this:
var PLMatches = from PLMat in db1.PLAccountHeaders
where PLMat.CompanyAlphaId.Equals(CoId)
&& dbField.Equals(Limit)
select PLMat;
such that dbField would be a variable containing the name of the database field. So, if the value of dbField was"PLMat.ItemCode" it would happily go away and return all instances of records with an ItemCode equal to the value of Limit and if value of dbField was"PLMat.ItemName" it would happily go away and return all instances of records with an ItemName equal to the value of Limit and so on.
I would really appreciate some help on this both to answer a very specific problem and I am sure it will enhance my basic understanding.
Many thanks
i dont think that what you are asking for is possible in LINQ, you can however use the Dynamic Linq Library.
here's the link
You download the zip, unzip it and then just reference it in your solution.
you can then do something like this:
// the "normal" LINQ way:
var query = from x in ctx.People
where x.city == "Boston" && x.age > 18
orderby x.ID
select x;
// the Dynamic Linq way:
var query = database.People
.Where("city = Boston AND age > 18")
.OrderBy("id")
.Select("New(Person as Name, Age)");
Another way (besides dynamic linq) is composing your query conditionally. Like so:
var PLMatches = from PLMat in db1.PLAccountHeaders
where PLMat.CompanyAlphaId.Equals(CoId)
select PLMat;
if (dbField == "ItemCode")
PLMatches = PLMatches.Where(m => m.ItemCode == Limit);
else if (dbField == "ItemName")
PLMatches = PLMatches.Where(m => m.ItemName == Limit);
else if (dbField == ...
The code looks a bit repetitive, but I always prefer to do it this way if the number of possible conditions is not too large. The advantages are, first, that the code integrity check is at compile time and not at runtime (as with dynamic linq) and, second, it is very clear what the code is about. (The second advantage also goes for dynamic linq, but less so for a third alternative: building expressions in code).
You can compose linq queries this way because of deferred execution. PLMatches = PLMatches.Where... only changes the query, but nothing is executed yet.

In Django, what is the most efficient way to check for an empty query set?

I've heard suggestions to use the following:
if qs.exists():
...
if qs.count():
...
try:
qs[0]
except IndexError:
...
Copied from comment below: "I'm looking for a statement like "In MySQL and PostgreSQL count() is faster for short queries, exists() is faster for long queries, and use QuerySet[0] when it's likely that you're going to need the first element and you want to check that it exists. However, when count() is faster it's only marginally faster so it's advisable to always use exists() when choosing between the two."
query.exists() is the most efficient way.
Especially on postgres count() can be very expensive, sometimes more expensive then a normal select query.
exists() runs a query with no select_related, field selections or sorting and only fetches a single record. This is much faster then counting the entire query with table joins and sorting.
qs[0] would still includes select_related, field selections and sorting; so it would be more expensive.
The Django source code is here (django/db/models/sql/query.py RawQuery.has_results):
https://github.com/django/django/blob/60e52a047e55bc4cd5a93a8bd4d07baed27e9a22/django/db/models/sql/query.py#L499
def has_results(self, using):
q = self.clone()
if not q.distinct:
q.clear_select_clause()
q.clear_ordering(True)
q.set_limits(high=1)
compiler = q.get_compiler(using=using)
return compiler.has_results()
Another gotcha that got me the other day is invoking a QuerySet in an if statement. That executes and returns the whole query !
If the variable query_set may be None (unset argument to your function) then use:
if query_set is None:
#
not:
if query_set:
# you just hit the database
exists() is generally faster than count(), though not always (see test below). count() can be used to check for both existence and length.
Only use qs[0]if you actually need the object. It's significantly slower if you're just testing for existence.
On Amazon SimpleDB, 400,000 rows:
bare qs: 325.00 usec/pass
qs.exists(): 144.46 usec/pass
qs.count() 144.33 usec/pass
qs[0]: 324.98 usec/pass
On MySQL, 57 rows:
bare qs: 1.07 usec/pass
qs.exists(): 1.21 usec/pass
qs.count(): 1.16 usec/pass
qs[0]: 1.27 usec/pass
I used a random query for each pass to reduce the risk of db-level caching. Test code:
import timeit
base = """
import random
from plum.bacon.models import Session
ip_addr = str(random.randint(0,256))+'.'+str(random.randint(0,256))+'.'+str(random.randint(0,256))+'.'+str(random.randint(0,256))
try:
session = Session.objects.filter(ip=ip_addr)%s
if session:
pass
except:
pass
"""
query_variatons = [
base % "",
base % ".exists()",
base % ".count()",
base % "[0]"
]
for s in query_variatons:
t = timeit.Timer(stmt=s)
print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100000)
It depends on use context.
According to documentation:
Use QuerySet.count()
...if you only want the count, rather than doing len(queryset).
Use QuerySet.exists()
...if you only want to find out if at least one result exists, rather than if queryset.
But:
Don't overuse count() and exists()
If you are going to need other data from the QuerySet, just evaluate it.
So, I think that QuerySet.exists() is the most recommended way if you just want to check for an empty QuerySet. On the other hand, if you want to use results later, it's better to evaluate it.
I also think that your third option is the most expensive, because you need to retrieve all records just to check if any exists.
#Sam Odio's solution was a decent starting point but there's a few flaws in the methodology, namely:
The random IP address could end up matching 0 or very few results
An exception would skew the results, so we should aim to avoid handling exceptions
So instead of filtering something that might match, I decided to exclude something that definitely won't match, hopefully still avoiding the DB cache, but also ensuring the same number of rows.
I only tested against a local MySQL database, with the dataset:
>>> Session.objects.all().count()
40219
Timing code:
import timeit
base = """
import random
import string
from django.contrib.sessions.models import Session
never_match = ''.join(random.choice(string.ascii_uppercase) for _ in range(10))
sessions = Session.objects.exclude(session_key=never_match){}
if sessions:
pass
"""
s = base.format('count')
query_variations = [
"",
".exists()",
".count()",
"[0]",
]
for variation in query_variations:
t = timeit.Timer(stmt=base.format(variation))
print "{} => {:02f} usec/pass".format(variation.ljust(10), 1000000 * t.timeit(number=100)/100000)
outputs:
=> 1390.177710 usec/pass
.exists() => 2.479579 usec/pass
.count() => 22.426991 usec/pass
[0] => 2.437079 usec/pass
So you can see that count() is roughly 9 times slower than exists() for this dataset.
[0] is also fast, but it needs exception handling.
I would imagine that the first method is the most efficient way (you could easily implement it in terms of the second method, so perhaps they are almost identical). The last one requires actually getting a whole object from the database, so it is almost certainly the most expensive.
But, like all of these questions, the only way to know for your particular database, schema and dataset is to test it yourself.
I was also in this trouble. Yes exists() is faster for most cases but it depends a lot on the type of queryset you are trying to do. For example for a simple query like:
my_objects = MyObject.objets.all() you would use my_objects.exists(). But if you were to do a query like: MyObject.objects.filter(some_attr='anything').exclude(something='what').distinct('key').values() probably you need to test which one fits better (exists(), count(), len(my_objects)). Remember the DB engine is the one who will perform the query, and to get a good result in performance, it depends a lot on the data structure and how the query is formed. One thing you can do is, audit the queries and test them on your own against the DB engine and compare your results you will be surprised by how naive sometimes django is, try QueryCountMiddleware to see all the queries executed, and you will see what i am talking about.