Is there a direct LINQ syntax for finding the members of set A that are absent from set B? In SQL I would write this
SELECT A.* FROM A LEFT JOIN B ON A.ID = B.ID WHERE B.ID IS NULL
See the MSDN documentation on the Except operator.
var results = from itemA in A
where !B.Any(itemB => itemB.Id == itemA.Id)
select itemA;
I believe your LINQ would be something like the following.
var items = A.Except(
from itemA in A
from itemB in B
where itemA.ID == itemB.ID
select itemA);
Update
As indicated by Maslow in the comments, this may well not be the most performant query. As with any code, it is important to carry out some level of profiling to remove bottlenecks and inefficient algorithms. In this case, chaowman's answer provides a better performing result.
The reasons can be seen with a little examination of the queries. In the example I provided, there are at least two loops over the A collection - 1 to combine the A and B list, and the other to perform the Except operation - whereas in chaowman's answer (reproduced below), the A collection is only iterated once.
// chaowman's solution only iterates A once and partially iterates B
var results = from itemA in A
where !B.Any(itemB => itemB.Id == itemA.Id)
select itemA;
Also, in my answer, the B collection is iterated in its entirety for every item in A, whereas in chaowman's answer, it is only iterated upto the point at which a match is found.
As you can see, even before looking at the SQL generated, you can spot potential performance issues just from the query itself. Thanks again to Maslow for highlighting this.
Related
I have the following piece of code which runs quickly (<1s):
SELECT
[Policy].[Value] AS [PolicyId]
,[Person].[Value] AS [PersonId]
,[Person].[Index] AS [PersonIndex]
FROM
[dbo].[View] AS [Policy]
INNER JOIN [dbo].[ViewPerson] AS [Person] WITH(INDEX([Index])) ON ([Policy].[CollectionId] = [Person].[CollectionId]
AND [Person].[Name] = 'PersonId' AND [Policy].[Name] = 'PolicyId')
WHERE
[Policy].[CollectionId] = 10003
-- AND [Policy].[Value] = [Person].[Value]
This will return 2 rows from my database. When I comment out the last line to apply a stronger filter it returns only 1 row from my database, but will take much longer to run (~20s).
Is there a method to reduce the time this query takes to run when a filter is applied to it? Ideally I'd like it to run at the same speed as the original.
You were told in comments, that forcing the engine to use a special index is - in most cases - not the best idea. The engine is pretty good in finding the best plan and it will work best if you let it go its own route.
Secondly you were told already, that the execution plan is the best place to start. As we do not see any details, the following is pure guessing:
If I get this correctly, your query will use CollectionId to filter for one given id (just very few Policy rows). For these rows, the JOIN on a VIEW (we have no idea, what is behind here!) tries to link person rows.
The filter should work against a very reduced set.
Your observations let me assume, that the second line in WHERE is dealing with a much larger set. I'm pretty sure, that the filter for CollectionId=10003 pulls after the other filter... The execution plan will show the details...
What you can do:
Take away the index hint
Try to add the second line in the WHERE with AND to the ON-clause of the JOIN.
Something along this:
SELECT
[Policy].[Value] AS [PolicyId]
,[Person].[Value] AS [PersonId]
,[Person].[Index] AS [PersonIndex]
FROM
[dbo].[View] AS [Policy]
INNER JOIN [dbo].[ViewPerson] AS [Person] ON ([Policy].[CollectionId] = [Person].[CollectionId]
AND [Person].[Name] = 'PersonId'
AND [Policy].[Name] = 'PolicyId'
AND [Policy].[Value] = [Person].[Value])
WHERE
[Policy].[CollectionId] = 10003;
I have a local access database and in it a query which takes values from a form to populate a drop down menu. The weird (to me) thing is that with most options this query is quick (blink of an eye), but with a few options it's very slow (>10 seconds).
What the query is does is a follows: It populates a dropdown menu to record animals seen at a specific sighting, but only those animals which have not been recorded at that specific sighting yet (to avoid duplicate entries).
SELECT DISTINCT tblAnimals.AnimalID, tblAnimals.Nickname, tblAnimals.Species
FROM tblSightings INNER JOIN (tblAnimals INNER JOIN tblAnimalsatSighting ON tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID) ON tblSightings.SightingID = tblAnimalsatSighting.SightingID
WHERE (((tblAnimals.Species)=[form]![Species]) AND ((tblAnimals.CurrentGroup)=[form]![AnimalGroup2]) AND ((tblAnimals.[Dead?])=False) AND ((Exists (select tblAnimalsatSighting.AnimalID FROM tblAnimalsatSighting WHERE tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID AND tblAnimalsatSighting.SightingID = [form]![SightingID]))=False));
It performs well for all groups of 2 of the 4 possible species, for 1 species it performs well for 4 of the 5 groups, but not for the last group, and for the last species it performs very slowly for both groups. Anybody an idea what can be the cause of this kind of behavior? Is it problems with the query? Or duplicate entries in the tables which can cause this? I don't think it's duplicates in the tables, I've checked that, and there are some, but they appear both for groups where there are problems and where there aren't. Could I re-write the query so it performs faster?
As noted in our comments above, you confirmed that the extra joins were not really need and were in fact going to limit the results to animal that had already had a sighting. Those joins would also likely contribute to a slowdown.
I know that Access probably added most of the parentheses automatically but I've removed them and converted the subquery to a not exists form that's a lot more readable.
SELECT tblAnimals.AnimalID, tblAnimals.Nickname, tblAnimals.Species
FROM tblAnimals
WHERE
tblAnimals.Species = [form]![Species]
AND tblAnimals.CurrentGroup = [form]![AnimalGroup2]
AND tblAnimals.[Dead?] = False
AND NOT EXISTS (
SELECT tblAnimalsatSighting.AnimalID
FROM tblAnimalsatSighting
WHERE
tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID
AND tblAnimalsatSighting.SightingID = [form]![SightingID]
);
I am working on a massive join at work and have very limited resources in terms of being able to add indexes and such as well as what I can do in the query itself due to the environment (i.e. I can only select data, no variables or table creations allowed). I have read somewhere that a subquery will automatically index the result, is this true? Also for my major join tables (3) each has ~140K rows. I have to join 2 extra tables to ensure filtering is correct. I have the query listed below which I currently have criteria on the JOIN clause. Another question is if I move my criteria to a where clause either in or out of the subquery will it benefit?
SELECT *
FROM (SELECT NULL AS A1,
DFS_ROHEADER.TECHID,
DFS_ROHEADER.RONUMBER,
DFS_ROHEADER.CUSTOMERNUMBER,
DFS_CUSTOMER.BNAME,
DFS_ROHEADER.UNITNUMBER,
DFS_ROHEADER.MILEAGE,
DFS_ROHEADER.OPENEDDATE,
DFS_ROHEADER.CLOSEDDATE,
DFS_ROHEADER.STATUS,
DFS_ROHEADER.PONUMBER,
DFS_TECH.REGION,
DFS_TECH.RSM,
DFS_ROPART.PARTID,
CONVERT(NVARCHAR(max), DFS_RODETAIL.STORY) AS STORY
FROM DFS_ROHEADER
LEFT JOIN DFS_CUSTOMER
ON DFS_ROHEADER.CUSTOMERNUMBER = DFS_CUSTOMER.CUST_NO
LEFT JOIN DFS_TECH
ON DFS_ROHEADER.TECHID = DFS_TECH.TECHID
INNER JOIN DFS_RODETAIL
ON DFS_ROHEADER.RONUMBER = DFS_RODETAIL.RONUMBER
INNER JOIN DFS_ROPART
ON DFS_RODETAIL.RONUMBER = DFS_ROPART.RONUMBER
AND DFS_RODETAIL.LINENUMBER = DFS_ROPART.LINENUMBER
AND DFS_ROHEADER.RONUMBER LIKE '%$FF_RONumber%'
AND DFS_ROHEADER.UNITNUMBER LIKE '%$FF_UnitNumber%'
AND DFS_ROHEADER.PONUMBER LIKE '%$FF_PONumber%'
AND ( DFS_CUSTOMER.BNAME LIKE '%$FF_Customer%'
OR DFS_CUSTOMER.BNAME IS NULL )
AND DFS_ROHEADER.TECHID LIKE '%$FF_TechID%'
AND DFS_ROHEADER.CLOSEDDATE BETWEEN
FF_ClosedBegin AND FF_ClosedEnd
AND ( DFS_TECH.REGION LIKE '%$FilterRegion%'
OR DFS_TECH.REGION IS NULL )
AND ( DFS_TECH.RSM LIKE '%$FF_RSM%'
OR DFS_TECH.RSM IS NULL )
AND DFS_RODETAIL.STORY LIKE '%$FF_Story%'
AND DFS_ROPART.PARTID LIKE '%$FF_PartID%'
WHERE DFS_ROHEADER.DELETED_BY < 0
AND DFS_RODETAIL.DELETED_BY < 0
AND DFS_ROPART.DELETED_BY < 0) T
ORDER BY T.RONUMBER
This query works; however, it can take forever to run, and can timeout. I have other queries that also run in the environment and I will take whatever you can give me in terms of suggestions and apply it to those. I am using SQLServer 2000, Thanks for the help.
EDIT:
Execution Plan:
https://dl.dropboxusercontent.com/u/99733863/ExecutionPlan.sqlplan
UPDATE:
I have come to the conclusion the environment I'm working in is the cause of the problem. My query works as intended and is not slow at all (1 sec. for 18,000 rows). As stated in the comments I have to fill grids with limited flexibility and I believe that these grids fill by first filling a temporary grid with the SQL statement and then copying row by row into the desired grid. There is a good chance that this is the cause of my issues. Thanks for the help.
I have come to the conclusion the environment I'm working in is the cause of the problem. My query works as intended and is not slow at all (1 sec. for 18,000 rows). As stated in the comments I have to fill grids with limited flexibility and I believe that these grids fill by first filling a temporary grid with the SQL statement and then copying row by row into the desired grid. There is a good chance that this is the cause of my issues. Thanks for the help everyone.
My 2 cents here.. In general LIKE is not very well optimized. In your case you also seem to be using LIKE with '%value%'. In that case the query optimizer has to scan the entire index. At a minimum I would see if there is a way to avoid using this.
I've been playing around with LINQ to SQL and I just have a couple of simple questions:
When do I need Select on the end of my query?
When can I omit the Select?
Here are my example queries:
Dim pageRoute = From r In db.PageRoutes Where r.PageId = pageId Order By r.Id Descending
Dim pageRoute = From r In db.PageRoutes Where r.PageId = pageId Order By r.Id Descending
Dim dp = From r In db.DownloadPageOnlineOnlies Where r.PageId = pageId Order By r.Weight Descending, r.Id Ascending
Dim download = (From r In db.Downloads Where r.Id = id).First
Are any of them technically wrong?
Could they be improved with a Select or something else?
In a nutshell, I don't understand when I would need either:
Select r
Select r.AColumnINeed, r.BColumnINeed (does this improve performance?)
Thanks.
P.S. I like to write my LINQ queries on one line unless they are really big.
The select portion of the linq statement is completely optional if and only if you want to get a full object out of the collection that matches your where clause. If you want individual value(s) from an object in the collection being LINQ'd through then you need to use a select clause.
I personally always put the select r clause on the end just out of pure habit but I have come across a few issues with other peoples code when I have option strict on and they did not when they wrote the LINQ. Leaving the select clause off yields multiple late binding errors if you decide to turn option strict on in the future for whatever reason.
so in short you don't need the select clause but you are only helping yourself later on down the line if you decide to turn option strict on. And it does make your code much more readable in my opinion.
Let's have a table with 20 columns. Take a query WITH select (2 columns) and one WITHOUT. The execution plan of the two can be different and there's much less data to transfer from the database server in the former case.
it a good practice to use select plus if you have a query that requests a more precise result and like you only want a or a and b from you query you might use select and it saves the memory allocation for your query since it know exact number of variables you are going to return.
Sorry if I'm not wording this too well, but let me try to explain what I'm doing. I have a main object of class A, that has multiple objects of classes B, C, D and E.
such that:
Class ObjectA
{
ObjectB[] myObjectBs;
ObjectC[] myObjectCs;
ObjectD[] myObjectDs;
ObjectE[] myObjectEs;
}
where A---B mapping is 1 to many, for B, C, D and E. That is, all B,C,D,E objects are associated with only one object A.
I'm storing the data for all these objects in a database, with Table A holding all the data for the instances of Class A, etc.
Now, when getting the data for this at run time on the fly, I'm running 5 different queries for each object.
(very simplified psuedocode)
objectA=sql("select * from tableA where id=#id#");
objectA.setObjectBs(sql("select * from tableB where a_id=#id#");
objectA.setObjectCs(sql("select * from tableC where a_id=#id#");
objectA.setObjectDs(sql("select * from tableD where a_id=#id#");
objectA.setObjectEs(sql("select * from tableE where a_id=#id#");
if that makes sense.
Now, I'm wondering, is this the most efficient way of doing it? I feel like there should be a way to get all this info in 1 query, but doing something like "select * from a,b,c,d,e where a.id = #id# and b.a_id = #id# and c.a_id = #id# and d.a_id = #id# and e.a_id = #id#" will give a result set with all the columns of A,B,C,D,E for each row, and there will be many many more rows that I'd be needing.
If there was only one array of objects (like just ObjectBs) it could be done with a simple join and then handled by my database framework. If the relationships were A(one)....B(many) and B(one)....C(many) it could be done with two joins and work. But for A(one)....B(many) and A(one)....C(many) etc I can't think of a good way to do joins or return this data without having too many rows, as with joins if A has 10 Bs and 10Cs, it'll return 100 rows rather than 20.
So, is the way I'm currently doing it, with 5 different selects, the most efficient (which it seems like its not), or is there a better way of doing it?
Also, If I were to grab a large set of these at once (say, 5000 ObjectAs and all the associated Bs, Cs, Ds, and Es), would there be a way to do it without running a ton of consecutive queries one after the other?
You can try iBatis using N+1 Select Lists
http://ibatis.apache.org/docs/dotnet/datamapper/ch03s05.html
Hth.
There is a huge performance issue with N+1 selects (check https://github.com/balak1986/prime-cache-support-in-ibatis2/blob/master/README). So please don't use it unless there is no other way of achieving this.
Luckily iBatis has groupBy property, that is created exactly to map data for these kind of complex object.
Check the example from http://www.cforcoding.com/2009/06/ibatis-tutorial-aggregation-with.html