Access query speed differs - sql

I have a local access database and in it a query which takes values from a form to populate a drop down menu. The weird (to me) thing is that with most options this query is quick (blink of an eye), but with a few options it's very slow (>10 seconds).
What the query is does is a follows: It populates a dropdown menu to record animals seen at a specific sighting, but only those animals which have not been recorded at that specific sighting yet (to avoid duplicate entries).
SELECT DISTINCT tblAnimals.AnimalID, tblAnimals.Nickname, tblAnimals.Species
FROM tblSightings INNER JOIN (tblAnimals INNER JOIN tblAnimalsatSighting ON tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID) ON tblSightings.SightingID = tblAnimalsatSighting.SightingID
WHERE (((tblAnimals.Species)=[form]![Species]) AND ((tblAnimals.CurrentGroup)=[form]![AnimalGroup2]) AND ((tblAnimals.[Dead?])=False) AND ((Exists (select tblAnimalsatSighting.AnimalID FROM tblAnimalsatSighting WHERE tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID AND tblAnimalsatSighting.SightingID = [form]![SightingID]))=False));
It performs well for all groups of 2 of the 4 possible species, for 1 species it performs well for 4 of the 5 groups, but not for the last group, and for the last species it performs very slowly for both groups. Anybody an idea what can be the cause of this kind of behavior? Is it problems with the query? Or duplicate entries in the tables which can cause this? I don't think it's duplicates in the tables, I've checked that, and there are some, but they appear both for groups where there are problems and where there aren't. Could I re-write the query so it performs faster?

As noted in our comments above, you confirmed that the extra joins were not really need and were in fact going to limit the results to animal that had already had a sighting. Those joins would also likely contribute to a slowdown.
I know that Access probably added most of the parentheses automatically but I've removed them and converted the subquery to a not exists form that's a lot more readable.
SELECT tblAnimals.AnimalID, tblAnimals.Nickname, tblAnimals.Species
FROM tblAnimals
WHERE
tblAnimals.Species = [form]![Species]
AND tblAnimals.CurrentGroup = [form]![AnimalGroup2]
AND tblAnimals.[Dead?] = False
AND NOT EXISTS (
SELECT tblAnimalsatSighting.AnimalID
FROM tblAnimalsatSighting
WHERE
tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID
AND tblAnimalsatSighting.SightingID = [form]![SightingID]
);

Related

How to prevent a filter from causing a query to run much slower

I have the following piece of code which runs quickly (<1s):
SELECT
[Policy].[Value] AS [PolicyId]
,[Person].[Value] AS [PersonId]
,[Person].[Index] AS [PersonIndex]
FROM
[dbo].[View] AS [Policy]
INNER JOIN [dbo].[ViewPerson] AS [Person] WITH(INDEX([Index])) ON ([Policy].[CollectionId] = [Person].[CollectionId]
AND [Person].[Name] = 'PersonId' AND [Policy].[Name] = 'PolicyId')
WHERE
[Policy].[CollectionId] = 10003
-- AND [Policy].[Value] = [Person].[Value]
This will return 2 rows from my database. When I comment out the last line to apply a stronger filter it returns only 1 row from my database, but will take much longer to run (~20s).
Is there a method to reduce the time this query takes to run when a filter is applied to it? Ideally I'd like it to run at the same speed as the original.
You were told in comments, that forcing the engine to use a special index is - in most cases - not the best idea. The engine is pretty good in finding the best plan and it will work best if you let it go its own route.
Secondly you were told already, that the execution plan is the best place to start. As we do not see any details, the following is pure guessing:
If I get this correctly, your query will use CollectionId to filter for one given id (just very few Policy rows). For these rows, the JOIN on a VIEW (we have no idea, what is behind here!) tries to link person rows.
The filter should work against a very reduced set.
Your observations let me assume, that the second line in WHERE is dealing with a much larger set. I'm pretty sure, that the filter for CollectionId=10003 pulls after the other filter... The execution plan will show the details...
What you can do:
Take away the index hint
Try to add the second line in the WHERE with AND to the ON-clause of the JOIN.
Something along this:
SELECT
[Policy].[Value] AS [PolicyId]
,[Person].[Value] AS [PersonId]
,[Person].[Index] AS [PersonIndex]
FROM
[dbo].[View] AS [Policy]
INNER JOIN [dbo].[ViewPerson] AS [Person] ON ([Policy].[CollectionId] = [Person].[CollectionId]
AND [Person].[Name] = 'PersonId'
AND [Policy].[Name] = 'PolicyId'
AND [Policy].[Value] = [Person].[Value])
WHERE
[Policy].[CollectionId] = 10003;

Conditional Distinct

Sorry for the kinda longish introduction, but I want to make clear what I'm trying to do. If someone is able to come up with a more appropriate title, please feel free to edit.
I wrote an SNMP Collector that queries every switch in our data center once per hour and checks which ports are online, and stores the results in a MS SQL 2k12 DB. The motivation was that often admins don't report a discontinued server or some other device and we are running out of switch ports.
The DB schema looks like this (simplified screenshot):
The Interfaces table is a child table to the Crawls (Crawl = Run of the SNMP collector) table as the number of interfaces is not constant for every switch but changes between Crawls, as line cards are inserted or removed.
Now I want to write a query that returns every Interface on every Switch that ALWAYS had an ifOperStatus value of 2 and NEVER had an ifOperStatus of 1.
I wrote a query that has three nested sub-queries, is ugly to read and slow as hell. There sure has to be an easier way.
My approach was to filter the ports that NEVER changed by using
HAVING (COUNT(DISTINCT dbo.Interfaces.ifOperStatus) = 1)
and than inner-joining against a list of ports that had an ifOperStatus of 2 during the last crawl. Ugly, as I said.
So, a sample output from the DB would look like this:
And I'm looking for a query that returns rows 5-7 because ifOperStatus never changed, but DOES NOT return rows 3-4 because ifOperStatus flapped.
How about
HAVING (MIN(dbo.Interfaces.ifOperStatus) = 2 AND MAX(dbo.Interfaces.ifOperStatus) = 2)
MIN and MAX don't require SQL Server to maintain a set of all values seen so far, just the highest/lowest. This may also avoid the need to join "against a list of ports that had an ifOperStatus of 2 during the last crawl".
select
s.Hostname,
s.sysDescr,
i.ifOperStatus,
i.ifAllias,
i.ifIndex,
i.ifDescr
from
interfaces i
join crawl c on c.id = i.crawlId
join switches s on s.id = c.switchId
where
i.ifOperStatus = 2
and not exists
(
select 'x'
from
interfaces ii
join crawl cc on cc.id = ii.crawlId
join switches ss on ss.id = cc.switchId
where
s.id = ss.id
and ii.ifOperStatus = 1
)

Multiple Joins + Lots of Data Optimization

I am working on a massive join at work and have very limited resources in terms of being able to add indexes and such as well as what I can do in the query itself due to the environment (i.e. I can only select data, no variables or table creations allowed). I have read somewhere that a subquery will automatically index the result, is this true? Also for my major join tables (3) each has ~140K rows. I have to join 2 extra tables to ensure filtering is correct. I have the query listed below which I currently have criteria on the JOIN clause. Another question is if I move my criteria to a where clause either in or out of the subquery will it benefit?
SELECT *
FROM (SELECT NULL AS A1,
DFS_ROHEADER.TECHID,
DFS_ROHEADER.RONUMBER,
DFS_ROHEADER.CUSTOMERNUMBER,
DFS_CUSTOMER.BNAME,
DFS_ROHEADER.UNITNUMBER,
DFS_ROHEADER.MILEAGE,
DFS_ROHEADER.OPENEDDATE,
DFS_ROHEADER.CLOSEDDATE,
DFS_ROHEADER.STATUS,
DFS_ROHEADER.PONUMBER,
DFS_TECH.REGION,
DFS_TECH.RSM,
DFS_ROPART.PARTID,
CONVERT(NVARCHAR(max), DFS_RODETAIL.STORY) AS STORY
FROM DFS_ROHEADER
LEFT JOIN DFS_CUSTOMER
ON DFS_ROHEADER.CUSTOMERNUMBER = DFS_CUSTOMER.CUST_NO
LEFT JOIN DFS_TECH
ON DFS_ROHEADER.TECHID = DFS_TECH.TECHID
INNER JOIN DFS_RODETAIL
ON DFS_ROHEADER.RONUMBER = DFS_RODETAIL.RONUMBER
INNER JOIN DFS_ROPART
ON DFS_RODETAIL.RONUMBER = DFS_ROPART.RONUMBER
AND DFS_RODETAIL.LINENUMBER = DFS_ROPART.LINENUMBER
AND DFS_ROHEADER.RONUMBER LIKE '%$FF_RONumber%'
AND DFS_ROHEADER.UNITNUMBER LIKE '%$FF_UnitNumber%'
AND DFS_ROHEADER.PONUMBER LIKE '%$FF_PONumber%'
AND ( DFS_CUSTOMER.BNAME LIKE '%$FF_Customer%'
OR DFS_CUSTOMER.BNAME IS NULL )
AND DFS_ROHEADER.TECHID LIKE '%$FF_TechID%'
AND DFS_ROHEADER.CLOSEDDATE BETWEEN
FF_ClosedBegin AND FF_ClosedEnd
AND ( DFS_TECH.REGION LIKE '%$FilterRegion%'
OR DFS_TECH.REGION IS NULL )
AND ( DFS_TECH.RSM LIKE '%$FF_RSM%'
OR DFS_TECH.RSM IS NULL )
AND DFS_RODETAIL.STORY LIKE '%$FF_Story%'
AND DFS_ROPART.PARTID LIKE '%$FF_PartID%'
WHERE DFS_ROHEADER.DELETED_BY < 0
AND DFS_RODETAIL.DELETED_BY < 0
AND DFS_ROPART.DELETED_BY < 0) T
ORDER BY T.RONUMBER
This query works; however, it can take forever to run, and can timeout. I have other queries that also run in the environment and I will take whatever you can give me in terms of suggestions and apply it to those. I am using SQLServer 2000, Thanks for the help.
EDIT:
Execution Plan:
https://dl.dropboxusercontent.com/u/99733863/ExecutionPlan.sqlplan
UPDATE:
I have come to the conclusion the environment I'm working in is the cause of the problem. My query works as intended and is not slow at all (1 sec. for 18,000 rows). As stated in the comments I have to fill grids with limited flexibility and I believe that these grids fill by first filling a temporary grid with the SQL statement and then copying row by row into the desired grid. There is a good chance that this is the cause of my issues. Thanks for the help.
I have come to the conclusion the environment I'm working in is the cause of the problem. My query works as intended and is not slow at all (1 sec. for 18,000 rows). As stated in the comments I have to fill grids with limited flexibility and I believe that these grids fill by first filling a temporary grid with the SQL statement and then copying row by row into the desired grid. There is a good chance that this is the cause of my issues. Thanks for the help everyone.
My 2 cents here.. In general LIKE is not very well optimized. In your case you also seem to be using LIKE with '%value%'. In that case the query optimizer has to scan the entire index. At a minimum I would see if there is a way to avoid using this.

Filtering simultaneously on count of related objects and on count of related objects that satisfy a condition in Django

So I have models amounting to this (very simplified, obviously):
class Mystery(models.Model):
name = models.CharField(max_length=100)
class Character(models.Model):
mystery = models.ForeignKey(Mystery, related_name="characters")
required = models.BooleanField(default=True)
Basically, in each mystery there are a number of characters, which can be essential to the story or not. The minimum number of actors that can stage a mystery is the number of required characters for that mystery; the maximum number is the number of characters total for the mystery.
Now I'm trying to query for mysteries that can be played by some given number of actors. It seemed straightforward enough using the way Django's filtering and annotation features function; after all, both of these queries work fine:
# Returns mystery objects with at least x characters in all
Mystery.objects.annotate(max_actors=Count('characters', distinct=True)).filter(max_actors__gte=x)
# Returns mystery objects with no more than x required characters
Mystery.objects.filter(characters__required=True).annotate(min_actors=Count('characters', distinct=True)).filter(min_actors__lte=x)
However, when I try to combine the two...
Mystery.objects.annotate(max_actors=Count('characters', distinct=True)).filter(characters__required=True).annotate(min_actors=Count('characters', distinct=True)).filter(min_actors__lte=x, max_actors__gte=x)
...it doesn't work. Both min_actors and max_actors come out containing the maximum number of actors. The relevant parts of the actual query being run look like this:
SELECT `mysteries_mystery`.`id`,
`mysteries_mystery`.`name`,
COUNT(DISTINCT `mysteries_character`.`id`) AS `max_actors`,
COUNT(DISTINCT `mysteries_character`.`id`) AS `min_actors`
FROM `mysteries_mystery`
LEFT OUTER JOIN `mysteries_character` ON (`mysteries_mystery`.`id` = `mysteries_character`.`mystery_id`)
INNER JOIN `mysteries_character` T5 ON (`mysteries_mystery`.`id` = T5.`mystery_id`)
WHERE T5.`required` = True
GROUP BY `mysteries_mystery`.`id`, `mysteries_mystery`.`name`
...which makes it clear that while Django is creating a second join on the character table just fine (the second copy of the table being aliased to T5), that table isn't actually being used anywhere and both of the counts are being selected from the non-aliased version, which obviously yields the same result both times.
Even when I try to use an extra clause to select from T5, I get told there is no such table as T5, even as examining the output query shows that it's still aliasing the second character table to T5. Another attempt to do this with extra clauses went like this:
Mystery.objects.annotate(max_actors=Count('characters', distinct=True)).extra(select={'min_actors': "SELECT COUNT(*) FROM mysteries_character WHERE required = True AND mystery_id = mysteries_mystery.id"}).extra(where=["`min_actors` <= %s", "`max_actors` >= %s"], params=[x, x])
But that didn't work because I can't use a calculated field in the WHERE clause, at least on MySQL. If only I could use HAVING, but alas, Django's .extra() does not and will never allow you to set HAVING parameters.
Is there any way to get Django's ORM to do what I want?
How about combining your Count()s:
Mystery.objects.annotate(max_actors=Count('characters', distinct=True),min_actors=Count('characters', distinct=True)).filter(characters__required=True).filter(min_actors__lte=x, max_actors__gte=x)
This seems to work for me but I didn't test it with your exact models.
It's been a couple of weeks with no suggested solutions, so here's how I ended up going about it, for anyone else who might be looking for an answer:
Mystery.objects.annotate(max_actors=Count('characters', distinct=True)).filter(max_actors__gte=x, id__in=Mystery.objects.filter(characters__required=True).annotate(min_actors=Count('characters', distinct=True)).filter(min_actors__lte=x).values('id'))
In other words, filter on the first count and on IDs that match those in an explicit subquery that filters on the second count. Kind of clunky, but it works well enough for my purposes.

Rails SQL query optimization

I have a featured section in my website that contains featured posts of three types: normal, big and small. Currently I am fetching the three types in three separate queries like so:
#featured_big_first = Post.visible.where(pinged: 1).where('overlay_type =?', :big).limit(1)
#featured_big_first = Post.visible.where(pinged: 1).where('overlay_type =?', :small).limit(1)
#featured_big_first = Post.visible.where(pinged: 1).where('overlay_type =?', :normal).limit(5)
Basically I am looking for a query that will combine those three in to one and fetch 1 big, 1 small, 5 normal posts.
I'd be surprised if you don't want an order. As you have it, it is supposed to find a random small, random large, and 5 random normal.
Yes, you can use a UNION. However, you will have to do an execute SQL. Look at the log for the SQL for each of your three queries, and do an execute SQL of a string which is each of the three queries with UNION in between. It might work, or it might have problems with the limit.
It is possible in SQL by joining the table to itself, doing a group by on one of the aliases for the table, a where when the other aliased table is <= the group by table, and adding a having clause where count of the <= table is under the limit.
So, if you had a simple query of the posts table (without the visible and pinged conditions) and wanted the records with the latest created_at date, then the normal query would be:
SELECT posts1.*
FROM posts posts1, posts posts2
WHERE posts2.created_at >= posts1.create_at
AND posts1.overlay_type = 'normal'
AND posts2.overlay_type = 'normal'
GROUP BY posts1.id
HAVING count(posts2.id) <= 5
Take this SQL, and add your conditions for visible and pinged, remembering to use the condition for both posts1 and posts2.
Then write the big and small versions and UNION it all together.
I'd stick with the three database calls.
I don't think this is possible but you can use scope which is more rails way to write a code
Also it may just typo but you are reassigning the #featured_big_first so it will contain the data of the last query only
in post.rb
scope :overlay_type_record lamda{|type| joins(:visible).where(["visible.pinged=1 AND visible.overlay_type =", type])}
and in controller
#featured_big_first = Post.overlay_type_record(:big).limit(1)
#featured_small_first = Post.overlay_type_record(:small).limit(1)
#featured_normal_first = Post.overlay_type_record(:normal).limit(5)