Fastest way to count from a subquery - sql

I have the following query to return a list of current employees and the number of 'corrections' they have. This is working correctly but is very slow.
I was previously not using a subquery, instead opting for a count (from...) as an aggregate subselect but I have read that a subquery should be much faster. Changing to the code to the below did improve performance but not anywhere near what I was expecting.
SELECT DISTINCT
tblStaff.StaffID, CorrectionsOut.Count AS CorrectionsAssigned
FROM tblStaff
LEFT JOIN tblMeetings ON tblMeetings.StaffID = tblStaff.StaffID
JOIN tblTasks ON tblTasks.TaskID = tblMeetings.TaskID
--Get Corrections Issued
LEFT JOIN(
SELECT
COUNT(DISTINCT tblMeetings.TaskID) AS Count, tblMeetings.StaffID
FROM tblRegister
JOIN tblMeetings ON tblRegister.MeetingID = tblMeetings.MeetingID
WHERE tblRegister.FDescription IS NOT NULL
AND tblRegister.CorrectionOutDate IS NULL
GROUP BY tblMeetings.StaffID
) AS CorrectionsOut ON CorrectionsOut.StaffID = tblStaff.StaffID
WHERE tblStaff.CurrentEmployee = 1
I need an open vendor solution as we are transitioning from SQL Server to Postgres. Note this is a simplified example of the query where there are quite few counts. My current query time without the counts is less than half a second, but with the counts, is approx 20 seconds, if it runs at all without locking or otherwise failing.

I would get rid of the joins that you are not using which probably makes the SELECT DISTINCT unnecessary as well:
SELECT s.StaffID, co.Count AS CorrectionsAssigned
FROM tblStaff s LEFT JOIN
(SELECT COUNT(DISTINCT m.TaskID) AS Count, m.StaffID
FROM tblRegister r
tblMeetings m
ON r.MeetingID = m.MeetingID
WHERE r.FDescription IS NOT NULL AND
r.CorrectionOutDate IS NULL
GROUP BY m.StaffID
) co
ON co.StaffID = s.StaffID
WHERE s.CurrentEmployee = 1;
Getting rid of the SELECT DISTINCT and the duplicate rows added by the tasks should help performance.
For additional benefit, you would want to be sure you have indexes on the JOIN keys, and perhaps on the filtering criteria.

Related

Select statement for 1 table returns new rows then the table actually have

Update: the issue was in saving results into a different table. Apologies, this question should be deleted.
I got this query:
SELECT DISTINCT
SubscriberKey,
'True' as Email_Opens
FROM LN_Journey_21
WHERE SubscriberKey in(
SELECT
LN.SubscriberKey
FROM
_Job J
join _Open O on J.JobID = O.JobID
join LN_Journey_21 LN on LN.SubscriberKey = O.SubscriberKey
WHERE
J.EmailName LIKE 'IQOS_LN%'
and j.CreatedDate >= '2021-05-10'
)
SubscriberKey is a PK in LN_Journey_21.
The results have more rows than LN_Journey_21 had before running the query, how is that?
The query should be (most importantly you don't need DISTINCT anywhere):
SELECT SubscriberKey,
'True' as Email_Opens
FROM dbo.LN_Journey_21 AS LN
WHERE EXISTS
(
SELECT 1
FROM dbo._Job AS J
INNER JOIN dbo._Open AS O
ON J.JobID = O.JobID
WHERE O.SubscriberKey = LN.SubscriberKey
AND J.EmailName LIKE 'IQOS_LN%'
AND J.CreatedDate >= '20210510'
);
Extra rows could be explained by:
different query than what's posted in your question
COUNT query is more complex than just SELECT COUNT(*) FROM dbo.table;
data has actually changed between when you ran the COUNT query and when you ran this query
using NOLOCK (perhaps you've hidden it from us, or it's used on your COUNT query, or both)
you are relying on the status bar in SSMS, which shows total rows for the batch by default, and you other queries that return those additional 500 rows
Like the comments suggest, it would be great if you could show a scenario (e.g. on db<>fiddle where COUNT produces fewer rows than this query. With the information we have so far, it's not possible, except for situations like those I mentioned above (that list may not be exhaustive, but probably the most common).

SELECT FROM inner query slowdown

We have two very similar queries, one takes 22 seconds the other takes 6 seconds. Both use an inner select, have the exact same outer columns and outer joins. The only difference is the inner select that the outer query is using to join in on.
The inner query when run alone executes in 100ms or less in both cases and returns the EXACT SAME data.
Both queries as a whole have a lot of room for improvement, but this particular oddity is really puzzling to us and we just want to understand why. To me it would seem the inner query should be executed once in 100ms then the outer stuff happens. I have a feeling the inner select may be executed multiple times.
Query that takes 6 seconds:
SELECT {whole bunch of column names}
FROM (
SELECT projectItems.* FROM projectItems
WHERE projectItems.isActive = 1
ORDER BY projectItemsID ASC
OFFSET 0 ROWS FETCH NEXT 1 ROWS ONLY
) projectItems
LEFT JOIN categories
ON projectItems.fk_category = categories.categoryID
...{more joins}
Query that takes 22 seconds:
SELECT {whole bunch of column names}
FROM (
SELECT projectItems.* FROM projectItems
WHERE projectItems.isActive = 1
AND projectItemsID = 6539
) projectItems
LEFT JOIN categories
ON projectItems.fk_category = categories.categoryID
...{more joins}
For every row in your projectItems table, in the second function, you search two columns instead of one. If projectItemsID isn't the primary key or if it isn't indexed, it takes longer to parse an extra column.'
If you look at the sizes of the tables and the number of rows each query returns, you can calculate how many comparisons need to be made for each of the queries.
I believe that you're right that the inner query is being run for every single row that is being left joined with categories.
I can't find a proper source on it right now, but you can easily test this by doing something like this and comparing the run times. Here, we can at least be sure that the inner query is only running one time. (sorry if any syntax is incorrect, but you'll get the general idea):
DECLARE #innerQuery TABLE ( [all inner query columns here] )
INSERT INTO #innerQuery
SELECT projectItems.* FROM projectItems
WHERE projectItems.isActive = 1
AND projectItemsID = 6539
SELECT {whole bunch of field names}
FROM #innerQuery as IQ
LEFT JOIN categories
ON IQ.fk_category = categories.categoryID
...{more joins}

SQL Performance: SELECT DISTINCT versus GROUP BY

I have been trying to improve query times for an existing Oracle database-driven application that has been running a little sluggish. The application executes several large queries, such as the one below, which can take over an hour to run. Replacing the DISTINCT with a GROUP BY clause in the query below shrank execution time from 100 minutes to 10 seconds. My understanding was that SELECT DISTINCT and GROUP BY operated in pretty much the same way. Why such a huge disparity between execution times? What is the difference in how the query is executed on the back-end? Is there ever a situation where SELECT DISTINCT runs faster?
Note: In the following query, WHERE TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A' represents just one of a number of ways that results can be filtered. This example was provided to show the reasoning for joining all of the tables that do not have columns included in the SELECT and would result in about a tenth of all available data
SQL using DISTINCT:
SELECT DISTINCT
ITEMS.ITEM_ID,
ITEMS.ITEM_CODE,
ITEMS.ITEMTYPE,
ITEM_TRANSACTIONS.STATUS,
(SELECT COUNT(PKID)
FROM ITEM_PARENTS
WHERE PARENT_ITEM_ID = ITEMS.ITEM_ID
) AS CHILD_COUNT
FROM
ITEMS
INNER JOIN ITEM_TRANSACTIONS
ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID
AND ITEM_TRANSACTIONS.FLAG = 1
LEFT OUTER JOIN ITEM_METADATA
ON ITEMS.ITEM_ID = ITEM_METADATA.ITEM_ID
LEFT OUTER JOIN JOB_INVENTORY
ON ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID
LEFT OUTER JOIN JOB_TASK_INVENTORY
ON JOB_INVENTORY.JOB_ITEM_ID = JOB_TASK_INVENTORY.JOB_ITEM_ID
LEFT OUTER JOIN JOB_TASKS
ON JOB_TASK_INVENTORY.TASKID = JOB_TASKS.TASKID
LEFT OUTER JOIN JOBS
ON JOB_TASKS.JOB_ID = JOBS.JOB_ID
LEFT OUTER JOIN TASK_INVENTORY_STEP
ON JOB_INVENTORY.JOB_ITEM_ID = TASK_INVENTORY_STEP.JOB_ITEM_ID
LEFT OUTER JOIN TASK_STEP_INFORMATION
ON TASK_INVENTORY_STEP.JOB_ITEM_ID = TASK_STEP_INFORMATION.JOB_ITEM_ID
WHERE
TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A'
ORDER BY
ITEMS.ITEM_CODE
SQL using GROUP BY:
SELECT
ITEMS.ITEM_ID,
ITEMS.ITEM_CODE,
ITEMS.ITEMTYPE,
ITEM_TRANSACTIONS.STATUS,
(SELECT COUNT(PKID)
FROM ITEM_PARENTS
WHERE PARENT_ITEM_ID = ITEMS.ITEM_ID
) AS CHILD_COUNT
FROM
ITEMS
INNER JOIN ITEM_TRANSACTIONS
ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID
AND ITEM_TRANSACTIONS.FLAG = 1
LEFT OUTER JOIN ITEM_METADATA
ON ITEMS.ITEM_ID = ITEM_METADATA.ITEM_ID
LEFT OUTER JOIN JOB_INVENTORY
ON ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID
LEFT OUTER JOIN JOB_TASK_INVENTORY
ON JOB_INVENTORY.JOB_ITEM_ID = JOB_TASK_INVENTORY.JOB_ITEM_ID
LEFT OUTER JOIN JOB_TASKS
ON JOB_TASK_INVENTORY.TASKID = JOB_TASKS.TASKID
LEFT OUTER JOIN JOBS
ON JOB_TASKS.JOB_ID = JOBS.JOB_ID
LEFT OUTER JOIN TASK_INVENTORY_STEP
ON JOB_INVENTORY.JOB_ITEM_ID = TASK_INVENTORY_STEP.JOB_ITEM_ID
LEFT OUTER JOIN TASK_STEP_INFORMATION
ON TASK_INVENTORY_STEP.JOB_ITEM_ID = TASK_STEP_INFORMATION.JOB_ITEM_ID
WHERE
TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A'
GROUP BY
ITEMS.ITEM_ID,
ITEMS.ITEM_CODE,
ITEMS.ITEMTYPE,
ITEM_TRANSACTIONS.STATUS
ORDER BY
ITEMS.ITEM_CODE
Here is the Oracle query plan for the query using DISTINCT:
Here is the Oracle query plan for the query using GROUP BY:
The performance difference is probably due to the execution of the subquery in the SELECT clause. I am guessing that it is re-executing this query for every row before the distinct. For the group by, it would execute once after the group by.
Try replacing it with a join, instead:
select . . .,
parentcnt
from . . . left outer join
(SELECT PARENT_ITEM_ID, COUNT(PKID) as parentcnt
FROM ITEM_PARENTS
) p
on items.item_id = p.parent_item_id
I'm fairly sure that GROUP BY and DISTINCT have roughly the same execution plan.
The difference here since we have to guess (since we don't have the explain plans) is IMO that the inline subquery gets executed AFTER the GROUP BY but BEFORE the DISTINCT.
So if your query returns 1M rows and gets aggregated to 1k rows:
The GROUP BY query would have run the subquery 1000 times,
Whereas the DISTINCT query would have run the subquery 1000000 times.
The tkprof explain plan would help demonstrate this hypothesis.
While we're discussing this, I think it's important to note that the way the query is written is misleading both to the reader and to the optimizer: you obviously want to find all rows from item/item_transactions that have a TASK_INVENTORY_STEP.STEP_TYPE with a value of "TYPE A".
IMO your query would have a better plan and would be more easily readable if written like this:
SELECT ITEMS.ITEM_ID,
ITEMS.ITEM_CODE,
ITEMS.ITEMTYPE,
ITEM_TRANSACTIONS.STATUS,
(SELECT COUNT(PKID)
FROM ITEM_PARENTS
WHERE PARENT_ITEM_ID = ITEMS.ITEM_ID) AS CHILD_COUNT
FROM ITEMS
JOIN ITEM_TRANSACTIONS
ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID
AND ITEM_TRANSACTIONS.FLAG = 1
WHERE EXISTS (SELECT NULL
FROM JOB_INVENTORY
JOIN TASK_INVENTORY_STEP
ON JOB_INVENTORY.JOB_ITEM_ID=TASK_INVENTORY_STEP.JOB_ITEM_ID
WHERE TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A'
AND ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID)
In many cases, a DISTINCT can be a sign that the query is not written properly (because a good query shouldn't return duplicates).
Note also that 4 tables are not used in your original select.
The first thing that should be noted is the use of Distinct indicates a code smell, aka anti-pattern. It generally means that there is a missing join or an extra join that is generating duplicate data. Looking at your query above, I am guessing that the reason why group by is faster (without seeing the query), is that the location of the group by reduces the number of records that end up being returned. Whereas distinct is blowing out the result set and doing row by row comparisons.
Update to approach
Sorry, I should have been more clear. Records are generated when
users perform certain tasks in the system, so there is no schedule. A
user could generate a single record in a day or hundreds per-hour. The
important things is that each time a user runs a search, up-to-date
records must be returned, which makes me doubtful that a materialized
view would work here, especially if the query populating it would take
long to run.
I do believe this is the exact reason to use a materialized view. So the process would work this way. You take the long running query as the piece that builds out your materialized view, since we know the user only cares about "new" data after they perform some arbitrary task in the system. So what you want to do is query against this base materialized view, which can be refreshed constantly on the back-end, the persistence strategy involved should not choke out the materialized view (persisting a few hundred records at a time won't crush anything). What this will allow is Oracle to grab a read lock (note we don't care how many sources read our data, we only care about writers). In the worst case a user will have "stale" data for microseconds, so unless this is a financial trading system on Wall Street or a system for a nuclear reactor, these "blips" should go unnoticed by even the most eagle eyed users.
Code example of how to do this:
create materialized view dept_mv FOR UPDATE as select * from dept;
Now the key to this is as long as you don' t invoke refresh you won't lose any of the persisted data. It will be up to you to determine when you want to "base line" your materialized view again (midnight perhaps?)
You should use GROUP BY to apply aggregate operators to each group and DISTINCT if you only need to remove duplicates.
I think the performance is the same.
In your case i think you should use GROUP BY.

Is there a way to get different results for the same SQL query if the data stays the same?

I get a different result set for this query intermittently when I run it...sometimes it gives 1363, sometimes 1365 and sometimes 1366 results. The data doesn't change. What could be causing this and is there a way to prevent it? Query looks something like this:
SELECT *
FROM
(
SELECT
RC.UserGroupId,
RC.UserGroup,
RC.ClientId AS CLID,
CASE WHEN T1.MultipleClients = 1 THEN RC.Salutation1 ELSE RC.DisplayName1 END AS szDisplayName,
T1.MultipleClients,
RC.IsPrimaryRecord,
RC.RecordTypeId,
RC.ClientTypeId,
RC.ClientType,
RC.IsDeleted,
RC.IsCompany,
RC.KnownAs,
RC.Salutation1,
RC.FirstName,
RC.Surname,
Relationship,
C.DisplayName Client,
RC.DisplayName RelatedClient,
E.Email,
RC.DisplayName + ' is the ' + R.Relationship + ' of ' + C.DisplayName Description,
ROW_NUMBER() OVER (PARTITION BY E.Email ORDER BY Relationship DESC) AS sequence_id
FROM
SSDS.Client.ClientExtended C
INNER JOIN
SSDS.Client.ClientRelationship R WITH (NOLOCK)ON C.ClientId = R.ClientID
INNER JOIN
SSDS.Client.ClientExtended RC WITH (NOLOCK)ON R.RelatedClientId = RC.ClientId
LEFT OUTER JOIN
SSDS.Client.Email E WITH (NOLOCK)ON RC.ClientId = E.ClientId
LEFT OUTER JOIN
SSDS.Client.UserDefinedData UD WITH (NOLOCK)ON C.ClientId = UD.ClientId AND C.UserGroupId = UD.UserGroupId
INNER JOIN
(
SELECT
E.Email,
CASE WHEN (COUNT(DISTINCT RC.DisplayName) > 1) THEN 1 ELSE 0 END AS MultipleClients
FROM
SSDS.Client.ClientExtended C
INNER JOIN
SSDS.Client.ClientRelationship R WITH (NOLOCK)ON C.ClientId = R.ClientID
INNER JOIN
SSDS.Client.ClientExtended RC WITH (NOLOCK)ON R.RelatedClientId = RC.ClientId
LEFT OUTER JOIN
SSDS.Client.Email E WITH (NOLOCK)ON RC.ClientId = E.ClientId
LEFT OUTER JOIN
SSDS.Client.UserDefinedData UD WITH (NOLOCK)ON C.ClientId = UD.ClientId AND C.UserGroupId = UD.UserGroupId
WHERE
Relationship IN ('z-Group Principle', 'z-Group Member ')
AND E.Email IS NOT NULL
GROUP BY E.Email
) T1 ON E.Email = T1.Email
WHERE
Relationship IN ('z-Group Principle', 'z-Group Member ')
AND E.Email IS NOT NULL
) T
WHERE
sequence_id = 1
AND T.UserGroupId IN (Select * from iCentral.dbo.GetSubUserGroups('471b9cbd-2312-4a8a-bb20-35ea53d30340',0))
AND T.IsDeleted = 0
AND T.RecordTypeId = 1
AND T.ClientTypeId IN
(
'1', --Client
'-1652203805' --NTU
)
AND T.CLID NOT IN
(
SELECT DISTINCT
UDDF.CLID
FROM
SLacsis_SLM.dbo.T_UserDef UD WITH (NOLOCK)
INNER JOIN
SLacsis_SLM.dbo.T_UserDefData UDDF WITH (NOLOCK)
ON UD.UserDef_ID = UDDF.UserDef_ID
INNER JOIN
SLacsis_SLM.dbo.T_Client CLL WITH (NOLOCK)
ON CLL.CLID = UDDF.CLID AND CLL.UserGroup_CLID = UD.UserID
WHERE
UD.UserDef_ID in
(
'F68F31CE-525B-4455-9D50-6DA77C66FEE5',
'A7CECB03-866C-4F1F-9E1A-CEB09474FE47'
)
AND UDDF.Data = 'NO'
)
ORDER BY T.Surname
EDIT:
I have removed all NOLOCK's (including the ones in views and UDFs) and I'm still having the same issue. I get the same results every time for the nested select (T) and if I put the result set of T into a temp table in the beginning of the query and join onto the temp table instead of the nested select then the final result set is the same every time I run the query.
EDIT2:
I have been doing some more reading on ROW_NUMBER()...I'm partitioning by email (of which there are duplicates) and ordering by Relationship (where there are only 1 of 2 relationships). Could this cause the query to be non-deterministic and would there be a way to fix that?
EDIT3:
Here are the actual execution plans if anyone is interested http://www.mediafire.com/?qo5gkh5dftxf0ml. Is it possible to see that it is running as read committed from the execution plan? I've compared the files using WinMerge and the only differences seem to be the counts (ActualRows="").
EDIT4:
This works:
SELECT * FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY B.Email ORDER BY Relationship DESC) AS sequence_id
FROM
(
SELECT DISTINCT
RC.UserGroupId,
...
) B...
EDIT5:
When running the same ROW_NUMBER() query (T in the original question, just selecting RC.DisplayName and ROW_NUMBER) twice in a row I get different rank for some people:
Does anyone have a good explanation/example of why or how ROW_NUMBER() over a result set that contains duplicates can rank differently each time it is run and ultimately change the number of results?
EDIT6:
Ok I think this makes sense to me now. This occurs when people 2 people have the same email address (e.g a husband and wife pair) and relationship. I guess in this case their ROW_NUMBER() ranking is arbitrary and can change every time it is run.
Your use of NOLOCK all over means you are doing dirty reads and will see uncommitted data, data that will be rolled back, transient and inconsistent data etc
Take these off, try again, report back pleas
Edit: some options with NOLOCKS removed
Data is really changing
Some parameter or filter is changing (eg GETDATE)
Some float comparisons running on different cores each time
See this on dba.se https://dba.stackexchange.com/q/4810/630
Embedded NOLOCKs in udfs or views (eg iCentral.dbo.GetSubUserGroups)
...
As I said yesterday in the comments the row numbering for rows with duplicate E.Email, Relationship values will be arbitrary.
To make it deterministic you would need to do PARTITION BY B.Email ORDER BY Relationship DESC, SomeUniqueColumn . Interesting that it changes between runs though using the same execution plan. I assume this is a consequence of the hash join.
I think your problem is the first row over the partition is not deterministic. I suspect that Email and Relationship is not unique.
ROW_NUMBER() OVER (PARTITION BY E.Email ORDER BY Relationship DESC) AS sequence_id
Later you examine the first row of the partition.
WHERE T.sequence_id = 1
AND T.UserGroupId ...
If that first row is arbitrary then you are going to get an arbitrary where comparison. You need to add to the ORDER BY to include a complete unique key. If there is no unique key then you need to make one or live with arbitrary results. Even on a table with a clustered PK the select row order is not guaranteed unless the entire PK is in the sort clause.
This probably has to do with ordering. You have a sequence_id defined as a row_number ordered by Relationship. You'll always get a sensible order by relationship, but other than that your row_number will be random. So you can get different rows with sequence_id 1 each time. That in turn will affect your where clause, and you can get different numbers of results. To fix this to get a consistent result, add another field to your row_number's order by. Use a primary key to be certain of consistent results.
There's a recent KB that addresses problems with ROW_NUMBER() ... see FIX: You receive an incorrect result when you run a query that uses the row_number function in SQL Server 2008 for the details.
However this KB indicates that it's a problem when parallelism is invoked for execution, and looking at your execution plans I can't see this kicking in. But the fact that MS have found a problem with it in one situation makes me a little bit wary - i.e., could the same issue occur for a sufficiently complicated query (and your execution plan does look sufficiently large).
So it may be worth checking your patch levels of SQL Server 2008.
U Must only use
Order by
without prtition by.
ROW_NUMBER() OVER (ORDER BY Relationship DESC) AS sequence_id

optimize SQL query

What more can I do to optimize this query?
SELECT * FROM
(SELECT `item`.itemID, COUNT(`votes`.itemID) AS `votes`,
`item`.title, `item`.itemTypeID, `item`.
submitDate, `item`.deleted, `item`.ItemCat,
`item`.counter, `item`.userID, `users`.name,
TIMESTAMPDIFF(minute,`submitDate`,NOW()) AS 'timeMin' ,
`myItems`.userID as userIDFav, `myItems`.deleted as myDeleted
FROM (votes `votes` RIGHT OUTER JOIN item `item`
ON (`votes`.itemID = `item`.itemID))
INNER JOIN
users `users`
ON (`users`.userID = `item`.userID)
LEFT OUTER JOIN
myItems `myItems`
ON (`myItems`.itemID = `item`.itemID)
WHERE (`item`.deleted = 0)
GROUP BY `item`.itemID,
`votes`.itemID,
`item`.title,
`item`.itemTypeID,
`item`.submitDate,
`item`.deleted,
`item`.ItemCat,
`item`.counter,
`item`.userID,
`users`.name,
`myItems`.deleted,
`myItems`.userID
ORDER BY `item`.itemID DESC) as myTable
where myTable.userIDFav = 3 or myTable.userIDFav is null
limit 0, 20
I'm using MySQL
Thanks
What does the analyzer say for this query? Without knowledge about how many rows there are in the table you cant tell any optimization. So run the analyzer and you'll see what parts costs what.
Of course, as #theomega said, look at the execution plan.
But I'd also suggest to try and "clean up" your statement. (I don't know which one is faster - that depends on your table sizes.) Usually, I'd try to start with a clean statement and start optimizing from there. But typically, a clean statement makes it easier for the optimizer to come up with a good execution plan.
So here are some observations about your statement that might make things slow:
a couple of outer joins (makes it hard for the optimzer to figure out an index to use)
a group by
a lot of columns to group by
As far as I understand your SQL, this statement should do most of what yours is doing:
SELECT `item`.itemID, `item`.title, `item`.itemTypeID, `item`.
submitDate, `item`.deleted, `item`.ItemCat,
`item`.counter, `item`.userID, `users`.name,
TIMESTAMPDIFF(minute,`submitDate`,NOW()) AS 'timeMin'
FROM (item `item` INNER JOIN users `users`
ON (`users`.userID = `item`.userID)
WHERE
Of course, this misses the info from the tables you outer joined, I'd suggest to try to add the required columns via a subselect:
SELECT `item`.itemID,
(SELECT count (itemID)
FROM votes v
WHERE v.itemID = 'item'.itemID) as 'votes', <etc.>
This way, you can get rid of one outer join and the group by. The outer join is replaced by the subselect, so there is a trade-off which may be bad for the "cleaner" statement.
Depending on the cardinality between item and myItems, you can do the same or you'd have to stick with the outer join (but no need to reintroduce the group by).
Hope this helps.
Some quick semi-random thoughts:
Are your itemID and userID columns indexed?
What happens if you add "EXPLAIN " to the start of the query and run it? Does it use indexes? Are they sensible?
DO you need to run the whole inner query and filter on it, or could you put move the where myTable.userIDFav = 3 or myTable.userIDFav is null part into the inner query?
You do seem to have too many fields in the Group By list, since one of them is itemID, I suspect that you could use an inner SELECT to preform the grouping and an outer SELECT to return the set of fields desired.
Can't you add the where clause myTable.userIDFav = 3 or myTable.userIDFav is null to WHERE (item.deleted = 0)?
Regards
Lieven
Look at the way your query is built. You join a lot of stuff, then limit the output to 20 rows. You should have the outer join on items and myitems, since your conditions only apply to these two tables, limit the output to the first 20 rows, then join and aggregate. Here you are performing a lot of work that is going to be discarded.