Avoiding a problematic nested query in MySQL - sql

I have this SQL query which due to my own lack of knowledge and problem with mysql handling nested queries, is really slow to process. The query is...
SELECT DISTINCT PrintJobs.UserName
FROM PrintJobs
LEFT JOIN Printers
ON PrintJobs.PrinterName = Printers.PrinterName
WHERE Printers.PrinterGroup
IN (
SELECT DISTINCT Printers.PrinterGroup
FROM PrintJobs
LEFT JOIN Printers
ON PrintJobs.PrinterName = Printers.PrinterName
WHERE PrintJobs.UserName='<username/>'
);
I would like to avoid splitting this into two queries and inserting the values of the subquery into the main query progamatically.

This is probably not exactly what you are looking for however, i will contribute my 2 cents. First off you should show us your schema and exactly what you are trying to accomplish with that query. However from the looks of it you are not using numeric IDs in the table and are instead using varchar fields to join tables, this is not really a good idea performance wise. Also i am not sure why you are doing:
(select PrinterName, UserName
from PrintJobs) AS Table1
instead of just joining on PrintJobs? Similar stuff for this one:
(select
PrinterName,
PrinterGroup
from Printers) as Table1
Maybe i am just not seeing it right. I would recommend that you simplify the query as much as possible and try it. Also tell us what exactly you are hoping to accomplish with the query and give us some schema to work with.
Removed the bad query from the answer.

This query you have is pretty messed up, not sure if this will handle everything you need but simplifying like this kills all the nested queries and it way faster. You can also use the EXPLAIN command to know how mysql will fetch your query.
SELECT DISTINCT PrintJobs.UserName
FROM PrintJobs
LEFT JOIN Printers ON PrintJobs.PrinterName = Printers.PrinterName
AND Printers.Username = '<username/>'
;

Related

SQL query efficiency of subquery in select vs inner join

I have a query with the following structure:
SELECT
Id,
(SELECT COUNT(1) AS [A1]
FROM [dbo].Table2 AS [Extent4]
WHERE (Table1.Id = [Extent4].Id2)) AS [C1]
FROM TPO_User
This query structure is usually used by LINQ as opposed to the following structure:
SELECT Id
FROM Table1
LEFT OUTER JOIN
(SELECT COUNT(1) AS [A1], [Extent4].Id2
FROM [dbo].Table2 AS [Extent4]
GROUP BY [Extent4].Id2) AS [C1] ON C1.Id2 = Table1.Id
When I compare them, the second query has a shorter duration. Could someone explain the exact difference in execution of such a query?
And is it worth it to ever have a subquery in your select statement instead of an inner join?
I would expect both queries to have similar performance characteristics. When doing performance comparisons, you have to be sure you do them correctly. For instance, running two queries in a row is not a good comparison, because the table data has been loaded in to memory.
To really compare the queries, you need a quiescent server and cold caches. That said, the execution plan can be a big help in understanding what is happening.
I would expect the correlated subquery to have good performance with the right indexes. For your example, you want an index on Table2(Id2).
Which has better performance in general? Well, it is simple to devise scenarios where the correlated subquery is better. For instance, if TPO_User has 1 row and Table2 has 1,000,000 rows, then the correlated subquery will be better under almost any circumstances.
In my understanding:
the FROM clause is the definition of the target.
the SELECT clause is the projection (line-by-line) definition.
So the FROM clause load the data you need in memory and after that the projection is made on each line of your select statement.
So if you do a query (or call a function...) in the SELECT clause, you say that you want this sub-job to be done for each line of your projection. Seems quite heavy ;)
A little source about the running order of an SQL request : https://www.periscopedata.com/blog/sql-query-order-of-operations
Hope this helps (and do not hesitate people to correct me if I am wrong)
(And if I remember well there is now an automatic feature to optimize queries in sql server. I think it will do the correction by itself, should it not?)

What's the best way to amalgamate the 2 queries below

I wrote the query below as part of a larger query to create a table. As I'm new to SQL, this was done in a very step-by-step manner so that I could easily understand the individual steps in the query and what each part was doing.
However, I've now been tasked to make the below 2 parts of the query more efficient by joining them together, and this is where I'm struggling.
I feel like I should be creating a single table rather than 2 and that the single table should contain all of the columns/values that I require. However, I am not at all sure of the syntax required to make this happen or the order in which I need to re-write the query.
Is anyone able to offer any advice?
Many thanks
sys_type as (select nvl(dw_start_date,sysdate) date_updated, id, descr
from scd2.scd2_table_a
inner join year_month_period
on 1=1
WHERE batch_end_date BETWEEN dw_start_date and NVL(dw_end_date,sysdate)),
sys_type_2 as (select -1 as sys_typ_id,
'Unassigned' as sys_typ_desc,
sysdate as date_updated
from dual
union
select id as sys_typ_id, descr as sys_typ_desc, date_updated
from sys_type),
Assuming you are using Oracle database, the queries above seem fine. I don't think you can make them more efficient just by 'joining' them (joining defined very loosely here. Is there a performance issue?
I think you can get better results by tuning your first inline query 'sys_type'.
You have a cartesian product there. Do you need that? Why don't you put the condition in the where clause as the join clause?
Basically
sys_type as (select nvl(dw_start_date,sysdate) date_updated, id, descr
from scd2.scd2_table_a
inner join year_month_period
on (batch_end_date BETWEEN dw_start_date and NVL(dw_end_date,sysdate)))

Creating view ,SQL Query performance

I am trying to create view, But select statement from this view is taking more than 15 secs.How can i make it faster. My query for the view is below.
create view Summary as
select distinct A.Process_date,A.SN,A.New,A.Processing,
COUNT(case when B.type='Sold' and A.status='Processing' then 1 end) as Sold,
COUNT(case when B.type='Repaired' and A.status='Processing' then 1 end) as Repaired,
COUNT(case when B.type='Returned' and A.status='Processing' then 1 end) as Returned
from
(select distinct M.Process_date,M.SN,max(P.enter_date) as enter_date,M.status,
COUNT(case when M.status='New' then 1 end) as New,
COUNT(case when M.status='Processing' and P.cn is null then 1 end) as Processing
from DB1.dbo.Item_details M
left outer join DB2.dbo.track_data P on M.SN=P.SN
group by M.Process_date,M.SN,M.status) A
left outer join DB2.dbo.track_data B on A.SN=B.SN
where A.enter_date=B.enter_date or A.enter_date is null
group by A.Process_date,A.New,A.Processing,A.SN
After this view..my select query is
select distinct process_date,sum(New),sum(Processing),sum(sold),sum(repaired),sum(returned) from Summary where month(process_date)=03 and year(process_date)=2011
Please suggest me on what changes to be made for the query to perform faster.
Thank you
ARB
It is hard to give advices without seeing the actual data and the structure of the tables. I would rewrite the query keeping in mind these principles:
Use inner join instead of outer join if possible.
Get rid of case operator inside COUNT function. Build a query so you use conditions in WHERE section not in COUNT.
Try to not use aggregated values in GROUP BY. Currently you use aggregated values New and Processing for grouping. Use GROUP BY by existing table values if possible.
If the query gets too complicated, break it into smaller queries and combine results in the final query. Writing a store procedure may help in this case.
I hope this helps.
For tuning a database query, I shall add few items additional to what #Davyd has already listed:
Look at the tables and indexing on those tables. Putting the right index and avoiding the wrong ones always speed up the query.
Is there anything in the where condition that is not part of any index? At times we put index on a column and in the query we use a cast or convert on the column. So the underlying index is not effective. You may consider setting the index on the cast/convert of the column.
Look at the normal form conformity or over normalisation. 3.
Good luck.
If your are using Postgresql, I suggest you use a tool like "http://explain.depesz.com/" in order to see more clearly what part of your query is slow. Depending on what you get, you could either optimize your indexes, or rewrite part of your query. If your are using another database, I'm sure a similar tool exists.
If none of these ideas help, the final solution would be to create a "materialized query". There are plenty of infos on the web regarding this.
Good luck.

SQL: Is a query like this OK or is there a more efficient way of doing it, like using a join?

I often find myself wanting to write an SQL query like the following:
SELECT body
FROM node_revisions
where vid = (SELECT vid
FROM node
WHERE nid = 4);
I know that there are joins and stuff you could do, but they seem to make things more complicated. Are joins a better way to do it? Is it more efficient? Easier to understand?
Joins tend to be more efficient since databases are written with set operations in mind (and joins are set operations).
However, performance will vary from database to database, how the tables are structured, the amount of data in them and how much will be returned by the query.
If the amount of data is small, I would use a subquery like yours rather than a join.
Here is what a join would look like:
SELECT body
FROM node_revisions nr
INNER JOIN node n
ON nr.vid = n.vid
WHERE n.nid = 4
I would not use the query you posted, as there is chance of more than one node record with a nid = 4, which would cause it to fail.
I would use:
SELECT body
FROM node_revisions
WHERE vid IN (SELECT vid
FROM node
WHERE nid = 4);
Is this more readable or understandable? In this case, it's a matter of personal preference.
I think joins are easier to understand and can be more efficient. Your case is pretty simple, so it is probably a toss-up. Here is how I would write it:
SELECT body
FROM node_revisions
inner join node
on (node_revisions.vid = node.vid)
WHERE node.nid = 4
The answer to any performance related questions in databases is it depends, and we're short on details in the OP. Knowing no specifics about your situation... (thus, these are general rules of thumb)
Joins are better and easier to understand
If for some reason you need multiple column keys (fishy), you can continue to use a join and simply tack on another expression to the join condition.
If in the future you really do need to join auxiliary data, the join framework is already there.
It makes it more clear exactly what you're joining on and where indexes should be implemented.
Use of joins makes you better at joins and better at thinking about joins.
Joins are clear about what tables are in play
Written queries have nothing to do with effiency*
The queries you write and what actually gets run have little to do with one another. There are many ways to write a query but only so few ways to fetch the data, and it's up to the query engine to decide. This relates mostly to indexes. It's very possible to write four queries that look totally different but internally do the same thing.
(* It's possible to write a horrible query that is inefficient but it takes a special kind of crazy to do that.)
select
body
from node_revisions nr
join node n
on n.vid = nr.vid
where n.nid = 4
A join is interesting:
select body
from node_revisions nr
join node n on nr.vid = n.vid
where n.vid = 4
But you can also express a join without a join [!]:
select body
from node_revisions nr, node n
where n.nid = 4 and nr.vid = n.vid
Interestingly enough, SQL Server gives a slight different query plan on both queries, while the join has a clustered index scan, the "join without a join" has a clustered index seek in its place, which indicates it's better, at least in this case!
select
body
from node_revisions A
where exists (select 'x'
from Node B
Where A.Vid = B.Vid and B.NID=4)
I don't see anything wrong with what you wrote, and a good optimizer may even change it to a join if it sees fit.
SELECT body
FROM node_revisions
WHERE vid =
(
SELECT vid
FROM node
WHERE nid = 4
)
This query is logically equivalent to a join if and only if nid is a PRIMARY KEY or is covered by a UNIQUE constraint.
Otherwise, the queries are not equivalent: a join will always succeed, while the subquery will fail if there are more that 1 row in node with nid = 4.
If nid is a PRIMARY KEY, then the JOIN and the subquery will have same performance.
In case of a join, node will be made leading
In case of a subquery, the subquery will be executed once and transformed into a const on parsing stage.
The latest MySQL 6.x code will automatically convert that IN expression into an INNER JOIN using a semi-join subquery optimization, making the 2 statements largely equivalent:
http://forge.mysql.com/worklog/task.php?id=3740
but, actually writing it out is pretty simple to do, because INNER JOIN is the default join type, and doing this wouldn't rely on the server optimizing it away (which it might decide not to for some reason and which wouldn't be portable necessarily). all things being equal, why not go with:
select body from node_revisions r, node n where r.vid = n.vid and n.node = 4

Aggregating two selects with a group by in SQL is really slow

I am currently working with a query in in MSSQL that looks like:
SELECT
...
FROM
(SELECT
...
)T1
JOIN
(SELECT
...
)T2
GROUP BY
...
The inner selects are relatively fast, but the outer select aggregates the inner selects and takes an incredibly long time to execute, often timing out. Removing the group by makes it run somewhat faster and changing the join to a LEFT OUTER JOIN speeds things up a bit as well.
Why would doing a group by on a select which aggregates two inner selects cause the query to run so slow? Why does an INNER JOIN run slower than a LEFT OUTER JOIN? What can I do to troubleshoot this further?
EDIT: What makes this even more perplexing is the two inner queries are date limited and the overall query only runs slow when looking at date ranges between the start of July and any other day in July, but if the date ranges are anytime before the the July 1 and Today then it runs fine.
Without some more detail of your query its impossible to offer any hints as to what may speed your query up. A possible guess is the two inner queries are blocking access to any indexes which might have been used to perform the join resulting in large scans but there are probably many other possible reasons.
To check where the time is used in the query check the execution plan, there is a detailed explanation here
http://www.sql-server-performance.com/tips/query_execution_plan_analysis_p1.aspx
The basic run down is run the query, and display the execution plan, then look for any large percentages - they are what is slowing your query down.
Try rewriting your query without the nested SELECTs, which are rarely necessary. When using nested SELECTs - except for trivial cases - the inner SELECT resultsets are not indexed, which makes joining them to anything slow.
As Tetraneutron said, post details of your query -- we may help you rewrite it in a straight-through way.
Have you given a join predicate? Ie join table A ON table.ColA = table.ColB. If you don't give a predicate then SQL may be forced to use nested loops, so if you have a lot of rows in that range it would explain a query slow down.
Have a look at the plan in the SQL studio if you have MS Sql Server to play with.
After your t2 statement add a join condition on t1.joinfield = t2.joinfield
The issue was with fragmented data. After the data was defragmented the query started running within reasonable time constraints.
JOIN = Cartesian Product. All columns from both tables will be joined in numerous permutations. It is slow because the inner queries are querying each of the separate tables, but once they hit the join, it becomes a Cartesian product and is more difficult to manage. This would occur at the outer select statement.
Have a look at INNER JOINs as Tetraneutron recommended.