select s.mname,count(distinct s.actorlist) as act
from p s
group by s.mname
having count(distinct s.actorlist) =
(select max(t.act)
from (
select s1.mname,count(distinct s1.actorlist) as act
from p s1
group by s1.mname
) as t );
error occuring :ORA-00907: missing right parenthesis . not able execute query
Can someone please help me in finding out the error.
Alas, Oracle does not permit as for a table alias:
select s.mname,count(distinct s.actorlist) as act
from p s
group by s.mname
having count(distinct s.actorlist) =
(select max(t.act)
from (select s1.mname, count(distinct s1.actorlist) as act
from p s1
group by s1.mname
) t
);
Do note, however: I would recommend using window functions for this purpose, rather than nested subqueries.
That would look like:
select s.*
from (select s.mname, count(distinct s.actorlist) as act,
max(count(distinct s.actorlist)) over () as max_act
from p s
group by s.mname
) s
where act = max_act;
The subquery approach can be simplified as follows:
select mname, count(distinct actorlist) as act
from p
group by mname
having count(distinct actorlist) = ( select max( count(distinct actorlist) )
from p
group by mname
);
Notice how the subquery (in the HAVING condition) works: it groups the rows in table p by mname, then it computes distinct actorlist within each group, and then it takes the MAX over all groups (over all mnames).
I don't see the benefit of using aliases here, either for tables or for columns. I also see no point in aliasing a table called p to s - even if you need to fully qualify column names, p.actorlist is perfectly fine. By NOT using aliases, you make it clear to a future developer that the subquery is self-contained (it is not correlated to anything in the outer query - it simply computes a number, in a self-contained manner, and it returns that number to the HAVING clause).
In this case, I am not convinced that analytic functions are needed. (Of course, the GROUP BY solution no longer has different levels of nested subqueries, so the reason to consider analytic functions is no longer there.)
You could avoid reading the base table twice by using dense_rank last but I would only look at that if performance is poor; otherwise this solution looks pretty clean and easy to explain to anyone and to maintain.
Related
I got a table Users and a table Tasks. Tasks are ordered by importance and are assigned to a user's task list. Tasks have a status: ready or not ready. Now, I want to list all users with their most important task that is also ready.
The interesting requirement that the tasks for each user first need to be filtered and sorted, and then the most important one should be selected. This is what I came up with:
SELECT Users.name,
(SELECT *
FROM (SELECT Tasks.description
FROM Tasks
WHERE Tasks.taskListCode = Users.taskListCode AND Tasks.isReady
ORDER BY Tasks.importance DESC)
WHERE rownum = 1
) AS nextTask
FROM Users
However, this results in the error
ORA-00904: "Users"."taskListCode": invalid identifier
I think the reason is that oracle does not support correlating subqueries with more than one level of depth. However, I need two levels so that I can do the WHERE rownum = 1.
I also tried it without a correlating subquery:
SELECT Users.name, Task.description
FROM Users
LEFT JOIN Tasks nextTask ON
nextTask.taskListCode = Users.taskListCode AND
nextTask.importance = MAX(
SELECT tasks.importance
FROM tasks
WHERE tasks.isReady
GROUP BY tasks.id
)
This results in the error
ORA-00934: group function is not allowed here
How would I solve the problem?
One work-around for this uses keep:
SELECT u.name,
(SELECT MAX(t.description) KEEP (DENSE_RANK FIRST ORDER BY T.importance DESC)
FROM Tasks t
WHERE t.taskListCode = u.taskListCode AND t.isReady
) as nextTask
FROM Users u;
Please try with analytic function:
with tp as (select t.*, row_number() over (partition by taskListCode order by importance desc) r
from tasks t
where isReady = 1 /*or 'Y' or what is positive value here*/)
select u.name, tp.description
from users u left outer join tp on (u.taskListCode = tp.taskListCode)
where tp.r = 1;
Here is a solution that uses aggregation rather than analytic functions. You may want to run this against the analytic functions solution to see which is faster; in many cases aggregate queries are (slightly) faster, but it depends on your data, on index usage, etc.
This solution is similar to what Gordon tried to do. I don't know why he wrote it using a correlated subquery instead of a straight join (and don't know if it will work - I've never seen the FIRST/LAST function used with correlated subqueries like that).
It may not work exactly right if there may be NULL in the importance column - then you will need to add nulls first after t.importance and before ). Note: the max(t.description) is needed, because there may be ties by "importance" (two tasks with the same, highest importance for a given user). In that case, one task must be chosen. If the ordering by importance is strict (no ties), then the MAX() does nothing as it selects the MAX over a set of exactly one value, but the compiler doesn't know that beforehand so it does need the MAX().
select u.name,
max(t.description) keep (dense_rank last order by t.importance) as descr
from users u left outer join tasks t on u.tasklistcode = t.tasklistcode
where t.isready = 'Y'
group by u.name
This query:
SELECT spin_id, COUNT(*) over(partition by spin_id) notes, note
FROM
(
SELECT spin_id, idfa, note, amount, balance, machine
FROM islot.ledger2, islothd.ledger2
WHERE machine =‘SlotMachineG2.SlotMachine41’ AND ds >= ‘2014-11-20’
) a
LEFT OUTER JOIN EACH views.internal_devices b
ON a.idfa = b.ios_idfa
WHERE b.ios_idfa is null
ORDER BY notes ASC;
... reliably fails with:
Resources exceeded during query execution. The query contained a GROUP
BY operator, consider using GROUP EACH BY instead.
... but this query, somewhat obviously, doesn't contain a GROUP BY. Normally we'd just promote any JOIN/GROUP clauses to their equivalent EACH, but it's not clear to me where to apply this.
Any suggestions?
This could happen if one of the tables you are querying over is a Table View that itself is defined as a "GROUP BY" query. Given that the table you are joining to is in a dataset called "views", I suspect this is the case.
Count () Over() is a group by operation. I would try to break down thequery to sub queries, and do the aggregation on lower level.
I've inherited a SQL Server based application and it has a stored procedure that contains the following, but it hits timeout. I believe I've isolated the issue to the SELECT MAX() part, but I can't figure out how to use alternatives, such as ROW_NUMBER() OVER( PARTITION BY...
Anyone got any ideas?
Here's the "offending" code:
SELECT BData.*, B.*
FROM BData
INNER JOIN
(
SELECT MAX( BData.StatusTime ) AS MaxDate, BData.BID
FROM BData
GROUP BY BData.BID
) qryMaxDates
ON ( BData.BID = qryMaxDates.BID ) AND ( BData.StatusTime = qryMaxDates.MaxDate )
INNER JOIN BItems B ON B.InternalID = qryMaxDates.BID
WHERE B.ICID = 2
ORDER BY BData.StatusTime DESC;
Thanks in advance.
SQL performance problems are seldom addressed by rewriting the query. The compiler already know how to rewrite it anyway. The problem is always indexing. To get MAX(StatusTime ) ... GROUP BY BID efficiently, you need an index on BData(BID, StatusTime). For efficient seek of WHERE B.ICID = 2 you need an index on BItems.ICID.
The query could also be, probably, expressed as a correlated APPLY, because it seems that what is what's really desired:
SELECT D.*, B.*
FROM BItems B
CROSS APPLY
(
SELECT TOP(1) *
FROM BData
WHERE B.InternalID = BData.BID
ORDER BY StatusTime DESC
) AS D
WHERE B.ICID = 2
ORDER BY D.StatusTime DESC;
SQL Fiddle.
This is not semantically the same query as OP, the OP would return multiple rows on StatusTime collision, I just have a guess though that this is what is desired ('the most recent BData for this BItem').
Consider creating the following index:
CREATE INDEX LatestTime ON dbo.BData(BID, StatusTime DESC);
This will support a query with a CTE such as:
;WITH x AS
(
SELECT *, rn = ROW_NUMBER() OVER (PARTITION BY BID ORDER BY StatusDate DESC)
FROM dbo.BData
)
SELECT * FROM x
INNER JOIN dbo.BItems AS bi
ON x.BID = bi.InternalID
WHERE x.rn = 1 AND bi.ICID = 2
ORDER BY x.StatusDate DESC;
Whether the query still gets efficiencies from any indexes on BItems is another issue, but this should at least make the aggregate a simpler operation (though it will still require a lookup to get the rest of the columns).
Another idea would be to stop using SELECT * from both tables and only select the columns you actually need. If you really need all of the columns from both tables (this is rare, especially with a join), then you'll want to have covering indexes on both sides to prevent lookups.
I also suggest calling any identifier the same thing throughout the model. Why is the ID that links these tables called BID in one table and InternalID in another?
Also please always reference tables using their schema.
Bad habits to kick : avoiding the schema prefix
This may be a late response, but I recently ran into the same performance issue where a simple query involving max() is taking more than 1 hour to execute.
After looking at the execution plan, it seems in order to perform the max() function, every record meeting the where clause condition will be fetched. In your case, it's every record in your table will need to be fetched before performing max() function. Also, indexing the BData.StatusTime will not speed up the query. Indexing is useful for looking up a particular record, but it will not help performing comparison.
In my case, I didn't have the group by so all I did was using the ORDER BY DESC clause and SELECT TOP 1. The query went from over 1 hour down to under 5 minutes. Perhaps, you can do what Gordon Linoff suggested and use PARTITION BY. Hopefully, your query can speed up.
Cheers!
The following is the version of your query using row_number():
SELECT bd.*, b.*
FROM (select bd.*, row_number() over (partition by bid order by statustime desc) as seqnum
from BData bd
) bd INNER JOIN
BItems b
ON b.InternalID = bd.BID and bd.seqnum = 1
WHERE B.ICID = 2
ORDER BY BData.StatusTime DESC;
If this is not faster, then it would be useful to see the query plans for your query and this query to figure out how to optimize them.
Depends entirely on what kind of data you have there. One alternative that may be faster is using CROSS APPLY instead of the MAX subquery. But more than likely it won't yield any faster results.
The best option would probably be to add an index on BID, with INCLUDE containing the StatusTime, and if possible filtering that by InternalID's matching BItems.ICID = 2.
[UNSOLVED] But I've moved on!
Thanks to everyone who provided answers / suggestions. Unfortunately I couldn't get any further with this, so have given-up trying for now.
It looks like the best solution is to re-write the application to UPDATE the latest data into into a different table, that way it's a really quick and simple SELECT to latest readings.
Thanks again for the suggestions.
Can anyone give me a good example of a subquery using TSQL 2008?
Maximilian Mayer believes that, due to referencing MS documentation, my assertion that there is a difference between a subquery and a subSelect is incorrect. Frankly, I'd consider MSDN's "Subquery Fundamentals" a better choice. Quote:
You are making distinctions between terms that actually mean the same.
O RLY?
A subQUERY...
IE:
WHERE id IN (SELECT n.id FROM TABLE n)
OR id = (SELECT MAX(m.id) FROM TABLE m)
OR EXISTS(SELECT 1/0 FROM TABLE) --won't return a math error for division by zero
...affects the WHERE or HAVING clauses -- the filteration of data -- for a SELECT, INSERT, UPDATE or DELETE statement. The value from a subquery is never directly visible in the SELECT clause.
A subSELECT...
IE:
SELECT t.column,
(SELECT x.col FROM TABLE x) AS col2
FROM TABLE t
...does not affect the filteration of data in the main query, and the value is exposed directly in the SELECT clause. But it's only one value - you can't return two or more columns into a single column in the outer query.
A subselect is a consistent means of performing a LEFT JOIN in ANSI-89 join syntax - if there is no supporting row, the column will be null. Additionally, a non-correlated subselect will return the same value for every row of the main query.
Correlation
If a subquery or subselect is correlated, that query runs once for every record of the main query returned -- which doesn't scale well as the number of rows in the result set increases.
Derived Table/Inline View
IE:
SELECT x.*,
y.max_date,
y.num
FROM TABLE x
JOIN (SELECT t.id,
t.num,
MAX(t.date) AS max_date
FROM TABLE t
GROUP BY t.id, t.num) y ON y.id = x.id
...is a JOIN to a derived table (AKA inline view).
"Inline view" is a better term, because that is all that happens when you reference a non-materialized view -- a view is just a prepared SQL statement. There's no performance or efficiency difference if you create a view with a query like the one in the example, and reference the view name in place of the SELECT statement within the brackets of the JOIN. The example has the same information as a correlated subquery, but the performance benefit of using a join and none of the subquery detriments. And you can return more than one column, because it is a view/derived table.
Conclusion
It should be obvious why I and others make distinctions. The concept of relying on the word "subquery" to categorize any SELECT statement that isn't the main clause is fatality flawed, because it's also a specific case under a categorization of the same word (IE: subquery-subselect, subquery-subquery, subquery-join...). Now think of helping someone who says "I've got a problem with a subquery..."
Maximilian Mayer's idea of "official" documentation was written by technical writers, who often have no experience in the subject and are only summarizing what they've been told to from knowledgeable people who have simplified things. Ultimately, it's just text on a page or screen -- like what you're reading now -- and the decision is up to you if the details I've laid out make sense to you.
For variety's sake, here's one in the where clause:
select
a.firstname,
a.lastname
from
employee a
where
a.companyid in (
select top 10
c.companyid
from
company c
where
c.num_employees > 1000
)
...returns all employees in the top ten companies with over 1000 employees.
SELECT
*,
(SELECT TOP 1 SomeColumn FROM dbo.SomeOtherTable)
FROM
dbo.MyTable
SELECT a.*, b.*
FROM TableA AS a
INNER JOIN
(
SELECT *
FROM TableB
) as b
ON a.id = b.id
Thats a normal subquery, running once for the whole result set.
On the other hand
SELECT a.*, (SELECT b.somecolumn FROM TableB AS b WHERE b.id = a.id)
FROM TableA AS a
is a correlated subquery, running once for every row in the result set.
I have an SQL question, related to this and this question (but different). Basically I want to know how I can avoid a nested query.
Let's say I have a huge table of jobs (jobs) executed by a company in their history. These jobs are characterized by year, month, location and the code belonging to the tool used for the job. Additionally I have a table of tools (tools), translating tool codes to tool descriptions and further data about the tool. Now they want a website where they can select year, month, location and tool using a dropdown box, after which the matching jobs will be displayed. I want to fill the last dropdown with only the relevant tools matching the before selection of year, month and location, so I write the following nested query:
SELECT c.tool_code, t.tool_description
FROM (
SELECT DISTINCT j.tool_code
FROM jobs AS j
WHERE j.year = ....
AND j.month = ....
AND j.location = ....
) AS c
LEFT JOIN tools as t
ON c.tool_code = t.tool_code
ORDER BY c.tool_code ASC
I resorted to this nested query because it was much faster than performing a JOIN on the complete database and selecting from that. It got my query time down a lot. But as I have recently read that MySQL nested queries should be avoided at all cost, I am wondering whether I am wrong in this approach. Should I rewrite my query differently? And how?
No, you shouldn't, your query is fine.
Just create an index on jobs (year, month, location, tool_code) and tools (tool_code) so that the INDEX FOR GROUP-BY can be used.
The article your provided describes the subquery predicates (IN (SELECT ...)), not the nested queries (SELECT FROM (SELECT ...)).
Even with the subqueries, the article is wrong: while MySQL is not able to optimize all subqueries, it deals with IN (SELECT …) predicates just fine.
I don't know why the author chose to put DISTINCT here:
SELECT id, name, price
FROM widgets
WHERE id IN
(
SELECT DISTINCT widgetId
FROM widgetOrders
)
and why do they think this will help to improve performance, but given that widgetID is indexed, MySQL will just transform this query:
SELECT id, name, price
FROM widgets
WHERE id IN
(
SELECT widgetId
FROM widgetOrders
)
into an index_subquery
Essentially, this is just like EXISTS clause: the inner subquery will be executed once per widgets row with the additional predicate added:
SELECT NULL
FROM widgetOrders
WHERE widgetId = widgets.id
and stop on the first match in widgetOrders.
This query:
SELECT DISTINCT w.id,w.name,w.price
FROM widgets w
INNER JOIN
widgetOrders o
ON w.id = o.widgetId
will have to use temporary to get rid of the duplicates and will be much slower.
You could avoid the subquery by using GROUP BY, but if the subquery performs better, keep it.
Why do you use a LEFT JOIN instead of a JOIN to join tools?