Updating with sub query on the same table - sql

I was trying to achieve this query1:
UPDATE temp_svn1 t set closedate=(select max(date) from temp_svn1 p where p.id=t.id
Apparently MySQL does not allow such queries. So I came up with this query using inner joins but this is too slow. How can I write a better query for this? OR How can I achieve the logic of query 1?
UPDATE temp_svn1 AS out INNER JOIN (select id, close from temp_svn1 T inner join (select id as cat, max(date) as close from temp_svn1 group by cat) as in where T.id = in.cat group by id ) as result ON out.id = result.id SET out.closedate = result.close

Because of mysql peculiarities the only way you're going to get this to be faster is to split it into two statements. First, fetch the max, and then use it in an update. This is a known hack, and, afaik, there's not a "neater" way of doing it in a single statement.

Since your subquery is just returning a single value, you could do the query in two stages. Select the max(date) into a server-side variable, then re-use that variable in the outer query. Of course, this breaks things up into two queries and it'll no longer be atomic. But with appropriate transactions/locks, that becomes moot.

This works:
UPDATE temp_svn1
set closedate = (select max(date) from temp_svn1 p where p.id = temp_svn1.id)

Related

Can all correlated sub queries be queried using joins in Postgresql?

So say I have a table
Dress(Dress_id, Colour, Value) and another table
On_Loan(Dress_id, Start_date, End_date)
and I have this query, which I am not sure it is a correlated sub query or not
SELECT D.COLOUR,D.VALUE FROM DRESS WHERE D.DRESS_ID = (SELECT ON_LOAN.DRESS_ID FROM ON_LOAN WHERE ON_LOAN.DRESS_ID = D.DRESS_ID);
Basically I want to return the colour and values of the dresses that are on loan. I have also queried it like this:
SELECT COLOUR,VALUE FROM DRESS INNER JOIN ON ON_LOAN WHERE DRESS.DRESS_ID = ON_LOAN.DRESS_ID);
Both of which give the same output, so i am wondering if all correlated subqueries can be turned into equivalent queries with joins?
Thanks.
Many correlated subqueries can be represented as JOINs. In fact, correlated subqueries are a type of join in the general sense.
But if you are asking if a "simple" JOIN operator query is always available, the answer is "no". A pretty simple example is:
select t.*,
(select tt.x
from tt
where tt.y = t.y
order by tt.z
limit 1
) as z
from t;
Of course, this can be expressed without using a correlated subquery. But it requires more than a single JOIN operation.

Access SQL: Update list with Max() and Min() values not possible

I've a list of dates: list_of_dates.
I want to find the max and min values of each number with this code (#1).
It works how it should, and therefore I get the table MinMax
Now I want to update a other list (list_of_things) with these newly acquired values (#2).
However, it is not possible.
I assume it's due to DISTINCT and the fact that I always get two rows per number, each with the min and max values. Therefore an update is not possible.
Unfortunately I don't know any other way.
#1
SELECT a.number, b.MaxDateTime, c.MinDateTime
FROM (list_of_dates AS a
INNER JOIN (
SELECT a.number, MAX(a.dat) AS MaxDateTime
FROM list_of_dates AS a
GROUP BY a.number) AS b
ON a.number = b.number)
INNER JOIN (SELECT a.number, MIN(a.dat) AS MinDateTime
FROM list_of_dates AS a
GROUP BY a.number) AS c
ON a.number = c.number;
#2
UPDATE list_of_things AS a
LEFT JOIN MinMax AS b
ON a.number = b.number
SET a.latest = b. MaxDateTime, a.ealiest = b.MinDateTime```
No part of an MS Access update query can contain aggregation, else the resulting recordset becomes 'not updateable'.
In your case, the use of the min & max aggregate functions within the MinMax subquery cause the final update query to become not updateable.
Whilst it is not always advisable to store aggregated data (in favour of generating an output from transactional data using queries), if you really need to do this, here are two possible methods:
1. Using a Temporary Table to store the Aggregated Result
Run a select into query such as the following:
select
t.number,
max(t.dat) as maxdatetime,
min(t.dat) as mindatetime
into
temptable
from
list_of_dates t
group by
t.number
To generate a temporary table called temptable, then run the following update query which sources date from this temporary table:
update
list_of_things t1 inner join temptable t2
on t1.number = t2.number
set
t1.latest = t2.maxdatetime,
t1.earliest = t2.mindatetime
2. Use Domain Aggregate Functions
Since domain aggregate functions (dcount, dsum, dmin, dmax etc.) are evaluated separately from the evaluation of the query, they do not break the updateable nature of a query.
As such, you might consider using a query such as:
update
list_of_things t1
set
t1.latest = dmax("dat","list_of_dates","number = " & t1.number),
t1.earliest = dmin("dat","list_of_dates","number = " & t1.number)
It's a shot in the dark, but try adding DistinctRow as per SQL Update woes in MS Access - Operation must use an updateable query
Also try using an inner join. If you need to, you can run an update to a null value first for all the records in the query to simulate the effect of the outer join.

Fastest way to count from a subquery

I have the following query to return a list of current employees and the number of 'corrections' they have. This is working correctly but is very slow.
I was previously not using a subquery, instead opting for a count (from...) as an aggregate subselect but I have read that a subquery should be much faster. Changing to the code to the below did improve performance but not anywhere near what I was expecting.
SELECT DISTINCT
tblStaff.StaffID, CorrectionsOut.Count AS CorrectionsAssigned
FROM tblStaff
LEFT JOIN tblMeetings ON tblMeetings.StaffID = tblStaff.StaffID
JOIN tblTasks ON tblTasks.TaskID = tblMeetings.TaskID
--Get Corrections Issued
LEFT JOIN(
SELECT
COUNT(DISTINCT tblMeetings.TaskID) AS Count, tblMeetings.StaffID
FROM tblRegister
JOIN tblMeetings ON tblRegister.MeetingID = tblMeetings.MeetingID
WHERE tblRegister.FDescription IS NOT NULL
AND tblRegister.CorrectionOutDate IS NULL
GROUP BY tblMeetings.StaffID
) AS CorrectionsOut ON CorrectionsOut.StaffID = tblStaff.StaffID
WHERE tblStaff.CurrentEmployee = 1
I need an open vendor solution as we are transitioning from SQL Server to Postgres. Note this is a simplified example of the query where there are quite few counts. My current query time without the counts is less than half a second, but with the counts, is approx 20 seconds, if it runs at all without locking or otherwise failing.
I would get rid of the joins that you are not using which probably makes the SELECT DISTINCT unnecessary as well:
SELECT s.StaffID, co.Count AS CorrectionsAssigned
FROM tblStaff s LEFT JOIN
(SELECT COUNT(DISTINCT m.TaskID) AS Count, m.StaffID
FROM tblRegister r
tblMeetings m
ON r.MeetingID = m.MeetingID
WHERE r.FDescription IS NOT NULL AND
r.CorrectionOutDate IS NULL
GROUP BY m.StaffID
) co
ON co.StaffID = s.StaffID
WHERE s.CurrentEmployee = 1;
Getting rid of the SELECT DISTINCT and the duplicate rows added by the tasks should help performance.
For additional benefit, you would want to be sure you have indexes on the JOIN keys, and perhaps on the filtering criteria.

Optimize WHERE clause in query

I have the following query:
SELECT table2.serialcode,
p.name,
date,
power,
behind,
direction,
length,
centerlongitude,
centerlatitude,
currentlongitude,
currentlatitude
FROM table1 as table2
JOIN pivots p ON p.serialcode = table2.serial
WHERE table2.serialcode = '49257'
and date = (select max(a.date) from table1 a where a.serialcode ='49257');
It seems it is retrieving the select max subquery for each join. It takes a lot of time. Is there a way to optimize it? Any help will be appreciated.
Sub selects that end up being evaluated "per row of the main query" can cause tremendous performance problems once you try to scale to larger number of rows.
Sub selects can almost always be eliminated with a data model tweak.
Here's one approach: add a new is_latest to the table to track if it's the max value (and for ties, use other fields like created time stamp or the row ID). Set it to 1 if true, else 0.
Then you can add where is_latest = 1 to your query and this will radically improve performance.
You can schedule the update to happen or add a trigger etc. if you need an automated way of keeping is_latest up to date.
Other approaches involve 2 tables - one where you keep only the latest record and another table where you keep the history.
declare #maxDate datetime;
select #maxDate = max(a.date) from table1 a where a.serialcode ='49257';
SELECT table2.serialcode,
p.name,
date,
power,
behind,
direction,
length,
centerlongitude,
centerlatitude,
currentlongitude,
currentlatitude
FROM table1 as table2
JOIN pivots p ON p.serialcode = table2.serial
WHERE table2.serialcode = '49257'
and date =#maxDate;
You can optimize this query using indexes. Here are somethat should help: table1(serialcode, serial, date), table1(serialcode, date), and pivots(serialcode).
Note: I find it very strange that you have columns called serial and serialcode in the same table, and the join is on serial.
Since you haven't mentioned which DB you are using, I would answer if it was for Oracle.
You can use WITH clause to take out the subquery and make it perform just once.
WITH d AS (
SELECT max(a.date) max_date from TABLE1 a WHERE a.serialcode ='49257'
)
SELECT table2.serialcode,
p.name,
date,
power,
behind,
direction,
length,
centerlongitude,
centerlatitude,
currentlongitude,
currentlatitude
FROM table1 as table2
JOIN pivots p ON p.serialcode = table2.serial
JOIN d on (table2.date = d.max_date)
WHERE table2.serialcode = '49257'
Please note that you haven't qualified date column, so I just assumed it belonged to table1 and not pivots. You can change it. An advise on the same note - always qualify your columns by using table.column format.

optimize SQL query

What more can I do to optimize this query?
SELECT * FROM
(SELECT `item`.itemID, COUNT(`votes`.itemID) AS `votes`,
`item`.title, `item`.itemTypeID, `item`.
submitDate, `item`.deleted, `item`.ItemCat,
`item`.counter, `item`.userID, `users`.name,
TIMESTAMPDIFF(minute,`submitDate`,NOW()) AS 'timeMin' ,
`myItems`.userID as userIDFav, `myItems`.deleted as myDeleted
FROM (votes `votes` RIGHT OUTER JOIN item `item`
ON (`votes`.itemID = `item`.itemID))
INNER JOIN
users `users`
ON (`users`.userID = `item`.userID)
LEFT OUTER JOIN
myItems `myItems`
ON (`myItems`.itemID = `item`.itemID)
WHERE (`item`.deleted = 0)
GROUP BY `item`.itemID,
`votes`.itemID,
`item`.title,
`item`.itemTypeID,
`item`.submitDate,
`item`.deleted,
`item`.ItemCat,
`item`.counter,
`item`.userID,
`users`.name,
`myItems`.deleted,
`myItems`.userID
ORDER BY `item`.itemID DESC) as myTable
where myTable.userIDFav = 3 or myTable.userIDFav is null
limit 0, 20
I'm using MySQL
Thanks
What does the analyzer say for this query? Without knowledge about how many rows there are in the table you cant tell any optimization. So run the analyzer and you'll see what parts costs what.
Of course, as #theomega said, look at the execution plan.
But I'd also suggest to try and "clean up" your statement. (I don't know which one is faster - that depends on your table sizes.) Usually, I'd try to start with a clean statement and start optimizing from there. But typically, a clean statement makes it easier for the optimizer to come up with a good execution plan.
So here are some observations about your statement that might make things slow:
a couple of outer joins (makes it hard for the optimzer to figure out an index to use)
a group by
a lot of columns to group by
As far as I understand your SQL, this statement should do most of what yours is doing:
SELECT `item`.itemID, `item`.title, `item`.itemTypeID, `item`.
submitDate, `item`.deleted, `item`.ItemCat,
`item`.counter, `item`.userID, `users`.name,
TIMESTAMPDIFF(minute,`submitDate`,NOW()) AS 'timeMin'
FROM (item `item` INNER JOIN users `users`
ON (`users`.userID = `item`.userID)
WHERE
Of course, this misses the info from the tables you outer joined, I'd suggest to try to add the required columns via a subselect:
SELECT `item`.itemID,
(SELECT count (itemID)
FROM votes v
WHERE v.itemID = 'item'.itemID) as 'votes', <etc.>
This way, you can get rid of one outer join and the group by. The outer join is replaced by the subselect, so there is a trade-off which may be bad for the "cleaner" statement.
Depending on the cardinality between item and myItems, you can do the same or you'd have to stick with the outer join (but no need to reintroduce the group by).
Hope this helps.
Some quick semi-random thoughts:
Are your itemID and userID columns indexed?
What happens if you add "EXPLAIN " to the start of the query and run it? Does it use indexes? Are they sensible?
DO you need to run the whole inner query and filter on it, or could you put move the where myTable.userIDFav = 3 or myTable.userIDFav is null part into the inner query?
You do seem to have too many fields in the Group By list, since one of them is itemID, I suspect that you could use an inner SELECT to preform the grouping and an outer SELECT to return the set of fields desired.
Can't you add the where clause myTable.userIDFav = 3 or myTable.userIDFav is null to WHERE (item.deleted = 0)?
Regards
Lieven
Look at the way your query is built. You join a lot of stuff, then limit the output to 20 rows. You should have the outer join on items and myitems, since your conditions only apply to these two tables, limit the output to the first 20 rows, then join and aggregate. Here you are performing a lot of work that is going to be discarded.