I'm trying to delete data from a table that has about 12 million records, wanting to delete it in batches but you can't use LIMIT with DELETE in sql, I'm a bit stumped on how to get around it.
The query without LIMIT is:
DELETE FROM roster_validationtaskerror
USING roster_validationtaskerror AS rvte
LEFT JOIN roster_validationtask AS rvt ON rvt.id = rvte.parent_task_id
LEFT JOIN roster_validation AS rv ON rv.id = rvt.validation_id
WHERE rv.id = 10
How can I add a LIMIT to this query?
I've been trying to add subselect queries to allow for the LIMIT to be added within this and then a JOIN made after. I'm quite new to SQL so have not been able to figure out how to get this to work.
Try this:
WITH r AS (
SELECT rvte.id
FROM roster_validationtaskerror AS rvte
JOIN roster_validationtask AS rvt ON rvt.id = rvte.parent_task_id
JOIN roster_validation AS rv ON rv.id = rvt.validation_id
WHERE rv.id = 10
-- perhaps order by is needed if you want to delete not randomly selected rows
LIMIT 10
)
DELETE FROM roster_validationtaskerror
USING r
WHERE r.id = roster_validationtaskerror.id
Related
Trying to delete from a table where there are matching records in other tables.
I've tried different variations of this, but this one returns:
SQL Error [42601]: [SQL0199] Keyword INNER not expected. Valid tokens: USE SKIP WAIT WITH FETCH LIMIT ORDER WHERE OFFSET.
It's basically a cross library / cross database, but can't get DB2 to play along. The Select works just fine, if I replace the delete with SELECT *
DELETE a
FROM INHOUSE.ANDREWCAT a
INNER JOIN ERPLIB.SRBPRG b ON
a.PSPRDC = b.PGPRDC
INNER JOIN ERPLIB.SRBRSD c
ON
b.PGIRGP = c.RDSRTY
AND c.RDTOFI = a.EPNUM AND c.RDSRTY = c.RDWHAT
AND a.EPNUM = 'REM104'
DB2 does not support the syntax you ware using.
Instead:
DELETE INHOUSE.ANDREWCAT a
WHERE EXISTS (SELECT 1
FROM ERPLIB.SRBPRG b JOIN
ERPLIB.SRBRSD c
ON b.PGIRGP = c.RDSRTY
WHERE a.PSPRDC = b.PGPRDC AND
c.RDTOFI = a.EPNUM AND
c.RDSRTY = c.RDWHAT AND
a.EPNUM = 'REM104'
);
We have two very similar queries, one takes 22 seconds the other takes 6 seconds. Both use an inner select, have the exact same outer columns and outer joins. The only difference is the inner select that the outer query is using to join in on.
The inner query when run alone executes in 100ms or less in both cases and returns the EXACT SAME data.
Both queries as a whole have a lot of room for improvement, but this particular oddity is really puzzling to us and we just want to understand why. To me it would seem the inner query should be executed once in 100ms then the outer stuff happens. I have a feeling the inner select may be executed multiple times.
Query that takes 6 seconds:
SELECT {whole bunch of column names}
FROM (
SELECT projectItems.* FROM projectItems
WHERE projectItems.isActive = 1
ORDER BY projectItemsID ASC
OFFSET 0 ROWS FETCH NEXT 1 ROWS ONLY
) projectItems
LEFT JOIN categories
ON projectItems.fk_category = categories.categoryID
...{more joins}
Query that takes 22 seconds:
SELECT {whole bunch of column names}
FROM (
SELECT projectItems.* FROM projectItems
WHERE projectItems.isActive = 1
AND projectItemsID = 6539
) projectItems
LEFT JOIN categories
ON projectItems.fk_category = categories.categoryID
...{more joins}
For every row in your projectItems table, in the second function, you search two columns instead of one. If projectItemsID isn't the primary key or if it isn't indexed, it takes longer to parse an extra column.'
If you look at the sizes of the tables and the number of rows each query returns, you can calculate how many comparisons need to be made for each of the queries.
I believe that you're right that the inner query is being run for every single row that is being left joined with categories.
I can't find a proper source on it right now, but you can easily test this by doing something like this and comparing the run times. Here, we can at least be sure that the inner query is only running one time. (sorry if any syntax is incorrect, but you'll get the general idea):
DECLARE #innerQuery TABLE ( [all inner query columns here] )
INSERT INTO #innerQuery
SELECT projectItems.* FROM projectItems
WHERE projectItems.isActive = 1
AND projectItemsID = 6539
SELECT {whole bunch of field names}
FROM #innerQuery as IQ
LEFT JOIN categories
ON IQ.fk_category = categories.categoryID
...{more joins}
I have two tables. I want to update one with another.
I have written two queries to show you the result. Here are my two queries
select PrjTermsID,InstNo,InstDesc,BlockID from ProjectPaymentTerms where BlockID=1
select PlotBookingID,InstNo,InstDesc,PrjTermsID from PlotPaymentTerms where PlotBookingID in
( select PlotBookingID from PlotBooking where PlotID in ( select PlotID from PlotMaster where AppartmentBlock=1))
and see the image for results
First table has 1 to 13 records with InstNo and second table has each PlotBookingID has 13 records.(Each plot will have 13(all) payment terms right?). Now I want to update the second table PlotPaymentTerms with first table autogenerated Id. IF I try with Inner join it is giving more results. How can I write update query to update the second table?
update ppt
set ppt.PrjTermsID = pp.PrjTermsID
from PlotPaymentTerms ppt
inner join ProjectPaymentTerms pp on ppt.InstNo = pp.InstNo and ppt.BlockID = 1
inner join PlotBooking pb on ppt.PlotBookingID = pb.PlotBookingID
inner join PlotMaster pm on pb.PlotID = pm.PlotID
where pm.AppartmentBlock = 1
please check this.
I don't think it is the right way to do. Anyways it works for me.
update ppt set ppt.PrjTermsID = pp.PrjTermsID from PlotPaymentTerms ppt
inner join ProjectPaymentTerms pp on ppt.InstNo = pp.InstNo and pp.BlockID=1
and ppt.PlotBookingID in(select PlotBookingID from PlotBooking where PlotID in ( select PlotID from PlotMaster where AppartmentBlock=1))
Thnaks to #Mukund
I am going to update a table using the sum of specific value from 3 different tables. For this purpose I wrote this query. But it takes too much time, what is the most efficient query for this purpose?
UPDATE dbo.dumpfile_doroud
SET dumpfile_doroud.sms_count_on_net = (SELECT sms_count_on_net
FROM dbo.dumpfile139201
WHERE
dbo.dumpfile_doroud.msisdn = dbo.dumpfile139201.msisdn)
+ (SELECT sms_count_on_net
FROM dbo.dumpfile139202
WHERE
dbo.dumpfile_doroud.msisdn = dbo.dumpfile139202.msisdn)
+ (SELECT sms_count_on_net
FROM dbo.dumpfile139203
WHERE
dbo.dumpfile_doroud.msisdn = dbo.dumpfile139203.msisdn)
P.S: dumpfile_doroud is small table but other three tables are really big.
Try this:
UPDATE t1
SET t1.sms_count_on_net=isnull(t2.sms_count_on_net,0) +
isnull(t3.sms_count_on_net,0) +
isnull(t4.sms_count_on_net,0)
FROM dbo.dumpfile_doroud t1
LEFT JOIN dbo.dumpfile139201 t2
ON t2.msisdn = t1.msisdn
LEFT JOIN dumpfile139202 t3
ON t3.msisdn = t1.msisdn
LEFT JOIN dumpfile139203 t4
ON t4.msisdn = t1.msisdn
I don't think it's possible to make faster query, so you can try put indexes. I think you can create nonclustered index on column msisdn on all tables. Syntax:
CREATE NONCLUSTERED INDEX IX_doroud_dumpfile139201
ON dbo.dumpfile139201(msisdn);
You can run SQL Management studio and turn on display estimated execution plan this sometimes gives good advices on creating indexes.
Create a subquery to calculate the totals then join the table to it
UPDATE o
SET o.sms_count_on_net = n.sms_count_on_net
FROM
dbo.dumpfile_doroud o
JOIN
(SELECT
d.msisdn, sms_count_on_net = (d1.sms_count_on_net+d2.sms_count_on_net+d3.sms_count_on_net)
FROM
dbo.dumpfile_doroud d
LEFT JOIN dbo.dumpfile139201 d1 ON d1.msisdn = d.msisdn
LEFT JOIN dbo.dumpfile139202 d2 ON d2.msisdn = d.msisdn
LEFT JOIN dbo.dumpfile139203 d3 ON d3.msisdn = d.msisdn) n
ON o.msisdn = n.msisdn
Note that if the value is missing from any of those tables the total will be null. That may or may not be what you want
I'm replacing a subquery with an self join to improve performance of my query.
The old subquery was like this:
(SELECT fage2.agecat
FROM people AS fage2
WHERE fage2.aacode = people.aacode
AND fage2.persno = 2) AS RAge2,
The new self join is like this:
(SELECT [People].[AgeCat]
FROM [People]
INNER JOIN [People] AS p2
ON [People].[aacode] = [P2].[aacode]
WHERE [P2].[PERSNO] = 2 ) AS RAge2,
but returns a No Current Record error message.
The goal is to find the record that has the same aacode but has the PERSNO number of 2 and return the AgeCat for that record in a column called RAge2,
This is only part of a larger query which is explained in full Convert a SQL subquery into a join when looking at another record in the same table Access 2010
Huum, looks like this query that you want to optimize is part of a bigger query, and would be important to the question that you post the entire query so it would help on understanding your problem ...
Also, from what I can see you would be showing the RAge2 for both rows with same AACode not only to the one that has Persno = 2 as you said on the goal. Pasting your entire query would help to understand that also.
I was trying to understand your query, so I created a fake query for your original one:
SELECT
(SELECT FAge2.AgeCat
FROM People AS FAge2
WHERE FAge2.aacode = People.aacode
AND FAge2.PERSNO = 2) AS RAge2,
People.PersonId
FROM People
To get the same results you would need a Left Join and not a Inner Join as a query with a subquery wouldn't exclude results from the outer table, so you would have something like this as the resulting Join query:
SELECT
FAge2.AgeCat as RAge2,
People.PersonID,
FROM People
Left JOIN People AS FAge2 ON (FAge2.aacode = People.aacode AND FAge2.PERSNO = 2)
Please use:
(SELECT [People].[AgeCat] FROM [People] INNER JOIN [People] AS P2 ON ([People].[aacode] = [P2].[aacode] AND [P2].[PERSNO] = 2)) AS RAge2