I am using a Cursor in my stored procedure. It works on a database that has a huge number of data. for every item in the cursor i do a update operation. This is taking a huge amount of time to complete. Almost 25min. :( .. Is there anyway i can reduce the time consumed for this?
When you need to do a more complex operation to each row than what a simple update would allow you, you can try:
Write a User Defined Function and use that in the update (probably still slow)
Put data in a temporary table and use that in an UPDATE ... FROM:
Did you know about the UPDATE ... FROM syntax? It is quite powerful when things get more complex:
UPDATE
MyTable
SET
Col1 = CASE WHEN b.Foo = "Bar" THEN LOWER(b.Baz) ELSE "" END,
Col2 = ISNULL(c.Bling, 0) * 100 / Col3
FROM
MyTable
INNER JOIN MySecondTable AS b ON b.Id = MyTable.SecondId
LEFT JOIN ##MyTempTable AS c ON c.Id = b.ThirdId
WHERE
MyTabe.Col3 > 0
AND b.Foo NOT IS NULL
AND MyTable.TheDate > GETDATE() - 10
The example is completely made-up and may not make much sense, but you get the picture of how to do a more complex update without having to use a cursor. Of course, a temp table would not necessarily be required for it to work. :-)
The quick answer is not to use a cursor. The most efficient way to update lots of records is to use an update statement. There are not many cases where you have to use a cursor rather than an update statement, you just have to get clever about how you write the update statement.
If you posted a snapshot of your SQL you might get some help to achieve what you're after.
I would avoid using a cursor, and work with views or materialized views if possible. Cursors is something that Microsoft doesn't optimize much in SQL Server, because most of the time, you should be using a more general SQL statement (SELECT, INSERT, UPDATE, DELETE) than with a cursor.
If you cannot perform the same end result even with using views or subqueries, you may want to use a temp table or look at improving the data model.
You don't provide much specific information, so all that I can do is give some general tips.
Are you updating the same data that the cursor is operating over?
What type of cursor? forward only? static? keyset? dynamic?
The UPDATE...FROM syntax mentioned above is the prefered method. You can also do a sub query such as the following.
UPDATE t1
SET t1.col1 = (SELECT top 1 col FROM other_table WHERE t1_id = t1.ID AND ...)
WHERE ...
Sometimes this is the only way to do it, as each column update may depend on a differant criteria (or a diferant table), and there may be a "best case" that you want to preserve bysing the order by clause.
Can you post more information about the type of update you are doing?
Cursors can be very useful in the right context (I use plenty of them), but if you have a choice between a cursor and a set-based operation, set-based is almost always the way to go.
But if you don't have a choice, you don't have a choice. Can't tell without more detail.
Related
I have the following query that took too much time to be executed.
How to optimize it?
Update Fact_IU_Lead
set
Fact_IU_Lead.Latitude_Point_Vente = Adr.Latitude,
Fact_IU_Lead.Longitude_Point_Vente = Adr.Longitude
FROM Dim_IU_PointVente
INNER JOIN
Data_I_Adresse AS Adr ON Dim_IU_PointVente.Code_Point_Vente = Adr.Code_Point_Vente
INNER JOIN
Fact_IU_Lead ON Dim_IU_PointVente.Code_Point_Vente = Fact_IU_Lead.Code_Point_Vente
WHERE
Latitude_Point_Vente is null
or Longitude_Point_Vente is null and Adr.[Error]=0
Couple of things I would look at on this to help.
How many records are on each table? If it's millions, then you may need to cycle through them.
Are the columns you're joining on or filtering on indexed on each table? If no, add them in! typically a huge speed difference with less cost.
Are the columns you're joining on stored as text instead of geo-spatial? I've had much better performance out of geo-spatial data types in this scenario. Just make sure your SRIDs are the same across tables.
Are the columns you're updating indexed, or is the table that's being updated heavy with indexes? Tons of indexes on a large table can be great for looking things up, but kills update/insert speeds.
Take a look at those first.
I've added a bit of slight cleaning to your code in regard to aliases.
Also, take a look at the where clauses. Choose one of them.
When you have mix and's and or's the best thing you can ever do is add parenthesis.
At a minimum, you'll have zero question regarding your thoughts when you wrote it.
At most, you'll know that SQL is executing your logic correctly.
Update Fact_IU_Lead
set
Latitude_Point_Vente = Adr.Latitude --Note the table prefix is removed
, Longitude_Point_Vente = Adr.Longitude --Note the table prefix is removed
FROM Dim_IU_PointVente as pv --Added alias
INNER JOIN
Data_I_Adresse AS adr ON pv.Code_Point_Vente = adr.Code_Point_Vente --carried alias
INNER JOIN
Fact_IU_Lead as fl ON pv.Code_Point_Vente = fl.Code_Point_Vente --added/carried alias
WHERE
(pv.Latitude_Point_Vente is null or pv.Longitude_Point_Vente is null) and adr.[Error] = 0 --carried alias, option one for WHERE change
pv.Latitude_Point_Vente is null or (pv.Longitude_Point_Vente is null and adr.[Error] = 0) --carried alias, option two for WHERE change
Making joins is usually expensive, the best approach in your case will be to place the update into a stored procedure, split your update into selects and use a transaction to keep everything consistent (if needed) instead.
Hope this answer point you in the right direction :)
I'd like to UPDATE just one data in a large TABLE
What would be the most efficient way to do this?
SELECT * from TABLE WHERE status='N'
UPDATE TABLE set status='Y' where status='N'
I assume table is very very large.
Then may be you should create temp/permanent Filtered index on table
CREATE NONCLUSTERED INDEX Temp_Table_Status
ON dbname.dbo.Table(Status)
WHERE Status='N'
GO
else your query is correct.
UPDATE TABLE set status='Y' where status='N'
"Most efficient" is much like "most beautiful". It has no absolute meaning. How do you measure "efficient". IMO, by far the most efficient mechanism is to use a single update query. Your query should actually be written to avoid pointless updates:
update table set col = 'Y' where col <> 'Y';
The where clause will make it "most efficient". And note that you might need to account for null values in the where clause - know your data. Some might argue for batching the updates in order to save space. If you do this on a regular basis, then you should have generally have sufficient space in the database and log to do this without attempting to pointlessly manage bits of diskspace.
Consider the following
declare #flag bit = 1
select *
from a
left join b
on a.id = b.id
and #flag = 0
Ideally, I might break this into two stored procedures: one for when #flag is true and another for when it's false. However, in my case there is a massive query that I'm just hoping to make a couple of quick optimizations to where I can find some low hanging fruit.
The idea is obviously that I don't SQL wasting time joining to table b under certain conditions, but that's not how SQL works. Are there other ways to make this kind of an optimization or is breaking it up into multiple procs with and without the join the only way to accomplish this?
I've tested your method on the AdventureWorks database and it works fine. Looking at the IO, SQL performs 0 reads of the table.
The key thing I would do is add OPTION (RECOMPILE) to the end of your query, because SQL may reuse an inappropriate Plan in the Cache.
I'm trying to optimise a select (cursor in pl/sql code actually) that includes a pl/sql function e.g.
select * from mytable t,mytable2 t2...
where t.thing = 'XXX'
... lots more joins and sql predicate on various columns
and myplsqlfunction(t.val) = 'X'
The myplsqlfunction() is very expensive, but is only applicable to a manageably small subset of the other conditions.
The problem is that Oracle appears to evaluating myplsqlfunction() on more data than is ideal.
My evidence for this is if I recast the above as either
select * from (
select * from mytable t,mytable2 t2...
where t.thing = 'XXX'
... lots more joins and sql predicate on various columns
) where myplsqlfunction(t.val) = 'X'
or pl/sql as:
begin
for t in ( select * from mytable t,mytable2 t2...
where t.thing = 'XXX'
... lots more joins and sql predicate on various columns ) loop
if myplsqlfunction(t.val) = 'X' then
-- process the desired subset
end if;
end loop;
end;
performance is an order of magnitude better.
I am resigned to restructuring the offending code to use either of the 2 above idioms, but it would be delighted if there was any simpler way to get the Oracle optimizer to do this for me.
You could specify a bunch of hints to force a particular plan. But that would almost assuredly be more of a pain than restructuring the code.
I would expect that what you really want to do is to associate non-default statistics with the function. If you tell Oracle that the function is less selective than the optimizer is guessing or (more likely) if you provide high values for the CPU or I/O cost of the function, you'll cause the optimizer to try to call the function as few times as possible. The oracle-developer.net article walks through how to pick reasonably correct values for the cost (or going a step beyond that how to make those statistics change over time as the cost of the function call changes). You can probably fix your immediate problem by setting crazy-high costs but you probably want to go to the effort of setting accurate values so that you're giving the optimizer the most accurate information possible. Setting costs way too high or way too low tends to cause some set of queries to do something stupid.
You can use WITH clause to first evaluate all your join conditions and get a manageable subset of data. Then you can go for the pl/sql Function on the subset of data. But it all depends on the volume still you can try this. Let me know for any issues.
You can use CTE like:
WITH X as
( select /*+ MATERIALIZE */ * from mytable t,mytable2 t2...
where t.thing = 'XXX'
... lots more joins and sql predicate on various columns
)
SELECT * FROM X
where myplsqlfunction(t.val) = 'X';
Note the Materiliaze hint. CTEs can be either inlined or materialized(into TEMP tablespace).
Another option would be to use NO_PUSH_PRED hint. This is generally better solution (avoids materializing of the subquery), but it requires some tweaking.
PS: you should not call another SQL from myplsqlfunction. This SQL might see data added after your query started and you might get surprising results.
You can also declare your function as RESULT_CACHE, to force the Oracle to remember return values from the function - if applicable i.e. the amount of possible function's parameter values is reasonably small.
Probably the best solution is to associate the stats, as Justin describes.
What I am trying to do, using java, is:
access a database
read a record from table "Target_stats"
if the field "threat_level" = 0, doAction1
if the field "threat_level" > 0, get additional fields from another table "Attacker_stats" and doAction2
read the next record
Now I have everything I need but a well thought out SQL statement that will allow me to only go through the database only once, if this does not work I suspect I will need to use two separate SQL statements and go through the database a second time. I do not have a clear understanding of case statements, so I will just provide pseudo code using an if statement.
SELECT A.1, A.2, A.3
if(A.3 > 0){
SELECT A.1, A.2, A.3, B.1, B.3
FROM A
JOIN B
ON A.1 = B.1
}
FROM A
Can anyone shed any light on my situation?
EDIT: Thankyou both for your time and effort. I understand both of your comments and I believe that I am headed more towards the right direction however, I'm still having some trouble. I didn't know about SQLfiddle before so I have now gone ahead and made a sample DB and tried to demonstrate my purpose. Here is the link: http://sqlfiddle.com/#!3/ea108/1 What I want to do here is Select target_stats.server_id, target_stats.target, target_stats.threat_level Where interval_id=3 and if the threat_level>0 I want to retrieve attack_stats.attacker, attack_stats.sig_name Where interval_id=3. Again, thankyou for your time and effort it is very useful to me
EDIT: after some tinkering around, I figured it out. thankyou so much for your help
As #Ocelot20 said, SQL is not procedural code. It is based on set-based operations, not per row operations. One immediate consequence of this is that the SELECT in your pseudo-example is wrong as it relies on rows in the same result set having different column lists.
That said, you can get pretty close to your pseudo-code example, if you can tolerate NULL values where the join is not possible.
Here's an example that (to me anyway) seems to be close to what your are driving at:
select *
from A
left outer join B
on A.a = B.d and A.a > 2
You can see it in action in this SQLFiddle, which should show you what sort of output to expect.
Note that what this is actually saying is something like this:
Fetch all the records from table A and also fetch any records from
table B have their d column the same as the a column in table
A, provided the value of A.a is greater than 2.
(This was picked for convenience. In my rather contrived example shifting the conditional column does not effect the output as can be see here).