In the documentation here, the following code example is given for using a cursor:
execute block
returns (
relation char(31),
sysflag int)
as
declare cur cursor for
(select rdb$relation_name, rdb$system_flag from rdb$relations);
begin
open cur;
while (1=1) do
begin
fetch cur into relation, sysflag;
if (row_count = 0) then leave;
suspend;
end
close cur;
end
But this can also be done as follows:
execute block
returns (
relation char(31),
sysflag int)
as
begin
for select rdb$relation_name, rdb$system_flag
from rdb$relations
into relation, sysflag
do begin
suspend;
end
end
So why would I want to use one? Ultimately the above example doesn't even need execlute block as it's just a simple select statement. So I suppose the example is just too simple to showcase a benefit of this.
The documentation you link to (and its newer 2.5 counterpart) already includes most of the reasons why you would (or would not) use a cursor (emphasis mine):
If the cursor is needed only to walk the result set, it is nearly always easier and less error-prone to use a FOR SELECT statement with the AS CURSOR clause. Declared cursors must be explicitly opened, used to fetch data and closed. The context variable ROW_COUNT has to be checked after each fetch and, if its value is zero, the loop has to be terminated. A FOR SELECT statement checks it automatically.
Nevertheless, declared cursors provide a high level of control over sequential events and allow several cursors to be managed in parallel.
So in short, you should usually use FOR SELECT, except when you need access to multiple cursors at the same time, or maybe need some more complicated logic than just a simple loop. It also makes it possible to reuse the same cursor definition in multiple parts of your code (although that might indicate you need to break up your code in multiple stored procedures).
Presence of a tool does not mean that it should be used for everything.
As an aside, a FOR SELECT is also a cursor, except you don't have explicit control over it (it hides most of the ugliness ;)).
Another situation one might use cursors is when it's needed to update retrieved rows, and finding or repositioning (determining the exact WHERE clause) the rows could be an issue. In this case, you can open cursors with FOR UPDATE clause, and update (or delete) rows using WHERE CURRENT OF clause.
Related
I seem to approach thinking about sql the wrong way. I am always writing things that do not work.
For example I need a variable. So i think:
DECLARE #CNT AS INT
SET #CNT = COUNT(DISTINCT database.schema.table.column)
Why doesn't this work...? I am using a fully qualified reference here, so the value I want should be clear.
DECLARE #CNT AS INT
SET #CNT = (SELECT COUNT(DISTINCT database.schema.table.column) FROM column)
This works... but why do I have to use select?
Does everything have to be prefaced with one of the DDL or DML statements?
Secondly:
I can't debug line by line because a sql statement is treated all as one step. The only way I can debug is if I select the innermost sub-query and run that, then include next outer sub query and run that, and so on and so forth.
Is there a locals window?
I've heard about set-based thinking rather than iterative thinking, I guess I am still iterative even for functional languages... the iteration is just from innermost parentheses to outermost parentheses, and applied to the whole set. but even here I run into trouble because I don't know which value in the set causes the error.
Sorry if this seems scatterbrained... I guess that just kinda reflects how I feel about it. I don't know how to architect a big stored procedure from lots of little components......Like in vba I can just call another sub-routine and make sure the variables I need are global.
tldr: Need the conceptual grounding / knowing what actually happens when I type something and hit F5
On Question #1, You need select because that's how SQL works. You've given it a name, but haven't told it what to do with that name (select it, update it, delete it?) Just saying the column name is not grammatically correct.
On #2, Yes, SQL is declarative, you're not telling it what to do, you're telling it what to return. It will retrieve the data in the order that is most efficient at that particular moment in time, Normally your sub-query will be the last thing to run, not the first.
Yes, you have to use SELECT in-order to fetch that data first and then assign it to variable. You can also do it like
DECLARE #CNT AS INT
SELECT #CNT = COUNT(DISTINCT `column`) FROM database.schema.table
I'm a beginner at PL/SQL, and during studying the course I saw CURSOR
and I want to know why we should use it, and when?
thank you very much
When you do a SELECT and it returns more than one row you can't save the rows in a variable, so you'll have to use a CURSOR. If you are familiar with programming a CURSOR is something like an Array.
So if you do a SELECT and save the results in a variable like in the code:
SELECT id INTO v_id FROM table;
and if more than one row is returned, you cant save the rows in the variable v_id, and a TOO_MANY_ROWS Exception will be thrown.
Reference: http://www.oracle.com/technetwork/issue-archive/2013/13-mar/o23plsql-1906474.html
Also, if you've seen Oracle's FOR ... IN (SELECT ...) ... LOOP ... END LOOP statement, that's using an implicit cursor.
The reason to use the explicit cursor method is that you can do more things with the cursor, such as BULK COLLECT which can greatly improve your processing performance in many, but not all, situations. That greater control (beyond just doing BULK COLLECT) is helpful as you develop more-elaborate processes.
Good luck on your journey into Oracle. I've been using it for 14 years and am a big fan.
I'm working to update a stored procedure that current selects up to n rows, if the rows returned = n, does a select count without the limit, and then returns the original select and the total impacted rows.
Kinda like:
SELECT TOP (#rowsToReturn)
A.data1,
A.data2
FROM
mytable A
SET #maxRows = ##ROWCOUNT
IF #rowsToReturn = ##ROWCOUNT
BEGIN
SET #maxRows = (SELECT COUNT(1) FROM mytableA)
END
I'm wanting reduce this to a single select statement. Based on this question, COUNT(*) OVER() allows this, but it is put on every single row instead of in an output parameter. Maybe something like FOUND_ROWS() in MYSQL, such as a ##TOTALROWCOUNT or such.
As a side note, since the actual select has an order by, the data base will need to already traverse the entire set (to make sure that it gets the correct first n ordered records), so the database should already have this count somewhere.
As #MartinSmith mentioned in a comment on this question, there is no direct (i.e. pure T-SQL) way of getting the total numbers of rows that would be returned while at the same time limiting it. In the past I have done the method of:
dump the query to a temp table to grab ##ROWCOUNT (the total set)
use ROW_NUBMER() AS [ResultID] on the ordered results of the main query
SELECT TOP (n) FROM #Temp ORDER BY [ResultID] or something similar
Of course, the downside here is that you have the disk I/O cost of getting those records into the temp table. Put [tempdb] on SSD? :)
I have also experienced the "run COUNT(*) with the same rest of the query first, then run the regular SELECT" method (as advocated by #Blam), and it is not a "free" re-run of the query:
It is a full re-run in many cases. The issue is that when doing COUNT(*) (hence not returning any fields), the optimizer only needs to worry about indexes in terms of the JOIN, WHERE, GROUP BY, ORDER BY clauses. But when you want some actual data back, that could change the execution plan quite a bit, especially if the indexes used to get the COUNT(*) are not "covering" for the fields in the SELECT list.
The other issue is that even if the indexes are all the same and hence all of the data pages are still in cache, that just saves you from the physical reads. But you still have the logical reads.
I'm not saying this method doesn't work, but I think the method in the Question that only does the COUNT(*) conditionally is far less stressful on the system.
The method advocated by #Gordon is actually functionally very similar to the temp table method I described above: it dumps the full result set to [tempdb] (the INSERTED table is in [tempdb]) to get the full ##ROWCOUNT and then it gets a subset. On the downside, the INSTEAD OF TRIGGER method is:
a lot more work to set up (as in 10x - 20x more): you need a real table to represent each distinct result set, you need a trigger, the trigger needs to either be built dynamically, or get the number of rows to return from some config table, or I suppose it could get it from CONTEXT_INFO() or a temp table. Still, the whole process is quite a few steps and convoluted.
very inefficient: first it does the same amount of work dumping the full result set to a table (i.e. into the INSERTED table--which lives in [tempdb]) but then it does an additional step of selecting the desired subset of records (not really a problem as this should still be in the buffer pool) to go back into the real table. What's worse is that second step is actually double I/O as the operation is also represented in the transaction log for the database where that real table exists. But wait, there's more: what about the next run of the query? You need to clear out this real table. Whether via DELETE or TRUNCATE TABLE, it is another operation that shows up (the amount of representation based on which of those two operations is used) in the transaction log, plus is additional time spent on the additional operation. AND, let's not forget about the step that selects the subset out of INSERTED into the real table: it doesn't have the opportunity to use an index since you can't index the INSERTED and DELETED tables. Not that you always would want to add an index to the temp table, but sometimes it helps (depending on the situation) and you at least have that choice.
overly complicated: what happens when two processes need to run the query at the same time? If they are sharing the same real table to dump into and then select out of for the final output, then there needs to be another column added to distinguish between the SPIDs. It could be ##SPID. Or it could be a GUID created before the initial INSERT into the real table is called (so that it can be passed to the INSTEAD OF trigger via CONTEXT_INFO() or a temp table). Whatever the value is, it would then be used to do the DELETE operation once the final output has been selected. And if not obvious, this part influences a performance issue brought up in the prior bullet: TRUNCATE TABLE cannot be used as it clears the entire table, leaving DELETE FROM dbo.RealTable WHERE ProcessID = #WhateverID; as the only option.
Now, to be fair, it is possible to do the final SELECT from within the trigger itself. This would reduce some of the inefficiency as the data never makes it into the real table and then also never needs to be deleted. It also reduces the over-complication as there should be no need to separate the data by SPID. However, this is a very time-limited solution as the ability to return results from within a trigger is going bye-bye in the next release of SQL Server, so sayeth the MSDN page for the disallow results from triggers Server Configuration Option:
This feature will be removed in the next version of Microsoft SQL Server. Do not use this feature in new development work, and modify applications that currently use this feature as soon as possible. We recommend that you set this value to 1.
The only actual way to do:
the query one time
get a subset of rows
and still get the total row count of the full result set
is to use .Net. If the procs are being called from app code, please see "EDIT 2" at the bottom. If you want to be able to randomly run various stored procedures via ad hoc queries, then it would have to be a SQLCLR stored procedure so that it could be generic and work for any query as stored procedures can return dynamic result sets and functions cannot. The proc would need at least 3 parameters:
#QueryToExec NVARCHAR(MAX)
#RowsToReturn INT
#TotalRows INT OUTPUT
The idea is to use "Context Connection = true;" to make use of the internal / in-process connection. You then do these basic steps:
call ExecuteDataReader()
before you read any rows, do a GetSchemaTable()
from the SchemaTable you get the result set field names and datatypes
from the result set structure you construct a SqlDataRecord
with that SqlDataRecord you call SqlContext.Pipe.SendResultsStart(_DataRecord)
now you start calling Reader.Read()
for each row you call:
Reader.GetValues()
DataRecord.SetValues()
SqlContext.Pipe.SendResultRow(_DataRecord)
RowCounter++
Rather than doing the typical "while (Reader.Read())", you instead include the #RowsToReturn param: while(Reader.Read() && RowCounter < RowsToReturn.Value)
After that while loop, call SqlContext.Pipe.SendResultsEnd() to close the result set (the one that you are sending, not the one you are reading)
then do a second while loop that cycles through the rest of the result, but never gets any of the fields:
while (Reader.Read())
{
RowCounter++;
}
then just set TotalRows = RowCounter; which will pass back the number of rows for the full result set, even though you only returned the top n rows of it :)
Not sure how this performs against the temp table method, the dual call method, or even #M.Ali's method (which I have also tried and kinda like, but the question was specific to not sending the value as a column), but it should be fine and does accomplish the task as requested.
EDIT:
Even better! Another option (a variation on the above C# suggestion) is to use the ##ROWCOUNT from the T-SQL stored procedure, sent as an OUTPUT parameter, rather than cycling through the rest of the rows in the SqlDataReader. So the stored procedure would be similar to:
CREATE PROCEDURE SchemaName.ProcName
(
#Param1 INT,
#Param2 VARCHAR(05),
#RowCount INT OUTPUT = -1 -- default so it doesn't have to be passed in
)
AS
SET NOCOUNT ON;
{any ol' query}
SET #RowCount = ##ROWCOUNT;
Then, in the app code, create a new SqlParameter, Direction = Output, for "#RowCount". The numbered steps above stay the same, except the last two (10 and 11), which change to:
Instead of the 2nd while loop, just call Reader.Close()
Instead of using the RowCounter variable, set TotalRows = (int)RowCountOutputParam.Value;
I have tried this and it does work. But so far I have not had time to test the performance against the other methods.
EDIT 2:
If the T-SQL stored procs are being called from the app layer (i.e. no need for ad hoc execution) then this is actually a much simpler variation of the above C# methods. In this case you don't need to worry about the SqlDataRecord or the SqlContext.Pipe methods. Assuming you already have a SqlDataReader set up to pull back the results, you just need to:
Make sure the T-SQL stored proc has a #RowCount INT OUTPUT = -1 parameter
Make sure to SET #RowCount = ##ROWCOUNT; immediately after the query
Register the OUTPUT param as a SqlParameter having Direction = Output
Use a loop similar to: while(Reader.Read() && RowCounter < RowsToReturn) so that you can stop retrieving results once you have pulled back the desired amount.
Remember to not limit the result in the stored proc (i.e. no TOP (n))
At that point, just like what was mentioned in the first "EDIT" above, just close the SqlDataReader and grab the .Value of the OUTPUT param :).
How about this....
DECLARE #N INT = 10
;WITH CTE AS
(
SELECT
A.data1,
A.data2
FROM mytable A
)
SELECT TOP (#N) * , (SELECT COUNT(*) FROM CTE) Total_Rows
FROM CTE
The last column will be populated with the total number of rows it would have returned without the TOP Clause.
The issue with your requirement is, you are expecting a SINGLE select statement to return a table and also a scalar value. which is not possible.
A Single select statement will return a table or a scalar value. OR you can have two separate selects one returning a Scalar value and other returning a scalar. Choice is yours :)
Just because you think TSQL should have a row count because of a sort doe not mean it does. And if it does it does it is not currently sharing it with the outside world.
What you are missing is this is very efficient
select count(*)
from ...
where ...
select top x
from ...
where ...
order by ...
With the count(*) unless the query is just plain ugly those indexes should be in memory.
It has to perform a count to sort based on what?
Did you actually evaluate any query plans?
If TSQL has to perform a sort then explain the following.
Why is the count(*) 100% of the cost when the second had to do a count anyway?
Just where in that second query plan is there a free opportunity to count?
Why are those query plans so different if they both need to count?
I think there is an arcane way to do what you want. It involves triggers and non-temporary tables. And, I should mention, although I have implemented each piece (for different purposes), I have never put them together for this purpose.
The idea starts with this Stack Overflow question. According to this source, ##ROWCOUNT counts the number of attempted inserts, even when they don't really happen. Now, I must admit that a perusal of available documentation doesn't seem to touch on this topic, so this may or may not be "correct" behavior. This method is relying on this "problem".
So, you could do what you want by:
Creating a new table for the output -- but not a table variable or a temporary table.
Creating an "instead of" trigger that prevents more than #maxRows from going into the table.
Select the query results into the table.
Read ##ROWCOUNT after the select.
Note that you can create the table and trigger using dynamic SQL. You could also create it once, and have the trigger read the #maxRows value from some sort of parameter table. As mentioned before, this needs to be a real table that supports triggers.
I know a set based solution is ideal and generally superior to a cursor. So please. Save your and mine time by abstaining from answers "don't use cursor, use set based ops". I am asking this because my Googling has not given any answers and the knowledge probably comes from experience:
1) FETCH NEXT FROM vs FETCH FROM When i open up a cursor (fast_forward/static), is there a difference between using 'fetch next from' and 'fetch next' inside the while loop? In performance, order of records accessed etc.
2) ROW_NUMBER + SELECT/WHILE vs STATIC CURSOR As I understand it, a static cursor creates a temp table with the data selected and goes over this temp table. So, is there any reason to use select row_number() ..., ... from ... into ... and iterate over it with a index variable and select * from #tmp table where RowNumber = #IndexVar?
3) FAST_FORWARD - can it break down? if i have a fast_forward local cursor, and inside this cursor insert/update operations are performed on tables the cursor selects from, are there any issues? (possible cycles etc?)
4) PLAN FORCING Is there a way to force a fast_forward cursor to use static/dynamic plan?
Thank you very much for your answers
PS: For those of you really curious, yes, the problem could be rewritten into a set-based approach, but due to some decisions from higher-up, new rows created in the primary table have to be created/inserted using a stored procedure.
is there a difference between using 'fetch next from' and 'fetch next' inside the while loop
No - NEXT is the default if no option is specified
So, is there any reason to use select row_number() ..., ... from ... into ... and iterate over it with a index variable and select * from #tmp table where RowNumber = #IndexVar?
I would presume that a static cursor has optimizations that would perform faster that you creating a temp table and searching for a specific row in each iteration, but I'd have to try it.
if i have a fast_forward local cursor, and inside this cursor insert/update operations are performed on tables the cursor selects from, are there any issues? (possible cycles etc?)
I'm not sure what you mean by "cycles" - if the underlying data changes the cursor will change as well (unless you declared it as STATIC)
Is there a way to force a fast_forward cursor to use static/dynamic plan?
I've never tried it, but you could try using OPTION( USE PLAN ...) in your SELECT when you define the cursor. I can't think of a reason why it wouldn't work.
i am getting mad about this and google is clueless
Google just reports what people put on the internet... Why are you mad? What problem are you trying to solve?
NEXT is the default fetch option, so FETCH = FETCH NEXT
Static/Dynamic refers to whether or not changes made by the cursor result set are reflected, ie if you insert new values into the static cursor result set it will not loop through those, it's whatever results are there when the cursor is declared/opened, and that's it, I believe your understanding is correct.
I'm not sure.
I'm not sure what you mean here, you can indicate dynamic or static when you declare your cursor.
I want to know the difference between these two statements. Is one 'better' than the other ?
DECLARE
myvar varchar2(50);
BEGIN
SELECT fieldone into myvar FROM tbl_one WHERE id = 1;
END;
AND
DECLARE
CURSOR L1 IS
SELECT fieldone FROM tbl_one WHERE id = 1;
BEGIN
OPEN L1;
FETCH L1 INTO myvar;
CLOSE L1;
END;
The first will raise an exception if there are no rows returned or if more than one row is returned. If you don't handle the exception, that gets thrown back to the calling routine or client software. This is known as an implicit cursor.
The second would fail silently. If no rows are returned, then myvar will have a null value (though its preferable if you assume it is undefined). If more than one row would be returned, then only the value from the first row is stored. Without an ORDER BY, which value is 'first' is undefined. This is known as an explicit cursor.
So the question is really, what do YOU want to happen in the event of a no data found or too many rows situation. If you are certain that will never happen, or don't know how to handle it, then go with option 1.
If you do expect a no data found situation only, then go with the implicit cursor but add an exception handler.
If you expect multiple rows, then either the implicit cursor with an exception handler, or a BULK SELECT or CURSOR LOOP if you actually need to process the multiple rows.
If you are going to select multiple fields, it can be useful to define an explicit cursor and use a %TYPE declaration to declare all the necessary variables.
From a performance point of view, there's no difference.
From a maintainablilty point of view, some people like their SELECT 'in-line' with their code (so prefer the implicit cursor). I prefer mine 'out of the way', especially if there is a big column list, so I like explicit cursors.
I don't know what question you're asking, but here goes.
Should you use PL/SQL like this?
declare
myvar varchar2(50);
begin
select fieldone
into myvar
from tbl_one;
end;
/
Well, you can if and only if you know that the select statement can return exactly one row; alternatively, you need error handling for the TOO_MANY_ROWS and NO_DATA_FOUND exceptions which would be raised otherwise.
When using explicit cursors (i.e., the CURSOR keyword), there are several operations against it which control its behavior.
declare
myvar varchar2(50);
CURSOR L1 IS
SELECT fieldone FROM tbl_one ;
begin
OPEN L1;
FETCH L1 into myvar;
CLOSE L1;
end;
/
CURSOR L1... is the cursor's declaration. It's nothing more than binding the static SQL statement, and all the PL/SQL engine does is check that the SQL is syntactically and contextually valid - are there missing clauses? Can this user SELECT from this table?
OPEN L1 opens the cursor, establishing the exact point in the history of the system which the results will reflect. Any subsequent FETCHes against that cursor will reflect the data as of that precise point.
FETCH L1... actually returns the first/next row of that result set, whatever it is, into the variables you've specified. It could be a record declared, or it could be list of variables.
CLOSE L1... frees any resources your cursor has open; for example, insert/update/delete operations that affect the records generate undo that your user session has a declared read interest in, so that undo can't be freed or reused until you've closed your cursor.
Generally, the less code you write, the more robust your solution is. This is why we don't favor assembler level languages anymore.
This idea applied to cursors, has been eloquently criticized by Joe Celko in one of his books:
Cursor is a way to transform query result set into a stream that can be processed in a host 3GL language. Cursors are not compatible between different RDBMS vendors, and generally work slower than declarative SQL queries. Then, why bother? Mainly because of ignorance of database fundamentals and old habits. Here is detailed analogy:
ALLOCATE = turn tape recorder power on, assign channel
DECLARE CURSOR FOR ... = mount the tape and declare file
OPEN = open the file
FETCH INTO = read in the program records one by one, move the head
CLOSE = close file, dismount tape from the recorder
DEALLOCATE = free tape recorder channel, power off
This quote reads especially funny in Russian translation edition of his "SQL Programming Style" where the "tape recorder" sounds like "audio tape deck". And the suggestion that somebody still operates a tape deck in the world infested by iPods, iPhones and alike is hilarious.
Then, Joe goes through an anecdotal case of a newbie programmer who arranged three cursors to work together to iterate through master-detail data that have chosen conforming records and performed update. Eventually 250 lines of code were thrown away in favor of a single SQL update statement.
Personally, I'd got for the first version whenever possible. Keep it simple. One statement instead of five, more readable.
In Oracle (unlike SQL Server) all SELECTs are cursors - all declaring a cursor does is get you a handle that you can then use to manipulate it. The execution plan will be identical in both of your cases.