I am writing a PL/SQL function that processes table rows individually. I pass it a key. What is the fastest way to check whether or not that row has been processed, and if so ignore it? It may sound stupid but please assume that it always tries to process all the rows in the table (mainly because it does other things too).
One solution I had was to create a flag column on that table(fastest I can think of), another was to insert a record into another table and check if the row is not in that table (probably slower).
Assuming you need to be using a PL/SQL function, you should only pass into it the rowset that it needs to handle. That means using plain SQL to select the rows from the table you need and pass that to the function. In any case though, you should look very carefully at what you're doing whenever you end up having to use a cursor in a database environment, because that's not really what databases are optimized for.
Related
Thought this would be a good place to ask for some "brainstorming." Apologies if it's a little broad/off subject.
I was wondering if anyone here had any ideas on how to approach the following problem:
First assume that I have a select statement stored somewhere as an object (this can be the tree form of the query). For example (for simplicity):
SELECT A, B FROM table_A WHERE A > 10;
It's easy to determine the below would change the result of the above query:
INSERT INTO table_A (A,B) VALUES (12,15);
But, given any possible Insert/Update/Whatever statement, as well as any possible starting Select (but we know the Selects and can analyze them all day) I'd like to determine if it would affect the result of the Select Statement.
It's fine to assume that there won't be any "outside" queries, and that we know about all the queries being sent to the DB. It is also assumed we know the DB schema.
No, this isn't for homework. Just a brain teaser I've been thinking about and started to get stuck on (obviously, SQL can get very complicated.)
Based on the reply to the comment, I'd say that without additional criteria, this ranges between very hard and impossible.
Very hard (leastways, it would be for me) because you'd have to write something to parse and interpret your SQL statements into a workable frame of reference for your goals. Doable, but can it be worth the effort?
Impossible because some queries transcend phrases like "Byzantinely complex". (Think nested queries, correlated subqueries, views, common table expressions, triggers, outer joins, and who knows what all.) Without setting criteria such as "no subqueries, no views or triggers, no more than X joins" and so forth, the problem becomes open-ended enough to warrant an NP Complete answer.
My first thought would be to put a trigger on table_A, where if any of the columns you're affecting (col A in this case) changes to meet (or no longer meet) the condition (> 10 here), then the trigger records that an "affecting" change has taken place.
E.g. have another little table to record a "last update timestamp", which the trigger could pop a getdate() into when it detects such a change.
Then, you could check that table to see if the timestamp has changed since the last time you ran the select query - if it has, then you know you need to re-run it, if it hasn't, then you know the results would be the same.
The table could hold many such timestamps (one per row, perhaps with the table/trigger name as a key value in another column) to service many such triggers.
Advantage? Being done in a trigger on the table means no risk of a change that could affect the select statement being missed.
Disadvantage? I guess depending on how your select statements come into existence, you might have an undesirable/unmanageable overhead in creating the trigger(s).
I have 2 tables: A temporary table with raw data. Rows in it may be repeating (more then 1 time). The second is the target table with actual data (every row is unique).
I'm transfering rows using a cursor. Inside the cursor I use a MERGE statement. How can I print to the console using DBMS_OUTPUT.PUT_LINE which rows are updated and which are deleted?
According to the official documentation there is no such feature for this statement.
Are there any workaround?
I don't understand why you would want to do this. The output of dbms_output requires someone to be there to look at it. Not only that it requires someone to look through all of the output otherwise it's pointless. If there are more than say 20 rows then no one will be bothered to do so. If no one looks through all the output to verify but you need to actually log it then you are actively harming yourself by doing it this way.
If you really need to log which rows are updated or deleted there are a couple of options; both involve performance hits though.
You could switch to a BULK COLLECT, which enables you to create a cursor with the ROWID of the temporary table. You BULK COLLECT a JOIN of your two tables into this. Update / delete from the target table based on rowid and according to your business logic then you update the temporary table with a flag of some kind to indicate the operation performed.
You create a trigger on your target table which logs what's happening to another table.
In reality unless it is important that the number of updates / deletes is known then you should not do anything. Create your MERGE statement in a manner that ensures that it errors if anything goes wrong and use the error logging clause to log any errors that you receive. Those are more likely to be the things you should be paying attention to.
Previous posters already said that this approach is suspicious, both because of the cursor/loop and the output log for review.
On SQL Server, there is an OUTPUT clause in the MERGE statement that allows you to insert a row in another table with the $action taken (insert,update,delete) and any columns from the inserted or deleted/overwritten data you want. This lets you summarize exactly as you asked.
The equivalent Oracle RETURNING clause may not work for MERGE but does for UPDATE and DELETE.
From the MSDN docs for create function:
User-defined functions cannot be used to perform actions that modify the database state.
My question is simply - why?
Yes, a UDF that modifies data may have potentially unwanted side-effects.
Yes, there is overhead involved if a UDF is called thousands of times.
But that is the whole point of design and testing - to ensure that such issues are ironed out before deployment. So why do DB vendors insist on imposing these artificial limitations on developers? What is the point of a language construct that can essentially only be used as a wrapper for select statements?
The reason for this question is as follows: I am writing a function to return a GUID for a certain unique integer ID. If a GUID is already allocated for that ID I simply return it; otherwise I want to generate a new GUID, store that into a table, and return the newly-generated GUID. (Yes, this sounds long-winded and possibly crazy, but when you're sending data to another dev company who believes their design was handed down by God and cannot be improved upon, it's easier just to smile and nod and do what they ask).
I know that I can use a stored procedure with an output parameter to achieve the same result, but then I have to declare a new variable just to hold the result of the sproc. Not only that, I then have to convert my simple select into a while loop that inserts into a temporary table, and call the sproc for every iteration of that loop.
It's usually best to think of the available tools as a spectrum, from Views, through UDFs, out to Stored Procedures. At the one end (Views) you have a lot of restrictions, but this means the optimizer can actually "see through" the code and make intelligent choices. At the other end (Stored Procedures), you've got lots of flexibility, but because you have such freedom, you lose some abilities (e.g. because you can return multiple result sets from a stored proc, you lose the ability to "compose" it as part of a larger query).
UDFs sit in a middle ground - you can do more than you can do in a view (multiple statements, for example), but you don't have as much flexibility as a stored proc. By giving up this freedom, it allows the outputs to be composed as part of a larger query. By not having side effects, you guarantee that, for example, it doesn't matter in which row order the UDF is applied in. If you could have side effects, the optimizer might have to give an ordering guarantee.
I understand your issue, I think, but taking this from your comment:
I want to do something like select my_udf(my_variable) from my_table, where my_udf either selects or creates the value it returns
So you want a select that (potentially) modifies data. Can you look at that sentence on its own and tell me that that reads perfectly OK? - I certainly can't.
Reading your description of what you actually need to do:
I am writing a function to return a
GUID for a certain unique integer ID.
If a GUID is already allocated for
that ID I simply return it; otherwise
I want to generate a new GUID, store
that into a table, and return the
newly-generated GUID.
I know that I can use a stored
procedure with an output parameter to
achieve the same result, but then I
have to declare a new variable just to
hold the result of the sproc. Not only
that, I then have to convert my simple
select into a while loop that inserts
into a temporary table, and call the
sproc for every iteration of that
loop.
from that last sentence it sounds like you have to process many rows at once, so how about a single INSERT that inserts the GUIDs for those IDs that don't already have them, followed by a single SELECT that returns all the GUIDs that (now) exist?
Sometimes if you cannot implement the solution you came up with, it may be an indication that your solution is not optimal.
Using a statement like this
INSERT INTO IntGuids(IntValue, GuidValue)
SELECT MyIntValues.IntValue, NEWID()
FROM MyIntValues
LEFT OUTER JOIN IntGuids ON MyIntValues.IntValue = IntGuids.IntValue
WHERE IntGuids.IntValue IS NULL
creates all the GUIDs you need to have in 1 statement. No need to SELECT+INSERT for every single value.
I have a table (table_a) that, upon insert, needs to retrieve the next available id from the available_id field in another table (table_b) to use as the primary key in table_a, and then increment the available_id field in table_b by 1. While doing this via stored procedures is easy, I need to be able to have this occur on any insert into the table.
I know I need to use triggers, but I am unsure how to code this. Any advice?
Basically this is my dilema:
I need to ensure 2 different tables have unique id's throughout. What would be the best way to do this without using GUID's? (Some of this code cannot be controlled on our end and requires ints as id's).
My advice is DON'T! Use an identity field instead.
In the first place, inserts can have multiple records and so a trigger to properly do this would have to account for that making it rather tricky to write. It would have to be an instead of trigger which is also tricky as you wouldn't have one of the required values (I assume your ID field is required) in the initial insert. In the second place two inserts going on at the same time could try to pick the same number or could lock the second connection for a good bit of time if you are doing a large import of data in one connection.
You could use an Oracle-style sequence, described here, calling it either via a trigger or from your application (providing the resulting value to your insert routine):
http://www.sqlteam.com/article/custom-auto-generated-sequences-with-sql-server
He mentions these issues to consider:
• What if two processes attempt to add
a row to the table at the exact same
time? Can you ensure that the same
value is not generated for both
processes?
• There can be overhead querying the
existing data each time you'd like to
insert new data
• Unless this is implemented as a
trigger, this means that all inserts
to your data must always go through
the same stored procedure that
calculates these sequences. This
means that bulk imports, or moving
data from production to testing and
so on, might not be possible or might
be very inefficient.
• If it is implemented as a trigger,
will it work for a set-based
multi-row INSERT statement? If so,
how efficient will it be? This
function wouldn't work if called for
each row in a single set-based INSERT
-- each NextCustomerNumber() returned would be the same value.
Inside of a stored procedure, I populate a table of items (#Items). Just basic information about them. However for each item, I need to make sure I can sell them, and in order to do that I need to perform a whole lot of validation. In order to keep the stored procedure somewhat maintainable, I moved the logic into another stored procedure.
What would be the best way to call the stored procedure for each item in the temp table?
The way I have it now, I apply an identity column and then just do a while loop, executing the stored procedure for each row and inserting the validation result into a temporary table. (#Validation)
However now that logic has changed, and in between the creation of #Items and the execution of the loop, some records are deleted which screws up the while loop, since the Identity no longer equals the counter.
I could handle that by dropping the identity column and reapplying it before the while loop, but I was just wondering if there was a better way. Is there a way to get a specific row at an index if I apply an order by clause?
I know I could do a cursor, but those are a pain in the ass to me. Also performance is somewhat of a concern, would a fastforward readonly cursor be a better option than a while loop? The number of rows in the #Items table isn't that large, maybe 50 at most, but the stored procedure is going to be called quite frequently.
Turn your validation stored procedure into a user defined function that accepts an item id or the data columns needed to validate an item record
Create the temp table
Insert all your items
Write a delete query for the temp table that calls your new UDF in the WHERE clause.
I agree that if you can do it set-based then do it that way. Perhaps put the validation into a user-defined function instead of a sproc to enable that. Which may pave the way for you to be able to do it set-based.
e.g.
SELECT * FROM SomeTable WHERE dbo.fnIsValid(dataitem1, dataitem2....) = 1
However, I know this is may not be possible depending on your exact scenario, so...
Correction edit based on now understanding the IDENTITY/loop issue:
You can use ROW_NUMBER() in SQL 2005 to get the next row, doesn't matter if there are gaps in the IDENTITY field as this will assign a row number to each record ordered by what you tell it:
-- Gets next record
SELECT * FROM
(
SELECT ROW_NUMBER() OVER(ORDER BY IDField ASC) AS RowNo, *
FROM #temptable
) s
WHERE s.RowNo = #Counter
Does this kind of business logic really has to be in database?
I don't know much about your scenario, but maybe it would be best to move that decision you're trying to model with SPs into the application?
So you might try to use a function instead of stored procedure for that logic, and include the result of this function as a column in your temporary table? Would that work for you? Or if you need the data in realtime every time you use it later, then function returning 0/1 values, included in select list, could be a good bet anyway
If it's possible to rewrite your stored procedure logic using a query, i. e. a set-based approach?
You should try this first.