How to discard / ignore one of a stored procedure's many return values - sql

I have a stored procedure that returns 2 values.
In another procedure, I call this (edit: NOT selectable) procedure but only need one of the two returned values.
Is there a way to discard the other value? I'm wondering what is a good practice, and hoping for a small performance gain.
Here is how I call the procedure without error:
CREATE or ALTER procedure my_proc1
as
declare variable v_out1 integer default null;
declare variable v_out2 varchar(10) default null;
begin
execute procedure my_proc2('my_param')
returning_values :v_out1, :v_out2;
end;
That is the only way I found to call this procedure without getting a -607 error 'unsuccessful metadata update request depth exceeded. (Recursive definition?)' whenever I use only one variable v_out1.
So my actual question is: can I avoid creating a v_out2 variable for nothing, as I will never use it (that value is only used in other procedures which also call my_proc2)?
Edit: the stored procedure my_proc2 is actually not selectable. But I made it selectable after all.

Because your stored procedure is selectable, you should call it by SELECT statement, ie
select out1, out2 from my_proc2('my_param')
and in that case you can indeed omit some of the return value(s). However, I wouldn't expect noticeable performance gain as the logic inside the SP which calculates the omitted field is still executed.
If your procedure is not selectable, then creating a wrapper SP is the only way, but again, it woulnd't give any performance gain as the code which does the hard work inside the original SP is still executed.

The answer is made to use text formatting while demonstrating "race conditions" in the multithreading programming (which SQL is) when [ab]using out-of-transaction objects (SQL sequences aka Firebird Generators).
So, the "use case".
Initial condition: table is empty, generator=0.
You start two concurrent transactions, A and B. For ease of imagining you may think those transactions were started from concurrent connections made by two persons working with your program on two networked computers. Though actually it does not matter much, if you open them transactions from one same connection - the scenario would not change a bit. Just for the ease of imagining.
The Tx.A issues UPDATE-OR-INSERT which inserts new row into the table. Doing so it up-ticks the generator. The transaction is not committed yet. Database condition: the table has one invisible (non-committed) row with auto_id=1, the generator = 1.
The Tx.B issues UPDATE-OR-INSERT too which inserts yet another row into the table. Doing so it also up-ticks the generator. The transaction maybe commits now, or maybe later, irrelevant. Database condition: the table has two rows (one or both are invisible (non-committed)) with auto_id=1 and auto_id=2, the generator = 2.
The Tx.A meets some error, throws the exception, DOWNTICKS the generator and rolls back. Database condition: the table has one row with auto_id=2 the generator = 1.
If Tx.B was not committed before, it is committed now. (this "if" just to demonstrate that it does not matter when other transactions would be committed, earlier or later, it only matters that Tx.A downticks the generator after any other transaction upticked it)
So, the final database condition: the table has one committed=visible row with auto_id=2 and the generator = 1.
Any next attempt to add yet one more row would try to up the generator 1+1=2 and then fail to insert new row with PK violation, then it would down the generator to 1 to recreate the faulty condition outlined above.
Your database stuck and without direct intervention by DB Administrator can not have data added further.
The very idea of rolling back the generator is defeating all intentions generators were created for and all expectations about generators behavior that the database and connection libraries and other programmers have.
You just placed a trap on the highway. It is only a matter of time until someone will be caught into it.
Even if you would continue guarding this hack by other hacks for now - wasting a lot of time and attention to do that scrupulously and pervasively - still one unlucky day in the future there would be another programmer, or even you would forget this gory details - and you would start using the generator in standard intended way - and would run into the trap.
Generators were not made to be backtracked during normal work.
existence of primary key is checked in the procedure before doing anything
Yep, that is the first reaction when multithreading programmer meets his first race condition. Let's just add more prior checks.
First few checks indeed can decrease probability of a clash, but it never can alleviate it completely. And the more use your program would see, the more transactions would get opened by more and more concurrent and active users - it is only a matter of time until this somewhat lowered probability would turn out still too much.
Think about it, SQL is about transactions, yet they had to invent and introduce explicitly out-of-transactions device Generator/Sequence is. If there was reliable solution without them - it would be just used instead of creating that so non-SQLish transaction boundary breaking tool.
When you say your SP "checks for PK violation" it is exactly the same as if you would drop the generator altogether and instead just issue "good old"
:new_id = ( select max(auto_id)+1 from MyTable );
By your description you actually do something like that, but in some indirect way. Something like
while exists( select * from MyTable where auto_id = gen_id(MyGen, +1))
do ;
:new_id = gen_id(MyGen, 0);
You may feel, that because you mentioned generators, you somehow overcame the cross-transaction invisibility problem. But you did not, because the very check "if PK was already taken" is done against in-transaction table.
That changes nothing, your two transactions Tx.A and Tx.B would not see each other's records, because they both did not committed yet. Now it only takes some unlucky Tx.C that would fail and downtick the generator to them collide on the same ID.
Or not, you do not even need Tx.C and downticking at all!
Here we bump into the multithreading idea about "atomic operations".
Let's look at it again.
while exists( select * from MyTable where auto_id = gen_id(MyGen, +1))
do ;
:new_id = gen_id(MyGen, 0);
In a single-threaded application that code is okay: you just keep running the generator up until the free slot, then you just query the value without changing it. "What could possibly go wrong?" But in multithreaded environment it is rooks waiting to be stepped over. Example:
Initial condition, table has 100 rows (auto_id goes from 1 to 100), the generator = 100.
Tx.A starts adding the row, upticks the generator in the while loop and exits the loop. It does not yet pass to the second line where local variable gets assigned. Not yet. The generator = 101, rows not added yet.
Tx.B starts adding the row, upticks the generator in the while loop and exits the loop. The generator = 102, rows not added yet.
Tx.A goes to the second line and reads gen_id(MyGen,0) into a variable for new row. While it was 101 out of the loop, it is 102 now!
Tx.B goes to the second line and reads gen_id(MyGen,0) and gets 102 too.
Tx.A and Tx.B both try to insert new row with auto_id=102
RACE CONDITIONS - both Tx.A and Tx.B try to commit their work. One of them succeeds, another fails. Which one? It is not predictable. A lucky one commits, an unlucky one fails.
The failed transaction downticks the generator.
Final condition: the table has 101 rows, the auto_id consistently goes from 1 to 100 and then skips to 102. The generator = 101, which his less than MAX(auto_id)
Now you might want to add more hacks, I mean more prior checks before actually inserting rows and committing. It will make mistakes yet less probable, right? Wrong. The more checks you do - the slower gets the code. The slower gets the code - the greater gets probability, that while one thread runs throw all them checks there happens another thread that interferes and alters the situation that was checked a moment ago.
The fundamental issue with multithreading is that any check is SEPARATE action. And between those actions the situation MAY change. Your procedure may check whatever it wants BEFORE actually inserting the row. It would not warrant much. Because when you finally gets at the row inserting statement, all the checks you did in the PAST are a matter of past. And the situation is potentially already altered. And warrants your checks were giving in the PAST only belong to that past, not to the moment at hands.
And even if you no more look for warranting sure thing, still adding every new check you can not even be sure if doing so you just decreased or increased probability of failure. Because multithreading is a bitch, it is flowing chaotically out of your control.
So, remember the KISS principle. Until proven otherwise - you most probably do not need SP2 at all, you only need one single UPDATE-OR-INSERT statement.
PS. There was a pretty fun game in my school days, it was called Pascal Robots. There are also C Robots I heard and probably implementation for other many languages. With Pascal Robots though came a number of already coded robots, demonstrating different strategies and approaches. Some of them were really thought out in very intrinsic details. And there was one robot which program was PRIMITIVE. It only had two loops: if you do not see an enemy - keep turning your radar around, if you do see an enemy - keep running to it and shooting at it. That was all. What could this idiot do against sophisticated robots having creative attack and defense strategies, flanking maneuvers, optimal distance to maintain by back and forth movements, escape tricks and more? Those sophisticated robots employed very extensive checks and very thought through hacks to be triggered by those checks. So... ...so that primitive idiot was second or maybe third best robot in the shipped set. there was only one or two smarties who could outwit it. With ALL the other robots this lean-and-fast idiot finished them before they could run through all their checks and hacks thrice. That is what multithreading does to programming. It was astonishing to watch those battles, which went so against out single-threaded intuition.

Related

Sequence number field with concurrent writes

Imagine a rail track, and my goal is to store every railcar that exists on that track. Each railcar has a position. Say there are 100 railcars on the track, so each railcar object would have a TrackPosition from 1-100.
That is essentially what we are doing right now, with a Track having child Railcars, and each Railcar has an integer TrackPosition.
When a new railcar is added, we simply take the # of cars in the track + 1 to save the position of the new car.
We are running into issues in a few different areas:
We would like to add cars concurrently using AWS Lambda. This presents a problem as two functions could hit the line of logic that calculates "total cars on track + 1" at the same time. When they go to save, both cars would have the same position. Locking that bit of code is not possible within AWS Lambda (as far as I can tell from what I've read). We've resolved this for the time being by setting the Lambdas to fire synchronously (concurrency set to 1), obviously not ideal for performance.
We would like to add a car into the middle of a track. This would involve taking any car with a greater position and incrementing them all. Not difficult to write some code to do this, but..
I'm wondering if I'm missing something fundamental within SQL that can achieve what I am after in a less error-prone way. The way I'm doing it seems naive. I've looked into Sequences, but I'm not sure if they would solve my concurrency issue.
Any insight would be greatly appreciated. We are using Entity Framework Core 2 with SQL Server.

Negamax: what to do with "partial" results after canceling a search?

I'm implementing negamax with alpha/beta transposition table based on the pseudo code here, with roughly this algorithm:
NegaMax():
1. Transposition Table lookup
2. Loop through moves
2a. **Bail if I'm out of time**
2b. Make move, call -NegaMax, undo move
2c. Update bestvalue, alpha/beta but if appropriate
3. Transposition table store/update
4. Return bestvalue
I'm also using iterative deepening, calling NegaMax with progressively higher depths.
My question is: when I determine I've run out of time (2a. in the beginning of move loop) what is the right thing to do? Do I bail immediately (not updating the transposition table) or do I just break the loop (saving whatever partial work I've done)?
Currently, I return null at that point, signifying that the search was canceled before "completing" that node (whether by trying every move or the alpha/beta cut). The null gets propagated up and up the stack, and each node on the way up bails by return, so step 3 never runs.
Essentially, I only store values in the TT if the node "completed". The scenario I keep seeing with the iterative deepening:
I get through depths 1-5 really quick, so the TT has a depth = 5, type = Exact entry.
The depth = 6 search is taking a long time, so I bail.
I ultimately return the best move in the transposition table, which is the move I found during the depth = 5 search. The problem is, if I start a new depth = 6 search, it feels like I'm starting it from scratch. However, if I save whatever partial results I found, I worry that I'll have corrupted my TT, potentially by overwriting the completed depth = 5 entry with an incomplete depth = 6 entry.
If the search wasn't completed, the score is inaccurate and should likely not be added to the TT. If you have a best move from the previous ply and it is still best and the score hasn't dropped significantly, you might play that.
On the other hand, if at depth 6 you discover that the opponent has a mate in 3 (oops!) or could win your queen, you might have to spend even more time to try to resolve that.
That would leave you with less time for the remaining moves (if any...), but it might be better to be slightly short on time than to get mated with plenty of time remaining. :-)

Write to the same row at the same time without locking?

What I need to do is to write to the same row from two different sources (procedures/methods/services).
The first call that comes in creates the row, and the next one just updates it.
This needs to happen without any locking taking place. And if possible I would like to be able to call either source just once (not repeatedly by dealing with locking errors)
Here is kinda what I have now in a third procedure that the others call and just inserts a row (only inserts into the xyz) or returns true if there is a row.
That way it´s just fast and unlikely that both calls arrive at the same time.
IF EXISTS(SELECT * FROM [dbo].[Wait] WHERE xyx= #xyz)
BEGIN
-- The row exists because the other datasource
-- has allready inserted a row with the same xyz
-- UPDATE THE ROW WITH DATA COMING IN
END
ELSE
BEGIN
-- No row with value xyz exists so we INSERT it with
-- the extra data.
END
I know it does´t guaranty no lock. But in my case it´s actually unlikely that both arrive at the same time and even if they would it´s user controlled so they will get an error and will just try again. BUT I wan´t to solve this.
I have been seeing Row Versioning popping up but I´m not sure if that helps or how I should use it.
Have a look at Michael J Swarts' article Mythbusting: Concurrent Update/Insert Solutions. This will show you all possible do's and don'ts. Including the fact that merge actually doesn't do a great job in solving concurrency issues.

Memory efficiency in If statements

I'm thinking more about how much system memory my programs will use nowadays. I'm currently doing A level Computing at college and I know that in most programs the difference will be negligible but I'm wondering if the following actually makes any difference, in any language.
Say I wanted to output "True" or "False" depending on whether a condition is true. Personally, I prefer to do something like this:
Dim result As String
If condition Then
Result = "True"
Else
Result = "False"
EndIf
Console.WriteLine(result)
However, I'm wondering if the following would consume less memory, etc.:
If condition Then
Console.WriteLine("True")
Else
Console.WriteLine("False")
EndIf
Obviously this is a very much simplified example and in most of my cases there is much more to be outputted, and I realise that in most commercial programs these kind of statements are rare, but hopefully you get the principle.
I'm focusing on VB.NET here because that is the language used for the course, but really I would be interested to know how this differs in different programming languages.
The main issue making if's fast or slow is predictability.
Modern CPU's (anything after 2000) use a mechanism called branch prediction.
Read the above link first, then read on below...
Which is faster?
The if statement constitutes a branch, because the CPU needs to decide whether to follow or skip the if part.
If it guesses the branch correctly the jump will execute in 0 or 1 cycle (1 nanosecond on a 1Ghz computer).
If it does not guess the branch correctly the jump will take 50 cycles (give or take) (1/200th of a microsecord).
Therefore to even feel these differences as a human, you'd need to execute the if statement many millions of times.
The two statements above are likely to execute in exactly the same amount of time, because:
assigning a value to a variable takes negligible time; on average less than a single cpu cycle on a multiscalar CPU*.
calling a function with a constant parameter requires the use of an invisible temporary variable; so in all likelihood code A compiles to almost the exact same object code as code B.
*) All current CPU's are multiscalar.
Which consumes less memory
As stated above, both versions need to put the boolean into a variable.
Version A uses an explicit one, declared by you; version B uses an implicit one declared by the compiler.
However version A is guaranteed to only have one call to the function WriteLine.
Whilst version B may (or may not) have two calls to the function WriteLine.
If the optimizer in the compiler is good, code B will be transformed into code A, if it's not it will remain with the redundant calls.
How bad is the waste
The call takes about 10 bytes for the assignment of the string (Unicode 2 bytes per char).
But so does the other version, so that's the same.
That leaves 5 bytes for a call. Plus maybe a few extra bytes to set up a stackframe.
So lets say due to your totally horrible coding you have now wasted 10 bytes.
Not much to worry about.
From a maintainability point of view
Computer code is written for humans, not machines.
So from that point of view code A is clearly superior.
Imagine not choosing between 2 options -true or false- but 20.
You only call the function once.
If you decide to change the WriteLine for another function you only have to change it in one place, not two or 20.
How to speed this up?
With 2 values it's pretty much impossible, but if you had 20 values you could use a lookup table.
Obviously that optimization is not worth it unless code gets executed many times.
If you need to know the precise amount of memory the instructions are going to take, you can use ildasm on your code, and see for yourself. However, the amount of memory consumed by your code is much less relevant today, when the memory is so cheap and abundant, and compilers are smart enough to see common patterns and reduce the amount of code that they generate.
A much greater concern is readability of your code: if a complex chain of conditions always leads to printing a conditionally set result, your first code block expresses this idea in a cleaner way than the second one does. Everything else being equal, you should prefer whatever form of code that you find the most readable, and let the compiler worry about optimization.
P.S. It goes without saying that Console.WriteLine(condition) would produce the same result, but that is of course not the point of your question.

Can I avoid locking rows for unnecessary amounts of time [in Django]?

Take the following code for example (ignore the lack of usefulness in its functionality, as it's just a simple example to include the things I need):
#transaction.commit_on_success
def test_for_success(username)
person = Person.objects.select_for_update().get(username=username)
response = urllib2.urlopen(URL_TO_SOME_SLOW_API, some_data)
if '<YES>' in response.read():
person.successes += 1
person.save()
My question pertaining the example has to do with when the queries hit the database. Clearly the first query will lock the Person row, and then I'm calling a slow API, which could take 3 seconds to respond, causing the row to be locked for 3 seconds. Am I understanding this correctly, and in the case of slow API hits happening in my transaction, if I move the location of my queries so that a SELECT FOR UPDATE doesn't happen until after all the slow API requests, will this have the seemingly obvious effect of not locking my rows for seconds at a time (the case for select_for_update in my application is unavoidable)? Or, am I misunderstanding, and somehow none of the SQL actually hits the database until the end of the transaction?
Your assumptions about your code are correct. If you look at the select_for_update() docs, this action does lock those rows in the database until they are unlocked. This would in effect lock out for the duration of your urllib request.
If you were to move the database call into the conditional after the request, you are correct again that the database would be locked for a much shorter amount of time (though if that is called alot will still have some clients who block on the call due to contention).