How to nest stored procedures that use one stream in Snowflake? - sql

I have the following architecture:
Main stored procedure main_sproc
Nested stored procedure nested_sproc
The task we have is processing data from a stream in Snowflake but have to do it from a nested approach.
Thus this would be the code:
create or replace procedure nested_sproc()
returns varchar
language sql
as
begin
begin transaction;
insert into table1
select * from stream; -- it will be more complex than this and multiple statements
commit;
end;
create or replace procedure main_sproc()
returns varchar
language sql
as
begin
begin transaction;
insert into table1
select * from stream; -- it will be more complex than this and multiple statements
call nested_sproc();
commit;
end;
When I try to run call main_sproc() I notice that the first statements goes through but when it reaches call nested_sproc() the transaction is blocked. I guess it's because I have in both procedures a begin transaction and commit but without them I get an error stating that I need to define the scope of the transaction. I need to deploy this final procedure on a task that runs on a schedule but without having to merge the queries from both procedures and the ability to still process the current data inside the stream.
Is there a way of achieving this?

As per the comments by #NickW, you can't have two processes reading the same stream simultaneously as the running a DML command is altering that stream - so your outer SP is selecting from the stream and will lock it until the outer SP transaction completes. You'll need to find another approach to achieve your end result e.g. in the outer SP write the stream to a table and use that table in the inner SP. If you use a temporary table it should be available when either SP is running (as they will be part of the same session) and then will get automatically dropped when the session ends - as long as you don't need this data outside of the 2 SPs this might be more convenient as you have less maintenance.

Related

Oracle SQL script parallel execution

For my job I need to prepare two tables (CTAS) and then do some joins between them. For this job I created a script (run it in SQL Developer) which consequentially creates these two tables one after another. Since these two tables are not related I'd like to start creating them in parallel. Is it possible in SQL script to start two table creations (or two other scripts) in parallel and then proceed when both finish their jobs?
Here's one option.
I wouldn't really CTAS - I'd rather create both tables in advance, and then insert rows into them. Why? Because this approach uses stored procedures which - in order to perform DDL (which is CTAS) - require dynamic SQL. Not that it is impossible to do that; on the contrary, but it is way simpler NOT to use it.
I'd create yet another table (let's call it table_done) which contains only one row with two columns: table_1 and table_2 whose values can be 0 (meaning: data for that table is not ready) or 1 (data ready).
Furthermore, I'd create two stored procedures which look the same; the only difference is that each of them inserts rows into its own table:
create procedure p_insert_1 as
begin
-- remove old data
execute immediate 'truncate table table_1';
-- table_1 data not ready
update table_done set table_1 = 0;
-- prepare new data
insert into table_1 (...) select ...;
-- table_1 data ready
insert into table_done (table_1) values (1);
commit;
end;
The 3rd, "main" procedure, is the one you'd run manually. What would it do? Create two one-time database jobs that run immediately, each of them starting its own p_insert procedure so that they run in parallel. That procedure would then (in a loop) check whether both columns in table_done are set to 1 and - if so - continue execution.
create procedure p_main is
l_job_1 number;
l_job_2 number;
--
l_t1_done number;
l_t2_done number;
begin
dbms_job.submit(l_job_1, 'begin p_insert_1; end;');
dbms_job.submit(l_job_2, 'begin p_insert_2; end;');
loop
select table_1, table_2
into l_t1_done, l_t2_done
from table_done;
if l_t1_done = 1 and l_t2_done = 1 then
-- exit the loop
exit;
else
-- tables aren't ready yet; wait 60 seconds and try again
dbms_lock.sleep(60);
end if;
end loop;
-- process data prepared in table_1 and table_2
end;
That's just a simplified idea; I didn't test it myself so I apologize if there are any errors I made. Also,
instead of dbms_job, you could choose to use advanced dbms_scheduler
if you're on 18c (or later), use dbms_session.sleep instead of dbms_lock.sleep
and so forth
Use SQL parallelism instead of process concurrency. While the words parallelism and concurrency are colloquially interchangeable, in Oracle they have different meanings. Parallelism implies that the SQL engine handles all the coordination of breaking work into little pieces, running those pieces at the same time, and then re-assembling the results at the end. Concurrency implies that the user will create multiple sessions and handle the coordination manually.
For simply creating two tables, parallelism will probably be simpler and faster than concurrency. For parallelism, you may only need to create the table in parallel. (And you probably want to reset the parallelism back to none at the end.)
CREATE TABLE TABLE1 PARALLEL 2 AS SELECT ...;
ALTER TABLE TABLE1 NOPARALLEL;
The PARALLEL 2 option instructs Oracle to run two server processes at the same time while the SQL statement is running. You can easily increase that number, but don't go too high or you'll be stealing too many resources from other sessions.
DBMS_SCHEDULER and other concurrency mechanisms are powerful and useful, but I recommend avoiding them if possible. Running and monitoring scheduler jobs will likely be much more complicated than the preceding code. (Although you may still need to occasionally monitor the parallel SQL statement using a tool like OEM SQL Monitor Reports to ensure that the server is actually using the requested parallelism.)

Postgres: Save rows from temp table before rollback

I have a main procedure (p_proc_a) in which I create a temp table for logging (tmp_log). In the main procedure I call some other procedures (p_proc_b, p_proc_c). In each of these procedures I insert data into table tmp_log.
How do I save rows from tmp_log into a physical table (log) in case of exception before rollback?
create procedure p_proc_a
language plpgsql
as $body$
begin
create temp table tmp_log (log_message text) on commit drop;
call p_proc_b();
call p_proc_c();
insert into log (log_message)
select log_message from tmp_log;
exception
when others then
begin
get stacked diagnostics
v_message_text = message_text;
insert into log (log_message)
values(v_message_text);
end;
end;
$body$
What is a workround to save logs into a table and rollback changes from p_proc_b and p_proc_c?
That is not possible in PostgreSQL.
The typical workaround is to use dblink to connect to the database itself and write the logs via dblink.
I found three solutions to store data within a transaction (im my case - for debugging propose), and still be able to see that data after rollback-ing the transaction
I have a scenario where inside, I use following block, so it may not apply to your scenario
DO $$
BEGIN
...
ROLLBACK;
END;
$$;
Two first solutions are suggested to me in the Postgres slack, and the other I tried and found after talking with them, a way that worked in other db.
Solutions
1 - Using DBLink
I don't remember how it was done, but you import some libraries and then connect to another db, and use the other DB - which maybe can support to be this db - which seem to be not affected by transactions
2 - Using COPY command
Using the
COPY (SELECT ...) TO PROGRAM 'psql -c "COPY xyz FROM stdin"'
BTW I never used it, and it seems that it requires Super User(SU) permission in Unix. And god knows how it is used, or how it output data
3 - Using Sub-Transactions
In this way, you use a sub-transaction (which I'm not sure it it's correct, but it must be called Autonomous transactions) to commit the result you want to keep.
In my case the command looks like this:
I used a Temp Table, but it seems (I'm not sure) to work with an actual table as well
CREATE TEMP TABLE
IF NOT EXISTS myZone AS
SELECT * from public."Zone"
LIMIT 0;
DO $$
BEGIN
INSERT INTO public."Zone" (...)VALUES(...);
BEGIN
INSERT INTO myZone
SELECT * from public."Zone";
commit;
END;
Rollback;
END; $$;
SELECT * FROM myZone;
DROP TABLE myZone;
don't ask what is the purpose of doing this, I'm creating a test scenario, and I wished to track what I did until now. since this block did not support SELECT of DQL, I had to do something else, and I wanted a clean set of report, not raising errors
According to www.techtarget.com:
*Autonomous transactions allow a single transaction to be subdivided into multiple commit/rollback transactions, each of which
will be tracked for auditing purposes. When an autonomous transaction
is called, the original transaction (calling transaction) is
temporarily suspended.
(This text was indexed by google and existed on 2022-10-11, and the website was not opened due to an E-mail validation issue)
Also this name seems to be coming from Oracle, which this article can relate
EIDTED:
Removing solution 3 as it won't work
POSTGRES 11 Claim to support Autonomous Transactions but it's not what we may expect...
For this functionality Postgres introduced the SAVEPOINT:
SAVEPOINT <name of savepoint>;
<... CODE ...>
<RELEASE|ROLLBACK> SAVEPOINT <name of savepoint>;
Now the issue is:
If you use nested BEGIN, the COMMIT Inside the nested code can COMMIT everything, and the ROLLBACK in the outside block will do none (Not rolling back anything that happened before COMMIT of inner
If you use SAVEPOINT, it is only used to rollbacks part of the code, and even if you COMMIT it, the ROLLBACK in the outside block will rollback the SAVEPOINT too

Looping through a CURSOR in PL/PGSQL without locking the table

I have a simple PL/PGSQL block Postgres 9.5 that loops over records in a table and conditionally updates some of the records.
Here's a simplified example:
DO $$
DECLARE
-- Define a cursor to loop through records
my_cool_cursor CURSOR FOR
SELECT
u.id AS user_id,
u.name AS user_name,
u.email AS user_email
FROM users u
;
BEGIN
FOR record IN my_cool_cursor LOOP
-- Simplified example:
-- If user's first name is 'Anjali', set email to NULL
IF record.user_name = 'Anjali' THEN
BEGIN
UPDATE users SET email = NULL WHERE id = record.user_id;
END;
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql;
I'd like to execute this block directly against my database (from my app, via the console, etc...). I do not want to create a FUNCTION() or stored procedure to do this operation.
The Issue
The issue is that the CURSOR and LOOP create a table-level lock on my users table, since everything between the outer BEGIN...END runs in a transaction. This blocks any other pending queries against it. If users is sufficiently large, this locks it up for several seconds or even minutes.
What I tried
I tried to COMMIT after each UPDATE so that it clears the transaction and the lock periodically. I was surprised to see this error message:
ERROR: cannot begin/end transactions in PL/pgSQL
HINT: Use a BEGIN block with an EXCEPTION clause instead.
I'm not quite sure how this is done. Is it asking me to raise an EXCEPTION to force a COMMIT? I tried reading the documentation on Trapping Errors but it only mentions ROLLBACK, so I don't see any way to COMMIT.
How do I periodically COMMIT a transaction inside the LOOP above?
More generally, is my approach even correct? Is there a better way to loop through records without locking up the table?
1.
You cannot COMMIT within a PostgreSQL function or DO command at all (plpgsql or any other PL). The error message you reported is to the point (as far as Postgres 9.5 is concerned):
ERROR: cannot begin/end transactions in PL/pgSQL
A procedure could do that in Postgres 11 or later. See:
PostgreSQL cannot begin/end transactions in PL/pgSQL
In PostgreSQL, what is the difference between a “Stored Procedure” and other types of functions?
There are limited workarounds to achieve "autonomous transactions" in older versions:
How do I do large non-blocking updates in PostgreSQL?
Does Postgres support nested or autonomous transactions?
Do stored procedures run in database transaction in Postgres?
But you do not need any of this for the presented case.
2.
Use a simple UPDATE instead:
UPDATE users
SET email = NULL
WHERE user_name = 'Anjali'
AND email IS DISTINCT FROM NULL; -- optional improvement
Only locks the rows that are actually updated (with corner case exceptions). And since this is much faster than a CURSOR over the whole table, the lock is also very brief.
The added AND email IS DISTINCT FROM NULL avoids empty updates. Related:
Update a column of a table with a column of another table in PostgreSQL
How do I (or can I) SELECT DISTINCT on multiple columns?
It's rare that explicit cursors are useful in plpgsql functions.
If you want to avoid locking rows for a long time, you could also define a cursor WITH HOLD, for example using the DECLARE SQL statement.
Such cursors can be used across transaction boundaries, so you could COMMIT after a certain number of updates. The price you are paying is that the cursor has to be materialized on the database server.
Since you cannot use transaction statements in functions, you will either have to use a procedure or commit in your application code.

Stored procedure with multiple 'INSERT INTO Table_Variable EXECUTE stored_procedure' statements [duplicate]

I have three stored procedures Sp1, Sp2 and Sp3.
The first one (Sp1) will execute the second one (Sp2) and save returned data into #tempTB1 and the second one will execute the third one (Sp3) and save data into #tempTB2.
If I execute the Sp2 it will work and it will return me all my data from the Sp3, but the problem is in the Sp1, when I execute it it will display this error:
INSERT EXEC statement cannot be nested
I tried to change the place of execute Sp2 and it display me another error:
Cannot use the ROLLBACK statement
within an INSERT-EXEC statement.
This is a common issue when attempting to 'bubble' up data from a chain of stored procedures. A restriction in SQL Server is you can only have one INSERT-EXEC active at a time. I recommend looking at How to Share Data Between Stored Procedures which is a very thorough article on patterns to work around this type of problem.
For example a work around could be to turn Sp3 into a Table-valued function.
This is the only "simple" way to do this in SQL Server without some giant convoluted created function or executed sql string call, both of which are terrible solutions:
create a temp table
openrowset your stored procedure data into it
EXAMPLE:
INSERT INTO #YOUR_TEMP_TABLE
SELECT * FROM OPENROWSET ('SQLOLEDB','Server=(local);TRUSTED_CONNECTION=YES;','set fmtonly off EXEC [ServerName].dbo.[StoredProcedureName] 1,2,3')
Note: You MUST use 'set fmtonly off', AND you CANNOT add dynamic sql to this either inside the openrowset call, either for the string containing your stored procedure parameters or for the table name. Thats why you have to use a temp table rather than table variables, which would have been better, as it out performs temp table in most cases.
OK, encouraged by jimhark here is an example of the old single hash table approach: -
CREATE PROCEDURE SP3 as
BEGIN
SELECT 1, 'Data1'
UNION ALL
SELECT 2, 'Data2'
END
go
CREATE PROCEDURE SP2 as
BEGIN
if exists (select * from tempdb.dbo.sysobjects o where o.xtype in ('U') and o.id = object_id(N'tempdb..#tmp1'))
INSERT INTO #tmp1
EXEC SP3
else
EXEC SP3
END
go
CREATE PROCEDURE SP1 as
BEGIN
EXEC SP2
END
GO
/*
--I want some data back from SP3
-- Just run the SP1
EXEC SP1
*/
/*
--I want some data back from SP3 into a table to do something useful
--Try run this - get an error - can't nest Execs
if exists (select * from tempdb.dbo.sysobjects o where o.xtype in ('U') and o.id = object_id(N'tempdb..#tmp1'))
DROP TABLE #tmp1
CREATE TABLE #tmp1 (ID INT, Data VARCHAR(20))
INSERT INTO #tmp1
EXEC SP1
*/
/*
--I want some data back from SP3 into a table to do something useful
--However, if we run this single hash temp table it is in scope anyway so
--no need for the exec insert
if exists (select * from tempdb.dbo.sysobjects o where o.xtype in ('U') and o.id = object_id(N'tempdb..#tmp1'))
DROP TABLE #tmp1
CREATE TABLE #tmp1 (ID INT, Data VARCHAR(20))
EXEC SP1
SELECT * FROM #tmp1
*/
My work around for this problem has always been to use the principle that single hash temp tables are in scope to any called procs. So, I have an option switch in the proc parameters (default set to off). If this is switched on, the called proc will insert the results into the temp table created in the calling proc. I think in the past I have taken it a step further and put some code in the called proc to check if the single hash table exists in scope, if it does then insert the code, otherwise return the result set. Seems to work well - best way of passing large data sets between procs.
This trick works for me.
You don't have this problem on remote server, because on remote server, the last insert command waits for the result of previous command to execute. It's not the case on same server.
Profit that situation for a workaround.
If you have the right permission to create a Linked Server, do it.
Create the same server as linked server.
in SSMS, log into your server
go to "Server Object
Right Click on "Linked Servers", then "New Linked Server"
on the dialog, give any name of your linked server : eg: THISSERVER
server type is "Other data source"
Provider : Microsoft OLE DB Provider for SQL server
Data source: your IP, it can be also just a dot (.), because it's localhost
Go to the tab "Security" and choose the 3rd one "Be made using the login's current security context"
You can edit the server options (3rd tab) if you want
Press OK, your linked server is created
now your Sql command in the SP1 is
insert into #myTempTable
exec THISSERVER.MY_DATABASE_NAME.MY_SCHEMA.SP2
Believe me, it works even you have dynamic insert in SP2
I found a work around is to convert one of the prods into a table valued function. I realize that is not always possible, and introduces its own limitations. However, I have been able to always find at least one of the procedures a good candidate for this. I like this solution, because it doesn't introduce any "hacks" to the solution.
I encountered this issue when trying to import the results of a Stored Proc into a temp table, and that Stored Proc inserted into a temp table as part of its own operation. The issue being that SQL Server does not allow the same process to write to two different temp tables at the same time.
The accepted OPENROWSET answer works fine, but I needed to avoid using any Dynamic SQL or an external OLE provider in my process, so I went a different route.
One easy workaround I found was to change the temporary table in my stored procedure to a table variable. It works exactly the same as it did with a temp table, but no longer conflicts with my other temp table insert.
Just to head off the comment I know that a few of you are about to write, warning me off Table Variables as performance killers... All I can say to you is that in 2020 it pays dividends not to be afraid of Table Variables. If this was 2008 and my Database was hosted on a server with 16GB RAM and running off 5400RPM HDDs, I might agree with you. But it's 2020 and I have an SSD array as my primary storage and hundreds of gigs of RAM. I could load my entire company's database to a table variable and still have plenty of RAM to spare.
Table Variables are back on the menu!
I recommend to read this entire article. Below is the most relevant section of that article that addresses your question:
Rollback and Error Handling is Difficult
In my articles on Error and Transaction Handling in SQL Server, I suggest that you should always have an error handler like
BEGIN CATCH
IF ##trancount > 0 ROLLBACK TRANSACTION
EXEC error_handler_sp
RETURN 55555
END CATCH
The idea is that even if you do not start a transaction in the procedure, you should always include a ROLLBACK, because if you were not able to fulfil your contract, the transaction is not valid.
Unfortunately, this does not work well with INSERT-EXEC. If the called procedure executes a ROLLBACK statement, this happens:
Msg 3915, Level 16, State 0, Procedure SalesByStore, Line 9 Cannot use the ROLLBACK statement within an INSERT-EXEC statement.
The execution of the stored procedure is aborted. If there is no CATCH handler anywhere, the entire batch is aborted, and the transaction is rolled back. If the INSERT-EXEC is inside TRY-CATCH, that CATCH handler will fire, but the transaction is doomed, that is, you must roll it back. The net effect is that the rollback is achieved as requested, but the original error message that triggered the rollback is lost. That may seem like a small thing, but it makes troubleshooting much more difficult, because when you see this error, all you know is that something went wrong, but you don't know what.
I had the same issue and concern over duplicate code in two or more sprocs. I ended up adding an additional attribute for "mode". This allowed common code to exist inside one sproc and the mode directed flow and result set of the sproc.
what about just store the output to the static table ? Like
-- SubProcedure: subProcedureName
---------------------------------
-- Save the value
DELETE lastValue_subProcedureName
INSERT INTO lastValue_subProcedureName (Value)
SELECT #Value
-- Return the value
SELECT #Value
-- Procedure
--------------------------------------------
-- get last value of subProcedureName
SELECT Value FROM lastValue_subProcedureName
its not ideal, but its so simple and you don't need to rewrite everything.
UPDATE:
the previous solution does not work well with parallel queries (async and multiuser accessing) therefore now Iam using temp tables
-- A local temporary table created in a stored procedure is dropped automatically when the stored procedure is finished.
-- The table can be referenced by any nested stored procedures executed by the stored procedure that created the table.
-- The table cannot be referenced by the process that called the stored procedure that created the table.
IF OBJECT_ID('tempdb..#lastValue_spGetData') IS NULL
CREATE TABLE #lastValue_spGetData (Value INT)
-- trigger stored procedure with special silent parameter
EXEC dbo.spGetData 1 --silent mode parameter
nested spGetData stored procedure content
-- Save the output if temporary table exists.
IF OBJECT_ID('tempdb..#lastValue_spGetData') IS NOT NULL
BEGIN
DELETE #lastValue_spGetData
INSERT INTO #lastValue_spGetData(Value)
SELECT Col1 FROM dbo.Table1
END
-- stored procedure return
IF #silentMode = 0
SELECT Col1 FROM dbo.Table1
Declare an output cursor variable to the inner sp :
#c CURSOR VARYING OUTPUT
Then declare a cursor c to the select you want to return.
Then open the cursor.
Then set the reference:
DECLARE c CURSOR LOCAL FAST_FORWARD READ_ONLY FOR
SELECT ...
OPEN c
SET #c = c
DO NOT close or reallocate.
Now call the inner sp from the outer one supplying a cursor parameter like:
exec sp_abc a,b,c,, #cOUT OUTPUT
Once the inner sp executes, your #cOUT is ready to fetch. Loop and then close and deallocate.
If you are able to use other associated technologies such as C#, I suggest using the built in SQL command with Transaction parameter.
var sqlCommand = new SqlCommand(commandText, null, transaction);
I've created a simple Console App that demonstrates this ability which can be found here:
https://github.com/hecked12/SQL-Transaction-Using-C-Sharp
In short, C# allows you to overcome this limitation where you can inspect the output of each stored procedure and use that output however you like, for example you can feed it to another stored procedure. If the output is ok, you can commit the transaction, otherwise, you can revert the changes using rollback.
On SQL Server 2008 R2, I had a mismatch in table columns that caused the Rollback error. It went away when I fixed my sqlcmd table variable populated by the insert-exec statement to match that returned by the stored proc. It was missing org_code. In a windows cmd file, it loads result of stored procedure and selects it.
set SQLTXT= declare #resets as table (org_id nvarchar(9), org_code char(4), ^
tin(char9), old_strt_dt char(10), strt_dt char(10)); ^
insert #resets exec rsp_reset; ^
select * from #resets;
sqlcmd -U user -P pass -d database -S server -Q "%SQLTXT%" -o "OrgReport.txt"

How to suppress record sets returned by SELECT statements in a Stored Procedure

I'm writing a stored procedure which checks for the existence of various tables in various databases, as well as the permissions that the user executing the stored procedure has on those tables. The stored procedure itself resides within a user database (i.e. it's not in the Master db).
To perform my checks, my stored procedure contains lots of SELECT statements. Each of those obviously returns a record set. What I would like is to somehow suppress these record sets so that they are not returned by the stored procedure, and instead return my own, single record set which is just a collection of messages relating to each check the stored procedure performs.
I think the obvious answer is to use a table-valued function instead, but I've not been able to recreate my tests successfully in a Function as they appear in the stored procedure. For starters, I'm having to use temporary tables (not possible in a function) and dynamic SQL (not very compatible with table parameters).
I think I've basically got two choices:
Rewrite my stored procedure as a function and figure out how to do the checks a different way.
Continue using my stored procedure and use an OUTPUT parameter to return my result messages, probably as a delimited string, and in the associated ASP.NET application just ignore all the record sets the stored procedure returns .
Neither of these solutions is very satisfactory. Before I spend any more time pursuing either one, is there a way to discard the record sets produced by the SELECT statements in a stored procedure and explicitly define what record I want it to return?
Hmm, I only can speculate here...
Are you using something like
SELECT ...;
IF ##rowcount > 0
BEGIN
...
END;
?
Then you can rewrite it using something like
IF EXISTS (SELECT ...)
BEGIN
...
END;
or
DECLARE #variable integer;
SELECT #variable = count(*) ...;
IF #variable > 0
BEGIN
...
END;
In general point the results of your queries to a target (variable, table, expression, ...), then they don't get outputted.
And then just execute the query for your desired result in the end.
In my opinion, here is almost no reason to have stored procedures produce record sets. That is what stored functions are for. On occasion, it is needed, because of the use of dynamic SQL or other stored procedures, but not as a general practice. Much, much too often, I see stored procedures being used where stored functions or views are more appropriate.
What should you do? Even SELECT statement in the stored procedure should be one of the following:
Setting (local) variables.
Saving the results in a temporary table or table variable.
The logic for the stored procedure should be working on the local variables. The results should be returned using OUTPUT parameters.
If you need to return rows in a tabular format, you can do that using tables explicitly (such as a global temporary table or real table). Or, you can have one SELECT at the end that does return a single result set. However, if you need this and can phrase the stored procedure as a function, that is better in my opinion.