Run multiple queries concurrently in SAS

Run multiple queries concurrently in SAS - sql

I have 36 completely independent queries I need to run on a regular bases which would go much faster if they could run 3 at a time (the database cancels our queries if we try and do more than 3 at a time) instead of each one waiting for the previous to finish.
I would like to do something like this
/* Some prep code here*/
/* Launch batch 1 containing queries 1-12*/
/* Immediately launch batch 2 (13-24) without waiting for 1-12 to finish*/
/* Immediately launch batch 3 (25-36)*/
/* Wait until all 3 batches are done and run some conclusion code*/
Or, if possible, just give it the 36 queries all together and have it run multiple at a time making sure to not have more than 3 running at any given time and any time one finishes, just add the next one from the stack.
Is this possible to do using SAS?
Thanks

I'm assuming you have a SAS server and from your local machine you're launching the queries.
(If you dont and work locally its not a problem you can do a rsubmit to a spawner you have on your local machine)
Even with SAS/Base its possible to have 3 queries launch at the same time by having in a single code three connection.
I'm assuming here you dont want to share work libraries and are completely independent queries
option autosignon=yes;
option sascmd="!sascmd";
* some random data;
data prova1;
do i=1 to 20000000;
x=rand('UNIFORM');
output;
end;
run;
data prova2;
do i=1 to 20000000;
y=rand('UNIFORM');
output;
end;
run;
*open connection to the server ;
options comamid=tcp;
filename rlink "D:\SAS\SASFoundation\9.2\connect\saslink\tcpwin.scr";
%LET host1=nbsimbol59;
%LET host2=nbsimbol59;
signon remote=host1 script=rlink;
signon remote=host2 script=rlink;
rsubmit process=host1 wait=no inheritlib=(work=cwork);;
proc sort data=cwork.prova1 out=cwork.r1;
by x;
run;
proc sort data=cwork.r1 out=cwork.r1a;
by i;
run;
endrsubmit;
rsubmit process=host2 wait=no inheritlib=(work=cwork);;
proc sort data=cwork.prova2 out=cwork.r2;
by y;
run;
proc sort data=cwork.r2 out=cwork.r2a;
by i;
run;
endrsubmit;
/* Wait for both tasks to complete. */
waitfor _ALL_ host1 host2;
data r9;
merge r1a (in=a) r2a (in=b);
by i;
if a and b;
run;
signoff host1;
signoff host2;
Only problem with this sample code is that it will wait both task to end and atm it doesnt come to mind a way to have it launch another query as soon as one ends but i believe it may be possible to have some way around it.
For now with this code you can easily launch 3 queries at a time, then when they end up 3 more and so on.
For your other request i'll think about it :)

On some platforms (Windows and UNIX for sure), if the configuration allows your SAS session to interact with the OS, then the SYSTASK statement gives you the ability to executes, lists, or terminates asynchronous tasks. Combined with the WAITFOR statement, you can do something like this:
systask command "sas prog1.sas" taskname=sas1;
systask command "sas prog2.sas" taskname=sas2;
systask command "sas prog3.sas" taskname=sas3;
waitfor _all_ sas1 sas2 sas3; /* suspend current session until the three jobs are finished */
See documentation on SYSTASK and WAITFOR statements (for the Windows Platform).

SAS Grid Computing? http://support.sas.com/rnd/scalability/grid/index.html

Related

Running Things in Parallel with SAS Enterprise Guide

Good afternoon, Stack Overflow gods and goddesses. I have a question about running code in SAS Enterprise Guide 7.1 in parallel.
Currently, I have 5 small PROC SQLs running in a project. The code runs fine, but it executes in series (IE: One at a time) and even though I have each piece broken out in an individual section, I can't seem to get all 5 to run at once. For example, take the code below:
PROC SQL;
connect to Oracle (user = "&Oracle_ID." password = "&Oracle_PW." path = "&Oracle_Path.");
Create table place.base_balance_data as select * from connection to Oracle (
Select
DEBR.Acct_Ref_Id
,case when DEBR.Acct_Typ_Cd = '2' and DEBR.Settle_Dt_Bal_Amt > 0
then sum(settle_dt_bal_amt)
else sum(0)
end as Typ_2_Settle_Dt_Bal_Amt
,case when DEBR.Acct_Typ_Cd = '5' and DEBR.Settle_Dt_Bal_Amt > 0
then sum(settle_dt_bal_amt)
else sum(0)
end as Typ_5_Settle_Dt_Bal_Amt
,case when DEBR.Acct_Typ_Cd = '1' and DEBR.Settle_Dt_Bal_Amt < 0
then sum(settle_dt_bal_amt)
else sum(0)
end as Typ_1_Settle_Dt_Bal_Amt
,case when DEBR.Acct_Typ_Cd = '1' and DEBR.Settle_Dt_Bal_Amt < 0
then sum(Csh_Free_Cr_Amt)
else sum(0)
end as Csh_Free_Cr_Amt
,case when DEBR.Acct_Typ_Cd = '1' and DEBR.Settle_Dt_Bal_Amt < 0
then coalesce(DEBR.Cr_Avbl_Amt,0)
end as Credit_Aval_Amt
From Cool.DataStuff DEBR
Where DEBR.Date_ID = &lm_bus_dID.
Group by DEBR.Acct_Ref_Id, DEBR.Acct_Typ_Cd, DEBR.Cr_Avbl_Amt, DEBR.Settle_Dt_Bal_Amt
Order by DEBR.Acct_Ref_ID asc offset 0 rows
);
Disconnect from Oracle;
Currently, the EG project looks like this:
I'm trying desperately to get all 5 of those pieces on the right to run at the same time, but alas, every time I try to do that, I get errors involving the passing of macro variables and not being able to connect to multiple sessions.
Has anyone had any luck doing this before? Could you maybe tell me what I'm missing here?
Thanks!

If you have SAS/CONNECT installed, you can break everything out into rsubmit blocks that will all run in parallel. You can ensure that each session gets the required macro vars using %syslput.
When running things in rsubmit blocks, think of each block as its own unique, independent worker session that's running code - almost like a thread. This worker lives in its own world and only knows what is in its session. The main session which kicked off the worker sessions is free to do whatever it pleases while the workers perform their specific tasks.
Below is an example of how to set this up.
Setup code
This will handle signing on to a metadata server in a new session automatically and make all code run asynchronously.
options autosignon=yes
sascmd='!sascmd'
connectwait=no
;
Create Macro Variables and pass them to your worker sessions
This will do two things:
Start a new asynchronous session
Send your macro variables from your main session to your worker session
..
<code to create macro vars>;
/* Send macro variables over to a new remote session */
%syslput mymacrovar / remote=worker1;
...
If you want to, you can use %syslput _USER_ / remote=worker1 to send all user-made macro variables to a new session.
Enclose all worker code in rsubmit blocks
libname workmain "%sysfunc(getoption(work))";
rsubmit remote=worker1 inheritlib=(<my libraries here> workmain);
<code here>;
endrsubmit;
Note the libname statement, workmain. rsubmit cannot inherit the work library of the main session. This is by design since each of these sessions have their own work library whose name cannot be overwritten. You can get around it by creating a new library that points to your main session's work library.
Wait for everything to finish
Finally, you can add one last piece of code to wait for everything to finish up - or, you're free to have the main thread run more independent light tasks.
waitfor _ALL_;

AS400 embedded SQL cursor handling

I believe I may have uncovered a production bug, which causes intermittent problems. Basically, I am trying to understand what the AS400 does when dealing with embedded SQL and cursors. I am thinking that the cursor does not get closed in some cases, causing the next case to fail since the cursor is still open.
Here is a snapshot of the code:
begsr checkfile;
exec sql
declare T1 cursor for
select * from FILE1
where field1 = :var1;
exec sql
open T1;
exec sql
fetch next from T1 into :vrDS;
dow SQLCOD = *zeros;
if a=b;
eval found = 'Y';
leavesr;
endif;
enddo;
exec sql
close T1;
endsr;
My concern is in the leavesr line. If the condition is met, it leaves the subroutine, which skips the close of the cursor T1. In the job log, there are informational messages like "Cursor T1 already open or allocated.". I assume this means that it didn't do anything, or maybe even fetched from the previous cursor? I'm also wondering if the declare statement gets executed every time, or if it just skips that part of the code after the first execution of it. I think I need to put a close T1 statement before the open statement, but I wanted to get a second opinion since it's a production issue that is nearly impossible to recreate due to security key differences between test & production.
Thanks!

a DECLARE CURSOR is actually a compile time statement.
It's never executed at run-time.
:var1 is passed to the DB when the OPEN is done at run time. Here's an answer with a detailed example Using cursor for multiple search conditions
Yes, as the code is shown, the cursor would be left open. And possibly read from rather than opening a fresh cursor during the next run. Depends on what the CLOSQLCUR option is during compile (or as set using EXEC SQL SET OPTION)
You should be checking SQLSTATE/SQLCODE after the OPEN and the FETCH
Another common practice is to do a CLOSE before the OPEN, again be sure to check SQLSTATE/SQLCODE and ignore the CURSOR NOT OPEN on the close.

What Charles said, and also I believe you may in some cases even create an infinite loop with this code! Maybe you aren't giving the whole code, but if the fetch is successful (SQLCOD = 0) and a <> b, then you are stuck in a loop.
I like to put the fetch in a sub procedure that returns *On if a record is successfully read. Then you can do something like this:
dcl-proc MyProc;
dcl-pi *n;
... parameters ...
end-pi;
C1_OpenCursor(parameters);
dow C1_FetchCursor(record);
... do something with the record ...
enddo;
C1_CloseCursor();
end-proc;
// ------------------------
// SQL Open the cursor
dcl-proc C1_OpenCursor;
dcl-pi *n;
... parameters ...
end-pi;
exec sql
declare C1 cursor for ...
exec sql
open C1;
if SQLCOD < 0;
.. error processing ...
endif;
end-proc;
// ------------------------
// SQL Read the cursor
dcl-proc C1_FetchCursor;
dcl-pi *n Ind;
... parameters ...
end-pi;
exec sql
fetch C1 into ...
if SQLCOD = 100;
return *Off;
elseif SQLCOD < 0;
... error handling ...
return *Off;
endif;
return *On;
end-proc;
// ------------------------
// SQL Close the cursor
dcl-proc C1_CloseCursor;
exec sql close C1;
end-proc;
This lets you keep all of your database code in one place, and just call it from your program. Database procedures just access the database and report errors in some way. Your program logic can remain uncluttered by sometimes wordy database and error handling code.
One other thing, I don't check for errors on cursor close because the only error (other than syntax errors) that can be returned here is that the cursor is not open. I don't care about that because that's what I want anyway.

Can some records be skipped during parallel processing?

I am using parallel processing.
CALL FUNCTION 'ZABC' STARTING NEW TASK taskname
DESTINATION IN GROUP srv_grp PERFORMING come_back ON END OF TASK
EXPORTING
...
EXCEPTIONS
...
.
I am calling this FM inside a loop. sometimes, my records are skipped. I am not getting a desired output. Sometimes 2000 records are processed and sometimes 1000. The numbers are varying. what can be the problem ? Can you provide me some cases where records can be skipped in parallel processing ?

See the example code SAP provides: https://help.sap.com/viewer/753088fc00704d0a80e7fbd6803c8adb/7.4.16/en-US/4892082ffeb35ed2e10000000a42189d.html
Look at how they handle the resource exceptions. A resource exception means that you tried to start a new aRFC but there were no more processes available. Your program has to handle these cases. Without handling these cases the entries will be skipped. The normal handling is to just wait a little time for some of the active processes to finish, in the example program:
WAIT UNTIL rcv_jobs >= snd_jobs
UP TO 5 SECONDS.

gv_semaphore = 0.
" your internal table size.
DESCRIBE TABLE lt_itab LINES lv_lines.
LOOP AT lt_itab INTO ls_itab.
" parallel proc. -->>
CALL FUNCTION 'ZABC' STARTING NEW TASK taskname
DESTINATION IN GROUP srv_grp PERFORMING come_back ON END OF TASK
EXPORTING
...
EXCEPTIONS
...
.
" <<--
ENDLOOP.
" wait until all parallel processes have terminated. **
WAIT UNTIL gv_semaphore = lv_lines.
NOTE: You should check total parallel process count. There should be some upper limits for opened threads.
Thanks.

Testing XQuery and Marklogic transactions

We have some business requirements that call for versioning. We chose to use MarkLogic library services for that. We have an issue testing our code with XRAY and using transactions.
Our test is as follows:
declare function should-save-with-version-when-releasing() {
declare option xdmp:transaction-mode "update";
let $uri := '/some-document-uri.xml'
let $document := fn:doc($uri)
let $pre-release-version := c:get-latest-version($uri)
let $post-release-version := c:get-latest-version($uri)
let $result := mut:release($document) (:this should version up:)
return (assert:not-empty($pre-release-version),
assert:not-empty($result),
assert:not-equal($pre-release-version,$post-release-version),
xdmp:rollback())
The test will pass no matter what, and as it turns out ML rollback demolishes all the variables.
How do we test it using transactions?
Any helps greatly appreciated,
im

With MarkLogic an entire XQuery update normally acts like a single transaction. When mut:release adds an update to the transaction's stack, the rest of the query will not see that update until after it commits. From the point of view of the query, this normally happens after the entire query finishes, and is not visible to the query.
The docs have something useful to add about what http://docs.marklogic.com/xdmp:rollback does:
When a transaction is rolled back, the current statement immediately
terminates, updates made by any statement in the transaction are
discarded, and the transaction terminates.
So it isn't that the variables are demolished: it's that your program is over.
I think http://docs.marklogic.com/guide/app-dev/transactions#id_15746 has an example that is fairly close to your use-case: "Example: Multi-statement Transactions and Same-statement Isolation". It demonstrates how to xdmp:eval or xdmp:invoke to update a document and see the results within the same query.
Test it to see that it works, then replace the xdmp:commit with an xdmp:rollback. For me the example still works. Start replacing the rest of the logic with your unit test logic, and you should be on your way.

Oracle - Strange intermittent error with data/state being lost between procedure calls

I have a bug I've been trying to track down for a few weeks now, and have pretty much isolated exactly what's going wrong. However, why this happens is beyond me.
I have a very large Oracle package (around 4,100 lines of code) and I'm calling several procedures. However, data seems to be getting lost between procedure calls.
The data that is being lost is:
dpmethodstate varchar_state_local_type;
First, I call this procedure:
PROCEDURE delivery_plan_set_state (
messages OUT ReferenceCursor,
state IN varchar_state_local_type
) AS
BEGIN
logMessage('state COUNT is: ' || state.COUNT);
dpmethodstate := state;
FOR I IN 1..dpmethodstate.COUNT LOOP
logMessage(dpmethodstate(I));
END LOOP;
logMessage('delivery_plan_set_state end - dpmethodstate count is now ' || dpmethodstate.COUNT);
OPEN messages FOR SELECT * FROM TABLE(messageQueue);
messageQueue := NULL;
END delivery_plan_set_state;
I pass in state, which is a valid array of a single string. I can verify in the logs that dpmethodstate has a COUNT of 1 when the procedure ends.
Next, I call the execute_filter procedure which looks like this:
PROCEDURE execute_filter (
--Whole bunch of OUT parameters
) AS
--About 50 different local variables being set here
BEGIN
SELECT TO_CHAR(SYSTIMESTAMP, 'HH24:MI:SS.ff') INTO TIMING FROM DUAL;
logMessage('[' || TIMING || '] execute_filter begin');
logMessage('a) dpmethodstate Count is: ' || dpmethodstate.COUNT);
--Rest of procedure
However, this time dpmethodstate.COUNT is 0. The value I set from delivery_plan_set_state has vanished!
When I look at my logs, it looks something like this:
proposed
delivery_plan_set_state end - dpmethodstate count is now 1
[21:39:48.719017] execute_filter begin
a) dpmethodstate Count is: 0
As you can see, dpmethodstate got lost between procedure calls. There's a few things to note:
Nothing else in this package is capable of setting the value for dpmethodstate besides delivery_plan_set_state. And I can see nothing else has called it.
My client side code is written in C#, and not much happens between the two procedure calls.
This happens perhaps 1 out of every 100 times, so it's very difficult to track down or debug.
First off, what's the best way to debug a problem like this? Is there any more logging I can do? Also, how does Oracle maintain state between procedure calls and is there anything that can intermittently reset this state? Any pointers would be much appreciated!

Is dpmethodstate a package global variable? I'm assuming it is, but I don't see that explicitly mentioned.
Since package global variables have session scope, are you certain that the two procedure calls are always using the same physical database connection and that nothing is using this connection in the interim? If you're using some sort of connection pool where you get a connection from the pool before each call and return the connection to the pool after the first call, it wouldn't be terribly unusual in a development environment (or low usage environment) to get the same connection 99% of the time for the second call but get a different session 1% of the time.
Can you log the SID and SERIAL# of the session where you are setting the value and where you are retrieving the value?
SELECT sid, serial#
FROM v$session
WHERE sid = sys_context( 'USERENV', 'SID' );
If those are different, you wouldn't expect the value to persist.
Beyond that, there are other ways to clear session state but they require someone to take explicit action. Calling DBMS_SESSION.RESET_PACKAGE or DBMS_SESSION.MODIFY_PACKAGE_STATE(DBMS_SESSION.REINITIALIZE) will clear out any session state set in your session. Compiling the package will do the same thing but that should throw an error warning you that your session state was discarded when you try to read it.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas