Running Things in Parallel with SAS Enterprise Guide

Running Things in Parallel with SAS Enterprise Guide - variables

Good afternoon, Stack Overflow gods and goddesses. I have a question about running code in SAS Enterprise Guide 7.1 in parallel.
Currently, I have 5 small PROC SQLs running in a project. The code runs fine, but it executes in series (IE: One at a time) and even though I have each piece broken out in an individual section, I can't seem to get all 5 to run at once. For example, take the code below:
PROC SQL;
connect to Oracle (user = "&Oracle_ID." password = "&Oracle_PW." path = "&Oracle_Path.");
Create table place.base_balance_data as select * from connection to Oracle (
Select
DEBR.Acct_Ref_Id
,case when DEBR.Acct_Typ_Cd = '2' and DEBR.Settle_Dt_Bal_Amt > 0
then sum(settle_dt_bal_amt)
else sum(0)
end as Typ_2_Settle_Dt_Bal_Amt
,case when DEBR.Acct_Typ_Cd = '5' and DEBR.Settle_Dt_Bal_Amt > 0
then sum(settle_dt_bal_amt)
else sum(0)
end as Typ_5_Settle_Dt_Bal_Amt
,case when DEBR.Acct_Typ_Cd = '1' and DEBR.Settle_Dt_Bal_Amt < 0
then sum(settle_dt_bal_amt)
else sum(0)
end as Typ_1_Settle_Dt_Bal_Amt
,case when DEBR.Acct_Typ_Cd = '1' and DEBR.Settle_Dt_Bal_Amt < 0
then sum(Csh_Free_Cr_Amt)
else sum(0)
end as Csh_Free_Cr_Amt
,case when DEBR.Acct_Typ_Cd = '1' and DEBR.Settle_Dt_Bal_Amt < 0
then coalesce(DEBR.Cr_Avbl_Amt,0)
end as Credit_Aval_Amt
From Cool.DataStuff DEBR
Where DEBR.Date_ID = &lm_bus_dID.
Group by DEBR.Acct_Ref_Id, DEBR.Acct_Typ_Cd, DEBR.Cr_Avbl_Amt, DEBR.Settle_Dt_Bal_Amt
Order by DEBR.Acct_Ref_ID asc offset 0 rows
);
Disconnect from Oracle;
Currently, the EG project looks like this:
I'm trying desperately to get all 5 of those pieces on the right to run at the same time, but alas, every time I try to do that, I get errors involving the passing of macro variables and not being able to connect to multiple sessions.
Has anyone had any luck doing this before? Could you maybe tell me what I'm missing here?
Thanks!

If you have SAS/CONNECT installed, you can break everything out into rsubmit blocks that will all run in parallel. You can ensure that each session gets the required macro vars using %syslput.
When running things in rsubmit blocks, think of each block as its own unique, independent worker session that's running code - almost like a thread. This worker lives in its own world and only knows what is in its session. The main session which kicked off the worker sessions is free to do whatever it pleases while the workers perform their specific tasks.
Below is an example of how to set this up.
Setup code
This will handle signing on to a metadata server in a new session automatically and make all code run asynchronously.
options autosignon=yes
sascmd='!sascmd'
connectwait=no
;
Create Macro Variables and pass them to your worker sessions
This will do two things:
Start a new asynchronous session
Send your macro variables from your main session to your worker session
..
<code to create macro vars>;
/* Send macro variables over to a new remote session */
%syslput mymacrovar / remote=worker1;
...
If you want to, you can use %syslput _USER_ / remote=worker1 to send all user-made macro variables to a new session.
Enclose all worker code in rsubmit blocks
libname workmain "%sysfunc(getoption(work))";
rsubmit remote=worker1 inheritlib=(<my libraries here> workmain);
<code here>;
endrsubmit;
Note the libname statement, workmain. rsubmit cannot inherit the work library of the main session. This is by design since each of these sessions have their own work library whose name cannot be overwritten. You can get around it by creating a new library that points to your main session's work library.
Wait for everything to finish
Finally, you can add one last piece of code to wait for everything to finish up - or, you're free to have the main thread run more independent light tasks.
waitfor _ALL_;

Related

Static Hangfire RecurringJob methods in LINQPad are not behaving

I have a script in LINQPad that looks like this:
var serverMode = EnvironmentType.EWPROD;
var jobToSchedule = JobType.ABC;
var hangfireCs = GetConnectionString(serverMode);
JobStorage.Current = new SqlServerStorage(hangfireCs);
Action<string, string, XElement> createOrReplaceJob =
(jobName, cronExpression, inputPackage) =>
{
RecurringJob.RemoveIfExists(jobName);
RecurringJob.AddOrUpdate(
jobName,
() => new BTR.Evolution.Hangfire.Schedulers.JobInvoker().Invoke(
jobName,
inputPackage,
null,
JobCancellationToken.Null),
cronExpression, TimeZoneInfo.Local);
};
// psuedo code to prepare inputPackage for client ABC...
createOrReplaceJob("ABC.CustomReport.SurveyResults", "0 2 * * *", inputPackage);
JobStorage.Current.GetConnection().GetRecurringJobs().Where( j => j.Id.StartsWith( jobToSchedule.ToString() ) ).Dump( "Scheduled Jobs" );
I have to schedule in both QA and PROD. To do that, I toggle the serverMode variable and run it once for EWPROD and once for EWQA. This all worked fine until recently, and I don't know exactly when it changed unfortunately because I don't always have to run in both environments.
I did purchase/install LINQPad 7 two days ago to look at some C# 10 features and I'm not sure if that affected it.
But here is the problem/flow:
Run it for EWQA and everything works.
Run it for EWPROD and the script (Hangfire components) seem to run in a mix of QA and PROD.
When I'm running it the 'second time' in EWPROD I've confirmed:
The hangfireCs (connection string) is right (pointing to PROD) and it is assigned to JobStorage.Current
The query at the end of the script, JobStorage.Current.GetConnection().GetRecurringJobs() uses the right connection.
The RecurringJob.* methods inside the createOrReplaceJob Action use the connection from the previous run (i.e. EWQA). If I monitor my QA Hangfire db, I see the job removed and added.
Temporary workaround:
Run it for EWQA and everything works.
Restart LINQPad or use 'Cancel and Reset All Queries' method
Run it for EWPROD and now everything works.
So I'm at a loss of where the issue might lie. I feel like my upgrade/install of LINQPad7 might be causing problems, but I'm not sure if there is a different way to make the RecurringJob.* static methods use the 'updated' connection string.
Any ideas on why the restart or reset is now needed?
LINQPad - 5.44.02
Hangfire.Core - 1.7.17
Hangfire.SqlServer - 1.7.17

This is caused by your script (or a library that you call) caching something statically, and not cleaning up between executions.
Either clear/dispose objects when you're done (e.g., JobStorage.Current?) or tell LINQPad not to re-use the process between executions, by adding Util.NewProcess=true; to your script.

getgroup() is very slow

I am using the function getgroup() to read all of the groups of a user in the active directory.
I'm not sure if I'm doing something wrong but it is very very slow. Each time it arrives at this point, it takes several seconds. I'm also accessing the rest of Active directory using the integrated function of "Accountmanagement" and it executes instantly.
Here's the code:
For y As Integer = 0 To AccountCount - 1
Dim UserGroupArray As PrincipalSearchResult(Of Principal) = UserResult(y).GetGroups()
UserInfoGroup(y) = New String(UserGroupArray.Count - 1) {}
For i As Integer = 0 To UserGroupArray.Count - 1
UserInfoGroup(y)(i) = UserGroupArray(i).ToString()
Next
Next
Later on...:
AccountChecker_Listview.Groups.Add(New ListViewGroup(Items(y, 0), HorizontalAlignment.Left))
For i As Integer = 0 To UserInfoGroup(y).Count - 1
AccountChecker_Listview.Items.Add(UserInfoGroup(y)(i)).Group = AccountChecker_Listview.Groups(y)
Next
Item(,) contains my normal Active directory data that I display Item(y, 0) contain the username.
y is the number of user accounts in AD. I also have some other code for the other information in this loop but it's not the issue here.
Anyone know how to make this goes faster or if there is another solution?

I'd recommend trying to find out where the time is spent. One option is to use a profiler, either the one built into Visual Studio or a third-party profiler like Redgate's Ants Profiler or the Yourkit .Net Profiler.
Another is to trace the time taken using the System.Diagnostics.Stopwatch class and use the results to guide your optimization efforts. For example time the function that retrieves data from Active Directory and separately time the function that populates the view to narrow down where the bottleneck is.
If the bottleneck is in the Active Directory lookup you may want to consider running the operation asynchronously so that the window is not blocked and populates as new data is retrieved. If it's in the listview you may want to consider for example inserting the data in a batch operation.

Oracle - Strange intermittent error with data/state being lost between procedure calls

I have a bug I've been trying to track down for a few weeks now, and have pretty much isolated exactly what's going wrong. However, why this happens is beyond me.
I have a very large Oracle package (around 4,100 lines of code) and I'm calling several procedures. However, data seems to be getting lost between procedure calls.
The data that is being lost is:
dpmethodstate varchar_state_local_type;
First, I call this procedure:
PROCEDURE delivery_plan_set_state (
messages OUT ReferenceCursor,
state IN varchar_state_local_type
) AS
BEGIN
logMessage('state COUNT is: ' || state.COUNT);
dpmethodstate := state;
FOR I IN 1..dpmethodstate.COUNT LOOP
logMessage(dpmethodstate(I));
END LOOP;
logMessage('delivery_plan_set_state end - dpmethodstate count is now ' || dpmethodstate.COUNT);
OPEN messages FOR SELECT * FROM TABLE(messageQueue);
messageQueue := NULL;
END delivery_plan_set_state;
I pass in state, which is a valid array of a single string. I can verify in the logs that dpmethodstate has a COUNT of 1 when the procedure ends.
Next, I call the execute_filter procedure which looks like this:
PROCEDURE execute_filter (
--Whole bunch of OUT parameters
) AS
--About 50 different local variables being set here
BEGIN
SELECT TO_CHAR(SYSTIMESTAMP, 'HH24:MI:SS.ff') INTO TIMING FROM DUAL;
logMessage('[' || TIMING || '] execute_filter begin');
logMessage('a) dpmethodstate Count is: ' || dpmethodstate.COUNT);
--Rest of procedure
However, this time dpmethodstate.COUNT is 0. The value I set from delivery_plan_set_state has vanished!
When I look at my logs, it looks something like this:
proposed
delivery_plan_set_state end - dpmethodstate count is now 1
[21:39:48.719017] execute_filter begin
a) dpmethodstate Count is: 0
As you can see, dpmethodstate got lost between procedure calls. There's a few things to note:
Nothing else in this package is capable of setting the value for dpmethodstate besides delivery_plan_set_state. And I can see nothing else has called it.
My client side code is written in C#, and not much happens between the two procedure calls.
This happens perhaps 1 out of every 100 times, so it's very difficult to track down or debug.
First off, what's the best way to debug a problem like this? Is there any more logging I can do? Also, how does Oracle maintain state between procedure calls and is there anything that can intermittently reset this state? Any pointers would be much appreciated!

Is dpmethodstate a package global variable? I'm assuming it is, but I don't see that explicitly mentioned.
Since package global variables have session scope, are you certain that the two procedure calls are always using the same physical database connection and that nothing is using this connection in the interim? If you're using some sort of connection pool where you get a connection from the pool before each call and return the connection to the pool after the first call, it wouldn't be terribly unusual in a development environment (or low usage environment) to get the same connection 99% of the time for the second call but get a different session 1% of the time.
Can you log the SID and SERIAL# of the session where you are setting the value and where you are retrieving the value?
SELECT sid, serial#
FROM v$session
WHERE sid = sys_context( 'USERENV', 'SID' );
If those are different, you wouldn't expect the value to persist.
Beyond that, there are other ways to clear session state but they require someone to take explicit action. Calling DBMS_SESSION.RESET_PACKAGE or DBMS_SESSION.MODIFY_PACKAGE_STATE(DBMS_SESSION.REINITIALIZE) will clear out any session state set in your session. Compiling the package will do the same thing but that should throw an error warning you that your session state was discarded when you try to read it.

Run multiple queries concurrently in SAS

I have 36 completely independent queries I need to run on a regular bases which would go much faster if they could run 3 at a time (the database cancels our queries if we try and do more than 3 at a time) instead of each one waiting for the previous to finish.
I would like to do something like this
/* Some prep code here*/
/* Launch batch 1 containing queries 1-12*/
/* Immediately launch batch 2 (13-24) without waiting for 1-12 to finish*/
/* Immediately launch batch 3 (25-36)*/
/* Wait until all 3 batches are done and run some conclusion code*/
Or, if possible, just give it the 36 queries all together and have it run multiple at a time making sure to not have more than 3 running at any given time and any time one finishes, just add the next one from the stack.
Is this possible to do using SAS?
Thanks

I'm assuming you have a SAS server and from your local machine you're launching the queries.
(If you dont and work locally its not a problem you can do a rsubmit to a spawner you have on your local machine)
Even with SAS/Base its possible to have 3 queries launch at the same time by having in a single code three connection.
I'm assuming here you dont want to share work libraries and are completely independent queries
option autosignon=yes;
option sascmd="!sascmd";
* some random data;
data prova1;
do i=1 to 20000000;
x=rand('UNIFORM');
output;
end;
run;
data prova2;
do i=1 to 20000000;
y=rand('UNIFORM');
output;
end;
run;
*open connection to the server ;
options comamid=tcp;
filename rlink "D:\SAS\SASFoundation\9.2\connect\saslink\tcpwin.scr";
%LET host1=nbsimbol59;
%LET host2=nbsimbol59;
signon remote=host1 script=rlink;
signon remote=host2 script=rlink;
rsubmit process=host1 wait=no inheritlib=(work=cwork);;
proc sort data=cwork.prova1 out=cwork.r1;
by x;
run;
proc sort data=cwork.r1 out=cwork.r1a;
by i;
run;
endrsubmit;
rsubmit process=host2 wait=no inheritlib=(work=cwork);;
proc sort data=cwork.prova2 out=cwork.r2;
by y;
run;
proc sort data=cwork.r2 out=cwork.r2a;
by i;
run;
endrsubmit;
/* Wait for both tasks to complete. */
waitfor _ALL_ host1 host2;
data r9;
merge r1a (in=a) r2a (in=b);
by i;
if a and b;
run;
signoff host1;
signoff host2;
Only problem with this sample code is that it will wait both task to end and atm it doesnt come to mind a way to have it launch another query as soon as one ends but i believe it may be possible to have some way around it.
For now with this code you can easily launch 3 queries at a time, then when they end up 3 more and so on.
For your other request i'll think about it :)

On some platforms (Windows and UNIX for sure), if the configuration allows your SAS session to interact with the OS, then the SYSTASK statement gives you the ability to executes, lists, or terminates asynchronous tasks. Combined with the WAITFOR statement, you can do something like this:
systask command "sas prog1.sas" taskname=sas1;
systask command "sas prog2.sas" taskname=sas2;
systask command "sas prog3.sas" taskname=sas3;
waitfor _all_ sas1 sas2 sas3; /* suspend current session until the three jobs are finished */
See documentation on SYSTASK and WAITFOR statements (for the Windows Platform).

SAS Grid Computing? http://support.sas.com/rnd/scalability/grid/index.html

SQL Server Agent 2005 job runs but no output

Essentially I have a job which runs in BIDS and as as a stand lone package and while it runs under the SQL Server Agent it doesn't complete properly (no error messages though).
The job steps are:
1) Delete all rows from table;
2) Use For each loop to fill up table from Excel spreasheets;
3) Clean up table.
I've tried this MS page (steps 1 & 2), didn't see any need to start changing from Server side security.
Also SQLServerCentral.com for this page, no resolution.
How can I get error logging or a fix?
Note I've reposted this from Server Fault as it's one of those questions that's not pure admin or programming.
I have logged in as the proxy account I'm running this under, and the job runs stand alone but complains that the Excel tables are empty?

Here's how I managed tracking "returned state" from an SSIS package called via a SQL Agent job. If we're lucky, some of this may apply to your system.
Job calls a stored procedure
Procedure builds a DTEXEC call (with a dozen or more parameters)
Procedure calls xp_cmdshell, with the call as a parameter (#Command)
SSIS package runs
"local" SSIS variable is initialized to 1
If an error is raised, SSIS "flow" passes to a step that sets that local variable to 0
In a final step, use Expressions to set SSIS property "ForceExecutionResult" to that local variable (1 = Success, 0 = Failure)
Full form of the SSIS call stores the returned value like so:
EXECUTE #ReturnValue = master.dbo.xp_cmdshell #Command
...and then it gets messy, as you can get a host of values returned from SSIS . I logged actions and activity in a DB table while going through the SSIS steps and consult that to try to work things out (which is where #Description below comes from). Here's the relevant code and comments:
-- Evaluate the DTEXEC return code
SET #Message = case
when #ReturnValue = 1 and #Description <> 'SSIS Package' then 'SSIS Package execution was stopped or interrupted before it completed'
when #ReturnValue in (0,1) then '' -- Package success or failure is logged within the package
when #ReturnValue = 3 then 'DTEXEC exit code 3, package interrupted'
when #ReturnValue in (4,5,6) then 'DTEXEC exit code ' + cast(#Returnvalue as varchar(10)) + ', package could not be run'
else 'DTEXEC exit code ' + isnull(cast(#Returnvalue as varchar(10)), '<NULL>') + ' is an unknown and unanticipated value'
end
-- Oddball case: if cmd.exe process is killed, return value is 1, but process will continue anyway
-- and could finish 100% succesfully... and #ReturnValue will equal 1. If you can figure out how,
-- write a check for this in here.
That last references the "what if, while SSIS is running, some admin joker kills the CMD session (from, say, taskmanager) because the process is running too long" situation. We've never had it happen--that I know of--but they were uber-paranoid when I was writing this so I had to look into it...

Why not use logging built into SSIS? We send our logs toa database table and then parse them out to another table in amore user friendly format and can see every step of everypackage that was run. And every error.

I did fix this eventually, thanks for the suggestions.
Basically I logged into Windows with the proxy user account I was running and started to see errors like:
"The For each file enumerator is empty"
I copied the project files across and started testing, it turned out that I'd still left a file path (N:/) in the properties of the For Each loop box, although I'd changed the connection properties. Easier once you've got error conditions to work with. I also had to recreate the variable mapping.
No wonder people just recreate the whole package.
Now fixed and working!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas