Is BigQuery procedure execution completely synchronous? - google-bigquery

I'm looking at the possibility of moving some of our simple model builds to use stored procedures. One of the sticking points for me is that it appears that procedures are entirely synchronous (even though they spawn child jobs, which would be done asynchronously). Is this accurate? There's no way to execute a SQL statement without waiting for its query job to complete?
https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting-concepts

Statements within a multi-statement query do not execute concurrently, if that's what you're asking. They execute sequentially, even without control flow. That's true regardless of whether you include multiple statements, or you invoke a stored procedure with multiple statements.
You can issue multiple queries to run concurrently, but that's accomplished by creating multiple query requests/jobs, not via a single invocation with multiple statements within it.

Related

Mule batch processing vs foreach vs splitter-aggregator

In Mule, I have quite many records to process, where processing includes some calculations, going back and forth to database etc.. We can process collections of records with these options
Batch processing
ForEach
Splitter-Aggregator
So what are the main differences between them? When should we prefer one to others?
Mule batch processing option does not seem to have batch job scope variable definition, for example. Or, what if I want to benefit multithreading to fasten the overall task? Or, which is better if I want to modify the payload during processing?
When you write "quite many" I assume it's too much for main memory, this rules out spliter/aggregator because it has to collect all records to return them as a list.
I assume you have your records in a stream or iterator, otherwise you probably have a memory problem...
So when to use for-each and when to use batch?
For Each
The most simple solution, but it has some drawbacks:
It is single threaded (so may be too slow for your use case)
It is "fire and forget": You can't collect anything within the loop, e.g. a record count
There is not support handling "broken" records
Within the loop, you can have several steps (message processors) to process your records (e.g. for the mentioned database lookup).
May be a drawback, may be an advantage: The loop is synchronous. (If you want to process asynchronous, wrap it in an async-scope.)
Batch
A little more stuff to do / to understand, but more features:
When called from a flow, always asynchronous (this may be a drawback).
Can be standalone (e.g. with a poll inside for starting)
When the data generated in the loading phase is too big, it is automatically offloaded to disk.
Multithreading for free (number of threads configurable)
Handling for "broken records": Batch steps may be executed for good/broken records only.
You get statitstics at the end (number of records, number of successful records etc.)
So it looks like you better use batch.
For Splitter and Aggregator , you are responsible for writing the splitting logic and then joining them back at the end of processing. It is useful when you want to process records asynchronously using different server. It is less reliable compared to other option, here parallel processing is possible.
Foreach is more reliable but it process records iteratively using single thread ( synchronous), hence parallel processing is not possible. Each records creates a single message by default.
Batch processing is designed to process millions of records in a very fast and reliable way. By default 16 threads will process your records and it is reliable as well.
Please go through the link below for more details.
https://docs.mulesoft.com/mule-user-guide/v/3.8/splitter-flow-control-reference
https://docs.mulesoft.com/mule-user-guide/v/3.8/foreach
I have been using approach to pass on records in array to stored procedure.
You can call stored procedure inside for loop and setting batch size of the for loop accordingly to avoid round trips. I have used this approach and performance is good. You may have to create another table to log results and have that logic in stored procedure as well.
Below is the link which has all the details
https://dzone.com/articles/passing-java-arrays-in-oracle-stored-procedure-fro

Executing Postgres statement or procedure without waiting for return

I'd like to invoke some cleanup procedure after some INSERT is done, and the cleanup takes a long time. My sql client has very little tolerance on timeouts, so I'd like to return as soon as the command is issued.
I've tried Trigger --the INSERT triggers the stored procedure to do cleanup. The problem is that now the INSERT has to wait for the cleanup to finish in order to return.
I've tried calling a 'proxy' function which then calls the actual cleanup function, but it seems that the function call stack in Postgres is no different than in other languages. The client still waits for the cleanup.
What's the best practice in this scenario? Thanks!
You will have to use an external program to do this. PostgreSQL itself will not allow you to run things in the background. You may be able to handle this with pgAgent or by creating your own program that runs constantly. You will need to send some kind of notification to the program, for which you can use e.g. a table with trigger on the insert, or listen/notify.

How to run an unknown number of stored procs in parallel

I am looking for something like this. Although the syntax is wrong, it demonstrates the principle
foreach(int i in myIntArray)
{
execute mystoredProc i;//this should kick off the proc and go onto next one without waiting for a return value
}
These stored procs are called from a Windows application. I am a little skeptical of creating numerous threads at the application end. I would rather do the threading at the SQL server end. I am open to using SSIS.
You can't do what you're asking for directly.
What you can do is to fire up n threads, and then each thread open up it's own connection, and each connection run it's own SQL query. Each thread will then wait for it's query to return. You can't do this in just one thread.
This also means that you can't do it natively within T-SQL.
You could write a CLR routine that launches multiple threads, and repeat the above process. So enabling your T-SQL to call your CLR code and the CLR deal with the concurrency problem.
But the standard practice for this really is to have multiple client side threads.

Can Sql Server 2008 Stored Procedures (or Triggers) manually parallel or background some logic?

If i have a stored procedure or a trigger in Sql Server 2008, can it do some sql calculations 'in another non-blocking thread'? ie. something in the background
also, can two sql code blocks be ran in parallel? or two stored procs be ran in parallel?
for example. Imagine we are given the job calculating the scores for each Stack Overflow user (and please leave all 'do that elsehwere/service/batch/overnight/etc, elswhere) after a user does some 'action'.
so we have a trigger on the Post table, so when a new post is INSERTED, the trigger fires off and part of that logic, it calculates the user's latest score. Instead of waiting for the stored proc to finish and block the current sql thread / executire, can we ask it to calc the score in the background OR parallel.
cheers!
SQL Server does not have parallel or deferred execution: each block of running code in a connection is serial, one line after the other.
To decouple processing, you usually have to use SQL Server Agent jobs or use Service broker. These start executing in a new connection, new session etc
This makes sense:
What if you want to rollback your changes? What does the background thread do and how does it know?
What data does it use? New, Old, lock wait, snapshot?
What if it gets ahead of the main thread and uses stale data?
No, but you could write the request to a queue. Service Broker, a SQL Server component, provides support for this kind of thing. It's probably the best option available for asynchronous processing.

Start stored procedures sequentially or in parallel

We have a stored procedure that runs nightly that in turn kicks off a number of other procedures. Some of those procedures could logically be run in parallel with some of the others.
How can I indicate to SQL Server whether a procedure should be run in parallel or serial — ie: kicked off of asynchronously or blocking?
What would be the implications of running them in parallel, keeping in mind that I've already determined that the processes won't be competing for table access or locks- just total disk io and memory. For the most part they don't even use the same tables.
Does it matter if some of those procedures are the same procedure, just with different parameters?
If I start a pair or procedures asynchronously, is there a good system in SQL Server to then wait for both of them to finish, or do I need to have each of them set a flag somewhere and check and poll the flag periodically using WAITFOR DELAY?
At the moment we're still on SQL Server 2000.
As a side note, this matters because the main procedure is kicked off in response to the completion of a data dump into the server from a mainframe system. The mainframe dump takes all but about 2 hours each night, and we have no control over it. As a result, we're constantly trying to find ways to reduce processing times.
I had to research this recently, so found this old question that was begging for a more complete answer. Just to be totally explicit: TSQL does not (by itself) have the ability to launch other TSQL operations asynchronously.
That doesn't mean you don't still have a lot of options (some of them mentioned in other answers):
Custom application: Write a simple custom app in the language of your choice, using asynchronous methods. Call a SQL stored proc on each application thread.
SQL Agent jobs: Create multiple SQL jobs, and start them asynchronously from your proc using sp_start_job. You can check to see if they have finished yet using the undocumented function xp_sqlagent_enum_jobs as described in this excellent article by Gregory A. Larsen. (Or have the jobs themselves update your own JOB_PROGRESS table as Chris suggests.) You would literally have to create separate job for each parallel process you anticipate running, even if they are running the same stored proc with different parameters.
OLE Automation: Use sp_oacreate and sp_oamethod to launch a new process calling the other stored proc as described in this article, also by Gregory A. Larsen.
DTS Package: Create a DTS or SSIS package with a simple branching task flow. DTS will launch tasks in individual spids.
Service Broker: If you are on SQL2005+, look into using Service Broker
CLR Parallel Execution: Use the CLR commands Parallel_AddSql and Parallel_Execute as described in this article by Alan Kaplan (SQL2005+ only).
Scheduled Windows Tasks: Listed for completeness, but I'm not a fan of this option.
I don't have much experience with Service Broker or CLR, so I can't comment on those options. If it were me, I'd probably use multiple Jobs in simpler scenarios, and a DTS/SSIS package in more complex scenarios.
One final comment: SQL already attempts to parallelize individual operations whenever it can*. This means that running 2 tasks at the same time instead of after each other is no guarantee that it will finish sooner. Test carefully to see whether it actually improves anything or not.
We had a developer that created a DTS package to run 8 tasks at the same time. Unfortunately, it was only a 4-CPU server :)
*Assuming default settings. This can be modified by altering the server's Maximum Degree of Parallelism or Affinity Mask, or by using the MAXDOP query hint.
Create a couple of SQL Server agent jobs where each one runs a particular proc.
Then from within your master proc kick off the jobs.
The only way of waiting that I can think of is if you have a status table that each proc updates when it's finished.
Then yet another job could poll that table for total completion and kick off a final proc. Alternatively, you could have a trigger on this table.
The memory implications are completely up to your environment..
UPDATE:
If you have access to the task system.. then you could take the same approach. Just have windows execute multiple tasks, each responsible for one proc. Then use a trigger on the status table to kick off something when all of the tasks have completed.
UPDATE2:
Also, if you're willing to create a new app, you could house all of the logic in a single exe...
You do need to move your overnight sprocs to jobs. SQL Server job control will let you do all of the scheduling you are asking for.
You might want to look into using DTS (which can be run from the SQL Agent as a job). It will allow you pretty fine control over which stored procedures need to wait for others to finish and what can run in parallel. You can also run the DTS package as an EXE from your own scheduling software if needed.
NOTE: You will need to create multiple copies of your connection objects to allow calls to run in parallel. Two calls using the same connection object will still block each other even if you don't explicitly put in a dependency.