Schedule a sequence of DML statements in BigQuery - google-bigquery

I've created a BQ table and need to schedule a series of DML statements on it (inserts & merge). I am trying to replicate the Oracle PL/SQL functionality where you can group DML statements into a single procedure that can be scheduled.
So, the goal is (i) group a series of DML statements into a script, and (ii) schedule the script for execution. Thank you in advance for any help.

Scripting is supported in scheduled query now. However, scripting query, when being scheduled, doesn't support setting a destination table for now. You still need to use DDL/DML to make change to existing table.
CREATE OR REPLACE destinationTable AS
SELECT *
FROM sourceTable
WHERE date >= maxDate

Related

Oracle SQL script parallel execution

For my job I need to prepare two tables (CTAS) and then do some joins between them. For this job I created a script (run it in SQL Developer) which consequentially creates these two tables one after another. Since these two tables are not related I'd like to start creating them in parallel. Is it possible in SQL script to start two table creations (or two other scripts) in parallel and then proceed when both finish their jobs?
Here's one option.
I wouldn't really CTAS - I'd rather create both tables in advance, and then insert rows into them. Why? Because this approach uses stored procedures which - in order to perform DDL (which is CTAS) - require dynamic SQL. Not that it is impossible to do that; on the contrary, but it is way simpler NOT to use it.
I'd create yet another table (let's call it table_done) which contains only one row with two columns: table_1 and table_2 whose values can be 0 (meaning: data for that table is not ready) or 1 (data ready).
Furthermore, I'd create two stored procedures which look the same; the only difference is that each of them inserts rows into its own table:
create procedure p_insert_1 as
begin
-- remove old data
execute immediate 'truncate table table_1';
-- table_1 data not ready
update table_done set table_1 = 0;
-- prepare new data
insert into table_1 (...) select ...;
-- table_1 data ready
insert into table_done (table_1) values (1);
commit;
end;
The 3rd, "main" procedure, is the one you'd run manually. What would it do? Create two one-time database jobs that run immediately, each of them starting its own p_insert procedure so that they run in parallel. That procedure would then (in a loop) check whether both columns in table_done are set to 1 and - if so - continue execution.
create procedure p_main is
l_job_1 number;
l_job_2 number;
--
l_t1_done number;
l_t2_done number;
begin
dbms_job.submit(l_job_1, 'begin p_insert_1; end;');
dbms_job.submit(l_job_2, 'begin p_insert_2; end;');
loop
select table_1, table_2
into l_t1_done, l_t2_done
from table_done;
if l_t1_done = 1 and l_t2_done = 1 then
-- exit the loop
exit;
else
-- tables aren't ready yet; wait 60 seconds and try again
dbms_lock.sleep(60);
end if;
end loop;
-- process data prepared in table_1 and table_2
end;
That's just a simplified idea; I didn't test it myself so I apologize if there are any errors I made. Also,
instead of dbms_job, you could choose to use advanced dbms_scheduler
if you're on 18c (or later), use dbms_session.sleep instead of dbms_lock.sleep
and so forth
Use SQL parallelism instead of process concurrency. While the words parallelism and concurrency are colloquially interchangeable, in Oracle they have different meanings. Parallelism implies that the SQL engine handles all the coordination of breaking work into little pieces, running those pieces at the same time, and then re-assembling the results at the end. Concurrency implies that the user will create multiple sessions and handle the coordination manually.
For simply creating two tables, parallelism will probably be simpler and faster than concurrency. For parallelism, you may only need to create the table in parallel. (And you probably want to reset the parallelism back to none at the end.)
CREATE TABLE TABLE1 PARALLEL 2 AS SELECT ...;
ALTER TABLE TABLE1 NOPARALLEL;
The PARALLEL 2 option instructs Oracle to run two server processes at the same time while the SQL statement is running. You can easily increase that number, but don't go too high or you'll be stealing too many resources from other sessions.
DBMS_SCHEDULER and other concurrency mechanisms are powerful and useful, but I recommend avoiding them if possible. Running and monitoring scheduler jobs will likely be much more complicated than the preceding code. (Although you may still need to occasionally monitor the parallel SQL statement using a tool like OEM SQL Monitor Reports to ensure that the server is actually using the requested parallelism.)

Atomicity of a job execution in SQL Server

I would like to find the proper documentation to confirm my thought about a SQL Server job I recently wrote. My fear is that data could be inconsistent for few milliseconds (timing between the start of the job execution and its end).
Let's say the job is setup to run every 30 minutes. It will only have one step with the following SQL statement:
DELETE FROM myTable
INSERT INTO myTable
SELECT *
FROM myTableTemp
Could it happens that a SELECT query would be executed exactly in between the DELETE statement and the INSERT statement and thus returning empty results?
And what if I would have created 2 steps in my job, one for the DELETE query and another for the INSERT INTO? Is the atomicity is protected by SQL Server between several steps of one job?
Thanks for your help on this one
No there is no automatic atomic handling of jobs, whether they are multiple statements or steps.
Use this:
begin transaction
delete...
insert....
... anything else you need to be atomic
commit work

SQL Server - SKIP DML statements from a given SQL Script file

I have a huge sql server script which has mix of ddl, dml operations and there is a requirement to create a clean db structure (with no data). Is it possible for a transaction to skip DML scripts through some parameter or some other way.
Thanks in Advance.
Arun
Out of the box: no. But you can wrap the DML statements in your script with something like:
if ($(RUNDML) = 1)
begin
--your dml here
end
Where RUNDML is a sqlcmd variable. You'd invoke your script with differing values of RUNDML based on whether or not you wanted data in the database being built.
Alternatively, separate the DML out into another script (or scripts) so you can choose whether to run the data portion of the build or not.

Do I have to write the "GO" word in order to execute an SQL server statement?

I have little to no experience with TSQL and SQL Server - so in MySQL when I want to execute a statement I simply write:
Select * from users
...and then hit ENTER.
However now I see many SQL Server tutorials that you have the GO word immediately after each statement. Do I have to write this? For example:
Select * from users; GO
Or I can simply write:
Select * from users; <enter key pressed...>
In SQL Server, go separates query batches. It's optional in most situations.
In earlier versions of SQL Server, you had to do a go after altering a table, like:
alter table MyTable add MyColumn int
go
select MyColumn from MyTable
If you didn't, SQL Server would parse the query batch, and complain that MyColumn didn't exist. See MSDN:
SQL Server utilities interpret GO as a
signal that they should send the
current batch of Transact-SQL
statements to an instance of SQL
Server. The current batch of
statements is composed of all
statements entered since the last GO,
or since the start of the ad hoc
session or script if this is the first
GO.
GO separates batches, as Andomar wrote.
Some SQL statements (e.g. CREATE SCHEMA) need to be the first or only statements within a batch. For example, MSDN states
The CREATE PROCEDURE statement cannot
be combined with other Transact-SQL
statements in a single batch.
Local variables are also limited to a batch, and therefore are not accessible after a GO.
Go is optional, no need to write that in your sql statements.
You don't have to. What the GO will do is execute each statement (at least in Sql Server)
As the other answerers said before me, you don't really NEED Go.
There is only one case when you have to use it, and that's when you want to create a table or view and then select from it.
For example:
create view MyView as select * from MyTable
go
select * from MyView
Without Go, Sql Server won't execute this because the select statement is not valid, because the view doesn't exist at that moment.

how to create a scheduled process in sql server

In MSSQL Server 2008, how would I go about creating a scheduled process that:
Takes the sum of a float column from specific users in a user column and then comparing which is has the greatest sum and storing that number along with the user whom has that value into a separate table on a weekly basis?
Create a SQL Server scheduled job that executes a stored procedure or raw SQL.
Based on your description, the query could look like this:
insert into table (username, sumofcolumn)
select top 1 username, sum(column)
from table2
group by username
order by sum(column) desc
Personally I prefer to write a service which performs actions periodically, since I have better control of when the actions are to be executed, and everything is in a single place.
If you want to solve your problem with database means only, just create a stored procedure implementing your logic, and call that stored procedure from a scheduled job.