Does u-sql script executes in sequence? - azure-data-lake

I am supposed to do incremental load and using below structure.
Do the statements execute in sequence i.e. TRUNCATE is never executed before first two statements which are getting data:
#newData = Extract ... (FROM FILE STREAM)
#existingData = SELECT * FROM dbo.TableA //this is ADLA table
#allData = SELECT * FROM #newData UNION ALL SELECT * FROM #existingData
TRUNCATE TABLE dbo.TableA;
INSERT INTO dbo.TableA SELECT * FROM #allData

To be very clear: U-SQL scripts are not executed statement by statement. Instead it groups the DDL/DML/OUTPUT statements in order and the query expressions are just subtrees to the inserts and outputs. But first it binds the data during compilation to their names, so your SELECT from TableA will be bound to the data (kind of like a light-weight snapshot), so even if the truncate is executed before the select, you should still be able to read the data from table A (note that permission changes may impact that).
Also, if your script fails during the execution phase, you should have an atomic execution. That means if your INSERT fails, the TRUNCATE should be undone at the end.
Having said that, why don't you use INSERT incrementally and use ALTER TABLE REBUILD periodically instead of doing the above pattern that reads the full table on every insertion?

Related

Why use MERGE statements that only do 1 operation in SQL?

In our code base I see a lot of SQL MERGE statements that only perform an UPDATE or only perform an INSERT or only perform a DELETE. I am having a hard time understanding why developers don't just write an insert or an update or a delete command.
What are the pros and cons of using merge statements that only execute one command?
From what I understand, merges are more problematic, if say a table's hash distribution column changes, or if a row gets updated more than once it will throw an exception (maybe a logic problem at that point, but will throw an exception nonetheless).
Example:
MERGE INTO dbo.my_dltd_recs AS dst
USING (SELECT *
FROM dbo.my_recs
WHERE dltd_ind = 'Y') src
ON (dst.my_skey = src.my_skey)
WHEN NOT MATCHED
-- only operation is INSERT, WHY USE MERGE
THEN INSERT (my_skey,
colA,
colB)
VALUES
( src.my_skey,
src.colA,
src.colB);

DB2 SQL statement - is it possible to A) declare a temporary table B) populate it with data then C) run a select statement against it?

I have read only access to a DB2 database and i want to create an "in flight/on the fly" or temporary table which only exists within the SQL, then populate it with values, then compare the results against an existing table.
So far I am trying to validate the premise and have the following query compiling but failing to pick anything up with the select statement.
Can anyone assist me with what I am doing wrong or advise on what I am attempting to do is possible? (Or perhaps a better way of doing things)
Thanks
Justin
--Create a table that only exists within the query
DECLARE GLOBAL TEMPORARY TABLE SESSION.TEMPEVENT (EVENT_TYPE INTEGER);
--Insert a value into the temporary table
INSERT INTO SESSION.TEMPEVENT (EVENT_TYPE) VALUES ('1');
--Select all values from the temporary table
SELECT * FROM SESSION.TEMPEVENT;
--Drop the table so the query can be run again
DROP TABLE SESSION.TEMPEVENT;
If you look at the syntax diagram of the DECLARE GLOBAL TEMPORARY TABLE statement, you may note the following block:
.-ON COMMIT DELETE ROWS---.
--●--+-------------------------+--●----------------------------
'-ON COMMIT PRESERVE ROWS-'
This means that ON COMMIT DELETE ROWS is default behavior. If you issue your statements with the autocommit mode turned on, the commit statement issued automatically after each statement implicitly, which deletes all the rows in your DGTT.
If you want DB2 not to delete rows in DGTT upon commit, you have to explicitly specify the ON COMMIT PRESERVE ROWS clause in the DGTT declaration.

Insert into select, target data unaffected

We have a simple query
INSERT INTO table2
SELECT *
FROM table1
WHERE condition;
I can read somewhere that to use INSERT INTO SELECT statement, the following condition must be fulfilled:
The existing records in the target table are unaffected
What does it mean?
INSERT is a SQL operations that add some new rows into your table, with not affect on the others. This is happening instead of UPDATE operations, that cand affect multiple rows from your table if you use a wrong WHERE Clause.

Select into tables in SQL. How are they stored?

When I run a script in PostgreSQL I usually do the following from psql:
my_database> \i my_script.sql
Where in my_script.sql I may have code like the following:
select a.run_uid, s.object_uid into temp_table from dt.table_run_group as a
inner join dt.table_segment as s on a.group_uid = s.object_uid;
In this particular case, I am only interested in creating temp_table with the results of the query.
Are these results in disk on the server? In memory? Is the table stored permanently?
Temporary tables are stored in RAM until the available memory is used up, at which time they spill onto disk. The relevant setting here is temp_buffers.
Either way, they live for the duration of a session and are dropped at the end automatically.
You can also drop them at the end of a transaction automatically (ON COMMIT DROP) or manually any time.
Temporary table are only visible to the the same user in the same session. Others cannot access it - and also not conflict with it.
Always use CREATE TABLE tbl AS .... The alternative form SELECT ... INTO tbl is discouraged since it conflicts with the INTO clause in plpgsql.
Your query could look like:
CREATE TEMP TABLE tbl AS
SELECT a.run_uid, s.object_uid
FROM dt.table_run_group a
JOIN dt.table_segment s ON a.group_uid = s.object_uid;
SELECT INTO table ... is the same as CREATE TABLE table AS ..., which creates a normal, permanent table.

Reuse a complex query result in other queries without redoing the complex query

I have a complex query in PostgreSQL and I want to use the result of it in other operations like UPDATEs and DELETEs, something like:
<COMPLEX QUERY>;
UPDATE WHERE <COMPLEX QUERY RESULT> = ?;
DELETE WHERE <COMPLEX QUERY RESULT> = ?;
UPDATE WHERE <COMPLEX QUERY RESULT> = ?;
I don't want to have to do the complex query one time for each operations. One way to avoid this is store the result in a table and use it for the WHERE and JOINS and after finishing, drop the temporary table.
I want to know if there is another way without storing the results to database, but already using the results in memory.
I already use loops for this, but I think doing only one operation for each thing will be faster than doing the operations per row.
You can loop through the query results like #phatfingers demonstrates (probably with a generic record variable or scalar variables instead of a rowtype, if the result type of the query doesn't match any existing rowtype). This is a good idea for few resulting rows or when sequential processing is necessary.
For big result sets your original approach will perform faster by an order of magnitude. It is much cheaper to do a mass INSERT / UPDATE / DELETE with one SQL command
than to write / delete incrementally, one row at a time.
A temporary table is the right thing for reusing such results. It gets dropped automatically at the end of the session. You only have to delete explicitly if you want to get rid of it right away or at the end of a transaction. I quote the manual here:
Temporary tables are automatically dropped at the end of a session, or
optionally at the end of the current transaction.
For big temporary tables it might be a good idea to run ANALYZE after they are populated.
Writeable CTE
Here is a demo for what Pavel added in his comment:
CREATE TEMP TABLE t1(id serial, txt text);
INSERT INTO t1(txt)
VALUES ('foo'), ('bar'), ('baz'), ('bax');
CREATE TEMP TABLE t2(id serial, txt text);
INSERT INTO t2(txt)
VALUES ('foo2'),('bar2'),('baz2');
CREATE TEMP TABLE t3 (id serial, txt text);
WITH x AS (
UPDATE t1
SET txt = txt || '2'
WHERE txt ~~ 'ba%'
RETURNING txt
)
, y AS (
DELETE FROM t2
USING x
WHERE t2.txt = x.txt
RETURNING *
)
INSERT INTO t3
SELECT *
FROM y
RETURNING *;
Read more in the chapter Data-Modifying Statements in WITH in the manual.
DECLARE
r foo%rowtype;
BEGIN
FOR r IN [COMPLEX QUERY]
LOOP
-- process r
END LOOP;
RETURN;
END