Spark SQL execute multiple commands as an ACID unit

Spark SQL execute multiple commands as an ACID unit - hive

Is there a way to force multiple SQL commands to execute as a unit from pySpark ?
I'll have
Create table B;
Drop table A;
Rename table B as table A;
I would like them to execute as a unit. If something goes wrong such as B is not properly created, I would not want to drop A.

Use try and catch with nesting. Then you can control things. But if the rename fails, what then? In that sense the ACID thing does not fly.

Related

Why doesn't a PL/SQL block create a temporary table?

I would like to create and populate temporary table with data to process it inside loop statement like this:
DECLARE
cnt NUMBER;
BEGIN
SELECT COUNT(tname) INTO cnt from tab where tname = 'MY_TEMP';
IF (cnt > 0) THEN
EXECUTE IMMEDIATE 'DROP TABLE MY_TEMP';
END IF;
EXECUTE IMMEDIATE 'CREATE GLOBAL TEMPORARY TABLE MY_TEMP (G NVARCHAR2(128), F NVARCHAR2(128), V NVARCHAR2(128)) ON COMMIT DELETE ROWS';
INSERT INTO MY_TEMP VALUES (N'G-value1', N'F-value1', N'V-value1');
INSERT INTO MY_TEMP VALUES (N'G-value2', N'F-value2', N'V-value2');
...
FOR record IN (SELECT G,F,V FROM MY_TEMP)
LOOP
... Do something sophisticated with record.G, record.F, record.V
END LOOP;
COMMIT;
END;
When I run this script inside PL-SQL Developer it tells me for the very first INSERT that MY_TEMP table or view doesn't exist even though my EXECUTE IMMEDIATE 'CREATE GLOBAL TEMPORARY TABLE ... ' statement seems to be executed without errors. I checked there is no MY_TEMP table inside tables list after script execution
When I run EXECUTE IMMEDIATE 'CREATE GLOBAL TEMPORARY TABLE ... ' alone it runs ok and MY_TEMP table is really created. After this the whole scripts runs ok.
How do I use this script without manually precreating MY_TEMP table ?

How do I use this script without manually precreating MY_TEMP table ?
You can't. Unless of course you run everything after the creation of the temporary table using EXECUTE IMMEDIATE. But I cannot for a second recommend that approach.
The point is not that your script fails to run, but that it fails to compile. Oracle won't start running your block if it can't compile it first. At the point Oracle tries to compile your PL/SQL block, the table doesn't exist. You have a compilation error, not a runtime error.
I suspect that you are more familiar with temporary tables in SQL Server and are trying to use temporary tables in Oracle in the same way. If this is the case, then you will need to know that there are differences between temporary tables in Oracle and in SQL Server.
Firstly, there's no such thing as a local temporary table (i.e. a table visible to only one connected user) in Oracle. Oracle does have global temporary tables, but only the data in a global temporary table is temporary. The table itself is permanent, so once it has been created it will only be dropped if you explicitly drop it. Compare this with SQL Server temporary tables, which are dropped once all relevant users disconnect.
I really don't think you need to be creating the temporary table in your block. It should be sufficient to create it once beforehand.

Why do want to drop and create the temp table? Simply create it and use it.

The only way around for your problem is to make the whole INSERT INTO temp_table statements into EXECUTE IMMEDIATE in this way you can BYPASS the TABLE check during COMPILE Time first.
But this way in my opinion is not good at all. There are some questions in my mind which has be answred before answering this question.
1) Why Temp Table is created evertime and Dropped.
We have option in GTT to keep or Remove Data after one Oracle Session.
2) Is this script a one time job ? If Yes then we can go for once GTT creation and the rest script will work fine.

The probloem is not wityh your first insert. It is with the compile of your block. The table does not exist, but you are referencing it. Try creating it beforhand so it is there such as it will be once the bloick finishes. Now the code is likely to compile as the reference to the table exists when you run it.
However, then you'll get into trouble with the drop as your code has a share lock on the table so you are not allowed to drop it.
You either have to make your selects dynamic, or make sure the table is created and dropped outrside the execution of your block.

Creating temporary table in Oracle is not best practice, instead use PIVOT

How do I replace a table in Postgres?

Basically I want to do this:
begin;
lock table a;
alter table a rename to b;
alter table a1 rename to a;
drop table b;
commit;
i.e. gain control and replace my old table while no one has access to it.

Simpler:
BEGIN;
DROP TABLE a;
ALTER TABLE a1 RENAME TO a;
COMMIT;
DROP TABLE acquires an ACCESS EXCLUSIVE lock on the table anyway. An explicit LOCK command is no better. And renaming a dead guy is just a waste of time.
You may want to write-lock the old table while preparing the new, to prevent writes in between. Then you'd issue a lock like this earlier in the process:
LOCK TABLE a IN SHARE MODE;
What happens to concurrent transactions trying to access the table? It's not that simple, read this:
Best way to populate a new column in a large table?
Explains why you may have seen error messages like this:
ERROR: could not open relation with OID 123456

Create SQL-backup, make changes you need directly at the backup.sql file and restore database. I used this trick when have added INHERIT for group of tables (Postgres dbms) to remove inherited fields from subtable.

I would use answer#13, but I agree, it will not inherit the constraints, and drop table might fail
line up the relevant constraints first (like from pg_dump --schema-only,
drop the constraints
do the swap per answer#13
apply the constraints (sql snippets from the schema dump)

Oracle : Create table in another schema and grant select and insert on it from the same schema

I have two schemas.
1.Schema A
2.Schema B
I need to do following.
I want create some tables in schema B (same as some tables in A)
Then Move data from A to B.
Now I want to do ALL this from schema A. I have written a package which when executed in A will create all the tables in B and then create synonyms for them in A. And then will just select data from its own tables and insert into B's tables.
VERY IMP : Now this entire thing has to happen in one go. Just an execution of one begin block should do the entire job.
Problem : But now the synonyms would not work because Schema A does not have any privs on the tables it created in B .
so is there a way to create tables (from A to B ) with all the privs given at the creation time ?
Or can the schemas be switched in PL-SQL while execution so that privs can be granted from B to A ? (I am sure this can not be done, but nothing is impossible they say ! :O :P so asking )
Please help me guys ! All suggestions are welcome !
The Main objective of this job is to do it in one go and from only one schema.

You can define a procedure which will run under definer rights, instead of caller rights
CREATE OR REPLACE PROCEDURE definer_test AUTHID DEFINER IS
BEGIN
...
END definer_test;
You would define such a procedure in schema B, which does the job and call it from schema A.

Re-runnable SQL Server Scripts

What are the best practices for ensuring that your SQL can be run repeatedly without receiving errors on subsequent runs?
e.g.
checking that tables don't already exist before creating them
checking that columns don't already exist before creating or renaming
transactions with rollback on error
If you drop tables that exist before creating them anew, drop their dependencies first too, and don't forget to recreate them after
Using CREATE OR ALTER PROCEDURE instead of CREATE PROCEDURE or ALTER PROCEDURE if your flavor of SQL supports it
Maintain an internal versioning scheme, so the same SQL just doesn't get run twice in the first place. This way you always know where you're at by looking at the version number.
Export the existing data to INSERT statements and completely recreate the entire DB from scratch.
dropping tables before creating them (not the safest thing ever, but will work in a pinch if you know what you're doing)
edit:
I was looking for something like this:
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[foo]')
AND OBJECTPROPERTY(object_id, N'IsUserTable') = 1 )
DROP TABLE foo
Do others use statements like this or something better?
edit:
I like Jhonny's suggestion:
IF OBJECT_ID('table_name') IS NOT NULL DROP TABLE table_name
I do this for adding columns:
IF NOT EXISTS ( SELECT *
FROM SYSCOLUMNS sc
WHERE EXISTS ( SELECT id
FROM [dbo].[sysobjects]
WHERE NAME LIKE 'TableName'
AND sc.id = id )
AND sc.name = 'ColumnName' )
ALTER TABLE [dbo].[TableName] ADD [ColumnName]

To make things easier I configure management studio to script objects as rerunnable
Tools
Options
SQL Server Object Explorer
Scripting
Object scripting options
Include IF Not Exists Clause True

I think the most important practice in ensuring that your scripts are re-runnable is to....run them against a test database multiple times after any changes to the script. The errors you encounter should shape your practices.
EDIT
In response to your edit on syntax, in general I think it is best to avoid the system tables in favor of the system views e.g.
if exists(Select 1 from information_schema.tables where table_name = 'sometable')
drop sometable
go
if exists(Select 1 from information_schema.routines where
specific_name = 'someproc')
drop someproc

To add to your list:
If you drop tables that exist before creating them anew, drop their dependencies first too, and don't forget to recreate them after
Using CREATE OR ALTER PROCEDURE instead of CREATE PROCEDURE or ALTER PROCEDURE if your flavor of SQL supports it
But ultimately, I would go with one of the following:
Maintain an internal versioning scheme, so the same SQL just doesn't get run twice in the first place. This way you always know where you're at by looking at the version number.
Export the existing data to INSERT statements and completely recreate the entire DB from scratch.

I recently found a check-in for existence that i didn't know existed and i liked it because it's shorter
IF OBJECT_ID('table_name') IS NOT NULL DROP TABLE table_name
before, i used to use
IF EXISTS (SELECT * FROM information_schema.tables WHERE table_name = 'table_name')
DROP TABLE table_name
Which i found useful because it's a little more portable (MySql, Postgres, etc), taking into account the differences, of course

For maintaining schemas, look at a migration tool. I think LiquiBase would work for SQL Server.

You'll also need to check for foreign keys on any tables that you may be dropping/recreating. Also, consider any data changes that you might make - delete rows before trying to insert a second time, etc.
You also might want to put in code to check for data before deleting tables as a safeguard so that you don't drop tables that are already being used.

For a SQL batch statement, you can issue
This is just a FYI, I just ran it 10 times
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[foo]')
AND OBJECTPROPERTY(object_id, N'IsUserTable') = 1 )
DROP TABLE foo
GO 10 -- run the batch 10 times
This is just a FYI, I just ran it 10 times
Beginning execution loop Batch
execution completed 10 times.

The "IF OBJECT_ID('table_name', 'U') IS NOT NULL" syntax is good, it can also be used for procedures:
IF OBJECT_ID('procname', 'P') IS NOT NULL
...
... and triggers, views, etc... Probably good practice to specify type (U for table, P for prog, etc.. dont remember the exact letters for all types) in case your naming strandards allow procedures and tables to have similar names...
Furthermore, a good idea might be to create your own procedures that changes tables, with error handling proper to your environment. For example:
prcTableDrop, Proc for droping a
table
prcTableColumnAdd, Proc for adding a column to a table
prcTableColumnRename, you get the idea
prcTableIndexCreate
Such procs makes creating repeatable (in same or other db) change scripts much easier.
/B

I've describe a few checks in my post DDL 'IF not Exists" conditions to make SQL scripts re-runnable

Just adding this for future searchers (including myself), such scripts are called idempotent (the noun being idempotency)

Creating table in mysql

Is it possible to create more than one table at a time using single create table statement.

For MySQL, you can use multi-query to execute multiple SQL statements in a single call. You'd issue two CREATE TABLE statements separated by a semicolon.
But each CREATE TABLE statement individually can create only one table. The syntax supported by MySQL does not allow multiple tables to be created simultaneously.
#bsdfish suggests using transactions, but DDL statements like CREATE TABLE cause implicit transaction commits. There's no way to execute multiple CREATE TABLE statements in a single transaction in MySQL.
I'm also curious why you would need to create two tables simultaneously. The only idea I could come up with is if the two tables have cyclical dependencies, i.e. they reference each other with foreign keys. The solution to that is to create the first table without that foreign key, then create the second table, then add the foreign key to the first table with ALTER TABLE ADD CONSTRAINT. Dropping either table requires a similar process in reverse.

Not with MS SQL Server. Not sure about mysql.
Can you give more info on why you'd want to do this? Perhaps there's an alternative approach.

I don't know, but I don't think you can do that. Why you want to do this?

Not in standard SQL using just the 'CREATE TABLE' statement. However, you can write multiple statements inside a CREATE SCHEMA statement, and some of those statements can be CREATE TABLE statements. Next question - does your DBMS support CREATE SCHEMA? And does it have any untoward side-effects?
Judging from the MySQL manual pages, it does support CREATE SCHEMA as a synonym for CREATE DATABASE. That would be an example of one of the 'untoward side-effects' I was referring to.
(Did you know that standard SQL does not provide a 'CREATE DATABASE' statement?)

I don't think it's possible to create more than one table with a 'CREATE TABLE' command. Everything really depends on what you want to do. If you want the creation to be atomic, transactions are probably the way to go. If you create all your tables inside a transaction, it will act as a single create statement from the perspective of anything going on outside the transaction.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Spark SQL execute multiple commands as an ACID unit - hive

Is there a way to force multiple SQL commands to execute as a unit from pySpark ? I'll have Create table B; Drop table A; Rename table B as table A; I would like them to execute as a unit. If something goes wrong such as B is not properly created, I would not want to drop A.

Use try and catch with nesting. Then you can control things. But if the rename fails, what then? In that sense the ACID thing does not fly.

Related

Why doesn't a PL/SQL block create a temporary table?

How do I replace a table in Postgres?

Oracle : Create table in another schema and grant select and insert on it from the same schema

Re-runnable SQL Server Scripts

Creating table in mysql

Categories

Resources