Hi I am running ETL via Python .
I have simple sql file that I run from Python like
truncate table foo_stg;
insert into foo_stg
(
select blah,blah .... from tables
);
truncate table foo;
insert into foo
(
select * from foo_stg
);
This query sometimes takes lock on table which it does not release .
Due to which other processes get queued .
Now I check which table has the lock and kill the process that had caused the lock .
I want to know what changes I can make in my code to mitigate such issues ?
Thanks in Advance!!!
The TRUNCATE is probably breaking your transaction logic. Recommend doing all truncates upfront. I'd also recommend adding some processing logic to ensure that each instance of the ETL process either: A) has exclusive access to the staging tables or B) uses a separate set of staging tables.
TRUNCATE in Redshift (and many other DBs) does an implicit COMMIT.
…be aware that TRUNCATE commits the transaction in which it is run.
Redshift tries to makes this clear by returning the following INFO message to confirm success: TRUNCATE TABLE and COMMIT TRANSACTION. However, this INFO message may not be displayed by the SQL client tool. Run the SQL in psql to see it.
in my case, I created a table the first time and tried to load it from the stage table using insert into a table from select c1,c2,c3 from stage;I am running this using python script.
The table is locking and not loading the data. Another interesting scenario is when I run the same insert SQL from the editor, it is loading, and after that my python script loads the same table without any locks. But the first time only the table lock is happening. Not sure what is the issue.
Related
We use a DB2 database. Some datawarehouse tables are TRUNCATEd and reloaded every day. We run into deadlock issues when another process is running an INSERT statement against that same table.
Scenario
TRUNCATE is executed on a table.
At the same time another process INSERTS some data in the same table.(The process is based on a trigger and can start at any time )
is there a work around?
What we have thought so far is to prioritize the truncate and then go thruogh with the insert. Is there any way to iplement this. Any help would be appreciated.
You should request a table lock before you execute the truncate.
If you do this you can't get a deadlock -- the table lock won't be granted before the insert finishes and once you have the lock another insert can't occur.
Update from comment:
You can use the LOCK TABLE command. The details depend on your situation but you should be able too get away with SHARED mode. This will allow reads but not inserts (this is the issue you are having I believe.)
It is possible this won't fix your problem. That probably means your insert statement is to complicated -- maybe it is reading from a bunch of other tables or from a federated table. If this is the case, re-architect your solution to include a staging table (first insert into the staging table .. slowly.. then insert into the target table from the staging table).
I would like to know what will happen if a hive SELECT and INSERT OVERWRITE is running at the same time. Please help me to understand what will hive query return in the below scenarios.
Run the query first, while the query is running, INSERT OVERWRITE the same table.
Run the INSERT OVERWRITE first, while overwriting, pull the data from the same table with SELECT.
Are we going to get the old data, new data, mixed data, nothing, or unpredictable data?
I am using MapR 4.0.1, Hive 0.13.
Best regards,
Ryan
Read Hive Locking:
For a non-partitioned table, the lock modes are pretty intuitive. When the table is being read, a S lock is acquired, whereas an X lock is acquired for all other operations (insert into the table, alter table of any kind etc.)
So SELECT and INSERT acquire incompatible locks so they can never run in parallel. One will acquire the lock first and the other will wait.
For partitioned tables things are a bit more complex as the locks acquire are hierarchical (S on table, S/X on partition). Read the link.
I want to insert data from database1 into database2 when something changes. Like a trigger but the thing is I am not allowed to create a trigger in database1.
I basically need to insert certain data into a table from a database into another database as they happen.
Any suggestions would be great!
Thanx
You can create a program to do that, there are many ways instead but I think this is the straightforward and easy one.
For example if you work with SQL server you can create C# code that connect to the two databases and check for data in the first one then send it to the second.
For example:
you can create a trigger for the first DB tables you want to check them, then you can create a web service that check if the trigger fire, it will get the data from first database and send it to the second, this will be better to enhance and you can change the code to do whatever you want without making any changes from the database side
MS SQL Server Replication
I think suitable for you is Transactional Replication
Transactional replication is asynchronous but close to real time
What about the merge operator ?
Merge operator will commit changes into the target table from the source table when something change.
As you cannot use trigger this cannot be a real time process but with a process scheduler like SQL Agent, you can run it each 10 seconds depending on your table size.
https://msdn.microsoft.com/en-us/library/bb510625.aspx
You can do it when your source come from multiple table by using CTE like that :
With TableSources As
(
Select id,Column1, Column2 from table1
UNION
Select id,Colum1,Column2 from table2
)
MERGE INTO TargetTable
USING TableSources ON TargetTable.id=TableSources.id
WHEN NOT MATCHED BY Target
Insert(Column1,Column2) Values(Column1,Column2)
WHEN NOT MATCHED BY Source
Delete
WHEN MATCHED
Update Set Column1=TableSources.Column1, Column2=TableSources.Column2
;
I needed to load 100,000 rows of data from an excel file into a temporary table that I created using "on commit preserve rows". But somehow the most efficient methods did not seem to populate the temporary table due to session issues?
I used Toad to Import Table Data and it showed that x amount of records are imported. But when I select from the temp table, it was empty. Then I generated a bunch of insert scripts and saved them in a notepad.sql and called it from toad editor using #/script/location/notepad.sql and hit F5. It ran and showed how many records were inserted. Again the temp table was somehow still empty. So, I decided to run a random insert script manually in the editor and it showed up in the temp table. I believe the methods that didn't work are not considered to be the same session?
I haven't try SQLLDR but I am assuming it will not work judging from the methods I tried. Can someone confirm? I can't access SQLLDR so I won't know.
Is there anyway to get this to work? I can't run the insert scripts manually. That will be time consuming and Toad can't take that many scripts at the same time.
Oracle temp tables created with ON COMMIT PRESERVE ROWS are session-specific, so the data put into them is only visible within a single session, and for the duration of that session. Toad may be creating a separate session for each window and thus data which is populated from one window/session isn't visible from another window/session. The fact that you can run an insert script and then select the data back suggests this may be the case if both operations were done from the same window. I expect you'd see the same behavior if you used SQL*Loader to load the tables because the load would run in one session and the data would be discarded when the session terminated. Best of luck.
Is there a way in Oracle to create a table that only exists while the database is running and is only stored in memory? So if the database is restarted I will have to recreate the table?
Edit:
I want the data to persist across sessions. The reason being that the data is expensive to recreate but is also highly sensitive.
Using a temporary table would probably help performance compared to what happens today, but its still not a great solution.
You can create a 100% ephemeral table that is usable for the duration of a session (typically shorter than the duration than the database run time) called a TEMPORARY table. The entire purpose of a table in memory is to make it faster for reading from. You will have to re-populate the table for each session as the table will be forgotten (both structure and data) once the session completes.
No exactly, no.
Oracle has the concept of a "global temporary table". With a global temporary table, you create the table once, as with any other table. The table definition will persist permanently, as with any other table.
The contents of the table, however, will will not be permanent. Depending on how you define it, the contents will persist for either the life of the session (on commit perserve rows) or the life of the transaction (on commit delete rows).
See the documentation for all the details:
http://docs.oracle.com/cd/E11882_01/server.112/e25494/tables003.htm#ADMIN11633
Hope that helps.
You can use Oracle's trigger mechanism to invoke a stored procedure when the database starts up or shuts down.
That way you could have the startup trigger create the table, and the shutdown trigger drop it.
You'd probably also want the startup trigger to handle cases where the table exists and truncate it just in case the server stopped suddenly and the shutdown trigger wasn't called.
Oracle trigger documentation
Using Oracle's Global Temporary Tables, you can create a table in memory and have it delete the data at the end of the transaction, or the end of the session.
If I understand correctly, you have some data that needs to be processed when the database is brought online and left available only as long as the database is online. The only use-case I can think of that would require this is if you're encrypting some data and you want to ensure that the unencrypted data is never written to disk.
If this is actually your use-case, I would recommend forgetting about trying to create your own solution for this and, instead, make use of Oracle's encrypted tablespaces or Transparent Data Encryption.