Hive multiple insert on the same destination table at the same time - hive

I am running a java process where 10 different threads are running. These threads will simultaneously insert record after certain interval in a Managed hive table.
My question is "if all 10 Threads send insert query to managed hive table at same time."
Will there be any query timeout? If yes then what is solution.

Related

Mitigate Redshift Locks?

Hi I am running ETL via Python .
I have simple sql file that I run from Python like
truncate table foo_stg;
insert into foo_stg
(
select blah,blah .... from tables
);
truncate table foo;
insert into foo
(
select * from foo_stg
);
This query sometimes takes lock on table which it does not release .
Due to which other processes get queued .
Now I check which table has the lock and kill the process that had caused the lock .
I want to know what changes I can make in my code to mitigate such issues ?
Thanks in Advance!!!
The TRUNCATE is probably breaking your transaction logic. Recommend doing all truncates upfront. I'd also recommend adding some processing logic to ensure that each instance of the ETL process either: A) has exclusive access to the staging tables or B) uses a separate set of staging tables.
TRUNCATE in Redshift (and many other DBs) does an implicit COMMIT.
…be aware that TRUNCATE commits the transaction in which it is run.
Redshift tries to makes this clear by returning the following INFO message to confirm success: TRUNCATE TABLE and COMMIT TRANSACTION. However, this INFO message may not be displayed by the SQL client tool. Run the SQL in psql to see it.
in my case, I created a table the first time and tried to load it from the stage table using insert into a table from select c1,c2,c3 from stage;I am running this using python script.
The table is locking and not loading the data. Another interesting scenario is when I run the same insert SQL from the editor, it is loading, and after that my python script loads the same table without any locks. But the first time only the table lock is happening. Not sure what is the issue.

Check table and run stored procedure

I am working in SQL Server and here is the scenario.
Below is the workflow:
We will get data from upstream for every 10 - 15 secs. Upstream will directly insert data into our tables - this is our Staging table
We wrote an after insert trigger on this staging table which will move data from Staging to Master table. And now master table also has an After insert trigger which will take 23 secs to complete
We are not sure when is the final batch of data will be submitted by upstream
Problem area where solution needed
As upstream tries to push data into our staging table for every 10 secs and if the trigger on Staging start it will wait until trigger on master table is completed. Now during time of execution 24 sec if we get data from upstream as STG is locked data received now is dropped due to locks
If there is any potential work around please let us know
Like Dan Guzman suggested you could just get rid of staging table and process everything using stored procedure.
Maybe a better, however a trickier, way would be use a message queue instead of a staging table and activation procedure that will deal with rows based on your business logic.
This could be a more suited and less blocking way for streaming data.

Performance tuning in SQL Server table

How to do performance tuning for a SQL Server table to speed up the inserts?
For example in an Employee table I have 150 000 records. When I am trying to insert a few more records (around 20k), it is taking 10-15 minutes.
Performance tuning using wait stats is a good approach in your case..below are few steps i would do
step1:
Run insert query
step2:
open another session and run below
select * from sys.dm_exec_requests
Now status and wait type column should give you enough info on what are your next steps
Ex:
If status is blocked(normally inserts won't be blocked),check the blocking query and see why it is blocked
Above is just an example and there is more info online for any wait type you might encounter
My suggestion for speeding inserts is to do a bulk insert into a temporary table and then a single insert into the final table.
This assumes that the source of the new records is an external file.
In any case, your question leaves lots of important information unexplained:
How are you doing the inserts now? It should not be taking 10-15 minutes to insert 20k records.
How many indexes are on the table?
How large is each record?
What triggers are on the table?
What other operations are taking place on the server?
What is the source of the records being inserted?
Do you have indexed views that use the table?
There are many reasons why inserts could be slow.
Some ideas in addition to checking for locks
Disable any on insert triggers if They exist and incorporate there logic into your insert. Also disable any indexes on table and reenable post bulk insert.

Data flow insert lock

I have an issue with my data flow task locking, this task compares a couple of tables, from the same server and the result is inserted into one of the tables being compared. The table being inserted into is being compared by a NOT EXISTS clause.
When performing fast load the task freezes with out errors when doing a regular insert the task gives a dead lock error.
I have 2 other tasks that perform the same action to the same table and they work fine but the amount of information being inserted is alot smaller. I am not running these tasks in parallel.
I am considering using no locks hint to get around this because this is the only task that writes to a cerain table partition, however I am only coming to this conclusion because I can not figure out anything else, aside from using a temp table, or a hashed anti join.
Probably you have so called deadlock situation. You have in your DataFlow Task (DFT) two separate connection instances to the same table. The first conn instance runs SELECT and places Shared lock on the table, the second runs INSERT and places a page or table lock.
A few words on possible cause. SSIS DFT reads table rows and processes it in batches. When number of rows is small, read is completed within a single batch, and Shared lock is eliminated when Insert takes place. When number of rows is substantial, SSIS splits rows into several batches, and processes it consequentially. This allows to perform steps following DFT Data Source before the Data Source completes reading.
The design - reading and writing the same table in the same Data Flow is not good because of possible locking issue. Ways to work it out:
Move all DFT logic inside single INSERT statement and get rid of DFT. Might not be possible.
Split DFT, move data into intermediate table, and then - move to the target table with following DFT or SQL Command. Additional table needed.
Set a Read Committed Snapshot Isolation (RCSI) on the DB and use Read Committed on SELECT. Applicable to MS SQL DB only.
The most universal way is the second with an additional table. The third is for MS SQL only.

MSSQL Automatic Merge Database

I have a PC that has a MSSQL database with 800 variables being populated every second. I need that database to merge/backup to a second database on another server PC at least every 10 minutes. Additionally, the first database needs to be wiped clean once per week, in order to save local drive space, so that only 1 week's worth of data is stored on that first database at any given time; meanwhile, the second database keeps everything intact and never gets cleared, only being added upon by the merges that occur every 10 minutes.
To my knowledge, this means I cannot rely on database mirroring, since the first one will be wiped every week. So from what I have gathered, this means I have to have scheduled merges going on every 10 minutes.
I will readily admit I know next to nothing about SQL. So my two questions are:
How do I set up scheduled merges to occur from one database to another in 10 minute frequencies?
How do I set a database to be scheduled/scripted so that it gets cleared every week?
(Note: both databases are running on MS SQL Server 2012 Standard.)
Assuming you can create a linked server on server A that connects to server B (Here's a guide)
Then create a trigger on your table, for example table1:
CREATE TRIGGER trigger1
ON table1
AFTER INSERT
AS
INSERT INTO ServerB.databaseB.dbo.table1
select *
from inserted
More on triggers here.
For part 2, you can schedule a job to truncate the table on whatever schedule you would like. How to create a scheduled job.
The trigger only fires on Inserts so deleting rows does nothing to the table on server B.
How is the purging/deleting of the data happening, via a stored proc? If so, you could also try transactional replication, and replicate the execution of that particular stored proc, but dummy the proc on the subscriber, so when the proc gets replicated and executed on the subscriber, nothing will get deleted/purged.