SQL Server Table Lock during bulk insert - sql

Below is the sample query, consider A
INSERT INTO Target (Col1,Col2,Col3,Col4) ----------------Statement#1
Select A.Col1,B.Col2,A.Col3,C.Col4 ----------------Statement#2
FROM A WITH(NOLOCK) INNER JOIN B WITH(NOLOCK)
ON A.Id = B.ID
LEFT JOIN C WITH NOLOCK
ON C.Id = B.ID
Where A.Id = 11
At which stage the lock will be applied on table [exclusive lock?], how SQL is going to execute the query?
Result will be fetched from table A, B and C based on join and where clause.
On ready result, start inserting data in table and at same time apply the lock on table.
So when actual data is written on the page table is locked but not during select even though it is INSERT INTO with SELECT?

Those two steps are the logical steps for query execution. What SQL Server can do/do at physical level is another story. At this moment:
INSERT INTO Target (Col1,Col2,Col3,Col4) ----------------Statement#1
Select A.Col1,B.Col2,A.Col3,C.Col4 ----------------Statement#2
FROM A WITH(NOLOCK) INNER JOIN B WITH(NOLOCK)
ON A.Id = B.ID
LEFT JOIN C WITH NOLOCK
ON C.Id = B.ID
Where A.Id = 11
for every output record (see SELECT clause) it takes an X lock on a RID or a KEY within target table (RID for heap / KEY for clustered index) and it inserts that record. This steps are repeated for every output record. So, it doesn't read all records from source tables and only after this step it starts inserting records into target table. Because of NOLOCK table hint on source table it will takes only Sch-S (schema stability) locks on these tables.
If you want to take an X lock on target table then you could use
INSERT INTO Target WITH(TABLOCKX) (Col1,Col2,Col3,Col4)
SELECT ...
If you want minimally logged inserts then please read this article.

Did you specify any "Table Lock" hint. If you want to Row-level lock Set "Table Lock" to off.
or check this it will help you...
http://technet.microsoft.com/en-us/library/ms180876(v=sql.105).aspx

Related

Left excluding join with BigQuery

I have two tables (A and B) having identical structures. Table B is basically a subset of Table A. I want to retrieve all the records from Table A that are not present in Table B.
For this, I am considering Left Excluding Join (reference). Here is the query I am executing:
select a.id, a.category from a
left join b
on a.id = b.id
where b.id is null;
As per BigQuery's estimate, the query will process 44.9 GiB. However, the query is taking unusually longer than expected to complete. Am I missing out on any important bit?

SQL Inner Join w/ Unique Vals

Questions similar to this one about using DISTINCT values in an INNER JOIN have been asked a few times, but I don't see my (simple) use case.
Problem Description:
I have two tables Table A and Table B. They can be joined via a variable ID. Each ID may appear on multiple rows in both Table A and Table B.
I would like to INNER JOIN Table A and Table B on the distinct values of ID which appear in Table B and select all rows of Table A with a Table A.ID which appears matching some condition in Table B.
What I want:
I want to make sure I get only one copy of each row of Table A with a Table A.ID matching a Table B.ID which satisfies [some condition].
What I would like to do:
SELECT * FROM TABLE A
INNER JOIN (
SELECT DISTINCT ID FROM TABLE B WHERE [some condition]
) ON TABLE A.ID=TABLE B.ID
Additionally:
As a further (really dumb) constraint, I can't say anything about the SQL standard in use, since I'm executing the SQL query through Stata's odbc load command on a database I have no information about beyond the variable names and the fact that "it does accept SQL queries," ( <- this is the extent of the information I have).
If you want all rows in a that match an id in b, then use exists:
select a.*
from a
where exists (select 1 from b where b.id = a.id);
Trying to use join just complicates matters, because it both filters and generates duplicates.

Optimizing sql update syntax rather than the Server

There are two tables, one is getting updated based on second table. The SQL is working but it is taking too much time because of number of records, i think. See this fiddle. The actual master table contains 1,500,000 and child contains 700,000 records and the following sql kept executing for 4 hours, hence terminated.
UPDATE master m SET m.amnt = (SELECT amnt FROM child c WHERE c.seqn = m.seqn)
WHERE m.seqn IN (SELECT seqn FROM child);
The execution plan of this sql is (Red one is master, other is child)
seqn is the primary key. No doubt it all depends upon server's performance and stats of indices. However, It bothers me that master is not being accessed by the index and child is being read twice. It is possible that the sql is optimized but oracle decided to go in this way, however i tried to optimize the sql as
UPDATE (
SELECT m.seqn m_seqn,c.seqn c_seqn, c.amnt c_amnt, m.amnt m_amnt
FROM master m INNER JOIN child c ON m.seqn = c.seqn)
SET m_amnt = c_amnt
which resulted in following error
ORA-01779: cannot modify a column which maps to a non key-preserved
table : UPDATE ( SELECT m.seqn m_seqn,c.seqn c_seqn, c.amnt c_amnt, m.amnt m_amnt
FROM master m INNER JOIN child c ON m.seqn = c.seqn) SET m_amnt = c_amnt
Is there any way i can optimize the SQL other than updating stats and tuning up the server?
EDIT The solution by #Sebas will not work if the column to be JOINED ON is not PK
check this one out:
UPDATE
(
SELECT m.amnt AS tochange, c.amnt AS newvalue
FROM child c
JOIN master m ON c.seqn = m.seqn
) t
SET t.tochange = t.newvalue;
SELECT * FROM master;
fiddle: http://www.sqlfiddle.com/#!4/c6b73/2
you just missed the PK in the fiddle.

SQL Perfomance : Subselects in joins or direct joins?

I have got a question on the performance of the below tables\
Table A -- Has only 5 customer ID's(5 Rows 1 column)
Table B -- Is the master base for all Customer's and their information.(1 Million Rows and 500 Columns)
Query 1:-
Select A.*,
B.Age
from A
left join B
on A.Customer_id = B.Customer_id;
Query 2:-
Select a.*,
B.Age
from A
left join
(select Customer_id,age from B) C
on A.Customer_id = C.Customer_id;
The main question of performance here is because of the presence of 500 columns in Table B.
I feel the 2nd Query is better as SQL wont have to create a temporary table during the join containing all columns from table B.
Please let me know if this is wrong?
I feel the 2nd Query is better as SQL wont have to create a temporary table during the join containing all columns from table B.
You can tell whether Oracle does create a temporary table during the execution or not from the explain plan. You should also consider whether the Oracle kernel developers would not have got round such an obvious performance problem if it existed.
As it happens, there will be no temporary table, and there is nothing wrong with your first query. There is almost never a need to manipulate the query for performance reasons -- write queries that are the best encapsulation of the logic you require.
CREATE INDEX index_name ON table_b (customer_id)
then use
Select a.*,
B.Age
from A
left join (select Customer_id,
age
from B) C
on A.Customer_id = C.Customer_id;
500 columns is rather extensive.
Maybe you can create an index like:
CREATE INDEX index_name
ON table_b (customer_id,
age
);
sub query in select is faster than using join (no matter if direct join or sub select)
select
a.*,
(select b.age
from b
where b.customer_id = a.customer_id)
from a
note:
it behaves like outer join (return empty field in age if customer_id from b doesn't exists in a)
the sub query should return only one row from b per row from a.

Delete Query using Inner joins on more than two tables

I want to delete records from a table using inner joins on more than two tables. Say if I have tables A,B,C,D with A's pk shared in all other mentioned tables. Then how to write a delete query to delete records from table D using inner joins on table B and A since the conditions are fetched from these two tables. I need this query from DB2 perspective. I am not using IN clause or EXISTS because of their limitations.
From your description, I take the schema as:
A(pk_A, col1, col2, ...)
B(pk_B, fk_A, col1, col2, ..., foreign key fk_A references A(pk_A))
C(pk_c, fk_A, col1, col2, ..., foreign key fk_A references A(pk_A))
D(pk_d, fk_A, col1, col2, ..., foreign key fk_A references A(pk_A))
As you say DB2 will allow only 1000 rows to be deleted if IN clause is used. I don't know about DB2, but Oracle allows only 1000 manual values inside the IN clause. There is not such limit on subquery results in Oracle at least. EXISTS should not be a problem as any database, including Oracle and DB2 checks only for existence of rows, be it one or a million.
There are three scenarios on deleting data from table D:
You want to delete data from table D in which fk_A (naturally) refers to a record in table A using column A.pk_A:
DELETE FROM d
WHERE EXISTS (
SELECT 1
FROM a
WHERE a.pk_A = d.fk_A
);
You want to delete data from table D in which fk_A refers to a record in table A, and that record in table A is also referred to by column B.fk_A. We do not want to delete the data from D that is in A but not in B. We can write:
DELETE FROM d
WHERE EXISTS (
SELECT 1
FROM a
INNER JOIN b ON a.pk_A = b.fk_A
WHERE a.pk_A = d.fk_A
);
The third scenario is when we have to delete data in table D that refers to a record in table A, and that record in A is also referred by columns B.fk_A and table C.fk_A. We want to delete only that data from table D which is common in all the four tables - A, B, C and D. We can write:
DELETE FROM d
WHERE EXISTS (
SELECT 1
FROM a
INNER JOIN b ON a.pk_A = b.fk_A
INNER JOIN c ON a.pk_A = c.fk_A
WHERE a.pk_A = d.fk_A
);
Depending upon your requirement you can incorporate one of these queries.
Note that "=" operator would return an error if the subquery retrieves more than one line. Also, I don't know if DB2 supports ANY or ALL keywords, hence I used a simple but powerful EXISTS keyword which performs faster than IN, ANY and ALL.
Also, you can observe here that the subqueries inside the EXISTS clause use "SELECT 1", not "SELECT a.pk" or some other column. This is because EXISTS, in any database, looks for only existence of rows, not for any particular values inside the columns.
Based on 'Using SQL to delete rows from a table using INNER JOIN to another table'
The key is that you specify the name of the table to be deleted from
as the SELECT. So, the JOIN and WHERE do the selection and limiting,
while the DELETE does the deleting. You're not limited to just one
table, though. If you have a many-to-many relationship (for instance,
Magazines and Subscribers, joined by a Subscription) and you're
removing a Subscriber, you need to remove any potential records from
the join model as well.
DELETE subscribers
FROM subscribers INNER JOIN subscriptions
ON subscribers.id = subscriptions.subscriber_id
INNER JOIN magazines
ON subscriptions.magazine_id = magazines.id
WHERE subscribers.name='Wes';
delete from D
where fk = (select d.fk from D d,A a,B b where a.pk = b.fk and b.fk = d.fk )
this should work