join large tables - transactional log full due to active transaction

join large tables - transactional log full due to active transaction - sql

I have two large tables with 60 million, resp. 10 million records. I want to join both tables together however the process runs for 3 hours then comes back with the error message:
the transaction log for database is full due to 'active_transaction'
The autogrowth is unlimited and I have set the DB recovery to simple
The size of the log drive is 50 GB
I am using SQL server 2008 r2.
The SQL query I am using is:
Select * into betdaq.[dbo].temp3 from
(Select XXXXX, XXXXX, XXXXX, XXXXX, XXXXX
from XXX.[dbo].temp1 inner join XXX.[dbo].temp2
on temp1.Date = temp2.[Date] and temp1.cloth = temp2.Cloth nd temp1.Time = temp1.Time) a

A single command is a transaction and the transaction does not commit until the end.
So you are filling up the transaction log.
You are going to need to loop and insert like 100,000 rows at a time
Start with this just to test the first 100,000
Then will need to add loop with a cursor
create table betdaq.[dbo].temp3 ...
insert into betdaq.[dbo].temp3 (a,b,c,d,e)
Select top 100000 with ties XXXXX, XXXXX, XXXXX, XXXXX, XXXXX
from XXX.[dbo].temp1
join XXX.[dbo].temp2
on temp1.Date = temp2.[Date]
and temp1.Time = temp1.Time
and temp1.cloth = temp2.Cloth
order by temp1.Date, temp1.Time
And why? That is a LOT of data. Could you use a View or a CTE?
If those join columns are indexed a View will be very efficient.

Transaction log can be full even though database is in simple recovery model,even though select into is a minimally logged operation,log can become full due to other transactiosn running in parallel as well.
I would use below queries to check tlog space usage by transactions while the query is runnnig
select * from sys.dm_db_log_space_usage
select * from sys.dm_tran_database_transactions
select * from sys.dm_tran_active_transactions
select * from sys.dm_tran_current_transaction
further below query can be used to check sql text also
https://gallery.technet.microsoft.com/scriptcenter/Transaction-Log-Usage-By-e62ba57d

Related

SQL Server Cross Database Query from On-Premises to Azure

Two SQL Server
On-Premises
Azure
When I run the T-sql
On-Premises only
Select top 100 * from Orders
The result is very fast as usual.
Azure Only
Select top 100 * from Orders_2
The same, Fast.
This is the point.
No matter what I used "Link-Server" or "OPENDATASOURCE".
Select top 100 * from Orders a LEFT OUTER JOIN
[AZUREDB].DB01.dbo.Orders_2 a2 ON a2.ID= a.ID
OR
Select top 100 * from Orders a LEFT OUTER JOIN
OPENDATASOURCE('SQLOLEDB','Data Source=AzureDB;User
ID=XXX;Password=XXX') .DB01.dbo.Orders_2 a2 ON a2.ID= a.ID
It takes a very long time, about 15 mins.
What's happened? and how to fix it?

Because it needs to pull the whole table from the remote server into the local server and then afterwards do the TOP 100. If you look at the query plan, which you haven't shown us, you will see that's what is happening.
Instead, filter the remote server's data first before joining.
Select
*
from Orders a
LEFT OUTER JOIN (
SELECT TOP (100)
*
FROM [AZUREDB].DB01.dbo.Orders_2
) a2 ON a2.ID = a.ID
Whether that works with the desired results, i don't know, as you haven't shown what you want.
The other alternative is to ensure the remote server's table has an index (probably clustered) on ID. This means that hopefully your query can just pass 100 rows from your own server to the remote one to join it up.

How to change query status from suspended to runnable?

I have a query that needs to update 2 million records but there is no space in the disk, so the query is suspended right now. After that, I free up some space, but the query is still in suspended. So how should I change the status to Runnable or is there any way to tell sql server that you have enough space right now, and you can run the query.

After that, I free up some space, but the query is still in suspended.is there any way to tell sql server that you have enough space right now, and you can run the query.
SQLSERVER will change the query status from suspended to runnable automatically,it is not managed by you..
Your job here is to check ,why the query is suspended..Below dmvs can help
select session_id,blocking_session_id,wait_resource,wait_time,
last_wait_type from sys.dm_exec_requests
where session_id=<< your session id>>
There are many reasons why a query gets suspended..some of them include locking/blocking,rollback,getting data from disk..
You will have to check the status as per above dmv and see what is the reason and troubleshoot accordingly..
Below is some sample piece of code which can help you in understanding what suspended means
create table t1
(
id int
)
insert into t1
select row_number() over (order by (select null))
from
sys.objects c
cross join
sys.objects c1
now in one tab of ssms:
run below query
begin tran
update t1
set id=id+1
Open another tab and run below query
select * from t1
Now open another tab and run below query
select session_id,blocking_session_id,wait_resource,wait_time,
last_wait_type,status from sys.dm_exec_requests
where session_id=<< your session id of select >>
or run below query
select session_id,blocking_session_id,wait_resource,wait_time,
last_wait_type,status from sys.dm_exec_requests
where blocking_session_id>0
You can see status as suspended due to blocking,once you clear the blocking(by committing transaction) , you will see sql server automatically resumes suspended query in this case

How do you perform large queries at the same time with the index job?

I have a large database (size 1.7 TB) and have a maintenance index job to rebuild-reorganize indexes. This job is scheduled at 11:00 pm.
This morning, i was just checking the queries that running on the server and i noticed that the index job is still running (more than 10 hours) because another t-sql query that has been running on the server more than 22 hours and locked the table that the job was trying to rebuild the indexes of it. It was like an endless progress so i had to kill the blocking session (169) to let the index job keeps running. My question is; how can i avoid locking tables that index job is working on. I know that rebuilding index is locking the table bcs its offline, but should i do some optimizing on the t-sql query which was running more than 22 hours ? Bcs this t-sql query is running oftenly by our ERP application in the day.
The query is;
SELECT T1.ACCOUNTNUM,T1.AMOUNTCUR,T1.AMOUNTMST,T1.DUEDATE,T1.RECID,T1.RECVERSION,T1.REFRECID,T1.TRANSDATE,T1.RECVERSION,T2.INVOICE
,T2.TRANSTYPE,T2.TRANSDATE,T2.AMOUNTCUR,T2.ACCOUNTNUM,T2.VOUCHER,T2.COLLECTIONLETTERCODE,T2.SETTLEAMOUNTCUR,T2.CURRENCYCODE,
T2.CUSTBILLINGCLASSIFICATION,T2.RECVERSION,T2.RECID,T3.ACCOUNTNUM,T3.PARTY,T3.CURRENCY,T3.RECID,T3.RECVERSION
FROM **CUSTTRANSOPEN** T1
CROSS JOIN CUSTTRANS T2
CROSS JOIN CUSTTABLE T3
WHERE (((T1.PARTITION=#P1) AND (T1.DATAAREAID=#P2)) AND (T1.DUEDATE<#P3)) AND (((T2.PARTITION=#P4) AND
(T2.DATAAREAID=#P5)) AND (((((((T2.TRANSTYPE<=#P6) OR (T2.TRANSTYPE=#P7)) OR ((T2.TRANSTYPE=#P8) OR (T2.TRANSTYPE=#P9)))
OR (((T2.TRANSTYPE=#P10) OR (T2.TRANSTYPE=#P11)) OR (T2.TRANSTYPE=#P12))) AND (T2.AMOUNTCUR>=#P13))
AND (T1.ACCOUNTNUM=T2.ACCOUNTNUM)) AND (T1.REFRECID=T2.RECID))) AND (((T3.PARTITION=#P14) AND (T3.DATAAREAID=#P15))
AND (T2.ACCOUNTNUM=T3.ACCOUNTNUM)) ORDER BY T1.DUEDATE OPTION(FAST 5)
** The locked table is: CUSTTRANSOPEN
I mean, for ex. should i put a WITH (NOLOCK) statement in the query ?
How do you perform large queries at the same time with the index job?
** I have standart edition sql server. So 'online rebuilding' is not possible.

You have two problems:
- Large query, which might be tuned
- Simultaneous running ALTER INDEX
Tuning query:
You may put NOLOCK only if you do not care about the result.
Your query does cartesian joins, which supposed to produce multiplication of rows of all three tables. No wonder that it takes 20 hours. It might be not the intention. So, determine what exactly you want. Here is a sample of simplified query. Verify if that produces the same logic:
SELECT T1.ACCOUNTNUM, T1.AMOUNTCUR, T1.AMOUNTMST, T1.DUEDATE, T1.RECID
, T1.RECVERSION, T1.REFRECID, T1.TRANSDATE, T1.RECVERSION, T2.INVOICE
, T2.TRANSTYPE, T2.TRANSDATE, T2.AMOUNTCUR, T2.ACCOUNTNUM, T2.VOUCHER
, T2.COLLECTIONLETTERCODE, T2.SETTLEAMOUNTCUR, T2.CURRENCYCODE, T2.CUSTBILLINGCLASSIFICATION, T2.RECVERSION
, T2.RECID, T3.ACCOUNTNUM, T3.PARTY, T3.CURRENCY, T3.RECID, T3.RECVERSION
FROM **CUSTTRANSOPEN** AS T1
INNER JOIN CUSTTRANS AS T2 ON T1.ACCOUNTNUM=T2.ACCOUNTNUM AND T1.REFRECID=T2.RECID AND
T2.PARTITION=#P4 AND T2.DATAAREAID=#P5 AND T2.AMOUNTCUR>=#P13 AND
(T2.TRANSTYPE<=#P6 OR T2.TRANSTYPE IN (#P7, #P8, #P9, #P10, #P11, #P12)
INNER JOIN CUSTTABLE AS T3 ON T2.ACCOUNTNUM=T3.ACCOUNTNUM AND T3.PARTITION=#P14 AND T3.DATAAREAID=#P15 AND
WHERE T1.PARTITION=#P1 AND T1.DATAAREAID=#P2 AND T1.DUEDATE<#P3 AND
ORDER BY T1.DUEDATE
OPTION (FAST 5);
You have to look at the execution plan
Look if plan is better if you exclude OPTION (FAST 5)
See if you can improve query by indexing.
You can do Altering indexes on one-by-one basis with exclusion of your CUSTTRANSOPEN table. and ALTER its indexes when query has finished.

Why inner join with a filtered table more slowly than that without filtering?

SQL server 2008 on WINDOWS 2008
Please compare following sqls:
1.
select count(*)
from Trades t
inner join UserAccount ua on ua.AccID = t.AccID
2.
select count(*)
from Trades t
inner join (
select *
from UserAccount ua
where ua.UserID = 1126
) as theua on theua.AccID = t.AccID
3.
select count(*)
from Trades t
inner join UserAccount ua on ua.AccID = t.AccID
where ua.UserID=1126
Given Trades has millions of rows and UserAccount is a quite small table. And AccID can be duplicative.
Execution result:
234734792
8806144
8806144
I expect No.2 can be at least as fast as No.1, but actually it's much slower even slower than No.3
Time consumption:
2 secs
10 secs
8 secs
Could someone explain the reason? And is it possible to make it faster when I need a filter like UserID=1126?

is the fastest since it has the least amount of where conditions.
(The missing UserID)
is the slowest because it has an inner select which has to execute for each join
(btw: never do this)
is slower than #1 because of the extra where condition (UserID). This is the query you want to use.
(You could also swap the "where" for an "and" directly after the join on)
Do you have foreign keys set up?
Also make sure you have the appropriate Indexes (IE: AccID & UserID).
From SSMS, run the query with the Execution Plan on and it will show you potential inefficiencies in the query / indexes you should create.
In the execution plan you should look out for things like tables scans. What you want to see are seeks.

SQL queries slow when running in sequence, but quick when running separately

I have a table that I will populate with values from an expensive calculation (with xquery from an immutable XML column). To speed up deployment to production I have precalculated values on a test server and saved to a file with BCP.
My script is as follows
-- Lots of other work, including modifying OtherTable
CREATE TABLE FOO (...)
GO
BULK INSERT FOO
FROM 'C:\foo.dat';
GO
-- rerun from here after the break
INSERT INTO FOO
(ID, TotalQuantity)
SELECT
e.ID,
SUM(e.Quantity) as TotalQuantity
FROM (select
o.ID,
h.n.value('TotalQuantity[1]/.', 'int') as TotalQuantity
FROM dbo.OtherTable o
CROSS APPLY XmlColumn.nodes('(item/.../salesorder/)') h(n)
WHERE o.ID NOT IN (SELECT DISTINCT ID FROM FOO)
) as E
GROUP BY e.ID
When I run the script in management studio the first two statements completes within seconds, but the last statement takes 4 hours to complete. Since no rows are added to the OtherTable since my foo.dat was computed management studio reports (0 row(s) affected).
If I cancel the query execution after a couple of minutes and selects just the last query and run that separately it completes within 5 seconds.
Notable facts:
The OtherTable contains 200k rows and the data in XmlColumn is pretty large, total table size ~3GB
The FOO table gets 1.3M rows
What could possibly make the difference?
Management studio has implicit transactions turned off. Is far as I can understand each statement will then run in its own transaction.
Update:
If I first select and run the script until -- rerun from here after the break, then select and run just the last query, it is still slow until I cancel execution and try again. This at least rules out any effects of running "together" with the previous code in the script and boils down to the same query being slow on first execution and fast on the second (running with all other conditions the same).

Probably different execution plans. See Slow in the Application, Fast in SSMS? Understanding Performance Mysteries.

Could it possibly be related to the statistics being completely wrong on the newly created Foo table? If SQL Server automatically updates the statistics when it first runs the query, the second run would have its execution plan created from up-to-date statistics.
What if you check the statistics right after the bulk insert (with the STATS_DATE function) and then checks it again after having cancelled the long-running query? Did the stats get updated, even though the query was cancelled?
In that case, an UPDATE STATISTICS on Foo right after the bulk insert could help.

Not sure exactly why it helped, but i rewrote the last query to an left outer join instead and suddenly the execution dropped to 15 milliseconds.
INSERT INTO FOO
(ID, TotalQuantity)
SELECT
e.ID,
SUM(e.Quantity) as TotalQuantity
FROM (select
o.ID,
h.n.value('TotalQuantity[1]/.', 'int') as TotalQuantity
FROM dbo.OtherTable o
INNER JOIN FOO f ON o.ID = f.ID
CROSS APPLY o.XmlColumn.nodes('(item/.../salesorder/)') h(n)
WHERE f.ID = null
) as E
GROUP BY e.ID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

join large tables - transactional log full due to active transaction - sql

Related

SQL Server Cross Database Query from On-Premises to Azure

How to change query status from suspended to runnable?

How do you perform large queries at the same time with the index job?

Why inner join with a filtered table more slowly than that without filtering?

SQL queries slow when running in sequence, but quick when running separately

Categories

Resources