How do you perform large queries at the same time with the index job? - sql

I have a large database (size 1.7 TB) and have a maintenance index job to rebuild-reorganize indexes. This job is scheduled at 11:00 pm.
This morning, i was just checking the queries that running on the server and i noticed that the index job is still running (more than 10 hours) because another t-sql query that has been running on the server more than 22 hours and locked the table that the job was trying to rebuild the indexes of it. It was like an endless progress so i had to kill the blocking session (169) to let the index job keeps running. My question is; how can i avoid locking tables that index job is working on. I know that rebuilding index is locking the table bcs its offline, but should i do some optimizing on the t-sql query which was running more than 22 hours ? Bcs this t-sql query is running oftenly by our ERP application in the day.
The query is;
SELECT T1.ACCOUNTNUM,T1.AMOUNTCUR,T1.AMOUNTMST,T1.DUEDATE,T1.RECID,T1.RECVERSION,T1.REFRECID,T1.TRANSDATE,T1.RECVERSION,T2.INVOICE
,T2.TRANSTYPE,T2.TRANSDATE,T2.AMOUNTCUR,T2.ACCOUNTNUM,T2.VOUCHER,T2.COLLECTIONLETTERCODE,T2.SETTLEAMOUNTCUR,T2.CURRENCYCODE,
T2.CUSTBILLINGCLASSIFICATION,T2.RECVERSION,T2.RECID,T3.ACCOUNTNUM,T3.PARTY,T3.CURRENCY,T3.RECID,T3.RECVERSION
FROM **CUSTTRANSOPEN** T1
CROSS JOIN CUSTTRANS T2
CROSS JOIN CUSTTABLE T3
WHERE (((T1.PARTITION=#P1) AND (T1.DATAAREAID=#P2)) AND (T1.DUEDATE<#P3)) AND (((T2.PARTITION=#P4) AND
(T2.DATAAREAID=#P5)) AND (((((((T2.TRANSTYPE<=#P6) OR (T2.TRANSTYPE=#P7)) OR ((T2.TRANSTYPE=#P8) OR (T2.TRANSTYPE=#P9)))
OR (((T2.TRANSTYPE=#P10) OR (T2.TRANSTYPE=#P11)) OR (T2.TRANSTYPE=#P12))) AND (T2.AMOUNTCUR>=#P13))
AND (T1.ACCOUNTNUM=T2.ACCOUNTNUM)) AND (T1.REFRECID=T2.RECID))) AND (((T3.PARTITION=#P14) AND (T3.DATAAREAID=#P15))
AND (T2.ACCOUNTNUM=T3.ACCOUNTNUM)) ORDER BY T1.DUEDATE OPTION(FAST 5)
** The locked table is: CUSTTRANSOPEN
I mean, for ex. should i put a WITH (NOLOCK) statement in the query ?
How do you perform large queries at the same time with the index job?
** I have standart edition sql server. So 'online rebuilding' is not possible.

You have two problems:
- Large query, which might be tuned
- Simultaneous running ALTER INDEX
Tuning query:
You may put NOLOCK only if you do not care about the result.
Your query does cartesian joins, which supposed to produce multiplication of rows of all three tables. No wonder that it takes 20 hours. It might be not the intention. So, determine what exactly you want. Here is a sample of simplified query. Verify if that produces the same logic:
SELECT T1.ACCOUNTNUM, T1.AMOUNTCUR, T1.AMOUNTMST, T1.DUEDATE, T1.RECID
, T1.RECVERSION, T1.REFRECID, T1.TRANSDATE, T1.RECVERSION, T2.INVOICE
, T2.TRANSTYPE, T2.TRANSDATE, T2.AMOUNTCUR, T2.ACCOUNTNUM, T2.VOUCHER
, T2.COLLECTIONLETTERCODE, T2.SETTLEAMOUNTCUR, T2.CURRENCYCODE, T2.CUSTBILLINGCLASSIFICATION, T2.RECVERSION
, T2.RECID, T3.ACCOUNTNUM, T3.PARTY, T3.CURRENCY, T3.RECID, T3.RECVERSION
FROM **CUSTTRANSOPEN** AS T1
INNER JOIN CUSTTRANS AS T2 ON T1.ACCOUNTNUM=T2.ACCOUNTNUM AND T1.REFRECID=T2.RECID AND
T2.PARTITION=#P4 AND T2.DATAAREAID=#P5 AND T2.AMOUNTCUR>=#P13 AND
(T2.TRANSTYPE<=#P6 OR T2.TRANSTYPE IN (#P7, #P8, #P9, #P10, #P11, #P12)
INNER JOIN CUSTTABLE AS T3 ON T2.ACCOUNTNUM=T3.ACCOUNTNUM AND T3.PARTITION=#P14 AND T3.DATAAREAID=#P15 AND
WHERE T1.PARTITION=#P1 AND T1.DATAAREAID=#P2 AND T1.DUEDATE<#P3 AND
ORDER BY T1.DUEDATE
OPTION (FAST 5);
You have to look at the execution plan
Look if plan is better if you exclude OPTION (FAST 5)
See if you can improve query by indexing.
You can do Altering indexes on one-by-one basis with exclusion of your CUSTTRANSOPEN table. and ALTER its indexes when query has finished.

Related

Small vs Large and Large vs Small sql joins [duplicate]

I was just tidying up some sql when I came across this query:
SELECT
jm.IMEI ,
jm.MaxSpeedKM ,
jm.MaxAccel ,
jm.MaxDeccel ,
jm.JourneyMaxLeft ,
jm.JourneyMaxRight ,
jm.DistanceKM ,
jm.IdleTimeSeconds ,
jm.WebUserJourneyId ,
jm.lifetime_odo_metres ,
jm.[Descriptor]
FROM dbo.Reporting_WebUsers AS wu WITH (NOLOCK)
INNER JOIN dbo.Reporting_JourneyMaster90 AS jm WITH (NOLOCK) ON wu.WebUsersId = jm.WebUsersId
INNER JOIN dbo.Reporting_Journeys AS j WITH (NOLOCK) ON jm.WebUserJourneyId = j.WebUserJourneyId
WHERE ( wu.isActive = 1 )
AND ( j.JourneyDuration > 2 )
AND ( j.JourneyDuration < 1000 )
AND ( j.JourneyDistance > 0 )
My question is does it make any performance difference the order of the joins as for the above query I would have done
FROM dbo.Reporting_JourneyMaster90 AS jm
and then joined the other 2 tables to that one
Join order in SQL2008R2 server does unquestionably affect query performance, particularly in queries where there are a large number of table joins with where clauses applied against multiple tables.
Although the join order is changed in optimisation, the optimiser does't try all possible join orders. It stops when it finds what it considers a workable solution as the very act of optimisation uses precious resources.
We have seen queries that were performing like dogs (1min + execution time) come down to sub second performance just by changing the order of the join expressions. Please note however that these are queries with 12 to 20 joins and where clauses on several of the tables.
The trick is to set your order to help the query optimiser figure out what makes sense. You can use Force Order but that can be too rigid. Try to make sure that your join order starts with the tables where the will reduce data most through where clauses.
No, the JOIN by order is changed during optimization.
The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.
I have a clear example of inner join affecting performance. It is a simple join between two tables. One had 50+ million records, the other has 2,000. If I select from the smaller table and join the larger it takes 5+ minutes.
If I select from the larger table and join the smaller it takes 2 min 30 seconds.
This is with SQL Server 2012.
To me this is counter intuitive since I am using the largest dataset for the initial query.
Usually not. I'm not 100% this applies verbatim to Sql-Server, but in Postgres the query planner reserves the right to reorder the inner joins as it sees fit. The exception is when you reach a threshold beyond which it's too expensive to investigate changing their order.
JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.
For test do the following:
select show actual execution plan and run first query
change JOIN order and now run the query again
compare execution plans
They should be identical as the query engine will reorganize them according to other factors.
As commented on other asnwer, you could use OPTION (FORCE ORDER) to use exactly the order you want but maybe it would not be the most efficient one.
AS a general rule of thumb, JOIN order should be with table of least records on top, and most records last, as some DBMS engines the order can make a difference, as well as if the FORCE ORDER command was used to help limit the results.
Wrong. SQL Server 2005 it definitely matters since you are limiting the dataset from the beginning of the FROM clause. If you start with 2000 records instead of 2 million it makes your query faster.

Why inner join with a filtered table more slowly than that without filtering?

SQL server 2008 on WINDOWS 2008
Please compare following sqls:
1.
select count(*)
from Trades t
inner join UserAccount ua on ua.AccID = t.AccID
2.
select count(*)
from Trades t
inner join (
select *
from UserAccount ua
where ua.UserID = 1126
) as theua on theua.AccID = t.AccID
3.
select count(*)
from Trades t
inner join UserAccount ua on ua.AccID = t.AccID
where ua.UserID=1126
Given Trades has millions of rows and UserAccount is a quite small table. And AccID can be duplicative.
Execution result:
234734792
8806144
8806144
I expect No.2 can be at least as fast as No.1, but actually it's much slower even slower than No.3
Time consumption:
2 secs
10 secs
8 secs
Could someone explain the reason? And is it possible to make it faster when I need a filter like UserID=1126?
is the fastest since it has the least amount of where conditions.
(The missing UserID)
is the slowest because it has an inner select which has to execute for each join
(btw: never do this)
is slower than #1 because of the extra where condition (UserID). This is the query you want to use.
(You could also swap the "where" for an "and" directly after the join on)
Do you have foreign keys set up?
Also make sure you have the appropriate Indexes (IE: AccID & UserID).
From SSMS, run the query with the Execution Plan on and it will show you potential inefficiencies in the query / indexes you should create.
In the execution plan you should look out for things like tables scans. What you want to see are seeks.

Most optimal order (of joins) for left join

I have 3 tables Table1 (with 1020690 records), Table2(with 289425 records), Table 3(with 83692 records).I have something like this
SELECT * FROM Table1 T1 /* OK fine select * is bad when not all columns are needed, this is just an example*/
LEFT JOIN Table2 T2 ON T1.id=T2.id
LEFT JOIN Table3 T3 ON T1.id=T3.id
and a query like this
SELECT * FROM Table1 T1
LEFT JOIN Table3 T3 ON T1.id=T3.id
LEFT JOIN Table2 T2 ON T1.id=T2.id
The query plan shows me that it uses 2 Merge Join for both the joins. For the first query, the first merge is with T1 and T2 and then with T3. For the second query, the first merge is with T1 and T3 and then with T2.
Both these queries take about the same time(40 seconds approx.) or sometimes Query1 takes couple of seconds longer.
So my question is, does the join order matter ?
The join order for a simple query like this should not matter. If there's a way to reorder the joins to improve performance, that's the job of the query optimizer.
In theory, you shouldn't worry about it -- that's the point of SQL. Trying to outthink the query optimizer is generally not going to give better results. Especially in MS SQL Server, which has a very good query optimizer.
I wouldn't expect this query to take 40 seconds. You might not have the right indexes defined. You should use tools like SQL Server Profiler or SQL Server Database Engine Tuning Advisor to see if it can recommend any new indexes.
The query optimizer will use a combination of the constraints, indexes, and statistics collected on the table to build an execution plan. In most cases this works well. However, I do occasionally encounter scenarios where the execution plan is chosen poorly. Often times tweaking the query can effectively coerce the optimizer into a choosing a better plan. I can offer no general rules for doing this though. When all else fails you could resort to the FORCE ORDER query hint.
And yes, the join order can have a significant impact on execution time of your query. The idea is that by joining the tables that yield the smallest results first will cause the next join to be computed more quickly. Edit: It is important to note, however, that in the abscense of FORCE ORDER and in all other things being equal the order you specify in the query may have no correlation with the way the optimizer builds the execution plan.
In general, SQL Server is smart enough to pick out the best way to join and it will not only use the order you wrote in the query. That said, I find it easier to understand a complex query if all the inner joins are first and then the left joins.

Aggregating two selects with a group by in SQL is really slow

I am currently working with a query in in MSSQL that looks like:
SELECT
...
FROM
(SELECT
...
)T1
JOIN
(SELECT
...
)T2
GROUP BY
...
The inner selects are relatively fast, but the outer select aggregates the inner selects and takes an incredibly long time to execute, often timing out. Removing the group by makes it run somewhat faster and changing the join to a LEFT OUTER JOIN speeds things up a bit as well.
Why would doing a group by on a select which aggregates two inner selects cause the query to run so slow? Why does an INNER JOIN run slower than a LEFT OUTER JOIN? What can I do to troubleshoot this further?
EDIT: What makes this even more perplexing is the two inner queries are date limited and the overall query only runs slow when looking at date ranges between the start of July and any other day in July, but if the date ranges are anytime before the the July 1 and Today then it runs fine.
Without some more detail of your query its impossible to offer any hints as to what may speed your query up. A possible guess is the two inner queries are blocking access to any indexes which might have been used to perform the join resulting in large scans but there are probably many other possible reasons.
To check where the time is used in the query check the execution plan, there is a detailed explanation here
http://www.sql-server-performance.com/tips/query_execution_plan_analysis_p1.aspx
The basic run down is run the query, and display the execution plan, then look for any large percentages - they are what is slowing your query down.
Try rewriting your query without the nested SELECTs, which are rarely necessary. When using nested SELECTs - except for trivial cases - the inner SELECT resultsets are not indexed, which makes joining them to anything slow.
As Tetraneutron said, post details of your query -- we may help you rewrite it in a straight-through way.
Have you given a join predicate? Ie join table A ON table.ColA = table.ColB. If you don't give a predicate then SQL may be forced to use nested loops, so if you have a lot of rows in that range it would explain a query slow down.
Have a look at the plan in the SQL studio if you have MS Sql Server to play with.
After your t2 statement add a join condition on t1.joinfield = t2.joinfield
The issue was with fragmented data. After the data was defragmented the query started running within reasonable time constraints.
JOIN = Cartesian Product. All columns from both tables will be joined in numerous permutations. It is slow because the inner queries are querying each of the separate tables, but once they hit the join, it becomes a Cartesian product and is more difficult to manage. This would occur at the outer select statement.
Have a look at INNER JOINs as Tetraneutron recommended.

Is this join hint dangerous?

A coworker asked me to look at indexing on some tables because his query was running very long. Over an hour.
select count(1)
from databaseA.dbo.table1
inner join databaseA.dbo.table2 on (table1.key = table2.key)
inner join databaseB.dbo.table3 on (table1.key = table3.key)
Note the different databases. This was being run from DatabaseB
Tables 1 and 2 were over 2 million records long. Table3 had a dozen records or so.
I looked at the query plan and the optimizer decided to do nested-loop index seeks into tables 1 and 2 with Table3 as the driving table!
My first assumption was that statistics were seriously messed up on Tables1 & 2 but before updating statistics I tried adding a join hint thusly:
select count(1)
from databaseA.dbo.table1
inner HASH join databaseA.dbo.table2 on (table1.key = table2.key)
inner join databaseB.dbo.table3 on (table1.key = table3.key)
Results returned in 15 seconds.
Since I was short on time, I passed the results back to him but I'm worried that this might result in problems down the road.
Should I revisit the statistics issue and resolve the problem that way? Could the bad query plan have resulted from the join being from a separate databases?
Can anyone offer me some ideas based on your experience?
I would suspect the statistics first.
As you are no doubt aware, Join hints should be avoided in 99% of cases and used only when you have proof that they are absolutely required.
Check statistics, and indexing on the table first. Index hints can cause problems. If the data in the tables changes the optimizer will be unable to choose a more efficent plan since you have forced it to always use a hash.
Wouldn't a nested loop be the most appropiate? Take the 12 records from Table 3, ,match to the 12 records in Table 1, match to 12 records in Table 2.
Otherwise, your hash join would enforce ordering as well - meaning you'd hash 1 million records from Table 1 and Table 2, then join to the 12 records in Table 3.
I'd look at statistics for both the plans - and I'd suspect the loop join is actually more efficient, but was blocked or your hash join was taking advantage of cached data.
But - yeah - in general, join hints are a last resort.
A slow-running query involving linked servers might have to do with collation.
See here for some background: http://blogs.msdn.com/psssql/archive/2008/02/14/how-it-works-linked-servers-and-collation-compatibility.aspx
The hash join hint forces the sortorder, so that explains the performance gain.
Here's how to set the options:
EXEC master.dbo.sp_serveroption
#server=N'databaseA',
#optname=N'collation compatible',
#optvalue=N'true'
EXEC master.dbo.sp_serveroption
#server=N'databaseA',
#optname=N'use remote collation',
#optvalue=N'false'
-Edoode