I have a massive query that has been working just fine, however, due to the number or records now in my database, the query is taking longer and longer for the stored procedure to complete.
I had a hard enough time getting the query to work in the first place and I'm not confident in myself to either A) Simplify the query or B) break it into smaller queries/stored procedures.
Can any expert help me out?
SELECT
r.resourceFirstName,
r.resourceLastName,
a.eventDateTime,
CONVERT(char(1), a.eventType) as eventType,
CONVERT(varchar(5), a.reasonCode) as reasonCode,
r.extension,
GETDATE() AS ciscoDate into #temp_Agent
FROM
CCX1.db_cra.dbo.Resource r
INNER JOIN CCX1.db_cra.dbo.AgentStateDetail a
ON r.resourceID = a.agentID
INNER JOIN (
SELECT
p.resourceFirstName,
p.resourceLastName,
MAX(e.eventDateTime) MaxeventDateTime
FROM
CCX1.db_cra.dbo.Resource p
INNER JOIN CCX1.db_cra.dbo.AgentStateDetail e
ON p.resourceID = e.agentID
where
e.eventDateTime > (GETDATE() - 1)
GROUP BY
p.resourceFirstName,
p.resourceLastName
) d
ON r.resourceFirstName = d.resourceFirstName
AND r.resourceLastName = d.resourceLastName
AND a.eventDateTime = d.MaxeventDateTime
AND r.active = 1
where
a.eventDateTime >= (GETDATE() - 7)
ORDER BY
r.resourceLastName,
r.resourceFirstName ASC
Can't give the correct answer having only the query. But...
Consider putting an index on "eventDateTime".
It appears you are joining with a set of records within 1 day. That would make the 7 day filter in the outer query irrelevant. I don't have the ability to test, but maybe your query can be reduced to this? (pseudo code below)
Also consider different solutions. Maybe partition a table based on the datetime. Maybe have a separate database for reporting using star schemas or cube design.
What is being done with the temporary table #temp_Agent?
declare #max datetime = (select max(eventDateTime)
from CCX1.db_cra.dbo.AgentStateDetail
where active=1
and eventDateTime > getdate()-1);
if(#max is null)
exit no records today
SELECT r.resourceFirstName,
r.resourceLastName,
a.eventDateTime,
CONVERT(char(1), a.eventType) as eventType,
CONVERT(varchar(5), a.reasonCode) as reasonCode,
r.extension,
GETDATE() AS ciscoDate
into #temp_Agent
FROM CCX1.db_cra.dbo.Resource r
INNER JOIN CCX1.db_cra.dbo.AgentStateDetail a ON r.resourceID = a.agentID
where r.active = 1
and a.eventDateTime = #max;
Without the full definition of your tables it is hard to troubleshoot why the query is hanging out, but i give you a couple of tips that could help you improve the performance in the Query:
Instead of using a temporary table such as "#temp_Agent" it is preferable to create a local Variable of type "Table". You can achieve exactly the same result but you can drastically increase the performace because:
A local variable of the type "Table" can be created with primary keys and indexes, which improves how SQL finds the information.
A local variable can be clustered, which also improves the performacne in certain scenarios, because the information is accesed directly from disk
A temporary table requires that SQL resolves at runtime the types of columns that should be used to store the information obtained by the query.
If you require to store information in temporary tables, variables, etc, avoid storing unnecesary information in those variables. for example, if you only require latter in your proccess two id colums, then avoid including extra columns that you can retrieve lather
If there is a bunch of information that you need to retireve from multiple sources, you should consider using a view, which also can be indexed and improve the retrieval of the information.
Avoid using unnecesary sorting, grouping, conversions and joins of strings. These particular operations cad degrade drastically the performance of a QUery.
As an extra tip, you can made use of the SQL server tools designed to improve your database and objects:
Check the execution plan of your query (Menu Query->Include Actual Execution Plan, or control + M)
Run the SQL Server Engine Tunning Advisor to analyze a trace file(See SQL Server Profiler) and add extra indexes to improve the database performace
Check with SQL Server Profiler if your Queryes are not generating Deadlocks in the tables you are using to get the information. It is a good practice to use "Hints" in all of your queries to avoid lock problems and other behaviors that in certain scenarios you want to avoid.
Se attached links to better understand what i mean:
Understanding Execution Plan
Usage of Hints
Tunning Options available in SQL Server
I hope the information helps.
Assuming this is SQLServer, try:
WITH CTE AS
(SELECT r.resourceFirstName,
r.resourceLastName,
a.eventDateTime,
CONVERT(char(1), a.eventType) as eventType,
CONVERT(varchar(5), a.reasonCode) as reasonCode,
r.extension,
GETDATE() AS ciscoDate,
RANK() OVER (PARTITION BY r.resourceFirstName, r.resourceLastName
ORDER BY a.eventDateTime DESC) RN
FROM CCX1.db_cra.dbo.Resource r
INNER JOIN CCX1.db_cra.dbo.AgentStateDetail a
ON r.resourceID = a.agentID AND a.eventDateTime >= (GETDATE() - 1)
where r.active = 1)
SELECT resourceFirstName, resourceLastName, eventDateTime, eventType, reasonCode, r.extension, ciscoDate
into #temp_Agent
FROM CTE
WHERE RN=1
ORDER BY r.resourceLastName, r.resourceFirstName ASC
Related
I am trying to calculate year to date
I try this script, it returns the good values but it's too slow
select
T.Delivery_month,
T.[Delivery_Year],
Sales_Organization,
SUM(QTY) as Month_Total,
COALESCE(
(
select SUM(s2.QTY)
FROM stg.Fact_DC_AGG s2
where
s2.Sales_Organization = T.Sales_Organization
and s2.[Delivery_Year]=T.[Delivery_Year]
AND s2.Delivery_month<= T.Delivery_month
),0) as YTD_Total
from stg.Fact_DC_AGG T
group by
T.Delivery_month,
T.[Delivery_Year],
Sales_Organization
ORDER BY
Sales_Organization,T.[Delivery_Year],
T.Delivery_month
I modified it in order to optimize it, but it returns wrongs values with duplicates:
select
T.Delivery_month,
T.[Delivery_Year],
Sales_Organization,
SUM(QTY) as Month_Total,
COALESCE(
(
),0) as YTD_Total
from stg.Fact_DC_AGG T
INNER JOIN stg.Fact_DC_AGG s2
ON
s2.Sales_Organization = T.Sales_Organization
and s2.[Delivery_Year]=T.[Delivery_Year]
AND s2.Delivery_month<= T.Delivery_month
group by
T.Delivery_month,
T.[Delivery_Year],
Sales_Organization
ORDER BY
Sales_Organization,T.[Delivery_Year],
T.Delivery_month
How to optimize the query or to correct the second script ?
To optimize the performance of your query first you have to remove sub-query and use join instead of sub-query as well as you also need to generate an actual execution plan for your query and identify required missing index.
NOTE: Missing indexes might affect your SQL Server performance, that can down your SQL Server performance, So be sure to review your actual query execution plans and the identify the right index.
For more Information Please visit the following links.
1) SQL Server Basic Performance Tuning Tips and Tricks
2) Create Missing Index From the Actual Execution Plan
I was just tidying up some sql when I came across this query:
SELECT
jm.IMEI ,
jm.MaxSpeedKM ,
jm.MaxAccel ,
jm.MaxDeccel ,
jm.JourneyMaxLeft ,
jm.JourneyMaxRight ,
jm.DistanceKM ,
jm.IdleTimeSeconds ,
jm.WebUserJourneyId ,
jm.lifetime_odo_metres ,
jm.[Descriptor]
FROM dbo.Reporting_WebUsers AS wu WITH (NOLOCK)
INNER JOIN dbo.Reporting_JourneyMaster90 AS jm WITH (NOLOCK) ON wu.WebUsersId = jm.WebUsersId
INNER JOIN dbo.Reporting_Journeys AS j WITH (NOLOCK) ON jm.WebUserJourneyId = j.WebUserJourneyId
WHERE ( wu.isActive = 1 )
AND ( j.JourneyDuration > 2 )
AND ( j.JourneyDuration < 1000 )
AND ( j.JourneyDistance > 0 )
My question is does it make any performance difference the order of the joins as for the above query I would have done
FROM dbo.Reporting_JourneyMaster90 AS jm
and then joined the other 2 tables to that one
Join order in SQL2008R2 server does unquestionably affect query performance, particularly in queries where there are a large number of table joins with where clauses applied against multiple tables.
Although the join order is changed in optimisation, the optimiser does't try all possible join orders. It stops when it finds what it considers a workable solution as the very act of optimisation uses precious resources.
We have seen queries that were performing like dogs (1min + execution time) come down to sub second performance just by changing the order of the join expressions. Please note however that these are queries with 12 to 20 joins and where clauses on several of the tables.
The trick is to set your order to help the query optimiser figure out what makes sense. You can use Force Order but that can be too rigid. Try to make sure that your join order starts with the tables where the will reduce data most through where clauses.
No, the JOIN by order is changed during optimization.
The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.
I have a clear example of inner join affecting performance. It is a simple join between two tables. One had 50+ million records, the other has 2,000. If I select from the smaller table and join the larger it takes 5+ minutes.
If I select from the larger table and join the smaller it takes 2 min 30 seconds.
This is with SQL Server 2012.
To me this is counter intuitive since I am using the largest dataset for the initial query.
Usually not. I'm not 100% this applies verbatim to Sql-Server, but in Postgres the query planner reserves the right to reorder the inner joins as it sees fit. The exception is when you reach a threshold beyond which it's too expensive to investigate changing their order.
JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.
For test do the following:
select show actual execution plan and run first query
change JOIN order and now run the query again
compare execution plans
They should be identical as the query engine will reorganize them according to other factors.
As commented on other asnwer, you could use OPTION (FORCE ORDER) to use exactly the order you want but maybe it would not be the most efficient one.
AS a general rule of thumb, JOIN order should be with table of least records on top, and most records last, as some DBMS engines the order can make a difference, as well as if the FORCE ORDER command was used to help limit the results.
Wrong. SQL Server 2005 it definitely matters since you are limiting the dataset from the beginning of the FROM clause. If you start with 2000 records instead of 2 million it makes your query faster.
I wanna do an offset like: from 0 to 10000 records, from 10000 to 20000 records and so on. How do I modify this query to add an offset? Also, how can I improve this query for performance?
SELECT
CASE
WHEN c.DataHoraUltimaAtualizacaoILR >= e.DataHoraUltimaAtualizacaoILR AND c.DataHoraUltimaAtualizacaoILR >= t.DataHoraUltimaAtualizacaoILR THEN c.DataHoraUltimaAtualizacaoILR
WHEN e.DataHoraUltimaAtualizacaoILR >= c.DataHoraUltimaAtualizacaoILR AND e.DataHoraUltimaAtualizacaoILR >= t.DataHoraUltimaAtualizacaoILR THEN e.DataHoraUltimaAtualizacaoILR
WHEN t.DataHoraUltimaAtualizacaoILR >= c.DataHoraUltimaAtualizacaoILR AND t.DataHoraUltimaAtualizacaoILR >= e.DataHoraUltimaAtualizacaoILR THEN t.DataHoraUltimaAtualizacaoILR
ELSE c.DataHoraUltimaAtualizacaoILR
END AS 'updated_at',
p.Email,
c.ID_Cliente,
p.Nome,
p.DataHoraCadastro,
p.Sexo,
p.EstadoCivil,
p.DataNascimento,
getdate() as [today],
datediff (yy,p.DataNascimento,getdate()) as 'Idade',
datepart(month,p.DataNascimento) as 'MesAniversario',
e.Bairro,
e.Cidade,
e.UF,
c.CodLoja as codloja_cadastro,
t.DDD,
t.Numero
FROM
PessoaFisica p
LEFT JOIN
Cliente c ON (c.ID_Pessoa = p.ID_PessoaFisica)
LEFT JOIN
Loja l ON (CAST(l.CodLoja AS integer) = CAST(c.CodLoja AS integer))
LEFT JOIN
PessoaEndereco pe ON (pe.ID_Pessoa = p.ID_PessoaFisica)
LEFT JOIN
Endereco e ON (e.ID_Endereco = pe.ID_Endereco)
LEFT JOIN
PessoaTelefone pt ON (pt.ID_Pessoa = p.ID_PessoaFisica)
LEFT JOIN
Telefone t ON (t.ID_Telefone = pt.ID_Telefone)
WHERE
p.Email IS NOT NULL
AND p.Email <> ''
--and p.Email = 'aline.salles#uol.com.br'
GROUP BY
p.Email, c.ID_Cliente, p.Nome, p.EstadoCivil, p.DataHoraCadastro,
c.CodLoja, p.Sexo, e.Bairro, p.DataNascimento, e.Cidade, e.UF,
t.DDD, t.Numero, c.DataHoraUltimaAtualizacaoILR, e.DataHoraUltimaAtualizacaoILR,
t.DataHoraUltimaAtualizacaoILR
ORDER BY
updated_at DESC
Overall Process
If you have access to a more modern SQL Server version, then you could setup a process to copy the raw data to a new database on a daily basis. This might initially be an exact copy of the source database, just for staging the data. Then build an transformation process, using stored procedures or perhaps SSIS for high performance. That process would transform your data into your desired end state, and load it into the final database.
The copy process could be replication, but if your staging database is SQL Server 2005 or above, then you could also build a simple SSIS job to perform the copy. Run that job in a schedule task (SQL Agent) on a daily basis. You could combine the two - load data, then transform - but if using SSIS, then I recommend keeping these as separate SSIS packages, which will help with debugging problems. In the scheduled task you could run the two packages back-to-back.
Performance
You'll need good indexing on the table, but indexing alone is not sufficient. Casting CodLoja as an integer will prevent you from using indexes on that field. If you need to store those as strings for some other reason, then consider adding calculated columns,
ALTER TABLE xyz Add CodLojaAsInt as (CAST(CodLoja as int))
Then place an index on that new calculated column. The problem is that any function call in a ON or WHERE clause will cause SQL Server to scan the entire and convert every single row, instead of peaking into an index.
After searching and looking over my problem again, #sfuqua helped me with this solution. Basically I'll create some more organized tables in my local DB and get all the abstract/ugly data from the remote DB and process it locally to new tables.
I'm gonna use Elasticsearch to speed up the indexing and queries.
It sounds like you're trying to emulate MySQL's SELECT ... LIMIT X,Y feature. SQL Server doesn't have that. In SQL Server 2005+, you can use ROW_NUMBER() in a subquery. Since you're on 2000, however, you're going to have to do it one of the hard ways.
They way I've always done it is like this:
SELECT ... FROM Table WHERE PK IN
(SELECT TOP #PageSize PK FROM Table WHERE PK NOT IN
(SELECT TOP #StartRow PK FROM Table ORDER BY SortColumn)
ORDER BY SortColumn)
ORDER BY SortColumn
Although I recommend rewriting it to use EXISTS instead of IN and seeing which works better. You'll have to use EXISTS if you have compound primary keys.
That code and the other solutions are here.
I was just tidying up some sql when I came across this query:
SELECT
jm.IMEI ,
jm.MaxSpeedKM ,
jm.MaxAccel ,
jm.MaxDeccel ,
jm.JourneyMaxLeft ,
jm.JourneyMaxRight ,
jm.DistanceKM ,
jm.IdleTimeSeconds ,
jm.WebUserJourneyId ,
jm.lifetime_odo_metres ,
jm.[Descriptor]
FROM dbo.Reporting_WebUsers AS wu WITH (NOLOCK)
INNER JOIN dbo.Reporting_JourneyMaster90 AS jm WITH (NOLOCK) ON wu.WebUsersId = jm.WebUsersId
INNER JOIN dbo.Reporting_Journeys AS j WITH (NOLOCK) ON jm.WebUserJourneyId = j.WebUserJourneyId
WHERE ( wu.isActive = 1 )
AND ( j.JourneyDuration > 2 )
AND ( j.JourneyDuration < 1000 )
AND ( j.JourneyDistance > 0 )
My question is does it make any performance difference the order of the joins as for the above query I would have done
FROM dbo.Reporting_JourneyMaster90 AS jm
and then joined the other 2 tables to that one
Join order in SQL2008R2 server does unquestionably affect query performance, particularly in queries where there are a large number of table joins with where clauses applied against multiple tables.
Although the join order is changed in optimisation, the optimiser does't try all possible join orders. It stops when it finds what it considers a workable solution as the very act of optimisation uses precious resources.
We have seen queries that were performing like dogs (1min + execution time) come down to sub second performance just by changing the order of the join expressions. Please note however that these are queries with 12 to 20 joins and where clauses on several of the tables.
The trick is to set your order to help the query optimiser figure out what makes sense. You can use Force Order but that can be too rigid. Try to make sure that your join order starts with the tables where the will reduce data most through where clauses.
No, the JOIN by order is changed during optimization.
The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.
I have a clear example of inner join affecting performance. It is a simple join between two tables. One had 50+ million records, the other has 2,000. If I select from the smaller table and join the larger it takes 5+ minutes.
If I select from the larger table and join the smaller it takes 2 min 30 seconds.
This is with SQL Server 2012.
To me this is counter intuitive since I am using the largest dataset for the initial query.
Usually not. I'm not 100% this applies verbatim to Sql-Server, but in Postgres the query planner reserves the right to reorder the inner joins as it sees fit. The exception is when you reach a threshold beyond which it's too expensive to investigate changing their order.
JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.
For test do the following:
select show actual execution plan and run first query
change JOIN order and now run the query again
compare execution plans
They should be identical as the query engine will reorganize them according to other factors.
As commented on other asnwer, you could use OPTION (FORCE ORDER) to use exactly the order you want but maybe it would not be the most efficient one.
AS a general rule of thumb, JOIN order should be with table of least records on top, and most records last, as some DBMS engines the order can make a difference, as well as if the FORCE ORDER command was used to help limit the results.
Wrong. SQL Server 2005 it definitely matters since you are limiting the dataset from the beginning of the FROM clause. If you start with 2000 records instead of 2 million it makes your query faster.
I am hitting a brick wall with something I'm trying to do.
I'm trying to perform a complex query and return the results to a vbscript (vbs) record set.
In order to speed up the query I create temporary tables and then use those tables in the main query (creates a speed boost of around 1200% on just using sub queries)
the problem is, the outlying code seems to ignore the main query, only 'seeing' the result of the very first command (i.e. it will return a 'records affected' figure)
For example, given a query like this..
delete from temp
select * into temp from sometable where somefield = somefilter
select sum(someotherfield) from yetanothertable where account in (select * from temp)
The outlying code only seems to 'see' the returned result of 'delete from temp' I can't access the data that the third command is returning.
(Obviously the sql query above is pseudo/fake. the real query is large and it's content not relevant to the question being asked. I need to solve this problem as without being able to use a temporary table the query goes from taking 3 seconds to 6 minutes!)
edit: I know I could get around this by making multiple calls to ADODB.Connection's execute (make the call to empty the temp tables, make the call to create them again, finally make the call to get the data) but I'd rather find an elegant solution/way to avoid this way of doing it.
edit 2: Below is the actual SQL code I've ended up with. Just adding it for the curiosity of people who have replied. It doesn't use the nocount as I'd already settled on a solution which works for me. It is also probably badly written. It evolved over time from something more basic. I could probably improve it myself but as it works and returns data extremely quickly I have stuck with it. (for now)
Here's the SQL.
Here's the Code where it's called. My chosen solution is to run the first query into a third temp table, then run a select * on that table from the code, then a delete from from the code...
I make no claims about being a 'good' sql scripter (self taught via necesity mostly), and the database is not very well designed (a mix of old and new tables. Old tables not relational and contain numerical values and date values stored as strings)
Here is the original (slow) query...
select
name,
program_name,
sum(handle) + sum(refund) as [Total Sales],
sum(refund) as Refunds,
sum(handle) as [Net Sales],
sum(credit - refund) as Payout,
cast(sum(comm) as money) as commission
from
(select accountnumber,program_name,
cast(credit_amount as money) as credit,cast(refund_amt as money) as refund,handle, handle * (
(select commission from amtotecommissions
where _date = a._date
and pool_type = (case when a.pool_type in ('WP','WS','PS','WPS') then 'WN' else a.pool_type end)
and program_name = a.program_name) / 100) as comm
from amtoteaccountactivity a where _date = '#yy/#mm/#dd' and transaction_type = 'Bet'
and accountnumber not in ('5067788','5096272') /*just to speed the query up a bit. I know these accounts aren't included*/
) a,
ews_db.dbo.amtotetrack t
where (a.accountnumber in (select accountno from ews_db.dbo.get_all_customers where country = 'US')
or a.accountnumber in ('5122483','5092147'))
and t.our_code = a.program_name collate database_default
and t.tracktype = 2
group by name,program_name
I suspect that with the right SQL and indexes you should be able to get equal performance with a single SELECT, however there isn't enough information in the original question to be able to give guidance on that.
I think you'll be best of doing this as a stored procedure and calling that.
CREATE PROCEDURE get_Count
#somefilter int
AS
delete from temp;
select * into temp from sometable where somefield = #somefilter;
select sum(someotherfield) from yetanothertable
where account in (select * from temp);
However an example avoiding the IN the way you're using it via a JOIN will probably fix the performance issue. Use EXPLAIN SELECT to see what's going on and optimise from there. For example the following
select sum(transactions.value) from transactions
inner join user on transactions.user=user.id where user.name='Some User'
is much quicker than
select sum(transactions.value) from transactions
where user in (SELECT id from user where user.name='Some User')
because the amount of rows scanned in the second example will be the entire table, whereas in the first the indexes can be used.
Rev1
Taking the slow SQL posted it is appears that there are full table scans going on where the SQL states WHERE .. IN e.g.
where (a.accountnumber in (select accountno from ews_db.dbo.get_all_customers))
The above will pull in lots of records which may not be required. This together with the other nested table selects are not allowing the optimiser to pull in only the records that match, as would be the case when using JOIN at the outer level.
When building these type of complex queries I generally start with the inner detail, because we need to have the inner detail so we can perform joins and aggregate operations.
What I mean by this is if you have a typical DB with customers that have orders that create transactions that contain items then I would start with the items and pull in the rest of the detail with joins.
By way of example only I suggest building the query more like the following:
select name,
program_name,
SUM(handle) + SUM(refund) AS [Total Sales],
SUM(refund) AS Refunds,
SUM(handle) AS [Net Sales],
SUM(credit - refund) AS Payout,
CAST(SUM(comm) AS money) AS commission,
FROM ews_db.dbo.get_all_customers AS cu
INNER JOIN amtoteactivity AS a ON a.accoutnumber = cu.accountnumber
INNER JOIN ews_db.dbo.amtotetrack AS track ON track.our_code = a.program_name
INNER JOIN amtotecommissions AS commision ON co.program_name = a.program_name
WHERE customers.country='US'
AND t.tracktype = 2
AND a.transaction_type = 'Bet'
AND a._date = ''#yy/#mm/#dd'
AND a.program_name = co.program_name
AND co.pool_type = (case when a.pool_type in ('WP','WS','PS','WPS') then 'WN' else a.pool_type end)
GROUP BY name,program_name,co.commission
NOTE: The above is not functional and is for illustration purposes. I'd need to have the database online to build the real query. I'm hoping you'll get the general idea and build from there.
My top tip for complex queries that don't work is simply to completely start again throwing away what you've already got. Sometimes I will do this three or four times when building a really tricky query.
Always build these queries gradually starting from the most detail and working outwards. Inspect the results at each stage because it helps visualise what the data are.
If you could come to a common data structure for all the selects you could UNION ALL them together with perhaps selecting a constant in each union so you know where the data was coming from - kinda like
select '1',col1,col2,'' from table 1
UNION ALL
select '2',col1,col2,col3 from table2
I just solved my original problem (that I came up against again today on a different query) in a slightly hacky way...
Conn.Execute(split(query,";")(0))
set rs = Conn.Execute(split(query,";")(1))
Works perfectly!
Edit : I just noticed that the first comment on my original question also provided a quick fix (set nocount on). I forgot about that. Well there is this and that. I had tried to get the query working without the temporary table but I couldn't get anywhere near the same performance as with it.