I have a very large table with around 50 million rows and 15 columns. Whenever I read, I always need all columns so I can't split them. I have a clustered index on the table with 4 keys and I always read data using those keys.
But the performance is still slow, my queries are fairy simple like this
select
CountryId, RetailerID, FY,
sum(col1), sum(col2),.....sum(col15)
from mytable a
join product p on a.productid = p.id
join ......
join .....
join ......
join .....
Where .......
group by CountryId, RetailerID, FY
I'm not using any IN operator or any sub queries here on any inline functions... which I know obviously make it slow. I've looked at partitioning but not sure about that, can I get some performance improvement by doing partition?
OR is there anything else I can do?
I'm using SQL Server 2012 Enterprise Edition
Please help!
Thanks guys, I have done this using aggregated tables. I run a job to aggregate data overnight which has limited the number of rows and the reports are running fine now
Related
I have serious performance issue when I execute a SQL statements which involves 3 tables as following:
TableA<----TableB---->TableC
In particular, these tables are in a data warehouse and the table in the middle is a dimension table while the others are fact tables. TableA has about 9 millions of record, while TableC about 3 million. The dimension table (TableB) only 74 records.
The syntax of the query is very simple, as you can see, where TableA is called _PG, TableB is equal to _MDT and Table C is called _FM:
SELECT _MDT.codiceMandato as Customer, SUM(_FM.Totale) AS Revenue,
SUM(_PG.ErogatoTotale) AS Paid
FROM _PG INNER JOIN
_MDT
ON _PG.idMandato = _MDT.idMandato INNER JOIN
_FM
ON _FM.idMandato = _MDT.idMandato
GROUP BY _MDT.codiceMandato
Actually, I never has seen the end of this query :-(
_PG has a non clustered index on idMandato and the same _FM table
_MDT table has a clustered index on idMandato
and the execution plan is the following
As you can see the bottleneck is due to Stream Aggregate (33% of cost) and Merge Join (66% of cost). In particular, the stream aggregate underlines about 400 billions of estimated rows!!
I don’t know the reasons and I don’t know how to proceed in order to solve this bad issue.
I use SQL Server 2016 SP1 installed of a virtual server with Windows Server 2012 Standard with 4 Cpu core and 32 GB of RAM , 1,5TB on a dedicated volume made up SAS disks with SSD cache.
I hope anybody can help me to understand.
Thanks in advance
The most likely cause is because you are getting a Cartesian product along two dimensions. This multiplies the rows unnecessarily. The solution is to aggregate before doing the join.
You haven't provided sample data, but this is the idea:
SELECT m.codiceMandato as Customer, f.revenue, p.Paid
FROM _MDT m INNER JOIN
(SELECT p.idMandato, SUM(p.ErogatoTotale) AS Paid
FROM _PG p
GROUP BY p.idMandato
) p
ON p.idMandato = m.idMandato INNER JOIN
(SELECT f.idMandato, SUM(f.Totale) AS Revenue
FROM _FM f
GROUP BY f.idMandato
) f
ON f.idMandato = m.idMandato;
I'm not 100% sure this will fix the problem, because your data structure is not clear.
You can try doing a subquery between TableA and TableC without aggregation and then joining this subquery with TableB and apply the GROUP BY:
SELECT _MDT.codiceMandato, SUM(A.Totale) AS Revenue, sum( A.ErogatoTotale)
AS Paid
FROM ( SELECT m.idMandato, _FM.Totale, _PG.ErogatoTotale FROM _PG
INNER JOIN _FM
ON _FM.idMandato = _MDT.idMandato ) A
INNER JOIN _MDT ON A.idMandato = _MDT.idMandato
GROUP BY _MDT.codiceMandato
I need help in improving performance of a SQL query. I have a tblThings and tblTexts. tblThings has around 10 TextIDs. The query I have left joins on these tables 10 times which is quite slow.
Here is a simplified version of my query:
SELECT th.ThingID, th.DescTextID, th.ColourTextID, tDesc.Text AS Desc, tCol.Text AS Colour
FROM tblThings th
LEFT JOIN tblTexts tDesc ON tDesc.TextID = th.DescTextID
LEFT JOIN tblTexts tCol ON tCol.TextID = th.ColourTextID
--and so on around 10 times
I would appreciate it the help.
There is nothing wrong with the query. As you must look up text in a text table (for internationalization reasons probably), you are forced to write the query thus.
There should be an index on tblTexts(TextID) of course.
I was just tidying up some sql when I came across this query:
SELECT
jm.IMEI ,
jm.MaxSpeedKM ,
jm.MaxAccel ,
jm.MaxDeccel ,
jm.JourneyMaxLeft ,
jm.JourneyMaxRight ,
jm.DistanceKM ,
jm.IdleTimeSeconds ,
jm.WebUserJourneyId ,
jm.lifetime_odo_metres ,
jm.[Descriptor]
FROM dbo.Reporting_WebUsers AS wu WITH (NOLOCK)
INNER JOIN dbo.Reporting_JourneyMaster90 AS jm WITH (NOLOCK) ON wu.WebUsersId = jm.WebUsersId
INNER JOIN dbo.Reporting_Journeys AS j WITH (NOLOCK) ON jm.WebUserJourneyId = j.WebUserJourneyId
WHERE ( wu.isActive = 1 )
AND ( j.JourneyDuration > 2 )
AND ( j.JourneyDuration < 1000 )
AND ( j.JourneyDistance > 0 )
My question is does it make any performance difference the order of the joins as for the above query I would have done
FROM dbo.Reporting_JourneyMaster90 AS jm
and then joined the other 2 tables to that one
Join order in SQL2008R2 server does unquestionably affect query performance, particularly in queries where there are a large number of table joins with where clauses applied against multiple tables.
Although the join order is changed in optimisation, the optimiser does't try all possible join orders. It stops when it finds what it considers a workable solution as the very act of optimisation uses precious resources.
We have seen queries that were performing like dogs (1min + execution time) come down to sub second performance just by changing the order of the join expressions. Please note however that these are queries with 12 to 20 joins and where clauses on several of the tables.
The trick is to set your order to help the query optimiser figure out what makes sense. You can use Force Order but that can be too rigid. Try to make sure that your join order starts with the tables where the will reduce data most through where clauses.
No, the JOIN by order is changed during optimization.
The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.
I have a clear example of inner join affecting performance. It is a simple join between two tables. One had 50+ million records, the other has 2,000. If I select from the smaller table and join the larger it takes 5+ minutes.
If I select from the larger table and join the smaller it takes 2 min 30 seconds.
This is with SQL Server 2012.
To me this is counter intuitive since I am using the largest dataset for the initial query.
Usually not. I'm not 100% this applies verbatim to Sql-Server, but in Postgres the query planner reserves the right to reorder the inner joins as it sees fit. The exception is when you reach a threshold beyond which it's too expensive to investigate changing their order.
JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.
For test do the following:
select show actual execution plan and run first query
change JOIN order and now run the query again
compare execution plans
They should be identical as the query engine will reorganize them according to other factors.
As commented on other asnwer, you could use OPTION (FORCE ORDER) to use exactly the order you want but maybe it would not be the most efficient one.
AS a general rule of thumb, JOIN order should be with table of least records on top, and most records last, as some DBMS engines the order can make a difference, as well as if the FORCE ORDER command was used to help limit the results.
Wrong. SQL Server 2005 it definitely matters since you are limiting the dataset from the beginning of the FROM clause. If you start with 2000 records instead of 2 million it makes your query faster.
I was just tidying up some sql when I came across this query:
SELECT
jm.IMEI ,
jm.MaxSpeedKM ,
jm.MaxAccel ,
jm.MaxDeccel ,
jm.JourneyMaxLeft ,
jm.JourneyMaxRight ,
jm.DistanceKM ,
jm.IdleTimeSeconds ,
jm.WebUserJourneyId ,
jm.lifetime_odo_metres ,
jm.[Descriptor]
FROM dbo.Reporting_WebUsers AS wu WITH (NOLOCK)
INNER JOIN dbo.Reporting_JourneyMaster90 AS jm WITH (NOLOCK) ON wu.WebUsersId = jm.WebUsersId
INNER JOIN dbo.Reporting_Journeys AS j WITH (NOLOCK) ON jm.WebUserJourneyId = j.WebUserJourneyId
WHERE ( wu.isActive = 1 )
AND ( j.JourneyDuration > 2 )
AND ( j.JourneyDuration < 1000 )
AND ( j.JourneyDistance > 0 )
My question is does it make any performance difference the order of the joins as for the above query I would have done
FROM dbo.Reporting_JourneyMaster90 AS jm
and then joined the other 2 tables to that one
Join order in SQL2008R2 server does unquestionably affect query performance, particularly in queries where there are a large number of table joins with where clauses applied against multiple tables.
Although the join order is changed in optimisation, the optimiser does't try all possible join orders. It stops when it finds what it considers a workable solution as the very act of optimisation uses precious resources.
We have seen queries that were performing like dogs (1min + execution time) come down to sub second performance just by changing the order of the join expressions. Please note however that these are queries with 12 to 20 joins and where clauses on several of the tables.
The trick is to set your order to help the query optimiser figure out what makes sense. You can use Force Order but that can be too rigid. Try to make sure that your join order starts with the tables where the will reduce data most through where clauses.
No, the JOIN by order is changed during optimization.
The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.
I have a clear example of inner join affecting performance. It is a simple join between two tables. One had 50+ million records, the other has 2,000. If I select from the smaller table and join the larger it takes 5+ minutes.
If I select from the larger table and join the smaller it takes 2 min 30 seconds.
This is with SQL Server 2012.
To me this is counter intuitive since I am using the largest dataset for the initial query.
Usually not. I'm not 100% this applies verbatim to Sql-Server, but in Postgres the query planner reserves the right to reorder the inner joins as it sees fit. The exception is when you reach a threshold beyond which it's too expensive to investigate changing their order.
JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.
For test do the following:
select show actual execution plan and run first query
change JOIN order and now run the query again
compare execution plans
They should be identical as the query engine will reorganize them according to other factors.
As commented on other asnwer, you could use OPTION (FORCE ORDER) to use exactly the order you want but maybe it would not be the most efficient one.
AS a general rule of thumb, JOIN order should be with table of least records on top, and most records last, as some DBMS engines the order can make a difference, as well as if the FORCE ORDER command was used to help limit the results.
Wrong. SQL Server 2005 it definitely matters since you are limiting the dataset from the beginning of the FROM clause. If you start with 2000 records instead of 2 million it makes your query faster.
I have the following Query in SQL Server 2005:
SELECT
PRODUCT_ID
FROM
PRODUCTS P
WHERE
SUPPLIER_ORGANIZATION_ID = 13225
AND ACTIVE_FLAG = 'Y'
AND CONTAINS(*, 'FORMSOF(Inflectional, "%clip%") ')
What's interesting is that using this generates a Hash Match whereas if I use a different SUPPLIER_ORGANIZATION_ID (older supplier), it uses a Merge Join. Obviously the Hash is much slower than the Merge Join. What I don't get is why there is a difference, and what's needed to make it run faster?
FYI, there are about 5 million records in the PRODUCTS table. When supplier organization id is selected (13225), there are about 25000 products for that supplier.
Thanks in advance.
I'd try using the OPTIMIZE FOR Query Hint to force it one way or the other.
SELECT
PRODUCT_ID
FROM
PRODUCTS P
WHERE
SUPPLIER_ORGANIZATION_ID = #Supplier_Organisation_Id
AND ACTIVE_FLAG = 'Y'
AND CONTAINS(*, 'FORMSOF(Inflectional, #Keywords) ')
OPTION (OPTIMIZE FOR (#Supplier_Organisation_Id = 1000 ))
One other thing is your STATISTICS might out of date, the tipping point for automatic updates is often not low enough meaning that the query plan chosen may not be ideal for your data. I'd suggest trying updating the STATISTICS on your Products table, perhaps creating a job to do this on a regular basis if this is part of the problem.