I am trying to speed up my execution time. What's wrong with my query. What is the better way to do query optimization.
TransactionEntry has 2 Million records
Transaction Table has 5 Billion Records
Here is my Query, If I remove the TotalPrice column, I am getting results at 10sec
--Total Quantity
SELECT
items.ItemLookupCode,sum(transactionsEntry.Quantity) Quantity,sum(transactionsEntry.Quantity*transactionsEntry.Price) TotalPrice
into
##temp_TotalPrice
FROM
(
SELECT
TransactionNumber,StoreID,Time
FROM
[HQMatajer].[dbo].[Transaction]
WHERE
Time>=CONVERT(datetime,'2015-01-01 00:00:00',102) and Time<=CONVERT(datetime,'2015-12-31 23:59:59',102)
) transactions
left join [HQMatajer].[dbo].[TransactionEntry] transactionsEntry
ON transactionsEntry.TransactionNumber=transactions.TransactionNumber and transactionsEntry.StoreID=transactions.StoreID
Left join [HQMatajer].[dbo].[Item] items
ON transactionsEntry.ItemID=items.ID
Group By items.ItemLookupCode
Order by items.ItemLookupCode
If I execute this(above one) query, it produce the result in 22 seconds. It's too long
When I execute the subquery alone(Below one). It's taking 11 seconds
(
SELECT
TransactionNumber,StoreID,Time
FROM
[HQMatajer].[dbo].[Transaction]
WHERE
Time>=CONVERT(datetime,'2015-01-01 00:00:00',102) and Time<=CONVERT(datetime,'2015-12-31 23:59:59',102)
) transactions
I have created one index for TransactionEntry Table that
TransactionNumber,StoreID,ItemID,Quantity,Price
One index for Transaction Table
`Time,TransactionNumber,StoreID`
One Index for Item Table
`ID`
Execution Plan
Clustured Index of TransactionEntry is taking 59% cost. That column_Name is AutoID
Assuming this is for SQL 2005 or above version. If its for SQL 2000, then instead of CTE you can have a temp table with proper index.
Also Since you were getting the values from [HQMatajer].[dbo].[TransactionEntry] and [HQMatajer].[dbo].[Item], why Left Join is used?
Avoid sub queries. I have re framed the query. Please check and let me know whether it improved the performance
;WITH transactions
AS
(
SELECT
TransactionNumber,StoreID,Time
FROM
[HQMatajer].[dbo].[Transaction]
WHERE
Time>=CONVERT(datetime,'2015-01-01 00:00:00',102) and Time<=CONVERT(datetime,'2015-12-31 23:59:59',102)
)
SELECT
items.ItemLookupCode,sum(transactionsEntry.Quantity) Quantity,sum(transactionsEntry.Quantity*transactionsEntry.Price) TotalPrice
into
##temp_TotalPrice
FROM [HQMatajer].[dbo].[Item] items INNER JOIN [HQMatajer].[dbo].[TransactionEntry] transactionsEntry
ON transactionsEntry.ItemID=items.ID
WHERE EXISTS (SELECT 1 FROM transactions WHERE transactionsEntry.TransactionNumber=transactions.TransactionNumber and transactionsEntry.StoreID=transactions.StoreID)
Group By items.ItemLookupCode
Order by items.ItemLookupCode
This is your query, simplified and formatted a bit (the subquery makes no difference):
select i.ItemLookupCode,
sum(te.Quantity) as quantity,
sum(te.Quantity * te.Price) as TotalPrice
into ##temp_TotalPrice
from [HQMatajer].[dbo].[Transaction] t left join
[HQMatajer].[dbo].[TransactionEntry] te
on te.TransactionNumber = t.TransactionNumber and
te.StoreID = t.StoreID left join
[HQMatajer].[dbo].[Item] i
on te.ItemID = i.ID
where t.Time >= '2015-01-01' and
t.Time < '2016-01-01'
group by i.ItemLookupCode
order by i.ItemLookupCode;
For this query, you want indexes on Transaction(Time, TransactionNumber, StoreId), TransactionEntry(TransactionNumber, StoreId, ItemId, Quantity, Price), and Item(Id, ItemLookupCode)`.
Even with the right indexes, this is processing a lot of data, so I would be surprised if this reduced the time to a few seconds.
This query is taking to much time because three times the entry is inserted into the temporary table which increases the time. if we insert the record into a another table and thn call it or make it as a cte. It decreases the cost.
Related
I am making up a SQL query which will get all the transaction types from one table, and from the other table it will count the frequency of that transaction type.
My query is this:
with CTE as
(
select a.trxType,a.created,b.transaction_key,b.description,a.mode
FROM transaction_data AS a with (nolock)
RIGHT JOIN transaction_types b with (nolock) ON b.transaction_key = a.trxType
)
SELECT COUNT (trxType) AS Frequency, description as trxType,mode
from CTE where created >='2017-04-11' and created <= '2018-04-13'
group by trxType ,description,mode
The transaction_types table contains all the types of transactions only and transaction_data contains the transactions which have occurred.
The problem I am facing is that even though it's the RIGHT join, it does not select all the records from the transaction_types table.
I need to select all the transactions from the transaction_types table and show the number of counts for each transaction, even if it's 0.
Please help.
LEFT JOIN is so much easier to follow.
I think you want:
select tt.transaction_key, tt.description, t.mode, count(t.trxType)
from transaction_types tt left join
transaction_data t
on tt.transaction_key = t.trxType and
t.created >= '2017-04-11' and t.created <= '2018-04-13'
group by tt.transaction_key, tt.description, t.mode;
Notes:
Use reasonable table aliases! a and b mean nothing. t and tt are abbreviations of the table name, so they are easier to follow.
t.mode will be NULL for non-matching rows.
The condition on dates needs to be in the ON clause. Otherwise, the outer join is turned into an inner join.
LEFT JOIN is easier to follow (at least for people whose native language reads left-to-right) because it means "keep all the rows in the table you have already read".
Here is my SQL command :
SELECT
ts.CHECK_NUMBER,
ts.CUSTOMER_NAME,
ts.COMPANY_NAME,
( SELECT COUNT(*)
FROM TRANSACTION_ORDER too
WHERE too.CHECK_NUMBER = ts.CHECK_NUMBER
) as NB_OF_ORDERS
FROM
TRANSACTION_SUMMARY ts
ORDER BY
ts.BUSINESS_DATE
It is taking so long to render data, we are talking about minimum 3000 transactions, for each one we have to count the orders.
Is there any better solution?
It is taking too long because when you have this sub-query in your select , it is executed for each row returned by the outer query, so if your outer query returns 50,000 rows this inner select query will be executed 50,000 times which is obviously a performance killer,
You should try something like this....
SELECT
ts.CHECK_NUMBER
,ts.CUSTOMER_NAME
,ts.COMPANY_NAME
,ISNULL(O.Total, 0) AS NB_OF_ORDERS
FROM TRANSACTION_SUMMARY ts
LEFT JOIN --<-- use inner join is you only want records with some orders
( SELECT CHECK_NUMBER, COUNT(*) AS Total
FROM TRANSACTION_ORDER
GROUP BY CHECK_NUMBER
) as O
ON ts.CHECK_NUMBER = O.CHECK_NUMBER
ORDER BY ts.BUSINESS_DATE
I have inherited a stored procedure which performs joins across eight tables, some of which contain hundreds of thousands of rows, then selects the top ten entries from the result of that join.
I have enough information at the start of the procedure to select those ten rows from a single table, and then perform those joins on those ten rows, rather than on hundreds of thousands of intermediate rows.
How do I select those top ten rows and then only do joins on those ten rows, instead of performing joins all of the thousands of rows in the table?
I should try:
SELECT * FROM
(SELECT TOP 10 * FROM your_table
ORDER BY your_condition) p
INNER JOIN second_table t
ON p.field = t.field
The optimizer may not be able to perform the top 10 first if you have inner joins, since it can't be sure that the inner joins won't exclude rows later on. It would be a bug if it selected 10 rows from the main table, and then only returned 7 rows at the end because of a join. Using Marco's rewrite may gain you performance for this reason since you're expressly stating that it's safe to limit the rows before the joins.
If you're query is sufficiently complicated, the query plan optimizer may run out of time finding a good plan. It's only given a few hundred milliseconds, and with even a few joins there are probably thousands of different ways it can execute the query (different join orders, etc). If this is the case, you'll benefit from storing the first 10 rows in a temp table first, and then using that later like this:
select top 10 *
into #MainResults
from MyTable
order by your_condition;
select *
from #MainResults r
join othertable t
on t.whatever = r.whatever;
I've seen cases where this second approach has made a HUGE difference.
You can also use a CTE to define the top X and then use it
For example this data.se query limits only to top 40 tags
with top40 as (
select top 40 t.id, t.tagname
from tags t, posttags pt
where pt.tagid = t.id
group by t.tagname, t.id
order by count(pt.postid) desc
),
myanswers as(
select p.parentid, p.score
from posts p
where
p.owneruserid = ##UserID## and
p.communityowneddate is null
)
select t40.tagname as 'Tag', sum(p1.score) as 'Score',
case when sum(p1.score) >= 15 then ':-)' else ':-(' end as 'Status'
from top40 t40, myanswers p1, posttags pt1
where
pt1.postid = p1.parentid and
pt1.tagid = t40.id
group by t40.tagname
order by sum(p1.score) desc
frequently I encounter a situation like this, where I need to join a big table to a certain transformation of a table.
I have made an example with a big table and a smaller prices table.
Enter the table CarPrices, which has prices per car brand/model with starting and ending dates. I want to join all sold cars to the sales price in the CarPrices table, on the criterium SaleDate BETWEEN PriceStartingDate and PriceEndingDate, but if there is no price for the period, I want to join to the newest price that is found.
I can accomplish it like this but it is terribly slow:
WITH CarPricesTransformation AS (
SELECT CarBrand, CarModel, PriceStartingDate,
CASE WHEN row_number() OVER (PARTITION BY CarBrand, CarModel,
ORDER BY PriceStartingDate DESC) = 1
THEN NULL ELSE PriceEndingDate END PriceEndingDate,
Price
FROM CarPrices
)
SELECT SUM(Price)
FROM LargeCarDataBase C
INNER JOIN CarPricesTransformation P
ON C.CarBrand = P.CarBrand
AND C.CarModel = P.CarModel
AND C.SaleDate >= P.PriceStartingDate
AND (C.SaleDate <= P.PriceEndingDate OR P.PriceEndingDate IS NULL)
A reliable way to do it quicker is to forget about making a VIEW and creating a stored procedure instead, where I first prepare the smaller prices table as a temporary table with the correct clustered index, and then make the join to that. This is much faster. But I would like to stick with a view.
Any thoughts...?
You can't make a "smaller prices table" since the price depends on the sale date. Also, why the CTE in the first place?
Select
Sum(Coalesce(ActivePrice.Price, LatestPrice.Price))
From
LargeCarDataBase As Sales
Left Outer Join CarPrices As ActivePrice
On Sales.CarBrand = ActivePrice.CarBrand
And Sales.CarModel = ActivePrice.CarModel
And (((Sales.SaleDate >= ActivePrice.PriceStartingDate)
And ((Sales.SaleDate <= ActivePrice.PriceEndingDate)
Or (ActivePrice.PriceEndingDate Is Null)))
Left Outer Join CarPrices As LatestPrice
On Sales.CarBrand = LatestPrice.CarBrand
And Sales.CarModel = LatestPrice.CarModel
And LatestPrice.PriceEndingDate Is Null
Have you tried Indexed Views?
The results from Indexed Views are automatically commited to disk so you can retrieve them super-fast.
CREATE VIEW [dbo].[SuperFastCarPrices] WITH SCHEMABINDING AS
SELECT C.CarBrand,
C.CarModel,
C.SaleDate,
SUM(P.Price) AS Price
FROM CarPrices P
INNER JOIN LargeCarDataBase C
ON C.CarBrand = P.CarBrand
AND C.CarModel = P.CarModel
AND C.SaleDate >= P.PriceStartingDate
AND (P.PriceEndingDate IS NULL OR C.SaleDate <= P.PriceEndingDate)
GROUP BY C.CarBrand, C.CarModel, C.SaleDate
CREATE UNIQUE CLUSTERED INDEX [IDX_SuperFastCarPrices]
ON [dbo].[SuperFastCarPrices](CarBrand, CarModel, SaleDate)
You can then select directly from this view, which will return records at the same speed as selecting from a table.
There is the downside that indexed views slow down changes to the underlying tables. If you are worried about the cost of inserting records into the table LargeCarDataBase after this view has been created, you can create an index on columns CarBrand, CarModel and SaleDate which should speed up insertion and update on this table.
For more on Indexed Views see Microsoft's Article.
I'm fetching data for my grid like this
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID
I also need the total count for the pagination.
There're two options.
1- Do an another fetch
SELECT count(*) FROM dbo.Orders
2- Put the count statement in the query
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity,
(SELECT count(*) FROM dbo.Orders) as Count
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID
Which way should I go ?
Of the 2 methods you've put forward, the first (separate query) is better. The second method means the count will appear in every row returned which is a bit unnecessary. Also if the query returns 20 rows, the select count(*) will be executed 20 times (if i remember right, guess this could depend on which database engine you're using).
Additionally, depending on how much traffic you're envisaging and how big the table is likely to get, you can improve upon this by caching the result of select count(*) somewhere, and then refreshing it upon insertions / deletions to the table.
If this is for SQL Server 2005 or higher, one of the best ways to get pagination is to use a Common Table Expression.
CREATE PROC MyPaginatedDataProc
#pageNumber INT
AS
WITH OrdersCTE (CustomerID, OrderTime, ProductID, Quantity, RowNumber)
AS
(
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity,
ROW_NUMBER() OVER (ORDER BY OrderItems.OrderID) AS RowNumber
FROM
dbo.Orders INNER JOIN dbo.OrderItems ON Orders.ID = OrderItems.OrderID
)
SELECT
CustomerID,
OrderTime,
ProductId,
Quantity
FROM
OrdersCTE
WHERE
RowNumber BETWEEN (#pageNumber * 10) AND (((#pageNumber + 1) * 10) -1)
Otherwise for getting the total row count, I'd use a separate query like Mailslut said.
If you are using oracle you can use COUNT(*) OVER ( ) CNT. This one was more efficient
as it takes single table scan
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity,
COUNT(*) OVER ( ) CNT as Count
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID
As #Mailslut suggests, you should probably use two queries. However, you should probably add a
WHERE clause to the query that fetches the data, so you only fetch the data that you actually need to show (unless you are caching it).
If more than one thread is accessing the database at a time, you will also need to somehow make sure that the count is kept in sync with the database.
I would consider something different, because what you are trying to do is not very simple, but quite necessary. Have you considered using the SQL Server row_number function? This way you will know how many records there are by looking at the max row_number returned, but also in the order you want.
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity,
ROW_NUMBER() OVER(ORDER BY Orders.CustomerId) rn
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID