SQL simple nested select query is getting blocked - sql

In the system I'm using the below query in order to get a portion of my table's data in order to implement pagination. The table contains around 100 records only right now but it will grow into 1 million+ records later on.
SELECT
Id AS ActualMapping_Id,
BudgetPhase AS ActualMapping_BudgetPhase,
FromBH AS ActualMapping_FromBH,
ToBH AS ActualMapping_ToBH,
FromBI AS ActualMapping_FromBI,
ToBI AS ActualMapping_ToBI,
FromSI1 AS ActualMapping_FromSI1,
ToSI1 AS ActualMapping_ToSI2,
FromSI2 AS ActualMapping_FromSI2,
ToSI2 AS ActualMapping_ToSI2,
DataType AS ActualMapping_DataType,
Status AS ActualMapping_Status,
MappingType AS ActualMapping_MappingType,
LastMappedBy AS ActualMapping_LastMappedBy,
LastMappedDate AS ActualMapping_LastMappedDate
FROM
(
SELECT
Id,
BudgetPhase,
FromBH,
ToBH,
FromBI,
ToBI,
FromSI1,
ToSI1,
FromSI2,
ToSI2,
DataType,
Status,
MappingType,
LastMappedBy,
LastMappedDate,
ROW_NUMBER() OVER (ORDER BY FromBH) AS RowNumber
FROM
ActualMapping
WHERE
DataType = 'Cost' and
Status = 'Active' and
MappingType = 'Static' and
BudgetPhase like '%some_text%' and
ToBH like '%some_text%' and
ToBI like '%some_text%' and
ToSI1 like '%some_text%' and
ToSI2 like '%some_text%' and
FromBH like '%some_text%' and
FROMBI like '%some_text%' and
FROMSI1 like '%some_text%' and
FROMSI2 like '%$some_text%'
) AS NumberedTable
WHERE
RowNumber BETWEEN 1 AND 50
The above query which works pretty fine (At least on ~100 records) but after a while it starts getting blocked. I can't understand the possible reason for it getting blocked but when that happens I won't be able to do a simple select query from the table unless I kill all those blocked queries which are piled up.
So my questions are:
Why does the query gets blocked after a while?
Is this a good/ok approach to implement pagination on sql side for a large data set? (possibly more than 1 million+ records)

Related

Query takes too long to run, how to optimize it?

The query structure: Helper-select in "with" clause - selects most recent entry using 'top 1 transaction_date'. Then does many joins. It takes too much time to run - what am I doing wrong?
CREATE VIEW [IRWSMCMaterialization].[FactInventoryItemOnHandDailyView] AS
WITH TempTBLFactIvnItmDaily AS (
SELECT TOP 20
ITEM_NUMBER AS [InventoryItemNumber]
,CAST(FORMAT(TRANSACTION_DATE, 'yyyyMMdd') AS INT) AS [DateKey]
,BRANCH_PLANT_FHK AS [BranchPlantKey]
,BRANCH_PLANT_CODE AS [BranchPlantCode]
,CAST(QUANTITY_ON_HAND AS BIGINT) AS [QuantityOnHand]
,TRANSACTION_DATE AS [Date]
,WAREHOUSE_LOCATION_FHK AS [WarehouseLocationKey]
,WAREHOUSE_LOCATION_CODE AS [WarehouseLocationCode]
,WAREHOUSE_LOT_NUMBER_CODE AS [WarehouseLotNumber]
,WAREHOUSE_LOT_NUMBER_FHK AS [WarehouseLotNumberKey]
,UNIT_OF_MEASURE AS [UnitOfMeasureName]
,UNIT_OF_MEASURE_PHK AS [UnitOfMeasureKey]
FROM dbo.RS_INV_ITEM_ON_HAND
-- below is where clause, choose only most recent entry
WHERE TRANSACTION_DATE = (SELECT TOP 1 TRANSACTION_DATE FROM dbo.RS_INV_ITEM_ON_HAND ORDER BY TRANSACTION_DATE DESC)
)
SELECT [InventoryItemNumber],
[DateKey],
[Date],
[BranchPlantCode] AS [BP],
[WarehouseLocationCode] AS [Location],
[QuantityOnHand],
[UnitOfMeasureName] AS [UoM],
CASE [WarehouseLotNumber]
WHEN 'Not Assigned' THEN NULL
ELSE [WarehouseLotNumber]
END
AS [Lot]
FROM TempTBLFactIvnItmDaily iioh
JOIN DWH.DimBranchPlant bp ON iioh.BranchPlantKey = bp.BRANCH_PLANT_PHK
JOIN DWH.DimWarehouseLocation wloc ON iioh.WarehouseLocationKey = wloc.WAREHOUSE_LOCATION_PHK
JOIN DWH.DimWarehouseLotNumber wlot ON iioh.WarehouseLotNumberKey = wlot.WarehouseLotNumber_PHK
JOIN DWH.DimUnitOfMeasure uom ON CAST(iioh.UnitOfMeasureKey AS VARCHAR(100)) = uom.UNIT_OF_MEASURE_PHK
where bp.BRANCH_PLANT_CODE = '96100'
AND iioh.QuantityOnHand > 0
AND (wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
GO
There are a lot of things that does not seems good. First of all, your base query must be a lot simpler. Something like this:
SELECT iioh.ITEM_NUMBER AS [InventoryItemNumber],
CAST(FORMAT(iioh.TRANSACTION_DATE, 'yyyyMMdd') AS INT) AS [DateKey],
iioh.TRANSACTION_DATE AS [Date],
iioh.BRANCH_PLANT_CODE AS [BP],
iioh.WAREHOUSE_LOCATION_CODE AS [Location],
CAST(iioh.QUANTITY_ON_HAND AS BIGINT) AS [QuantityOnHand],
iioh.UNIT_OF_MEASURE AS [UoM],
NULLIF(iioh.WAREHOUSE_LOT_NUMBER_CODE, 'Not Assigned') AS [Lot]
FROM dbo.RS_INV_ITEM_ON_HAND iioh
JOIN DWH.DimBranchPlant bp
ON iioh.BranchPlantKey = bp.BRANCH_PLANT_PHK
JOIN DWH.DimWarehouseLocation wloc
ON iioh.WarehouseLocationKey = wloc.WAREHOUSE_LOCATION_PHK
JOIN DWH.DimUnitOfMeasure uom
ON CAST(iioh.UnitOfMeasureKey AS VARCHAR(100)) = uom.UNIT_OF_MEASURE_PHK
where bp.BRANCH_PLANT_CODE = '96100'
AND iioh.QuantityOnHand > 0
AND (wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
AND iioh.TRANSACTION_DATE = #TRANSACTION_DATE
For example, you are joining the DWH.DimWarehouseLotNumber but you are not extracting columns - do you really need it? Also, there are other columns which are not returned by the view - why to query them?
From, there you are first filtering by date and then y other fields, so your first TOP 20 records may be filtered by the next conditions - is this a behavior you want?
Also, do you really want this cast?
ON CAST(iioh.UnitOfMeasureKey AS VARCHAR(100)) = uom.UNIT_OF_MEASURE_PHK
It's better to use CONVERT, not FORMAT in performance aspect. Also, why not saving/materializing the TRANSACTION_DATE as INT (for example using a persisted computed column or just on CRUD) instead of calculating this value on each read?
Filtering by location code using LIKE clause can heart the performance, too. Why not adding a new column WareHouseLocationCodeType and set a same value for all locations satisfying this condition:
(wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
Then you can filter by this column in the view since this is very important for you. Also, you can create filter index on this column to increase the performance, more.
Also, you may want to create a inline-function instead a view and pass the date as parameter:
CREATE OR ALTER FUNCTION [IRWSMCMaterialization].[FactInventoryItemOnHandDailyView]
(
#TRANSACTION_DATE datetime
)
RETURNS TABLE
AS
RETURN
(
SELECT iioh.ITEM_NUMBER AS [InventoryItemNumber],
CAST(FORMAT(iioh.TRANSACTION_DATE, 'yyyyMMdd') AS INT) AS [DateKey],
iioh.TRANSACTION_DATE AS [Date],
iioh.BRANCH_PLANT_CODE AS [BP],
iioh.WAREHOUSE_LOCATION_CODE AS [Location],
CAST(iioh.QUANTITY_ON_HAND AS BIGINT) AS [QuantityOnHand],
iioh.UNIT_OF_MEASURE AS [UoM],
NULLIF(iioh.WAREHOUSE_LOT_NUMBER_CODE, 'Not Assigned') AS [Lot]
,iioh.TRANSACTION_DATE
FROM dbo.RS_INV_ITEM_ON_HAND iioh
JOIN DWH.DimBranchPlant bp
ON iioh.BranchPlantKey = bp.BRANCH_PLANT_PHK
JOIN DWH.DimWarehouseLocation wloc
ON iioh.WarehouseLocationKey = wloc.WAREHOUSE_LOCATION_PHK
JOIN DWH.DimUnitOfMeasure uom
ON CAST(iioh.UnitOfMeasureKey AS VARCHAR(100)) = uom.UNIT_OF_MEASURE_PHK
where bp.BRANCH_PLANT_CODE = '96100'
AND iioh.QuantityOnHand > 0
AND (wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
AND iioh.TRANSACTION_DATE = #TRANSACTION_DATE
)
Then call it like this:
SELECT TOP 20 *
FROM [IRWSMCMaterialization].[FactInventoryItemOnHandDailyView] ('2020-12-04')
ORDER BY #TRANSACTION_DATE DESC
The query optimization is science today. If you want to find bottlenecks in your query you can follow some of these steps:
As the first step, enable statistics with these commands:
SET STATISTICS TIME ON;
SET STATISTICS IO ON;
Once you execute these commands in some query windows in the same window execute your query. When your query is executed switch to the Messages tab and you will see a lot of useful information like TIME execution, parse and compile-time and maybe the most interesting I/O reads.
As the second step, try to understand which table has a lot of reads, for example if you are expecting 10 rows from the query, but in some tables you have 10k or 100k logical reads something is wrong. That means for the 10 rows query execution from one table reads 10k pages. Obviously you are missing some index on this table, try to find which index you need.
If you are having some static values in where clause like the following one, then think about Filtered Index:
bp.BRANCH_PLANT_CODE = '96100' AND iioh.QuantityOnHand > 0
Not always, but in some cases conversion can break your indexes if you are casting them or using some other function in where clause like the following one, even you have an index on this column query optimizer will not use it in query execution:
CAST(iioh.UnitOfMeasureKey AS VARCHAR(100))
The last one, if you have OR logical operator in your query try to execute one by one part of your OR logical operator separately see to performance. This logical operator can really kill your query, and this is one example:
AND (wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
Once, you determine here that you don't have any issues you can go more further.

SQL Server Query Slow with Smaller Database

I have 2 tables:
asset - with id_asset, name, ticker (60k rows)
quote_close - with id_asset, refdate, quote_close (22MM rows)
I want to make a filter in name and ticker and return:
id_asset
name
ticker
min(refdate) of the id_asset
max(refdate) of the id_asset
quote_close on max(refdate) of the id_asset
I wrote this query:
WITH tableAssetFiltered AS
(
SELECT
id_asset, ticker, name
FROM
asset
WHERE
ticker LIKE ('%VALE%') AND name LIKE ('%PUT%')
)
SELECT
ast.id_asset, ast.ticker, ast.name,
xx.quote_close as LastQuote, xx.MinDate,
xx.refdate as LastDate
FROM
tableAssetFiltered ast
LEFT JOIN
(SELECT
qc.id_asset, qc.refdate, qc.quote_close, tm.MinDate
FROM
quote_close qc
INNER JOIN
(SELECT
t.id_asset, max(t.refdate) as MaxDate, min(t.refdate) as MinDate
FROM
(SELECT
qc.id_asset, qc.refdate, qc.quote_close
FROM
quote_close qc
WHERE
qc.id_asset IN (SELECT id_asset
FROM tableAssetFiltered)
) t
GROUP BY
t.id_asset) tm ON qc.id_asset = tm.id_asset
AND qc.refdate = tm.MaxDate
) xx ON xx.id_asset = ast.id_asset
ORDER BY
ast.ticker
The results with different filter in name and ticker are:
With ticker like ('%VALE%') AND name like ('%PUT%') it took 00:02:28 and returns 491 rows
With name like ('%PUT%') it took 00:00:02 and returns 16697 rows
With ticker like ('%VALE%') it took 00:00:02 and returns 1102 rows
With no likes it took 00:00:03 and returns 51847 rows
What I can't understand is that the query
SELECT id_asset,ticker, name
FROM Viper.dbo.asset
WHERE ticker like ('%VALE%') AND name like ('%PUT%')
took 00:00:00 to run.
Why does a smaller table took more time to run? Any solution to make it faster?
The slowness could be caused by many things, Hardware, network, caching, etc.
To make the query faster,
1. Make sure that there is an index on ticker.
2. Run update statistics on the table.
3. Try to find a way to remove the '%' at the beginning of the string.
This is okay: 'VALE%'
This will slow down your query: '%VALE'

SQL Server query issues

We have a SQL Server 2012 database with our live project including multiple tables and records, Basically We are facing a problem with SQL queries on a table where two same SQL queries. The first SQL query is taking less execution time and second SQL query is getting tremendously slow to execute.
I don't know why it is happening someone could help me to solve this problem?
Our two queries are given below....
First query (taking so much time to execute):
SELECT * FROM (SELECT TOP 10 TrackerResponse.EventName,TrackerResponse.ReceiveTime,ISNull(TrackerResponse.InputStatus,0) AS InputStatus,
TrackerResponse.Latitude,TrackerResponse.Longitude,TrackerResponse.Speed,
TrackerResponse.TrackerID,TrackerResponse.OdoMeter,TrackerResponse.Direction,
UserCar.CarNo FROM TrackerResponse
INNER JOIN UserCar ON (UserCar.TrackerID = TrackerResponse.TrackerID)
WHERE (TrackerResponse.EventName IS NOT NULL AND TrackerResponse.EventName<>'')
AND TrackerResponse.TrackerID = 112 Order By ID DESC) AS Events)
Second query (taking less execution time):
SELECT * FROM (SELECT TOP 10 TrackerResponse.EventName,TrackerResponse.ReceiveTime,ISNull(TrackerResponse.InputStatus,0) AS InputStatus,
TrackerResponse.Latitude,TrackerResponse.Longitude,TrackerResponse.Speed,
TrackerResponse.TrackerID,TrackerResponse.OdoMeter,TrackerResponse.Direction,
UserCar.CarNo FROM TrackerResponse
INNER JOIN UserCar ON (UserCar.TrackerID = TrackerResponse.TrackerID)
WHERE (TrackerResponse.EventName IS NOT NULL AND TrackerResponse.EventName<>'')
AND TrackerResponse.TrackerID = 56 Order By ID DESC) AS Events
I see Both queries are similar with an exception of tracker id..so my best guess would be ,this query might be a victim of Parameter sniffing..
Try using Option(Recompile) like below to see if this is the cause and follow article referred in above link for resolution
SELECT * FROM (SELECT TOP 10 TrackerResponse.EventName,TrackerResponse.ReceiveTime,ISNull(TrackerResponse.InputStatus,0) AS InputStatus,
TrackerResponse.Latitude,TrackerResponse.Longitude,TrackerResponse.Speed,
TrackerResponse.TrackerID,TrackerResponse.OdoMeter,TrackerResponse.Direction,
UserCar.CarNo FROM TrackerResponse
INNER JOIN UserCar ON (UserCar.TrackerID = TrackerResponse.TrackerID)
WHERE (TrackerResponse.EventName IS NOT NULL AND TrackerResponse.EventName<>'')
AND TrackerResponse.TrackerID = 56 Order By ID DESC
option(recompile)
AS Events)

SQL Query - combine 2 rows into 1 row

I have the following query below (view) in SQL Server. The query produces a result set that is needed to populate a grid. However, a new requirement has come up where the users would like to see data on one row in our app. The tblTasks table can produce 1 or 2 rows. The issue becomes when they're is two rows that have the same job_number but different fldProjectContextId (1 or 31). I need to get the MechApprovalOut and ElecApprovalOut columns on one row instead of two.
I've tried restructuring the query using CTE and over partition and haven't been able to get the necessary results I need.
SELECT TOP (100) PERCENT
CAST(dbo.Job_Control.job_number AS int) AS Job_Number,
dbo.tblTasks.fldSalesOrder, dbo.tblTaskCategories.fldTaskCategoryName,
dbo.Job_Control.Dwg_Sent, dbo.Job_Control.Approval_done,
dbo.Job_Control.fldElecDwgSent, dbo.Job_Control.fldElecApprovalDone,
CASE WHEN DATEDIFF(day, dbo.Job_Control.Dwg_Sent, GETDATE()) > 14
AND dbo.Job_Control.Approval_done IS NULL
AND dbo.tblProjectContext.fldProjectContextID = 1
THEN 1 ELSE 0
END AS MechApprovalOut,
CASE WHEN DATEDIFF(day, dbo.Job_Control.fldElecDwgSent, GETDATE()) > 14
AND dbo.Job_Control.fldElecApprovalDone IS NULL
AND dbo.tblProjectContext.fldProjectContextID = 31
THEN 1 ELSE 0
END AS ElecApprovalOut,
dbo.tblProjectContext.fldProjectContextName,
dbo.tblProjectContext.fldProjectContextId, dbo.Job_Control.Drawing_Info,
dbo.Job_Control.fldElectricalAppDwg
FROM dbo.tblTaskCategories
INNER JOIN dbo.tblTasks
ON dbo.tblTaskCategories.fldTaskCategoryId = dbo.tblTasks.fldTaskCategoryId
INNER JOIN dbo.Job_Control
ON dbo.tblTasks.fldSalesOrder = dbo.Job_Control.job_number
INNER JOIN dbo.tblProjectContext
ON dbo.tblTaskCategories.fldProjectContextId = dbo.tblProjectContext.fldProjectContextId
WHERE (dbo.tblTaskCategories.fldTaskCategoryName = N'Approval'
OR dbo.tblTaskCategories.fldTaskCategoryName = N'Re-Approval')
AND (CASE WHEN DATEDIFF(day, dbo.Job_Control.Dwg_Sent, GETDATE()) > 14
AND dbo.Job_Control.Approval_done IS NULL
AND dbo.tblProjectContext.fldProjectContextID = 1
THEN 1 ELSE 0
END = 1)
OR (dbo.tblTaskCategories.fldTaskCategoryName = N'Approval'
OR dbo.tblTaskCategories.fldTaskCategoryName = N'Re-Approval')
AND (CASE WHEN DATEDIFF(day, dbo.Job_Control.fldElecDwgSent, GETDATE()) > 14
AND dbo.Job_Control.fldElecApprovalDone IS NULL
AND dbo.tblProjectContext.fldProjectContextID = 31
THEN 1 ELSE 0
END = 1)
ORDER BY dbo.Job_Control.job_number, dbo.tblTaskCategories.fldProjectContextId
The above query gives me the following result set:
I've created a work around via code (which I don't like but it works for now) where i've used code to populate a "temp" table the way i need it to display the data, that is, one record if duplicate job numbers to get the MechApprovalOut and ElecApprovalOut columns on one row (see first record in following screen shot).
Example:
With the desired result set and one row per job_number, this is how the form looks with the data and how I am using the result set.
Any help restructuring my query to combine duplicate rows with the same job number where MechApprovalOut and ElecApproval out columns are on one row is greatly appreciated! I'd much prefer to use a view on SQL then code in the app to populate a temp table.
Thanks,
Jimmy
What I would do is LEFT JOIN the main table to itself at the beginning of the query, matching on Job Number and Sales Order, such that the left side of the join is only looking at Approval task categories and the right side of the join is only looking at Re-Approval task categories. Then I would make extensive use of the COALESCE() function to select data from the correct side of the join for use later on and in the select clause. This may also be the piece you were missing to make a CTE work.
There is probably also a solution that uses a ranking/windowing function (maybe not RANK itself, but something that category) along with the PARTITION BY clause. However, as those are fairly new to Sql Server I haven't used them enough personally to be comfortable writing an example solution for you without direct access to the data to play with, and it would still take me a little more time to get right than I can devote to this right now. Maybe this paragraph will motivate someone else to do that work.

How to optimize this low-performance MySQL query?

I’m currently using the following query for jsPerf. In the likely case you don’t know jsPerf — there are two tables: pages containing the test cases / revisions, and tests containing the code snippets for the tests inside the test cases.
There are currently 937 records in pages and 3817 records in tests.
As you can see, it takes quite a while to load the “Browse jsPerf” page where this query is used.
The query takes about 7 seconds to execute:
SELECT
id AS pID,
slug AS url,
revision,
title,
published,
updated,
(
SELECT COUNT(*)
FROM pages
WHERE slug = url
AND visible = "y"
) AS revisionCount,
(
SELECT COUNT(*)
FROM tests
WHERE pageID = pID
) AS testCount
FROM pages
WHERE updated IN (
SELECT MAX(updated)
FROM pages
WHERE visible = "y"
GROUP BY slug
)
AND visible = "y"
ORDER BY updated DESC
I’ve added indexes on all fields that appear in WHERE clauses. Should I add more?
How can this query be optimized?
P.S. I know I could implement a caching system in PHP — I probably will, so please don’t tell me :) I’d just really like to find out how this query could be improved, too.
Use:
SELECT x.id AS pID,
x.slug AS url,
x.revision,
x.title,
x.published,
x.updated,
y.revisionCount,
COALESCE(z.testCount, 0) AS testCount
FROM pages x
JOIN (SELECT p.slug,
MAX(p.updated) AS max_updated,
COUNT(*) AS revisionCount
FROM pages p
WHERE p.visible = 'y'
GROUP BY p.slug) y ON y.slug = x.slug
AND y.max_updated = x.updated
LEFT JOIN (SELECT t.pageid,
COUNT(*) AS testCount
FROM tests t
GROUP BY t.pageid) z ON z.pageid = x.id
ORDER BY updated DESC
You want to learn how to use EXPLAIN. This will execute the sql statement, and show you which indexes are being used, and what row scans are being performed. The goal is to reduce the number of row scans (ie, the database searching row by row for values).
You may want to try the subqueries one at a time to see which one is slowest.
This query:
SELECT MAX(updated)
FROM pages
WHERE visible = "y"
GROUP BY slug
Makes it sort the result by slug. This is probably slow.