ROW_NUMBER count returns invalid result - sql

I am quite new to SQL Server and trying to improve myself. I prepared a statement for fetching customers that are cancelled (where customerCancel is true)
Normally, when I count total number of cancelled customers, the total number is 1050.
What I want to do is to show first 100 users, but when I run the query below, I only get 38, when I increase RowNumber manually, the result is increasing but not being same with actual result. I will use this query for pagination.
My query:
SELECT
COUNT(*) OVER() TotalRowCount,
ID, customerNo, customerName, customerSurname, customerTitle, customerUnitList, customerTotalList
FROM
(SELECT
ROW_NUMBER() OVER(ORDER BY m.ID) RowNumber,
COUNT(*) OVER() TotalRowCount,
m.ID, m.customerNo, m.customerName, m.customerSurname, m.customerTitle,
(SELECT COUNT(f.ID)
FROM Invoices f
WHERE f.Paid = 0
AND f.custumerCancel = 0
AND f.customerID = m.ID) AS customerUnitList,
COALESCE((SELECT SUM(f.Total) AS InvoiceNo
FROM Invoices f
WHERE f.Paid = 0
AND f.custumerCancel = 0
AND f.customerID = m.ID), 0) AS customerTotalList
FROM
Customers m) flist
WHERE
customerTotalList > 0
AND RowNumber between 1 AND 100
I tried several way to fix it but no luck.

Try this query
SELECT * FROM (SELECT Count(*) OVER() TotalRowCount,
Row_number()
OVER(ORDER BY id) RowNumber,
id,
customerno,
customername,
customersurname,
customertitle,
customerunitlist,
customertotallist
FROM (SELECT m.id,
m.customerno,
m.customername,
m.customersurname,
m.customertitle,
(SELECT Count(f.id)
FROM invoices f
WHERE f.paid = 0
AND f.custumercancel = 0
AND f.customerid = m.id) AS
customerUnitList,
Isnull((SELECT Sum(f.total) AS InvoiceNo
FROM invoices f
WHERE f.paid = 0
AND f.custumercancel = 0
AND f.customerid = m.id), 0) AS
customerTotalList
FROM customers m) flist
WHERE customertotallist > 0) x
WHERE rownumber BETWEEN 1 AND 100
you are supposed to apply row number filter for paging, only after applying all your custom filters.

Using WHERE outside of the row_number() creating query means that some rows might be filtered by that. I bet that if you comment out the line before the last (customerTotalList>0) you will always get 100 rows.
If you want 100, you can just use
select top 100..........order by RowNumber asc

Which version of MS SQL Server do you have? This old approach was usable before to MS SQL Server 2012, from 2012 and up you have 0FFSET FETCH for pagination scenario.

Related

CTE self join slow down the execution

I am using the following query in SP.
DECLARE #DateFrom datetime = '01/01/1753',
#DateTo datetime = '12/31/9999'
BEGIN
WITH tmpTethers
AS
(
SELECT TL.str_systemid AS SystemCode,
ISNULL(ml.name, ml.location) AS [System],
TL.dte_created AS [Date],
TL.str_LengthId AS TetherRegId,
0 AS LengthCut,
ISNULL(TL.dbl_newlength, 0) AS LengthAdded,
CAST(0 AS FLOAT) AS RemainingLength,
1 AS Mode,
UT.description AS UOM
FROM OP_TetherLength AS TL
INNER JOIN master_location AS ML ON ML.location = TL.str_systemid
LEFT JOIN udc_type AS UT ON TL.lng_lengthuom = UT.udc
WHERE (TL.dte_dateadded BETWEEN #DateFrom AND #DateTo)
UNION ALL
SELECT RR.systemcode AS SystemCode,
ISNULL(ML.name, ML.location) AS [System],
RR.datecreated AS [Date],
RR.oms_repairid AS TetherRegId,
ISNULL(RR.cutlength, 0) AS LengthCut,
0 AS LengthAdded,
0 AS RemainingLength,
0 AS Mode,
UT.description AS UOM
FROM Repair_Registration AS RR
INNER JOIN master_location AS ML ON RR.systemcode = ml.location
LEFT JOIN udc_type AS UT ON RR.cutlength_uomid = UT.udc
WHERE --RR.cut_umbilical_tether = 0 AND
RR.cutbackrequired = 1 AND
(RR.datecreated BETWEEN #DateFrom AND #DateTo)
),
tmpOrderedTethers
AS
(
SELECT TOP 1000
SystemCode,
[System],
[Date],
TetherRegId,
LengthCut,
LengthAdded,
RemainingLength,
Mode,
UOM,
ROW_NUMBER() OVER(PARTITION BY SystemCode ORDER BY [Date] ) AS RowNumber
FROM tmpTethers
ORDER BY SystemCode
),
tmpFinalTethers
AS
(
SELECT SystemCode,
[System],
[Date],
TetherRegId,
LengthCut,
LengthAdded,
CASE
WHEN Mode = 1 THEN LengthAdded
ELSE 0 - LengthCut
END AS RemainingLength,
Mode,
UOM,
RowNumber
FROM tmpOrderedTethers
WHERE RowNumber = 1
UNION ALL
SELECT tmpOT.SystemCode,
tmpOT.[System],
tmpOT.[Date],
tmpOT.TetherRegId,
tmpOT.LengthCut,
tmpOT.LengthAdded,
CASE
WHEN tmpOT.Mode = 1 THEN /*tmpFT.RemainingLength +*/ tmpOT.LengthAdded
ELSE tmpFT.RemainingLength - tmpOT.LengthCut
END AS RemainingLength,
CASE
WHEN tmpOT.Mode = 1 OR tmpFT.Mode = 1 THEN 1
ELSE 0
END AS Mode,
tmpOT.UOM,
tmpOT.RowNumber
FROM tmpOrderedTethers AS tmpOT
INNER JOIN tmpFinalTethers AS tmpFT ON tmpFT.SystemCode = tmpOT.SystemCode AND
tmpFT.RowNumber = tmpOT.RowNumber - 1
),
---- FT - Previous
---- OT - Current
SELECT SystemCode,
[System],
[Date],
TetherRegId,
LengthCut,
LengthAdded,
RemainingLength,
UOM,
RowNumber
,ROW_NUMBER() OVER(PARTITION BY SystemCode ORDER BY [Date] desc) AS SortNumber
FROM tmpGetFinalTethers
ORDER BY SystemCode, SortNumber
OPTION (MAXRECURSION 1000)
END
In above query when I am commenting the following part then execution time reduced and data come fast:
SELECT tmpOT.SystemCode,
tmpOT.[System],
tmpOT.[Date],
tmpOT.TetherRegId,
tmpOT.LengthCut,
tmpOT.LengthAdded,
CASE
WHEN tmpOT.Mode = 1 THEN /*tmpFT.RemainingLength +*/ tmpOT.LengthAdded
ELSE tmpFT.RemainingLength - tmpOT.LengthCut
END AS RemainingLength,
CASE
WHEN tmpOT.Mode = 1 OR tmpFT.Mode = 1 THEN 1
ELSE 0
END AS Mode,
tmpOT.UOM,
tmpOT.RowNumber
FROM tmpOrderedTethers AS tmpOT
INNER JOIN tmpFinalTethers AS tmpFT ON tmpFT.SystemCode = tmpOT.SystemCode AND
tmpFT.RowNumber = tmpOT.RowNumber - 1
Please let me know how I can refine this.
It seems like you have row by row processing in your [tmpFinalTethers] and [tmpGetFinalTethers] cte's.
Each row returned in [tmpFinalTethers] is based on [tmpOrderedTethers] and [tmpOrderedTethers]'s data is based on [tmpTethers]. Therefore the logic which contains in [tmpOrderedTethers] and [tmpTethers] will be executed n times, where n is a number of rows returned by [tmpFinalTethers].
The reason is because cte's are not materialized objects. They are not get stored in memory or disc, so they're executing each time you reference them outside of declaration.
Loading the resultset of [tmpOrderedTethers] to temp table may help if you really need row by row processing for your task and don't have other options.
Also it seems like your [tmpFinalTethers] and [tmpGetFinalTethers] have the same logic inside. I am not sure what the purpose for it. Mb you can do final select from [tmpFinalTethers] and get rid of [tmpGetFinalTethers].
Edited:
Try smth like this:
;WITH tmpTethers AS (...),
tmpOrderedTethers AS (...)
SELECT * INTO #tmpOrderedTethers FROM tmpOrderedTethers
;WITH tmpFinalTethers (
SELECT ... FROM #tmpOrderedTethers WHERE ...
UNION ALL
SELECT ... FROM #tmpOrderedTethers tmpOT INNER JOIN ...
)
Edited 2:
As you have OPTION (MAXRECURSION 1000) I suppose you always get 1000<= number of rows. For such amount of rows your solution with recursive cte combined with temp table will probably work. At least it would be better than cursor, because it consumes some resources in addition to row by row processing. But if you will need to process let's say 10 000 of rows then row by row processing is definitely not appropriate solution and you should find another one.

SQL - ROW_NUMBER that is used in a multi-condition LEFT JOIN

Two tables store different properties for each product: CTI_ROUTING_VIEW and ORD_MACH_OPS
They are both organized by SPEC_NO > MACH_SEQ_NO but the format of the Sequence number is different for each table so it can't be used for a JOIN. ORCH_MACH_OPS has MACHINE and PASS_NO, meaning if a product goes through the same machine twice, the row with the higher SEQ_NO will be PASS_NO 2, 3, etc. CTI_ROUTING_VIEW does not offer PASS_NO, but I can achieve the desired result with:
SELECT TOP (1000) [SPEC_NO]
,[SPEC_PART_NO]
,[MACH_NO]
,[MACH_SEQ_NO]
,[BLANK_WID]
,[BLANK_LEN]
,[NO_OUT_WID]
,[NO_OUT_LEN]
,[SU_MINUTES]
,[RUN_SPEED]
,[NO_COLORS]
,[PRINTDIEID]
,[CUTDIEID]
,ROW_NUMBER() OVER (PARTITION BY MACH_NO ORDER BY MACH_SEQ_NO) as PASS_NO
FROM [CREATIVE].[dbo].[CTI_ROUTING_VIEW]
I would think that I could use this artificial PASS_NO as a JOIN condition, but I can't seem to get it to come through. This is my first time using ROW_NUMBER() so I'm just wondering if I'm doing something wrong in the JOIN syntax.
SELECT rOrd.[SPEC_NO]
,rOrd.[MACH_SEQ_NO]
,rOrd.[WAS_REROUTED]
,rOrd.[NO_OUT]
,rOrd.[PART_COMP_FLG]
,rOrd.[SCHED_START]
,rOrd.[SCHED_STOP]
,rOrd.[MACH_REROUTE_FLG]
,rOrd.[MACH_DESCR]
,rOrd.REPLACED_MACH_NO
,rOrd.MACH_NO
,rOrd.PASS_NO
,rWip.MAX_TRX_DATETIME
,ISNULL(rWip.NET_FG_SUM*rOrd.NO_OUT,0) as NET_FG_SUM
,CASE
WHEN rCti.BLANK_WID IS NULL then 'N//A'
ELSE CONCAT(rCti.BLANK_WID, ' X ', rCti.BLANK_LEN)
END AS SIZE
,ISNULL(rCti.PRINTDIEID,'N//A') as PRINTDIEID
,ISNULL(rCti.CUTDIEID, 'N//A') as CUTDIEID
,rStyle.DESCR as STYLE
,ISNULL(rCti.NO_COLORS, 0) as NO_COLORS
,CAST(CONCAT(rOrd.ORDER_NO,'-',rOrd.ORDER_PART_NO) as varchar) as ORD_MACH_KEY
FROM [CREATIVE].[dbo].[ORD_MACH_OPS] as rOrd
LEFT JOIN (SELECT DISTINCT
[SPEC_NO]
,[SPEC_PART_NO]
,[MACH_NO]
,MACH_SEQ_NO
,[BLANK_WID]
,[BLANK_LEN]
,[NO_COLORS]
,[PRINTDIEID]
,[CUTDIEID]
,ROW_NUMBER() OVER (PARTITION BY MACH_NO ORDER BY MACH_SEQ_NO) as PASS_NO
FROM [CREATIVE].[dbo].[CTI_ROUTING_VIEW]) as rCti
ON rCti.SPEC_NO = rOrd.SPEC_NO
and rCti.MACH_NO =
CASE
WHEN rOrd.REPLACED_MACH_NO is null then rOrd.MACH_NO
ELSE rOrd.REPLACED_MACH_NO
END
and rCti.PASS_NO = rOrd.PASS_NO
LEFT JOIN INVENTORY_ITEM_TAB as rTab
ON rTab.SPEC_NO = rOrd.SPEC_NO
LEFT JOIN STYLE_DESCRIPTION as rStyle
ON rStyle.DESCR_CD = rTab.STYLE_CD
LEFT JOIN (
SELECT
JOB_NUMBER
,FORM_NO
,TRX_ORIG_MACH_NO
,PASS_NO
,SUM(GROSS_FG_QTY-WASTE_QTY) as NET_FG_SUM
,MAX(TRX_DATETIME) as MAX_TRX_DATETIME
FROM WIP_MACH_OPS
WHERE GROSS_FG_QTY <> 0
GROUP BY JOB_NUMBER, FORM_NO, TRX_ORIG_MACH_NO, PASS_NO) as rWip
ON rWip.JOB_NUMBER = rOrd.ORDER_NO
and rWip.FORM_NO = rOrd.ORDER_PART_NO
and rWip.TRX_ORIG_MACH_NO = rOrd.MACH_NO
and rWip.PASS_NO = rOrd.PASS_NO
WHERE rOrd.SCHED_START > DATEADD(DAY, -20, GETDATE())
I fixed it by adding a second partition.
ROW_NUMBER() OVER (PARTITION BY SPEC_NO, MACH_NO ORDER BY MACH_SEQ_NO) as PASS_NO

compute Sum of Sum in sql

in SQL when I want to compute Sum of one column that is also Sum base on grouping, the total value is not correct.here I want to compute sum of Mand that is Sum(Qty) but the final result isn't correct.
Select Sum(Mand) from (Select TrackingFactor1 As Number ,SUM(Qty) As Mand from(
Select t.TrackingFactor1,SUM(
CASE
WHEN Direction = 1 THEN t.MajorUnitQuantity
WHEN Direction = 2 THEN -t.MajorUnitQuantity
ELSE 0
END
) AS Qty,
ROW_NUMBER() OVER(ORDER BY i.Date, i.VoucherCreationDate, v.InventoryVoucherID, i.InventoryVoucherItemID) AS
RowNumber
from LGS3.InventoryVoucher AS v
INNER JOIN LGS3.InventoryVoucherItem i ON v.InventoryVoucherID = I.InventoryVoucherRef
LEFT OUTER JOIN LGS3.InventoryVoucherItemTrackingFactor t ON i.InventoryVoucherItemID = t.InventoryVoucherItemRef
GROUP BY
v.InventoryVoucherID,
v.Number,
i.Date,
i.VoucherCreationDate,
i.InventoryVoucherItemID,
i.CounterpartEntityText,
t.TrackingFactor1 ,
v.FiscalYearRef
)As A
group by TrackingFactor1
Having SUM(Qty)>0) As AA

ROW_NUMBER() Query Plan SORT Optimization

The query below accesses the Votes table that contains over 30 million rows. The result set is then selected from using WHERE n = 1. In the query plan, the SORT operation in the ROW_NUMBER() windowed function is 95% of the query's cost and it is taking over 6 minutes to complete execution.
I already have an index on same_voter, eid, country include vid, nid, sid, vote, time_stamp, new to cover the where clause.
Is the most efficient way to correct this to add an index on vid, nid, sid, new DESC, time_stamp DESC or is there an alternative to using the ROW_NUMBER() function for this to achieve the same results in a more efficient manner?
SELECT v.vid, v.nid, v.sid, v.vote, v.time_stamp, v.new, v.eid,
ROW_NUMBER() OVER (
PARTITION BY v.vid, v.nid, v.sid ORDER BY v.new DESC, v.time_stamp DESC) AS n
FROM dbo.Votes v
WHERE v.same_voter <> 1
AND v.eid <= #EId
AND v.eid > (#EId - 5)
AND v.country = #Country
One possible alternative to using ROW_NUMBER():
SELECT
V.vid,
V.nid,
V.sid,
V.vote,
V.time_stamp,
V.new,
V.eid
FROM
dbo.Votes V
LEFT OUTER JOIN dbo.Votes V2 ON
V2.vid = V.vid AND
V2.nid = V.nid AND
V2.sid = V.sid AND
V2.same_voter <> 1 AND
V2.eid <= #EId AND
V2.eid > (#EId - 5) AND
V2.country = #Country AND
(V2.new > V.new OR (V2.new = V.new AND V2.time_stamp > V.time_stamp))
WHERE
V.same_voter <> 1 AND
V.eid <= #EId AND
V.eid > (#EId - 5) AND
V.country = #Country AND
V2.vid IS NULL
The query basically says to get all rows matching your criteria, then join to any other rows that match the same criteria, but which would be ranked higher for the partition based on the new and time_stamp columns. If none are found then this must be the row that you want (it's ranked highest) and if none are found that means that V2.vid will be NULL. I'm assuming that vid otherwise can never be NULL. If it's a NULLable column in your table then you'll need to adjust that last line of the query.

Fastest way to check if the the most recent result for a patient has a certain value

Mssql < 2005
I have a complex database with lots of tables, but for now only the patient table and the measurements table matter.
What I need is the number of patient where the most recent value of 'code' matches a certain value. Also, datemeasurement has to be after '2012-04-01'. I have fixed this in two different ways:
SELECT
COUNT(P.patid)
FROM T_Patients P
WHERE P.patid IN (SELECT patid
FROM T_Measurements M WHERE (M.code ='xxxx' AND result= 'xx')
AND datemeasurement =
(SELECT MAX(datemeasurement) FROM T_Measurements
WHERE datemeasurement > '2012-01-04' AND patid = M.patid
GROUP BY patid
GROUP by patid)
AND:
SELECT
COUNT(P.patid)
FROM T_Patient P
WHERE 1 = (SELECT TOP 1 case when result = 'xx' then 1 else 0 end
FROM T_Measurements M
WHERE (M.code ='xxxx') AND datemeasurement > '2012-01-04' AND patid = P.patid
ORDER by datemeasurement DESC
)
This works just fine, but it makes the query incredibly slow because it has to join the outer table on the subquery (if you know what I mean). The query takes 10 seconds without the most recent check, and 3 minutes with the most recent check.
I'm pretty sure this can be done a lot more efficient, so please enlighten me if you will :).
I tried implementing HAVING datemeasurment=MAX(datemeasurement) but that keeps throwing errors at me.
So my approach would be to write a query just getting all the last patient results since 01-04-2012, and then filtering that for your codes and results. So something like
select
count(1)
from
T_Measurements M
inner join (
SELECT PATID, MAX(datemeasurement) as lastMeasuredDate from
T_Measurements M
where datemeasurement > '01-04-2012'
group by patID
) lastMeasurements
on lastMeasurements.lastmeasuredDate = M.datemeasurement
and lastMeasurements.PatID = M.PatID
where
M.Code = 'Xxxx' and M.result = 'XX'
The fastest way may be to use row_number():
SELECT COUNT(m.patid)
from (select m.*,
ROW_NUMBER() over (partition by patid order by datemeasurement desc) as seqnum
FROM T_Measurements m
where datemeasurement > '2012-01-04'
) m
where seqnum = 1 and code = 'XXX' and result = 'xx'
Row_number() enumerates the records for each patient, so the most recent gets a value of 1. The result is just a selection.