optimizing row number query

optimizing row number query - sql

I am using sql server 2008 r2 and had below query
select * from
(
select d.ID as ID,
....
....
ROW_NUMBER() OVER
(
ORDER BY #some field
) AS RowNum
from
/*some table*/
LEFT join
(select Device_ID,
Level,
ROW_NUMBER() over (partition by Device_ID order by id desc) as rn
from #sometable as de WITH (NOLOCK)
where #some condition
) t
where t.rn = 1)tmp on ID=tmp.Device_ID **/* sort operation 1*/**
/*some more joins */
WHERE /*some condition*/
) as DbD
where RowNum BETWEEN #SkipRowsLocal and (#SkipRowsLocal + #TakeRowsLocal - 1)
order by RowNum
I am trying to implement pagination kind of query from sample
http://blog.sqlauthority.com/2013/04/14/sql-server-tricks-for-row-offset-and-paging-in-various-versions-of-sql-server/
but looks like it's executing very slow, when I looked into query plan sort operation is consuming almost 50% of query time and i guess its the 1st sort operation which I marked as 1, basically in temp table t I want to retrieve latest value and in outer row number I wanted to fetch say only 40 records.
It's basically like sorting 10K rows and then taking 40 out of it, is there is any way we can improve this query?

instead of a left join, try an OUTER APPLY, to get the TOP 1 of the device you are interested in - think of it as 'I get a record from the first table, then for that record, I go and get the TOP 1 for that device ID in the other table, but remembering that in the second table, you need to search by the data from the first table, rather than a join
select * from
(
select d.ID as ID,
....
....
ROW_NUMBER() OVER
(
ORDER BY #some field
) AS RowNum
from
/*some table*/
OUTER APPLY
(select TOP 1 Device_ID,
Level
from #sometable as de WITH (NOLOCK)
where #some condition and Device_ID = [/*some table*/].id ORDER BY id DESC
) t
)tmp
/*some more joins */
WHERE /*some condition*/
) as DbD
where RowNum BETWEEN #SkipRowsLocal and (#SkipRowsLocal + #TakeRowsLocal - 1)
order by RowNum
something on those lines, you might want to build just the relevant parts of the query and compare them, it would seem to avoid sorting and with an index might well be quick

Related

Foreach/per-item iteration in SQL

I'm new to SQL and I think I must just be missing something, but I can't find any resources on how to do the following:
I have a table with three relevant columns: id, creation_date, latest_id. latest_id refers to the id of another entry (a newer revision).
For each entry, I would like to find the min creation date of all entries with latest_id = this.id. How do I perform this type of iteration in SQL / reference the value of the current row in an iteration?

select
t.id, min(t2.creation_date) as min_creation_date
from
mytable t
left join
mytable t2 on t2.latest_id = t.id
group by
t.id

You could solve this with a loop, but it's not anywhere close the best strategy. Instead, try this:
SELECT tf.id, tf.Creation_Date
FROM
(
SELECT t0.id, t1.Creation_Date,
row_number() over (partition by t0.id order by t1.creation_date) rn
FROM [MyTable] t0 -- table prime
INNER JOIN [MyTable] t1 ON t1.latest_id = t0.id -- table 1
) tf -- table final
WHERE tf.rn = 1
This connects the id to the latest_id by joining the table to itself. Then it uses a windowing function to help identify the smallest Creation_Date for each match.

Listing multiple columns in a single row in SQL

(select ID,EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE,ROW_NUMBER() OVER(PARTITION BY EXTERNAL_TRANSACTION_ID ORDER BY ID ) AS SEQNUM
from AC_POS_TRANSACTION_TRK aptt WHERE [RESULT] ='Success'
GROUP BY ID, EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE )
Hello,
On above query, I want to get rows of transaction id's which has seqnum=1 and seqnum=2
But if that transaction id has no second row (seqnum=2), I dont want to get any row for that transaction id.
Thanks!!
Something like this

Not 100% sure if this is correct without you table definition, but my understanding is that you want to EXCLUDE records if that record has an entry with seqnum=2 -- you can't use a where clause alone because that would still return seqnum = 1.
You can use an exists /not exists or in/not in clause like this
(select ID,EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE,ROW_NUMBER() OVER(PARTITION BY EXTERNAL_TRANSACTION_ID ORDER BY ID ) AS SEQNUM
from AC_POS_TRANSACTION_TRK aptt WHERE [RESULT] ='Success'
and not exists ( select 1 from AC_POS_TRANSACTION_TRK a where a.id = aptt.id
and a.seqnum = 2)
GROUP BY ID, EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE )
basically what this does is it excludes records if a record exists as specified in the NOT EXISTS query.

One option you can try is to add a count of rows per group using the same partioning critera and then filter accordingly. Not entirely sure about your query without seeing it in context and with sample data - there's no aggregation so why use group by?
However can you try something along these lines
select * from (
select ID,EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE,
Row_Number() over(partition by EXTERNAL_TRANSACTION_ID order by ID) as SEQNUM,
Count(*) over(partition by EXTERNAL_TRANSACTION_ID) Qty
from AC_POS_TRANSACTION_TRK
where [RESULT] ='Success'
)x
where SEQNUM in (1,2) and Qty>1

This should do the job.
With Qry As (
-- Your original query goes here
),
Select Qry.*
From Qry
Where Exists (
Select *
From Qry Qry1
Where Qry1.EXTERNAL_TRANSACTION_ID = Qry.EXTERNAL_TRANSACTION_ID
And Qry1.SEQNUM = 1
)
And Exists (
Select *
From Qry Qry2
Where Qry2.EXTERNAL_TRANSACTION_ID = Qry.EXTERNAL_TRANSACTION_ID
And Qry2.SEQNUM = 2
)
BTW, your original query looks problematic to me, specifically I think that instead of a GROUP BY columns those columns should be in the PARTITION BY clause of the OVER statement, but without knowing more about the table structures and what you're trying to achieve, I could not say for sure.

Performance for multiple partitioned subqueries

I have a database with one main table and multiple history/log tables that stores the evolution over time of properties of some rows of main table. These properties are not stored on the main table itself, but must be queried from the relevant history/log table. All these tables are big (on the order of gigabytes).
I want to dump the whole main table and join the last entry of all the history/log tables.
Currently I do it via subqueries as follows:
WITH
foo AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY itemid ORDER BY date DESC) AS rownumber,
...
FROM table1),
bar AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY itemid ORDER BY date DESC) AS rownumber,
...
FROM table2)
SELECT
...
FROM maintable mt
JOIN foo foo ON foo.itemid = mt.itemid AND foo.rownumber = 1
JOIN bar bar ON foo.itemid = mt.itemid AND bar.rownumber = 1
WHERE ...
The problem is that this is very slow. Is there a faster solution to this problem?
I am only allowed to perform read-only queries on this database: I can not make any changes to it.

In actual Oracle versions it's usually better to use laterals/CROSS APPLY, because CBO (oracle cost-based optimizer) can transform them (DCL - lateral view decorrelation transformation) and use optimal join method depending on your circumstances/conditions (table statistics, cardinality, etc).
So it would be something like this:
SELECT
...
FROM maintable mt
CROSS APPLY (
SELECT *
FROM table1
WHERE table1.itemid = mt.itemid
ORDER BY date DESC
fetch first 1 row only
)
CROSS APPLY (
SELECT *
FROM table2
WHERE table2.itemid = mt.itemid
ORDER BY date DESC
fetch first 1 row only
)
WHERE ...
PS. You haven't specified your oracle version, so my answer is for Oracle 12+

Select all but last row in Oracle SQL

I want to pull all rows except the last one in Oracle SQL
My database is like this
Prikey - Auto_increment
common - varchar
miles - int
So I want to sum all rows except the last row ordered by primary key grouped by common. That means for each distinct common, the miles will be summed (except for the last one)

Note: the question was changed after this answer was posted. The first two queries work for the original question. The last query (in the addendum) works for the updated question.
This should do the trick, though it will be a bit slow for larger tables:
SELECT prikey, authnum FROM myTable
WHERE prikey <> (SELECT MAX(prikey) FROM myTable)
ORDER BY prikey
This query is longer but for a large table it should faster. I'll leave it to you to decide:
SELECT * FROM (
SELECT
prikey,
authnum,
ROW_NUMBER() OVER (ORDER BY prikey DESC) AS RowRank
FROM myTable)
WHERE RowRank <> 1
ORDER BY prikey
Addendum There was an update to the question; here's the updated answer.
SELECT
common,
SUM(miles)
FROM (
SELECT
common,
miles,
ROW_NUMBER() OVER (PARTITION BY common ORDER BY prikey DESC) AS RowRank
FROM myTable
)
WHERE RowRank <> 1
GROUP BY common

Looks like I am a little too late but here is my contribution, similar to Ed Gibbs' first solution but instead of calculating the max id for each value in the table and then comparing I get it once using an inline view.
SELECT d1.prikey,
d1.authnum
FROM myTable d1,
(SELECT MAX(prikey) prikey myTable FROM myTable) d2
WHERE d1.prikey != d2.prikey
At least I think this is more efficient if you want to go without the use of Analytics.

query to retrieve all the records in the table except first row and last row
select * from table_name
where primary_id_column not in
(
select top 1 * from table_name order by primary_id_column asc
)
and
primary_id_column not in
(
select top 1 * from table_name order by primary_id_column desc
)

MSSQL 2008 SP pagination and count number of total records

In my SP I have the following:
with Paging(RowNo, ID, Name, TotalOccurrences) as
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
SELECT * FROM Paging WHERE RowNo BETWEEN 1 and 50
SELECT COUNT(*) FROM Paging
The result is that I get the error: invalid object name 'Paging'.
Can I query again the Paging table? I don't want to include the count for all results as a new column ... I would prefer to return as another data set. Is that possible?
Thanks, Radu

After more research I fond another way of doing this:
with Paging(RowNo, ID, Name, TotalOccurrences) AS
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
select RowNo, ID, Name, TotalOccurrences, (select COUNT(*) from Paging) as TotalResults from Paging where RowNo between (#PageNumber - 1 )* #PageSize + 1 and #PageNumber * #PageSize;
I think that this has better performance than calling two times the query.

You can't do that because the CTE you are defining will only be available to the FIRST query that appears after it's been defined. So when you run the COUNT(*) query, the CTE is no longer available to reference. That's just a limitation of CTEs.
So to do the COUNT as a separate step, you'd need to not use the CTE and instead use the full query to COUNT on.
Or, you could wrap the CTE up in an inline table valued function and use that instead, to save repeating the main query, something like this:
CREATE FUNCTION dbo.ufnExample()
RETURNS TABLE
AS
RETURN
(
with Paging(RowNo, ID, Name, TotalOccurrences) as
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
SELECT * FROM Paging
)
SELECT * FROM dbo.ufnExample() x WHERE RowNo BETWEEN 1 AND 50
SELECT COUNT(*) FROM dbo.ufnExample() x

Please be aware that Radu D's solution's query plan shows double hits to those tables. It is doing two executions under the covers. However, this still may be the best way as I haven't found a truly scalable 1-query design.
A less scalable 1-query design is to dump a completed ordered list into a #tablevariable , SELECT ##ROWCOUNT to get the full count, and select from #tablevariable where row number between X and Y. This works well for <10000 rows, but with results in the millions of rows, populating that #tablevariable gets expensive.
A hybrid approach is to populate this temp/variable up to 10000 rows. If not all 10000 rows are filled up, you're set. If 10000 rows are filled up, you'll need to rerun the search to get the full count. This works well if most of your queries return well under 10000 rows. The 10000 limit is a rough approximation, you can play around with this threshold for your case.

Write "AS" after the CTE table name Paging as below:
with Paging AS (RowNo, ID, Name, TotalOccurrences) as
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
SELECT * FROM Paging WHERE RowNo BETWEEN 1 and 50
SELECT COUNT(*) FROM Paging

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

optimizing row number query - sql

Related

Foreach/per-item iteration in SQL

Listing multiple columns in a single row in SQL

Performance for multiple partitioned subqueries

Select all but last row in Oracle SQL

MSSQL 2008 SP pagination and count number of total records

Categories

Resources