Total row count with OFFSET .. FETCH issue - sql

Goal is trivial: get total row count and some data page.
When I use OFFSET...FETCH approach to implement paging with total row counting I running into following issue: when we pass some big page number (e.g. we have only 100 rows, but requested 15th with 10 records per page) COUNT(*) OVER() statement has never called, because result set is empty. So, we can not get right total row count in this case.
Is there way to get right total row count using OFFSET ... FETCH approach even when big page number passed?
FYI, OFFSET ... FETCH approach is that:
SELECT
...
Total = COUNT(*) OVER()
FROM Table1
ORDER BY Col1
OFFSET (#PageNum-1) * #PageSize ROWS
FETCH NEXT #PageSize ROWS ONLY;

I think the answer is "no". You are appending the total row count onto each row being returned. The query is returning no rows, so there is no place to put the total.
By the way, I do imagine that the total is being calculated. But without any rows, you never see it.
EDIT:
The only work-around I can think of is at the application layer. If no rows are returned, then run:
SELECT Total = COUNT(*) OVER()
FROM Table1;
You could actually run this first to get the total. The downside is when the table really isn't a table but a view that is expensive to run.

Related

SQL: getting rows where some 'x' column value is a maximal one [duplicate]

This question already has answers here:
How can I SELECT rows with MAX(Column value), PARTITION by another column in MYSQL?
(22 answers)
Closed 11 months ago.
I am trying to get data from my database.
Query upon sub-query upon another sub-query - and as the intermediate result I get looks like this:
item
quantity
pen
34
pencil
42
notebook
42
eraser
12
I need to build another query upon this result set to get the rows where item_quantity has it's maximal value (42 in the example above). The rows with pencils and notebooks. However, I have found out that the task is a bit trickier than I expected it to be.
SELECT * FROM sub_query_result HAVING quantity = MAX(quantity)
always returns an empty result set
SELECT * FROM sub_query_result HAVING quantity = 42
is pointless since I need to know the exact max quantity in advance
SELECT * FROM sub_query_result WHERE quantity = MAX(quantity)
simply works not ("Invalid use of group function")
I can see solutions that work but that I do not like -- due to extra actions I need to take on my back-end code that executes this sql request, or due to their inefficiency:
I can create a temporary table, get max. quantity from it and place to a variable. Then I can use this variable inside the query to that temporary table and get the data I need.
I can do
SELECT * FROM query_result HAVING quantity = (SELECT MAX(quantity) FROM
<Query upon sub-query upon another sub-query that shall return query_result>)
but that way I request the very same data twice! which in general is not a good approach.
So... Anything I missed? Any simple and elegant solutions that can solve my problem?
Order by quantity descending to get the max values first. Only select the first row, i.e. the one having the max values. ANSI SQL version:
SELECT * FROM query_result
ORDER BY quantity DESC
FETCH FIRST 1 ROW WITH TIES
WITH TIES means you can get several rows, if there are several rows having the same maximum quantity.

SQL Multiple Conditions in max statement not working

I am attempting to filter my table and get the item that sold for the most amount of money. In order to do this I am using "AuctionOpen" to determine whether or not the auction is still open. The auction cannot be open and have the item been sold (later I will use this for the most expensive item available).
I am able to use the AND operator to compare AuctionOpen by using the following:
select s.*
from auctionsite.dbo.Auction s
where s.HighestBid = (select max(s2.HighestBid) from auctionsite.dbo.Auction
s2) and s.AuctionOpen = 0;
When I set this equal to zero I get results, but when I set it equal to 1, it only returns the column titles even though there are values set to 1 in the table.
Results when compared to 0:
Results when compared to 1:
Clearly, the highest bid is on a record where AuctionOpen <> 1.
I recommend using order by and fetch (or the equivalent in your database):
select s.*
from auctionsite.dbo.Auction s
where s.AuctionOpen = 0
order by s.HIghestBid desc
fetch first 1 row only
In SQL Server, use either select top (1) or offset 0 rows fetch first 1 row only.
I think you should try the Count aggregate function
here, try this:
**Select count(Item_name) As
[Item with the highest money]
from table_name
Group by Item_name DSEC;**
You can check my page hereSQL/MySQL tips for some SQL/MySQL lessons

Get filtered row count using dm_db_partition_stats

I'm using paging in my app but I've noticed that paging has gone very slow and the line below is the culprit:
SELECT COUNT (*) FROM MyTable
On my table, which only has 9 million rows, it takes 43 seconds to return the row count. I read in another article which states that to return the row count for 1.4 billion rows, it takes over 5 minutes. This obviously cannot be used with paging as it is far too slow and the only reason I need the row count is to calculate the number of available pages.
After a bit of research I found out that I get the row count pretty much instantly (and accurately) using the following:
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('MyTable')
AND (index_id=0 or index_id=1)
But the above returns me the count for the entire table which is fine if no filters are applied but how do I handle this if I need to apply filters such as a date range and/or a status?
For example, what is the row count for MyTable when the DateTime field is between 2013-04-05 and 2013-04-06 and status='warning'?
Thanks.
UPDATE-1
In case I wasn't clear, I require the total number of rows available so that I can determine the number of pages required that will match my query when using 'paging' feature. For example, if a page returns 20 records and my total number of records matching my query is 235, I know I'll need to display 12 buttons below my grid.
01 - (row 1 to 20) - 20 rows displayed in grid.
02 - (row 21 to 40) - 20 rows displayed in grid.
...
11 - (row 200 to 220) - 20 rows displayed in grid.
12 - (row 221 to 235) - 15 rows displayed in grid.
There will be additional logic added to handle a large amount of pages but that's a UI issue, so this is out of scope for this topic.
My problem with using "Select count(*) from MyTable" is that it is taking 40+ seconds on 9 million records (thought it isn't anymore and I need to find out why!) but using this method I was able to add the same filter as my query to determine the query. For example,
SELECT COUNT(*) FROM [MyTable]
WHERE [DateTime] BETWEEN '2018-04-05' AND '2018-04-06' AND
[Status] = 'Warning'
Once I determine the page count, I would then run the same query but include the fields instead of count(*), the CurrentPageNo and PageSize in order to filter my results by page number using the row ids and navigate to a specific pages if needed.
SELECT RowId, DateTime, Status, Message FROM [MyTable]
WHERE [DateTime] BETWEEN '2018-04-05' AND '2018-04-06' AND
[Status] = 'Warning' AND
RowId BETWEEN (CurrentPageNo * PageSize) AND ((CurrentPageNo + 1) * PageSize)
Now, if I use the other mentioned method to get the row count i.e.
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('MyTable')
AND (index_id=0 or index_id=1)
It returns the count instantly but how do I filter this so that I can include the same filters as if I was using the SELECT COUNT(*) method, so I could end up with something like:
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('MyTable') AND
(index_id=0 or index_id=1) AND
([DateTime] BETWEEN '2018-04-05' AND '2018-04-06') AND
([Status] = 'Warning')
The above clearing won't work as I'm querying the dm_db_partition_stats but I would like to know if I can somehow perform a join or something similar to provide me with the total number of rows instantly but it needs to be filtered rather than apply to the entire table.
Thanks.
Have you ever asked for directions to alpha centauri? No? Well the answer is, you can't get there from here.
Adding indexes, re-orgs/re-builds, updating stats will only get you so far. You should consider changing your approach.
sp_spaceused will return the record count typically instantly; You may be able to use this, however depending (which you've not quite given us enough information) on what you are using the count for might not be adequate.
I am not sure if you are trying to use this count as a means to short circuit a larger operation or how you are using the count in your application. When you start to highlight 1.4 billion records and you're looking for a window in said set, it sounds like you might be a candidate for partitioned tables.
This allows you assign several smaller tables, typically separated by date, years / months, that act as a single table. When you give the date range on 1.4+ Billion records, SQL can meet performance expectations. This does depend on SQL Edition, but there is also view partitioning as well.
Kimberly Tripp has a blog and some videos out there, and Kendra Little also has some good content on how they are used and how to set them up. This would be a design change. It is a bit complex and not something you would want implement on a whim.
Here is a link to Kimberly's Blog: https://www.sqlskills.com/blogs/kimberly/sqlskills-sql101-partitioning/
Dev banter:
Also, I hear you blaming SQL, are you using entity framework by chance?

SQL COUNT - greater than some number without having to get the exact count?

There's a thread at https://github.com/amatsuda/kaminari/issues/545 talking about a problem with a Ruby pagination gem when it encounters large tables.
When the number of records is large, the pagination will display something like:
[1][2][3][4][5][6][7][8][9][10][...][end]
This can incur performance penalties when the number of records is huge, because getting an exact count of, say, 50M+ records will take time. However, all that's needed to know in this case is that the count is greater than the number of pages to show * number of records per page.
Is there a faster SQL operation than getting the exact COUNT, which would merely assert that the COUNT is greater than some value x?
You could try with
SQL Server:
SELECT COUNT(*) FROM (SELECT TOP 1000 * FROM MyTable) X
MySQL:
SELECT COUNT(*) FROM (SELECT * FROM MyTable LIMIT 1000) X
With a little luck, the SQL Server/MySQL will optimize this query. Clearly instead of 1000 you should put the maximum number of pages you want * the number of rows per page.

Select Average of Top 25% of Values in SQL

I'm currently writing a stored procedure for my client to populate some tables that will be used to generate SSRS reports later on. Some of the data is based on specific stock formulas that are run on each of their clients' quarterly data (sent to them by their clients). The other part of the data is generated by comparing those results against those from other, similar sized clients. One of the things that they want tracked in their reports is the average of the top 25% of formula results for that particular comparison group.
To give a better picture of it, imagine the following fields that I have in a temp table:
FormulaID int
Value decimal (18,6)
I want to do the following: Given a specific FormulaID return the average of the top 25% of Value.
I know how to take an average in SQL, but I don't know how to do it against only the top 25% of a specific group.
How would I write this query?
I guess you can do something like this...
SELECT AVG(Q.ColA) Avg25Prec
FROM (
SELECT TOP 25 Percent ColA
FROM Table_Name
ORDER BY SomeCOlumn
) Q
Here's what I did, given the table shown above:
select AVG(t.Value)
from (select top 25 percent Value
from #TempGroupTable
where FormulaID = #PassedInFormulaID
order by Value desc) as t
The desc must be there, because the percent command will not actually do comparisons. It will just simply grab the first x number of records, with x being equal to 25% of the count of records it's querying. Therefore, the order by Value desc line then will grab the top 25% records which have the highest Value, and then sends that info to be averaged.
As a side note to all of this, this also means that if you wanted to grab the bottom 25% instead, or if your formula results are like a golf score (i.e. lowest is the best), all you would need to do is remove the desc part and you would be good to go.