database paging design - sql

I'm fetching data for my grid like this
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID
I also need the total count for the pagination.
There're two options.
1- Do an another fetch
SELECT count(*) FROM dbo.Orders
2- Put the count statement in the query
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity,
(SELECT count(*) FROM dbo.Orders) as Count
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID
Which way should I go ?

Of the 2 methods you've put forward, the first (separate query) is better. The second method means the count will appear in every row returned which is a bit unnecessary. Also if the query returns 20 rows, the select count(*) will be executed 20 times (if i remember right, guess this could depend on which database engine you're using).
Additionally, depending on how much traffic you're envisaging and how big the table is likely to get, you can improve upon this by caching the result of select count(*) somewhere, and then refreshing it upon insertions / deletions to the table.

If this is for SQL Server 2005 or higher, one of the best ways to get pagination is to use a Common Table Expression.
CREATE PROC MyPaginatedDataProc
#pageNumber INT
AS
WITH OrdersCTE (CustomerID, OrderTime, ProductID, Quantity, RowNumber)
AS
(
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity,
ROW_NUMBER() OVER (ORDER BY OrderItems.OrderID) AS RowNumber
FROM
dbo.Orders INNER JOIN dbo.OrderItems ON Orders.ID = OrderItems.OrderID
)
SELECT
CustomerID,
OrderTime,
ProductId,
Quantity
FROM
OrdersCTE
WHERE
RowNumber BETWEEN (#pageNumber * 10) AND (((#pageNumber + 1) * 10) -1)
Otherwise for getting the total row count, I'd use a separate query like Mailslut said.

If you are using oracle you can use COUNT(*) OVER ( ) CNT. This one was more efficient
as it takes single table scan
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity,
COUNT(*) OVER ( ) CNT as Count
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID

As #Mailslut suggests, you should probably use two queries. However, you should probably add a
WHERE clause to the query that fetches the data, so you only fetch the data that you actually need to show (unless you are caching it).
If more than one thread is accessing the database at a time, you will also need to somehow make sure that the count is kept in sync with the database.

I would consider something different, because what you are trying to do is not very simple, but quite necessary. Have you considered using the SQL Server row_number function? This way you will know how many records there are by looking at the max row_number returned, but also in the order you want.
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity,
ROW_NUMBER() OVER(ORDER BY Orders.CustomerId) rn
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID

Related

SQL queries with slight modification giving different results

I am learning SQL and doing some exercise with analytics functions. I have following query to find out ship_name and order_value of the highest placed order. Following are my tables:
orders(id, ship_name, city_of_origination)
order_details(id, order_id, unit_price, quantity)
In order to solve this problem, I wrote following query:
select o.ship_name, od.quantity*od.unit_price, first_value(od.quantity*od.unit_price) over (order by od.quantity*od.unit_price desc)
from orders o
inner join order_details od
on o.order_id = od.order_id
limit 1
Here id the sample output after removing limit in above query:
Changing the problem statement slightly, I only want the ship_name. So I wrote this query:
select tmp.ship_name
from (select o.ship_name as ship_name, first_value(od.quantity*od.unit_price) over (order by od.quantity*od.unit_price desc) fv
from orders o
inner join order_details od
on o.order_id = od.order_id
limit 1
) tmp;
To my surprise, the result changed. Here is the result of above wuery without limit:
At the same time, if I execute following query:
select tmp.ship_name, tmp.fv
from (select o.ship_name as ship_name, first_value(od.quantity*od.unit_price) over (order by od.quantity*od.unit_price desc) fv
from orders o
inner join order_details od
on o.order_id = od.order_id
limit 1
) tmp;
I get the same result (and the expected one) as that of the first query. My question is: Why is there a difference of results in above queries?
limit without order by returns an arbitrary row. It might not even return the same row for the same query when executed subsequent times.
So, use order by to control which row is returned.
In Postgres, row order is returned based on the hidden column ctid order. Essentially, it's last-updated/last-inserted order--it just orders based on the order it finds it on-disk. Using LIMIT does not change that order, as it's still going to come out in the order that it's read out of the disk.
Using LIMIT 1 will only show you the first row it encounters off disk. To change the ordering behavior, you should use ORDER BY

Can we use order by in subquery? If not why sometime could use top(n) order by?

I'm an entry level trying to learn more about SQL,
I have a question "can we use order by in subquery?" I did look for some article says no we could not use.
But on the other hand, I saw examples using top(n) with order by in subquery:
select c.CustomerId,
c.OrderId
from CustomerOrder c
inner join (
select top 2
with TIES CustomerId,
COUNT(distinct OrderId) as Count
from CustomerOrder
group by CustomerId
order by Count desc
) b on c.CustomerId = b.CustomerId
So now I'm bit confused.
Could anyone advise?
Thank you very much.
Yes, you are right we cannot use order by in a inner query. Because it is acting as a table. A table in itself needs to be sorted when queried for different purposes.
In your query itself the inner query is select some records using Top 2. Eventhough these are top 2 records only, they form a table with 2 records which is enough for it to recognized as a table and join it with another table
The right query will be:-
SELECT * FROM
(
SELECT c.CustomerId, c.OrderId, DENSE_RANK() OVER(ORDER BY b.count DESC) AS RANK
FROM CustomerOrder c
INNER JOIN
(SELECT CustomerId, COUNT(distinct OrderId) as Count
FROM CustomerOrder GROUP BY CustomerId) b
ON c.CustomerId = b.CustomerId
) a
WHERE RANK IN (1,2);
Hope I have answered your question.
Yes we can use order by clause in sub query, for example i have a table named as product (check the screen shot of table http://prntscr.com/f15j3z). Chek this query on your side and revert me in case of any doubt.
select p1.* from product as p1 where product_id = (select p2.product_id from product as p2 order by product_id limit 0,1)
yes we can use order by in subquery,but it is pointless to use it.
It is better to use it in the outer query.There is no use of ordering the result of subquery, because result of inner query will become the input for outer query and it does not have to do any thing with the order of the result of subquery.

How to do Query optimization in SQL Sever?

I am trying to speed up my execution time. What's wrong with my query. What is the better way to do query optimization.
TransactionEntry has 2 Million records
Transaction Table has 5 Billion Records
Here is my Query, If I remove the TotalPrice column, I am getting results at 10sec
--Total Quantity
SELECT
items.ItemLookupCode,sum(transactionsEntry.Quantity) Quantity,sum(transactionsEntry.Quantity*transactionsEntry.Price) TotalPrice
into
##temp_TotalPrice
FROM
(
SELECT
TransactionNumber,StoreID,Time
FROM
[HQMatajer].[dbo].[Transaction]
WHERE
Time>=CONVERT(datetime,'2015-01-01 00:00:00',102) and Time<=CONVERT(datetime,'2015-12-31 23:59:59',102)
) transactions
left join [HQMatajer].[dbo].[TransactionEntry] transactionsEntry
ON transactionsEntry.TransactionNumber=transactions.TransactionNumber and transactionsEntry.StoreID=transactions.StoreID
Left join [HQMatajer].[dbo].[Item] items
ON transactionsEntry.ItemID=items.ID
Group By items.ItemLookupCode
Order by items.ItemLookupCode
If I execute this(above one) query, it produce the result in 22 seconds. It's too long
When I execute the subquery alone(Below one). It's taking 11 seconds
(
SELECT
TransactionNumber,StoreID,Time
FROM
[HQMatajer].[dbo].[Transaction]
WHERE
Time>=CONVERT(datetime,'2015-01-01 00:00:00',102) and Time<=CONVERT(datetime,'2015-12-31 23:59:59',102)
) transactions
I have created one index for TransactionEntry Table that
TransactionNumber,StoreID,ItemID,Quantity,Price
One index for Transaction Table
`Time,TransactionNumber,StoreID`
One Index for Item Table
`ID`
Execution Plan
Clustured Index of TransactionEntry is taking 59% cost. That column_Name is AutoID
Assuming this is for SQL 2005 or above version. If its for SQL 2000, then instead of CTE you can have a temp table with proper index.
Also Since you were getting the values from [HQMatajer].[dbo].[TransactionEntry] and [HQMatajer].[dbo].[Item], why Left Join is used?
Avoid sub queries. I have re framed the query. Please check and let me know whether it improved the performance
;WITH transactions
AS
(
SELECT
TransactionNumber,StoreID,Time
FROM
[HQMatajer].[dbo].[Transaction]
WHERE
Time>=CONVERT(datetime,'2015-01-01 00:00:00',102) and Time<=CONVERT(datetime,'2015-12-31 23:59:59',102)
)
SELECT
items.ItemLookupCode,sum(transactionsEntry.Quantity) Quantity,sum(transactionsEntry.Quantity*transactionsEntry.Price) TotalPrice
into
##temp_TotalPrice
FROM [HQMatajer].[dbo].[Item] items INNER JOIN [HQMatajer].[dbo].[TransactionEntry] transactionsEntry
ON transactionsEntry.ItemID=items.ID
WHERE EXISTS (SELECT 1 FROM transactions WHERE transactionsEntry.TransactionNumber=transactions.TransactionNumber and transactionsEntry.StoreID=transactions.StoreID)
Group By items.ItemLookupCode
Order by items.ItemLookupCode
This is your query, simplified and formatted a bit (the subquery makes no difference):
select i.ItemLookupCode,
sum(te.Quantity) as quantity,
sum(te.Quantity * te.Price) as TotalPrice
into ##temp_TotalPrice
from [HQMatajer].[dbo].[Transaction] t left join
[HQMatajer].[dbo].[TransactionEntry] te
on te.TransactionNumber = t.TransactionNumber and
te.StoreID = t.StoreID left join
[HQMatajer].[dbo].[Item] i
on te.ItemID = i.ID
where t.Time >= '2015-01-01' and
t.Time < '2016-01-01'
group by i.ItemLookupCode
order by i.ItemLookupCode;
For this query, you want indexes on Transaction(Time, TransactionNumber, StoreId), TransactionEntry(TransactionNumber, StoreId, ItemId, Quantity, Price), and Item(Id, ItemLookupCode)`.
Even with the right indexes, this is processing a lot of data, so I would be surprised if this reduced the time to a few seconds.
This query is taking to much time because three times the entry is inserted into the temporary table which increases the time. if we insert the record into a another table and thn call it or make it as a cte. It decreases the cost.

Distinct on multi-columns in sql

I have this query in sql
select cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
I want to get rows distinct by pageid ,so in the end I will not have rows with same pageid more then once(duplicate)
any Ideas
Thanks
Baaroz
Going by what you're expecting in the output and your comment that says "...if there rows in output that contain same pageid only one will be shown...," it sounds like you're trying to get the top record for each page ID. This can be achieved with ROW_NUMBER() and PARTITION BY:
SELECT *
FROM (
SELECT
ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID) rowNumber,
c.id,
c.pageId,
c.quantity,
c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
) a
WHERE a.rowNumber = 1
You can also use ROW_NUMBER() OVER(PARTITION BY ... along with TOP 1 WITH TIES, but it runs a little slower (despite being WAY cleaner):
SELECT TOP 1 WITH TIES c.id, c.pageId, c.quantity, c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
ORDER BY ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID)
If you wish to remove rows with all columns duplicated this is solved by simply adding a distinct in your query.
select distinct cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If however, this makes no difference, it means the other columns have different values, so the combinations of column values creates distinct (unique) rows.
As Michael Berkowski stated in comments:
DISTINCT - does operate over all columns in the SELECT list, so we
need to understand your special case better.
In the case that simply adding distinct does not cover you, you need to also remove the columns that are different from row to row, or use aggregate functions to get aggregate values per cartlines.
Example - total quantity per distinct pageId:
select distinct cartlines.id,cartlines.pageId, sum(cartlines.quantity)
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If this is still not what you wish, you need to give us data and specify better what it is you want.

Efficiency of joining subqueries in SQL Server

I have a customers and orders table in SQL Server 2008 R2. Both have indexes on the customer id (called id). I need to return details about all customers in the customers table and information from the orders table, such as details of the first order.
I currently left join my customers table on a subquery of the orders table, with the subquery returning the information I need about the orders. For example:
SELECT c.id
,c.country
,First_orders.product
,First_orders.order_id
FROM customers c
LEFT JOIN SELECT( id,
product
FROM (SELECT id
,product
,order_id
,ROW_NUMBER() OVER (PARTITION BY id ORDER BY Order_Date asc) as order_No
FROM orders) orders
WHERE Order_no = 1) First_Orders
ON c.id = First_orders.id
I'm quite new to SQL and want to understand if I'm doing this efficiently. I end up left joining quite a few subqueries like this onto the customers table in one select query and it can take tens of minutes to run.
So am I doing this efficiently or can it be improved? For example, I'm not sure if my index on id in the orders table is of any use and maybe I could speed up the query by creating a temporary table of what is in the subquery first and creating a unique index on id in the temporary table so SQL Server knows id is now a unique column and then joining my customers table to this temporary table? I typically have one or two million rows in the customers and orders tables.
Many thanks in advance!
You can remove one of your subqueries to make it a little more efficient:
SELECT c.id
,c.country
,First_orders.product
,First_orders.order_id
FROM customers c
LEFT JOIN (SELECT id
,product
,order_id
,ROW_NUMBER() OVER (PARTITION BY id ORDER BY Order_Date asc) as order_No
FROM orders) First_Orders
ON c.id = First_orders.id AND First_Orders.order_No = 1
In your above query, you need to be careful where you place your parentheses as I don't think it will work. Also, you're returning product in your results, but not including in your nested subquery.
For someone who is just learning SQL, your query looks pretty good.
The index on customers may or may not be used for the query -- you would need to look at the execution plan. An index on orders(id, order_date) could be used quite effectively for the row_number function.
One comment is on the naming of fields. The field orders.id should not be the customer id. That should be something like 'orders.Customer_Id`. Keeping the naming system consistent across tables will help you in the future.
Try this...its easy to understand
;WITH cte
AS (
SELECT id
,product
,order_id
,ROW_NUMBER() OVER (
PARTITION BY id ORDER BY Order_Date ASC
) AS order_No
FROM orders
)
SELECT c.id
,c.country
,c1.Product
,c1.order_id
FROM customers c
INNER JOIN cte c1 ON c.id = c1.id
WHERE c1.order_No = 1