How do I make this SQL code faster? - sql

Everyone want me to be more specific. I am attempting to do pagination with asp classic and ms-access database. This is the query I am using to get the items for page 2. there are 25 items per page and when the query returns larger data sets like around 500+ this is taking about 20+ seconds to execute and yes I have made sku indexed for faster queries. any suggestions.
SELECT TOP 25 *
FROM catalog
WHERE sku LIKE '1W%'
AND sku NOT IN (SELECT TOP 25 sku
FROM catalog
WHERE sku LIKE '1W%' ORDER BY price DESC ) ORDER BY price DESC

TOP without ORDER BY looks useless or at least strange. I guess youo meant to use this subquery:
( SELECT TOP 25 sku
FROM catalog
WHERE sku LIKE '1W%'
ORDER BY sku
)
Add an index on sku, if you haven't one.
A possible rewriting of the query, for Access:
SELECT *
FROM catalog
WHERE sku LIKE '1W%'
AND sku >= ( SELECT MAX(sku)
FROM ( SELECT TOP 26 sku
FROM catalog
WHERE sku LIKE '1W%'
ORDER BY sku
)
)
If you are using SQL-Server, you can use window functions for this type of query.

Some pointers:
You can simulate a SELECT BOTTOM (n) by using TOP (n) and reversing the ORDER BY
You can use nested SELECTs (creating a temporary table)
So, the final result of the "paging" query is (replace 50 with 75, 100, 125, ... for subsequent pages):
SELECT TOP 25 *
FROM
(
SELECT TOP 50 *
FROM catalog
WHERE sku LIKE '1W%'
ORDER BY price desc
)
TEMP
ORDER BY price asc;
Although you mentioned you've indexed your data, but, just to be completely clear, for optimal performance, you should ensure all your table is adequately indexed for your query. In this case, I would recommend, AT LEAST the two columns involved in the query:
CREATE INDEX IX_CATALOG ON CATALOG (SKU, PRICE);

What you're trying to do is select all the rows from a table, that meet a certain criteria, other than the first twenty-five. Unfortunately, different database management systems have their own syntax for doing this kind of thing.
There is a good survey of the different syntaxes on the Wikipedia page for the SQL select statement.
To give an example, in MySQL you can use the LIMIT clause of the SELECT statement to specify how many rows to return and the offset:
SELECT *
FROM catalog
WHERE sku LIKE '1W%'
ORDER by id
LIMIT 25, 9999999999
which returns rows 26 to 9999999999 of the query results.

Create an index on the column sku (if valid make it unique). How many rows are there in the table?

SELECT sku
FROM catalog
WHER sku LIKE '1W%
ORDER BY __SOME COLUMN __
LIMIT 10000 OFFSET 25
This is returning all* the rows in the database that are after row 25 (OFFSET 25).
*LIMIT 10000 constraints the resulting query to 10000 tuples (rows).
To ensure you are not getting a random OFFSET, you would need to order by some column.

Related

Optional parameters in SQL query

I am new to SQL and I am kind of lost. I have a table that contains products, various fields like productname, category etc.
I want to have a query where I can say something like: select all products in some category that have a specific word in their productname. The complicating factor is that I only want to return a specific range of that subset. So I also want to say return me the 100 to 120 products that fall in that specification.
I googled and found this query:
WITH OrderedRecords AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY PRODUCTNUMMER) AS "RowNumber",
FROM (
SELECT *
FROM SHOP.dbo.PRODUCT
WHERE CATEGORY = 'ARDUINO'
and PRODUCTNAME LIKE '%yellow%'
)
)
SELECT * FROM OrderedRecords WHERE RowNumber BETWEEN 100 and 120
Go
The query works to an extent, however it assigns the row number before filtering so I won't get enough records and I don't know how I can handle it if there are no parameters. Ideally I want to be able to not give a category and search word and it will just list all products.
I have no idea how to achieve this though and any help is appreciated!
Building on what esiprogrammer showed in his answer on how to return only rows in a certain range using paging.
Your second question was:
Ideally I want to be able to not give a category and search word and it will just list all products.
You can either have two queries/stored procedures, one for the case where you do lookup with specific parameters, another for lookup without parameters.
Or, if you insist on keeping one query/stored procedure for all cases, there are two options:
Build a Dynamic SQL statement that only has the filters that are present; execute it using EXECUTE (#sql) or EXECUTE sp_executesql #sql
Build a Catch-All Query
Example for option 2:
-- if no category is given, it will be NULL
DECLARE #search_category VARCHAR(128);
-- if no name is given, it will be NULL
DECLARE #search_name VARCHAR(128);
SELECT *
FROM SHOP.dbo.PRODUCT
WHERE (#search_category IS NULL OR CATEGORY=#search_category) AND
(#search_name IS NULL OR PRODUCTNAAM LIKE '%'+#search_name+'%')
ORDER BY PRODUCTNUMMER
OFFSET 100 ROWS
FETCH NEXT 20 ROWS ONLY
OPTION(RECOMPILE); -- generate a new plan on each execution that is optimized for that execution’s set of parameters
If you just need to to paginate your query and return a specific range of results, you can simply use OFFSET FETCH Clause.
That way there is no need to filter result items by RowNumber. I think this solution is easier:
SELECT *
FROM SHOP.dbo.PRODUCT
WHERE CATEGORY = 'ARDUINO' AND PRODUCTNAAM LIKE '%yellow%'
ORDER BY PRODUCTNUMMER
OFFSET 100 ROWS -- start row
FETCH NEXT 20 ROWS ONLY -- page size
Find out more Pagination with OFFSET / FETCH
What do you mean it assigns the rownumber before filtering? Category and ProductName are part of the sub query... So if the product table has 10k records and only 1k meet your criteria the results from the CTE will be 1k, so RowNumber BETWEEN 100 and 120 works. Test it out, remove your where clauses from both select statement and you'll get rownumber for all of products table. Then add back in the category and productname filter and your RowNumber is for your filter ordered by ProductNumber, so when you then add back in Between 100 and 120, this is the right solution based on what you described.
WITH OrderedRecords AS
(
SELECT ROW_NUMBER() OVER (ORDER BY PRODUCTNUMMER) AS "RowNumber"
, *
FROM SHOP.dbo.PRODUCT
WHERE CATEGORY = 'ARDUINO'
and PRODUCTNAAM LIKE '%yellow%'
)
)
SELECT *
FROM OrderedRecords
WHERE RowNumber
BETWEEN 100 and 120
Go

Pagination in two tables in SQL Server

I have two tables,
Orders - this is small, typically up to 50 thousands records
OrdersArchive - this one is normal, about 80 millions records
This situation might happens:
the order might have on of those values for status:
'created'
'processing'
'finished'
The finished orders from Orders are periodically moved to OrdersArchive.
In other words, Orders might contain orders with status created, processing or finished. OrdersArchive contains only orders with a status of finished.
The result has to be sorted in this order 'created', 'processing', 'finished'
I need a query in this two tables which supports pagination.
What is the best way to do it? (so fast as possible)
A pagination might be any type
I mean like:
the classical pagination with PageNumber and CountOfRowsPerPage.
'lazy' pagination with count of orders after the specific Order.
I would use the union SQL operator for this. See the w3schools page for details.
With union you can either do union or union all. The first will check for duplicates while the second just combines the results. It sounds like you shouldn't have duplicates in these two tables so for performance you don't need to do the distinct search.
You also need to make sure that both queries have the same number of columns with similar types.
e.g.
select orderno, status from Orders
union all
select orderno, status from OrdersArchive
order by status, orderno
Pagination
That query gives you the combined resultset for both tables. Now to add pagination I would use a CTE with row numbers like this:
with x as (
select orderno as num, status as stat from Orders
union all
select archiveorderno as num, archivestatus as stat from OrdersArchive
) select row_number() over(order by stat, num) as rownum, num, stat from x
where rownum between 1 and 20
Alternative
If you find using union is too slow then you could look at changing the way your search works. If you always sort the same way and it's always records from Orders followed by records from OrdersArchive then you could query the tables separately. Start by paging through Orders and then when you run out of records continue paging through OrdersArchive. This would be much faster than the union but you would have to keep the query simple and always sort on status. The union allows much more complex searches.
Using OFFSET and FETCH NEXT in SQL Server can provide a paging solution. Rough sample code is:
DECLARE #PageNumber INT = 2
DECLARE #PageSize INT = 100000;
SELECT [ID]
FROM [Table]
ORDER BY [ID]
OFFSET #PageSize * (#PageNumber - 1) ROWS
FETCH NEXT #PageSize ROWS ONLY
Obviously place in your own tables, filters, orders and probably put this in a stored procedure with the PageNumber and PageSize being input params

How to structure these SQL queries so they are perfect SQL?

I have to structure these queries so they are perfect SQL. The queries need to be for a SQL Server database, I have a database StoresDB, a table items_table.
I need to retrieve the
total number of items within this table
The number of item where the price is higher or equal than £10 - the column name is amount
The list of items in the computer category - column name ='comp_id' sorted by decreased amount.
For the above requests I have attempted the below:
SELECT COUNT(*) FROM items_table
Select * from items_table where amount >= 10
Select * from items_table where comp_id = ’electronics’ desc
I am very new to SQL and not sure if I have attempted this correctly.
Maybe is good to know few things when writing this sort of query:
a) SELECT COUNT(*) FROM items_table
This query is written correctly.
b) SELECT COUNT(*) FROM items_table WHERE amount >= 10
Query is OK, but choose to create indexes which cover WHERE clause, in this case, is good to have non-clustered index on amount column
c) SELECT * FROM items_table WHERE comp_id = 'electronics' ORDER BY price DESC
With this last query you have an issue that searching all columns in result, with SELECT * ... which is considered like bad practice in production, so you need to put in SELECT list only columns which are really needed, not all columns. Also you can create non-clustered index on comp_id column, with included columns from SELECT list.
a) Looks correct.
b) You are being asked for a count but are querying a list.
SELECT COUNT(*) FROM items_table WHERE price >= 10
c) This one looks good but you are missing an ORDER BY statement.
SELECT * FROM items_table WHERE catID='electronics' ORDER BY price DESC

SQL Server 2008 Paged Row Retrieval and Large Tables

I'm using SQL Server 2008 and the following query to implement paged data retrieval from our JSF application, in below code i am retrieving 25 rows at a time sorted by the default sort column in DESC order.
SELECT * FROM
(
SELECT TOP 25 * FROM
(
SELECT TOP 25 ...... WHERE CONDITIONS
--ORDER BY ... DESC
)AS INNERQUERY ORDER BY INNERQUERY.... ASC
)
AS OUTERQUERY
ORDER BY OUTERQUERY.... DESC
It works, but with one obvious flow. If the users request to see the last page and there are over 10 million records in table, then the second TOP Query will have to first retrieve the 10 million records and only then the first top Query will pick out the Top 25 which will look like:
SELECT * FROM
(
SELECT TOP 25 * FROM
(
SELECT TOP 10000000 ...... WHERE CONDITIONS
--ORDER BY ... DESC
)AS INNERQUERY ORDER BY INNERQUERY.... ASC
)
AS OUTERQUERY
ORDER BY OUTERQUERY.... DESC
I looked into replacing the above with ROW_NUMBER OVER(....) but seemingly i had the same issue where the second TOP statement will have to get the entire result and only then you can do a where ROW_NUMBER between x and y.
Can you please point out my mistakes in the above approach and hints on how it can be optimized?
I'm currently using the following to code to retrieve subset of rows:
WITH PAGED_QRY (
SELECT *, ROW_NUMVER() OVER(ORDER BY Y) AS ROW_NO
FROM TABLE WHERE ....
)
SELECT * FROM PAGED_QRY WHERE ROW_NO BETWEEN #CURRENT_INDEX and # ROWS_TO_RETRIEVE
ORDER BY ROW_NO
where #current_index and #rows_to_retrieve (ie. 1 and 50) are your paging variables. it's cleaner and easier to read.
I've also tried using SET ROW_COUNT #ROWS_TO_RETRIEVE but doesn't seem to make much difference.
Using above query and by carefully studying the execution path of the query and modifying/creating indexes and statistics I've reached results that are sufficiently satisfactory, hence why i'm making this as the answer. The original goal of retrieving only the required rows in the inner query seems to be not possible yet, if you do find the way please let me know.
we can improve above query a bit more.
If I assume that #current_index is the current page number then we can rewrite the above query as:
WITH PAGED_QRY (
SELECT top (#current_index * #rows_to_retrieve) *, ROW_NUMVER()
OVER(ORDER BY Y) AS ROW_NO
FROM TABLE WHERE ....
)
SELECT TOP #ROWS_TO_RETRIEVE FROM PAGED_QRY
ORDER BY ROW_NO DESC
In this case, our inner query will not return the whole record set. Suppose our page_index is 3 & page_size is 50, then it will select only 150 rows(even if our table contains hundreds/thousands/millions of rows) & we can skip the where clause also.

total number of rows of a query

I have a very large query that is supposed to return only the top 10 results:
select top 10 ProductId from .....
The problem is that I also want the total number of results that match the criteria without that 'top 10', but in the same time it's considered unaceptable to return all rows (we are talking of roughly 100 thousand results.
Is there a way to get the total number of rows affected by the previous query, either in it or afterwords without running it again?
PS: please no temp tables of 100 000 rows :))
dump the count in a variable and return that
declare #count int
select #count = count(*) from ..... --same where clause as your query
--now you add that to your query..of course it will be the same for every row..
select top 10 ProductId, #count as TotalCount from .....
Assuming that you're using an ORDER BY clause already (to properly define which the "TOP 10" results are), then you could add a call of ROW_NUMBER also, with the opposite sort order, and pick the highest value returned.
E.g., the following:
select top 10 *,ROW_NUMBER() OVER (order by id desc) from sysobjects order by ID
Has a final column with values 2001, 2000, 1999, etc, descending. And the following:
select COUNT(*) from sysobjects
Confirms that there are 2001 rows in sysobjects.
I suppose you could hack it with a union select
select top 10 ... from ... where ...
union
select count(*) from ... where ...
For you to get away with this type of hack you will need to add fake columns to the count query so it returns the same amount of columns as the main query. For example:
select top 10 id, first_name from people
union
select count(*), '' as first_name from people
I don't recommend using this solution. Using two separate queries is how it should be done
Generally speaking no - reasoning is as follows:
If(!) the query planner can make use of TOP 10 to return only 10 rows then RDBMS will not even know the exact number of rows that satisfy the full criteria, it just gets the TOP 10.
Therefore, when you want to find out count of all rows satisfying the criteria you are not running it the second time, but the first time.
Having said that proper indexes might make both queries execute pretty fast.
Edit
MySQL has SQL_CALC_FOUND_ROWS which returns the number of rows that query would return if there was no LIMIT applied - googling for an equivalent in MS SQL points to analytical SQL and CTE variant, see this forum (even though not sure that either would qualify as running it only once, but feel free to check - and let us know).