Sql query to get a non-contiguous subset of results - sql

I'm writing a web application that should show very large results on a search query.
Say some queries will return 10.000 items.
I'd like to show those to users paginated; no problem so far: each page will be the result of a query with an appropriate LIMIT statement.
But I'd like to show clues about results in each page of the paginated query: some data from the first item and some from the last.
This mean that, for example, with a result of 10.000 items and a page size of 50 items, if the user asked for the first page I will need:
the first 50 items (the page requested by the user)
item 51 and 100 (the first and last of the second page)
item 101 and 151
etc
For efficiency reasons I want to avoid one query per row.
[edit] I also would prefer not downloading 10.000 results if I only need 50 + 10000/50*2 = 400
The question is: is there a single query I can issue to the RDBMS (mysql, by the way, but I'd prefer a cross-db solution) that will return only the data I need?
I can't use server side cursor, because not all dbs support it and I want my app to be database-agnostic.

Just for fun, here is the MSSQL version of it.
declare #pageSize as int; set #pageSize = 10;
declare #pageIndex as int; set #pageIndex = 0; /* first page */
WITH x AS
(
select
ROW_NUMBER() OVER (ORDER BY (created) ASC) AS RowNumber,
*
from table
)
SELECT * FROM x
WHERE
((RowNumber <= (#pageIndex+1)*#pageSize) AND (RowNumber >= #pageIndex*#PageSize+1))
OR
RowNumber % #pageSize = 1
OR
RowNumber % #pageSize = #pageSize-1
Note, that an ORDER BY is provided in the over clause.
Also note, that if you have gazillion rows, your result set will have millions. You need to maximize the result rows for practical reasons.
I have no idea how this could be a solved in generic SQL. (My bet: no way. Even simple pageing cannot be solved without DB-specific operators.)

UPDATE: I completely misread the initial question. You can do this using UNION and the LIMIT clause in MySQL, although it might be what you meant by "one query per row". The syntax would be like:
select FOO from BAZ limit 50
union
select FOO from BAZ limit 50, 1
union
select FOO from BAZ limit 99, 1
union
select FOO from BAZ limit 100, 1
union
select FOO from BAZ limit 149, 1
and so on and so forth. Since you're using UNION, you'll only need one roundtrip to the database. I'm not sure how MySQL will treat the various SELECT statements, though. It should be able to recognize that they are essentially the same query and use a cached query plan, but I don't work with MySQL enough to know if that's a reasonable expectation for its optimizer.
Obviously, to build this query in a general fashion, you'll first need to run a count query so you can calculate what your offsets will be.
This is definitely not a tractable problem for standard SQL, since the paging logic requires nonstandard features.

Related

CosmosDB: How does SELECT TOP work?

I have a 450GB database... with millions of records.
Here is an example query:
SELECT TOP 1 * FROM c WHERE c.type='water';
To speed up our queries, I thought about just taking the first one but we have noticed that the query still takes quite a while, despite the very first record in the database matching our constraints.
So, my question is, how does the SELECT TOP 1 really work? Does it:
A) Select ALL records and then return just the first (top) one where
type='water'
B) Return the first record which is encountered where type='water'
Try this line, noting the offset limit:
SELECT * FROM c WHERE c.type='water' OFFSET 0 LIMIT 1
For more information about the offset limit:
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-offset-limit
Assuming you aren't sorting your results (which you query isn't) then TOP 1 will return the first result as soon as it finds one. This should then end the query.
Cosmos db Explorer doesn't work with the TOP Command, It's an existing issue. It works fine in SDK Call.
Check some Top command usage below
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-subquery

Vague count in sql select statements

I guess this has been asked in the site before but I can't find it.
I've seen in some sites that there is a vague count over the results of a search. For example, here in stackoverflow, when you search a question, it says +5000 results (sometimes), in gmail, when you search by keywords, it says "hundreds" and in google it says aprox X results. Is this just a way to show the user an easy-to-understand-a-huge-number? or this is actually a fast way to count results that can be used in a database [I'm learning Oracle at the moment 10g version]? something like "hey, if you get more than 1k results, just stop and tell me there are more than 1k".
Thanks
PS. I'm new to databases.
Usually this is just a nice way to display a number.
I don't believe there is a way to do what you are asking for in SQL - count does not have an option for counting up until some number.
I also would not assume this is coming from SQL in either gmail, or stackoverflow.
Most search engines will return a total number of matches to a search, and then let you page through results.
As for making an exact number more human readable, here is an example from Rails:
http://api.rubyonrails.org/classes/ActionView/Helpers/NumberHelper.html#method-i-number_to_human
With Oracle, you can always resort to analytical functions in order to calculate the exact number of rows about to be returned. This is an example of such a query:
SELECT inner.*, MAX(ROWNUM) OVER(PARTITION BY 1) as TOTAL_ROWS
FROM (
[... your own, sorted search query ...]
) inner
This will give you the total number of rows for your specific subquery. When you want to apply paging as well, you can further wrap these SQL parts as such:
SELECT outer.* FROM (
SELECT * FROM (
SELECT inner.*,ROWNUM as RNUM, MAX(ROWNUM) OVER(PARTITION BY 1) as TOTAL_ROWS
FROM (
[... your own, sorted search query ...]
) inner
)
WHERE ROWNUM < :max_row
) outer
WHERE outer.RNUM > :min_row
Replace min_row and max_row by meaningful values. But beware that calculating the exact number of rows can be expensive when you're not filtering using UNIQUE SCAN or relatively narrow RANGE SCAN operations on indexes. Read more about this here: Speed of paged queries in Oracle
As others have said, you can always have an absolute upper limit, such as 5000 to your query using a ROWNUM <= 5000 filter and then just indicate that there are more than 5000+ results. Note that Oracle can be very good at optimising queries when you apply ROWNUM filtering. Find some info on that subject here:
http://www.dba-oracle.com/t_sql_tuning_rownum_equals_one.htm
Vague count is a buffer which will be displayed promptly. If user wants to see more results then he can request more.
It's a performance facility, after displaying the results the sites like google keep searching for more results.
I don't know how fast this will run, but you can try:
SELECT NULL FROM your_tables WHERE your_condition AND ROWNUM <= 1001
If count of rows in result will equals to 1001 then total count of records will > 1000.
this question gives some pretty good information
When you do an SQL query you can set a
LIMIT 0, 100
for example and you will only get the first hundred answers. so you can then print to your viewer that there are 100+ answers to their request.
For google I couldn't say if they really know there is more than 27'000'000'000 answer to a request but I believe they really do know. There are some standard request that have results stored and where the update is done in the background.

How can I get a specific chunk of results?

Is it possible to retrieve a specific range of results? I know how to do TOP x but the result I will retrieve is WAY too big and will time out. I was hoping to be able to pick say the first 10,000 results then the next 10,000 and so on. Is this possible?
WITH Q AS (
SELECT ROW_NUMBER() OVER (ORDER BY ...some column) AS N, ...other columns
FROM ...some table
) SELECT * FROM Q WHERE N BETWEEN 1 AND 10000;
Read more about ROW_NUMBER() here: http://msdn.microsoft.com/en-us/library/ms186734.aspx
Practically all SQL DB implementations have a way of specifying the starting row to return, as well as the number of rows.
For example, in both mysql and postgres it looks like:
SELECT ...
ORDER BY something -- not required, but highly recommended
LIMIT 100 -- only get 100 rows
OFFSET 500; -- start at row 500
Note that normally you would include an ORDER BY to make sure your chunks are consistent
MS SQL Server (being a "pretend" DB) don't support OFFSET directly, but it can be coded using ROW_NUMBER() - see this SO post for more detail.

Top N in View or Crystal Reports?

I am wondering if it's possible to use a view to get the top 5 lines from a table.
I am finding that Crystal reports doesn't seem to have anything built in to do this, or I'd do it there.
When I query the view Select * from qryTranHistory, it returns the first 5 items, but if I try to select a specific type Select * from qryTranHistory Where tID = 45 it returns nothing, since there are no tID=45 in the top 5 normally.
Is it possible to do this?
Can it be accomplished in a sub report in Crystal Reports?
It is easy to limit a report to the top 5 records. In the menu, just choose
Report --> Selection Formulas... --> Group
In the formula, enter "RecordNumber <= 5" and you are done.
You don't need to have a group field nor summary field to do the group filter. You don't need a sort order, but using top N records without a sort order doesn't usually make much sense. It might not be efficient as OMG Ponies suggested, but for small number of records it is OK.
You can reference a sproc from Crystal Reports. In the sproc, use a conditional on the parameter.
ALTER PROCEDURE dbo.Get_TOP5
(
#tID INT = NULL
)
AS
IF #tID IS NULL
BEGIN
SELECT TOP 5
FIELD1,
FIELD2
FROM qryTranHistory
END
ELSE
BEGIN
SELECT
FIELD1,
FIELD2
FROM qryTranHistory
WHERE tID =#tID
END
A simple setting can limit the records to top 5!! Here it is, if you're using .Net 1.1 (similar arrangement of options in higher frameworks too!).
Right click on the report layout > Reports > Top N/Sort Group Expert > Choose Top N in the Dropdown that asks for the type of filtering/ sorting you wish to do > Set the Value of top N (5 in your case) > Uncheck the option that includes other records.
Your report will be filtered for only the top 5 records from the Dataset.
There's another way how it could be done and that is through the Record selection formula where you limit the No. of records, as suggested by John Price in this thread.
Cheers!
Can you put the TOP in your SELECT statement instead of in the view?
SELECT TOP 5
col1,
col2,
...
FROM
qryTranHistory
WHERE
tid = 45
If your table has more then 5 rows I hope this query:
SELECT * FROM qryTranHistory
Returns more then 5 rows because you never mentioned TOP 5.
Your question doesn't make a lot of sense as I am not sure waht you are after.
You mentioned if you ran your query with WHERE tID=45, it returns nothing, what exactly do you want it to return ?
Read up on TOP in BOL:
SELECT TOP 10 Recs FROM Records WHERE...
By the way you do not want to do this in the report / a form interface, you want to do this in your db layer.
You can do Top N processing in Crystal Reports, but it's a little obscure - you have to use the group sort expert (and in order to use that, you need to have groups and summary fields inserted into the groups.)
Doing the Top N processing in the query should be more efficient, where possible.
If you have a small recordset, you can create a running total that counts the change of rows (field1), then in Section Expert in Details, tell it to supress RTotal0 (your running total variable) to > 5

LIMIT in FoxPro

I am attempting to pull ALOT of data from a fox pro database, work with it and insert it into a mysql db. It is too much to do all at once so want to do it in batches of say 10 000 records. What is the equivalent to LIMIT 5, 10 in Fox Pro SQL, would like a select statement like
select name, address from people limit 5, 10;
ie only get 10 results back, starting at the 5th. Have looked around online and they only make mention of top which is obviously not of much use.
Take a look at the RecNo() function.
FoxPro does not have direct support for a LIMIT clause. It does have "TOP nn" but that only provides the "top-most records" within a given percentage, and even that has a limitation of 32k records returned (maximum).
You might be better off dumping the data as a CSV, or if that isn't practical (due to size issues), writing a small FoxPro script that auto-generates a series of BEGIN-INSERT(x10000)-COMMIT statements that dump to a series of text files. Of course, you would need a FoxPro development environment for this, so this may not apply to your situation...
Visual FoxPro does not support LIMIT directly.
I used the following query to get over the limitation:
SELECT TOP 100 * from PEOPLE WHERE RECNO() > 1000 ORDER BY ID;
where 100 is the limit and 1000 is the offset.
It is very easy to get around LIMIT clause using TOP clause ; if you want to extract from record _start to record _finish from a file named _test, you can do :
[VFP]
** assuming _start <= _finish, if not you get a top clause error
*
_finish = MIN(RECCOUNT('_test'),_finish)
*
SELECT * FROM (SELECT TOP (_finish - _start + 1) * FROM (SELECT TOP _finish *, RECNO() AS _tempo FROM _test ORDER BY _tempo) xx ORDER BY _tempo DESC) yy ORDER BY _tempo
**
[/VFP]
I had to convert a Foxpro database to Mysql a few years ago. What I did to solve this was add an auto-incrementing id column to the Foxpro table and use that as the row reference.
So then you could do something like.
select name, address from people where id >= 5 and id <= 10;
The Foxpro sql documentation does not show anything similar to limit.
Here, adapt this to your tables. Took me like 2 mins, i do this waaaay too often.
N1 - group by whatever, and make sure you got a max(id), you can use recno() to make one, sorted correctly
N2 - Joins N1 where the ID = Max Id of N1, display the field you want from N2
Then if you want to join to other tables, put that all in brackets and give it an alias and include it in a join.
Select N1.reference, N1.OrderNoteCount, N2.notes_desc LastNote
FROM
(select reference, count(reference) OrderNoteCount, Max(notes_key) MaxNoteId
from custnote
where reference != ''
Group by reference
) N1
JOIN
(
select reference, count(reference) OrderNoteCount, notes_key, notes_desc
from custnote
where reference != ''
Group by reference, notes_key, notes_desc
) N2 ON N1.MaxNoteId = N2.notes_key
To expand on Eyvind's answer I would create a program to uses the RecNo() function to pull records within a given range, say 10,000 records.
You could then programmatically cycle through the large table in chucks of 10,000 records at a time and preform your data load into you MySQL database.
By using the RecNO() function you can be certain not to insert rows more than once, and be able to restart at a know point in the data load process. That by it's self can be very handy in the event you need to stop and restart the load process.
Depending on the number of the returned rows and if you are using .NET Framework you can offset/limit the gotten DataTable on the following way:
dataTable = dataTable.AsEnumerable().Skip(offset).Take(limit).CopyToDataTable();
Remember to add the Assembly System.Data.DataSetExtensions.