How can I get a specific chunk of results? - sql

Is it possible to retrieve a specific range of results? I know how to do TOP x but the result I will retrieve is WAY too big and will time out. I was hoping to be able to pick say the first 10,000 results then the next 10,000 and so on. Is this possible?

WITH Q AS (
SELECT ROW_NUMBER() OVER (ORDER BY ...some column) AS N, ...other columns
FROM ...some table
) SELECT * FROM Q WHERE N BETWEEN 1 AND 10000;
Read more about ROW_NUMBER() here: http://msdn.microsoft.com/en-us/library/ms186734.aspx

Practically all SQL DB implementations have a way of specifying the starting row to return, as well as the number of rows.
For example, in both mysql and postgres it looks like:
SELECT ...
ORDER BY something -- not required, but highly recommended
LIMIT 100 -- only get 100 rows
OFFSET 500; -- start at row 500
Note that normally you would include an ORDER BY to make sure your chunks are consistent
MS SQL Server (being a "pretend" DB) don't support OFFSET directly, but it can be coded using ROW_NUMBER() - see this SO post for more detail.

Related

SQL: Reduce resultset to X rows?

I have the following MYSQL table:
measuredata:
- ID (bigint)
- timestamp
- entityid
- value (double)
The table contains >1 billion entries. I want to be able to visualize any time-window. The time window can be size of "one day" to "many years". There are measurement values round about every minute in DB.
So the number of entries for a time-window can be quite different. Say from few hundrets to several thousands or millions.
Those values are ment to be visualiuzed in a graphical chart-diagram on a webpage.
If the chart is - lets say - 800px wide, it does not make sense to get thousands of rows from database if time-window is quite big. I cannot show more than 800 values on this chart anyhow.
So, is there a way to reduce the resultset directly on DB-side?
I know "average" and "sum" etc. as aggregate function. But how can I i.e. aggregate 100k rows from a big time-window to lets say 800 final rows?
Just getting those 100k rows and let the chart do the magic is not the preferred option. Transfer-size is one reason why this is not an option.
Isn't there something on DB side I can use?
Something like avg() to shrink X rows to Y averaged rows?
Or a simple magic to just skip every #th row to shrink X to Y?
update:
Although I'm using MySQL right now, I'm not tied to this. If PostgreSQL f.i. provides a feature that could solve the issue, I'm willing to switch DB.
update2:
I maybe found a possible solution: https://mike.depalatis.net/blog/postgres-time-series-database.html
See section "Data aggregation".
The key is not to use a unixtimestamp but a date and "trunc" it, avergage the values and group by the trunc'ed date. Could work for me, but would require a rework of my table structure. Hmm... maybe there's more ... still researching ...
update3:
Inspired by update 2, I came up with this query:
SELECT (`timestamp` - (`timestamp` % 86400)) as aggtimestamp, `entity`, `value` FROM `measuredata` WHERE `entity` = 38 AND timestamp > UNIX_TIMESTAMP('2019-01-25') group by aggtimestamp
Works, but my DB/index/structue seems not really optimized for this: Query for last year took ~75sec (slow test machine) but finally got only a one value per day. This can be combined with avg(value), but this further increases query time... (~82sec). I will see if it's possible to further optimize this. But I now have an idea how "downsampling" data works, especially with aggregation in combination with "group by".
There is probably no efficient way to do this. But, if you want, you can break the rows into equal sized groups and then fetch, say, the first row from each group. Here is one method:
select md.*
from (select md.*,
row_number() over (partition by tile order by timestamp) as seqnum
from (select md.*, ntile(800) over (order by timestamp) as tile
from measuredata md
where . . . -- your filtering conditions here
) md
) md
where seqnum = 1;

SQL Query Limit for DB2 AS/400 Version 4

I know the version is way too old (yea version 4!), but I have no choice.
How to limit my query for example 100 rows only for DB2 AS400?
FETCH FIRST n ROWS ONLY
and
ROW_NUMBER()
don't work.
Any ideas or workaround?
Here is a sample SQL query (does not work):
SELECT POLNOP FROM ZICACPTF.POLHDR FETCH FIRST 10 ROWS ONLY
It says
[SQL0199] Keyword FETCH not expected. Valid tokens: FOR WITH ORDER UNION OPTIMIZE.
There is no dbms support for this operation, check Version 4 DB2 UDB for AS/400 SQL Reference: No Limit, Top, First, ... reserved words.
You can try to limit rows via where clause, where sequence between 100 and 200. But this is an unreal scenario.
First work around is via cursor:
DECLARE ITERROWS INTEGER;
...
SET ITERROWS = 0;
DO WHILE (SUBSTR(SQLSTATE,1,2) = '00' and ITERROWS < 100
DO
...
SET ITERROWS = ITERROWS + 1;
second one, in your middleware language.
I hope someone post a clever workaround, but, in my opinion, they are not.
Solution only for > V4R4
Using FETCH FIRST [n] ROWS ONLY:
SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
FROM EMP
ORDER BY SALARY DESC
FETCH FIRST 10 ROWS ONLY;
Reference: publib.boulder.ibm.com
The difference I can see from your query to this example is that here we are using a ORDER BY clause - do you have the possibility to add a ORDER BY - it should do the trick. Referencing to: https://stackoverflow.com/a/16858430/1581725
To get ranges or also only the first 10 rows, you'd have to use ROW_NUMBER() (since v5r4):
SELECT
*
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY {{table field}}) AS ROWNUM, * {{yourtable}}
) AS {{yourcursor}}
WHERE
{{yourcursor}}.ROWNUM>0 AND
{{yourcursor}}.ROWNUM<=10
Reference: blog.zanclus.com

Vague count in sql select statements

I guess this has been asked in the site before but I can't find it.
I've seen in some sites that there is a vague count over the results of a search. For example, here in stackoverflow, when you search a question, it says +5000 results (sometimes), in gmail, when you search by keywords, it says "hundreds" and in google it says aprox X results. Is this just a way to show the user an easy-to-understand-a-huge-number? or this is actually a fast way to count results that can be used in a database [I'm learning Oracle at the moment 10g version]? something like "hey, if you get more than 1k results, just stop and tell me there are more than 1k".
Thanks
PS. I'm new to databases.
Usually this is just a nice way to display a number.
I don't believe there is a way to do what you are asking for in SQL - count does not have an option for counting up until some number.
I also would not assume this is coming from SQL in either gmail, or stackoverflow.
Most search engines will return a total number of matches to a search, and then let you page through results.
As for making an exact number more human readable, here is an example from Rails:
http://api.rubyonrails.org/classes/ActionView/Helpers/NumberHelper.html#method-i-number_to_human
With Oracle, you can always resort to analytical functions in order to calculate the exact number of rows about to be returned. This is an example of such a query:
SELECT inner.*, MAX(ROWNUM) OVER(PARTITION BY 1) as TOTAL_ROWS
FROM (
[... your own, sorted search query ...]
) inner
This will give you the total number of rows for your specific subquery. When you want to apply paging as well, you can further wrap these SQL parts as such:
SELECT outer.* FROM (
SELECT * FROM (
SELECT inner.*,ROWNUM as RNUM, MAX(ROWNUM) OVER(PARTITION BY 1) as TOTAL_ROWS
FROM (
[... your own, sorted search query ...]
) inner
)
WHERE ROWNUM < :max_row
) outer
WHERE outer.RNUM > :min_row
Replace min_row and max_row by meaningful values. But beware that calculating the exact number of rows can be expensive when you're not filtering using UNIQUE SCAN or relatively narrow RANGE SCAN operations on indexes. Read more about this here: Speed of paged queries in Oracle
As others have said, you can always have an absolute upper limit, such as 5000 to your query using a ROWNUM <= 5000 filter and then just indicate that there are more than 5000+ results. Note that Oracle can be very good at optimising queries when you apply ROWNUM filtering. Find some info on that subject here:
http://www.dba-oracle.com/t_sql_tuning_rownum_equals_one.htm
Vague count is a buffer which will be displayed promptly. If user wants to see more results then he can request more.
It's a performance facility, after displaying the results the sites like google keep searching for more results.
I don't know how fast this will run, but you can try:
SELECT NULL FROM your_tables WHERE your_condition AND ROWNUM <= 1001
If count of rows in result will equals to 1001 then total count of records will > 1000.
this question gives some pretty good information
When you do an SQL query you can set a
LIMIT 0, 100
for example and you will only get the first hundred answers. so you can then print to your viewer that there are 100+ answers to their request.
For google I couldn't say if they really know there is more than 27'000'000'000 answer to a request but I believe they really do know. There are some standard request that have results stored and where the update is done in the background.

Sql query to get a non-contiguous subset of results

I'm writing a web application that should show very large results on a search query.
Say some queries will return 10.000 items.
I'd like to show those to users paginated; no problem so far: each page will be the result of a query with an appropriate LIMIT statement.
But I'd like to show clues about results in each page of the paginated query: some data from the first item and some from the last.
This mean that, for example, with a result of 10.000 items and a page size of 50 items, if the user asked for the first page I will need:
the first 50 items (the page requested by the user)
item 51 and 100 (the first and last of the second page)
item 101 and 151
etc
For efficiency reasons I want to avoid one query per row.
[edit] I also would prefer not downloading 10.000 results if I only need 50 + 10000/50*2 = 400
The question is: is there a single query I can issue to the RDBMS (mysql, by the way, but I'd prefer a cross-db solution) that will return only the data I need?
I can't use server side cursor, because not all dbs support it and I want my app to be database-agnostic.
Just for fun, here is the MSSQL version of it.
declare #pageSize as int; set #pageSize = 10;
declare #pageIndex as int; set #pageIndex = 0; /* first page */
WITH x AS
(
select
ROW_NUMBER() OVER (ORDER BY (created) ASC) AS RowNumber,
*
from table
)
SELECT * FROM x
WHERE
((RowNumber <= (#pageIndex+1)*#pageSize) AND (RowNumber >= #pageIndex*#PageSize+1))
OR
RowNumber % #pageSize = 1
OR
RowNumber % #pageSize = #pageSize-1
Note, that an ORDER BY is provided in the over clause.
Also note, that if you have gazillion rows, your result set will have millions. You need to maximize the result rows for practical reasons.
I have no idea how this could be a solved in generic SQL. (My bet: no way. Even simple pageing cannot be solved without DB-specific operators.)
UPDATE: I completely misread the initial question. You can do this using UNION and the LIMIT clause in MySQL, although it might be what you meant by "one query per row". The syntax would be like:
select FOO from BAZ limit 50
union
select FOO from BAZ limit 50, 1
union
select FOO from BAZ limit 99, 1
union
select FOO from BAZ limit 100, 1
union
select FOO from BAZ limit 149, 1
and so on and so forth. Since you're using UNION, you'll only need one roundtrip to the database. I'm not sure how MySQL will treat the various SELECT statements, though. It should be able to recognize that they are essentially the same query and use a cached query plan, but I don't work with MySQL enough to know if that's a reasonable expectation for its optimizer.
Obviously, to build this query in a general fashion, you'll first need to run a count query so you can calculate what your offsets will be.
This is definitely not a tractable problem for standard SQL, since the paging logic requires nonstandard features.

LIMIT in FoxPro

I am attempting to pull ALOT of data from a fox pro database, work with it and insert it into a mysql db. It is too much to do all at once so want to do it in batches of say 10 000 records. What is the equivalent to LIMIT 5, 10 in Fox Pro SQL, would like a select statement like
select name, address from people limit 5, 10;
ie only get 10 results back, starting at the 5th. Have looked around online and they only make mention of top which is obviously not of much use.
Take a look at the RecNo() function.
FoxPro does not have direct support for a LIMIT clause. It does have "TOP nn" but that only provides the "top-most records" within a given percentage, and even that has a limitation of 32k records returned (maximum).
You might be better off dumping the data as a CSV, or if that isn't practical (due to size issues), writing a small FoxPro script that auto-generates a series of BEGIN-INSERT(x10000)-COMMIT statements that dump to a series of text files. Of course, you would need a FoxPro development environment for this, so this may not apply to your situation...
Visual FoxPro does not support LIMIT directly.
I used the following query to get over the limitation:
SELECT TOP 100 * from PEOPLE WHERE RECNO() > 1000 ORDER BY ID;
where 100 is the limit and 1000 is the offset.
It is very easy to get around LIMIT clause using TOP clause ; if you want to extract from record _start to record _finish from a file named _test, you can do :
[VFP]
** assuming _start <= _finish, if not you get a top clause error
*
_finish = MIN(RECCOUNT('_test'),_finish)
*
SELECT * FROM (SELECT TOP (_finish - _start + 1) * FROM (SELECT TOP _finish *, RECNO() AS _tempo FROM _test ORDER BY _tempo) xx ORDER BY _tempo DESC) yy ORDER BY _tempo
**
[/VFP]
I had to convert a Foxpro database to Mysql a few years ago. What I did to solve this was add an auto-incrementing id column to the Foxpro table and use that as the row reference.
So then you could do something like.
select name, address from people where id >= 5 and id <= 10;
The Foxpro sql documentation does not show anything similar to limit.
Here, adapt this to your tables. Took me like 2 mins, i do this waaaay too often.
N1 - group by whatever, and make sure you got a max(id), you can use recno() to make one, sorted correctly
N2 - Joins N1 where the ID = Max Id of N1, display the field you want from N2
Then if you want to join to other tables, put that all in brackets and give it an alias and include it in a join.
Select N1.reference, N1.OrderNoteCount, N2.notes_desc LastNote
FROM
(select reference, count(reference) OrderNoteCount, Max(notes_key) MaxNoteId
from custnote
where reference != ''
Group by reference
) N1
JOIN
(
select reference, count(reference) OrderNoteCount, notes_key, notes_desc
from custnote
where reference != ''
Group by reference, notes_key, notes_desc
) N2 ON N1.MaxNoteId = N2.notes_key
To expand on Eyvind's answer I would create a program to uses the RecNo() function to pull records within a given range, say 10,000 records.
You could then programmatically cycle through the large table in chucks of 10,000 records at a time and preform your data load into you MySQL database.
By using the RecNO() function you can be certain not to insert rows more than once, and be able to restart at a know point in the data load process. That by it's self can be very handy in the event you need to stop and restart the load process.
Depending on the number of the returned rows and if you are using .NET Framework you can offset/limit the gotten DataTable on the following way:
dataTable = dataTable.AsEnumerable().Skip(offset).Take(limit).CopyToDataTable();
Remember to add the Assembly System.Data.DataSetExtensions.