There are a method to paging using ANSI SQL only? - sql

I know:
Firebird: FIRST and SKIP;
MySQL: LIMIT;
SQL Server: ROW_NUMBER();
Does someone knows a SQL ANSI way to perform result paging?

See Limit—with offset section on this page: http://troels.arvin.dk/db/rdbms/
BTW, Firebird also supports ROWS clause since version 2.0

No official way, no.*
Generally you'll want to have an abstracted-out function in your database access layer that will cope with it for you; give it a hint that you're on MySQL or PostgreSQL and it can add a 'LIMIT' clause to your query, or rownum over a subquery for Oracle and so on. If it doesn't know it can do any of those, fall back to fetching the lot and returning only a slice of the full list.
*: eta: there is now, in ANSI SQL:2003. But it's not globally supported, it often performs badly, and it's a bit of a pain because you have to move/copy your ORDER into a new place in the statement, which makes it harder to wrap automatically:
SELECT * FROM (
SELECT thiscol, thatcol, ROW_NUMBER() OVER (ORDER BY mtime DESC, id) AS rownumber
)
WHERE rownumber BETWEEN 10 AND 20 -- care, 1-based index
ORDER BY rownumber;
There is also the "FETCH FIRST n ROWS ONLY" suffix in SQL:2008 (and DB2, where it originated). But like the TOP prefix in SQL Server, and the similar syntax in Informix, you can't specify a start point, so you still have to fetch and throw away some rows.

In nowadays there is a standard, not necessarily a ANSI standard (people gave many anwsers, I think this is the less verbose one)
SELECT * FROM t1
WHERE ID > :lastId
ORDER BY ID
FETCH FIRST 3 ROWS ONLY
It's not supported by all databases though, bellow a list of all databases that have support
MariaDB: Supported since 5.1 (usually, limit/offset is used)
MySQL: Supported since 3.19.3 (usually, limit/offset is used)
PostgreSQL: Supported since PostgreSQL 8.4 (usually, limit/offset is used)
SQLite: Supported since version 2.1.0
Db2 LUW: Supported since version 7
Oracle: Supported since version 12c (uses subselects with the row_num function)
Microsoft SQL Server: Supported since 2012 (traditionally, top-N is used)
You can use the offset style of course, although you could have performance issues
SELECT * FROM t1
ORDER BY ID
OFFSET 0 ROWS
FETCH FIRST 3 ROWS ONLY
It has a different support
MariaDB: Supported since 5.1
MySQL: Supported since 4.0.6
PostgreSQL: Supported since PostgreSQL 6.5
SQLite: Supported since version 2.1.0
Db2 LUW: Supported since version 11.1
Oracle: Supported since version 12c
Microsoft SQL Server: Supported since 2012

Yes (SQL ANSI 2003), feature E121-10, combined with the F861 feature you have :
ORDER BY column OFFSET n1 ROWS FETCH NEXT n2 ROWS ONLY;
Like:
SELECT Name, Address FROM Employees ORDER BY Salary OFFSET 2 ROWS
FETCH NEXT 2 ROWS ONLY;
Examples:
postgres:
https://dbfiddle.uk/?rdbms=postgres_9.5&fiddle=e25bb5235ccce77c4f950574037ef379
oracle:
https://dbfiddle.uk/?rdbms=oracle_21&fiddle=07d54808407b9dbd2ad209f2d0fe7ed7
sqlserver:
https://dbfiddle.uk/?rdbms=sqlserver_2019l&fiddle=e25bb5235ccce77c4f950574037ef379
db2:
https://dbfiddle.uk/?rdbms=db2_11.1&fiddle=e25bb5235ccce77c4f950574037ef379
YugabyteDB:
https://dbfiddle.uk/?rdbms=yugabytedb_2.8&fiddle=e25bb5235ccce77c4f950574037ef379
Unfortunately, MySQL does not support this syntax, you need something like:
ORDER BY column LIMIT n1 OFFSET n2
But MariaDB does:
https://dbfiddle.uk/?rdbms=mariadb_10.6&fiddle=e25bb5235ccce77c4f950574037ef379

I know I'm very, very late to this question, but it's still one of the top results for this issue.
However one response missing for this question is that the I believe the "correct" ANSI SQL method for paging, at least if you want maximum portability, is to not to use LIMIT/OFFSET/FIRST etc. at all, but to instead do something like:
SELECT *
FROM MyTable
WHERE ColumnA > ?
ORDER BY ColumnA ASC
Where ? is a parameter using a library that supports them (such as PDO in PHP).
The idea here is simple, when fetching the first page we pass a parameter that will match every possible row, e.g- if ColumnA is text, we would pass an empty string (''). We then read in as many results as we want, and then release the rest. This may mean some extra rows are fetched behind the scenes, but our priority here is compatibility.
In order to fetch the next page, we take the value of ColumnA from the last row in our results, and pass it in as the parameter, this way we will only fetch values that appear after it. To run the same query in the other direction, just swap > for < and ASC for DESC.
There are some important caveats of this approach:
Since we're using a condition, your DBMS is free to use an index to optimise the request, which can actually be faster than some "proper" pagination methods, as you eliminate rows rather than advancing past them.
This form of paging is more tightly anchored than row number based methods. When using row number offsets, if you offset into the table, but new rows are added that sort earlier than the current page, then it will cause results to be shifted into later pages. For example, if your current page's last row is mango but since fetching it rows are added for apple and carrot, then mango may now appear on the next page as well, as it has been shifted in the sort order. By using a condition of ColumnA > 'mango' this can't happen. This can be very useful in cases where you are sorting by a DATETIME with frequent updates occurring.
This trick can be made to work in both directions, by reversing the sort order as mentioned when going backwards (flip > to < and ASC to DESC) and passing in the value of ColumnA from the first row of each page of results, rather than the last. Note that if values were added to your table, it may mean that your first page may be shorter, but this is a fairly minor issue.
To be sure you're on the last (or first) page, you should fetch N + 1 rows, where N is the number of rows you want per page, this way you can detect whether there are more rows to fetch.
This method works best if you have a single column with only unique values, but it is still possible to use in more complex cases, so long as you can expand your ORDER BY clause (and WHERE condition) to include enough columns that every row is unique.
So it's not without a few catches, but it's by far the most compatible method as every SQL database will support it.

Insert your results into a storage table, ordered how you'd like to display them, but with a new IDENTITY column.
Now SELECT from that table just the range of IDs you're interested in.
(Be sure to clean out the table when you're done)
Or do it on the client, as anything to do with presentation should not normally be done on the SQL Server (in my opinion)

ANSI Sql example:
offset=41, fetchsize=10
SELECT TOP(10) *
FROM table1
WHERE table1.ID NOT IN (SELECT TOP(40) table1.ID FROM table1)

For paging we need a RowNo column to filter over it -that it should be over a field like id- with two variables like #PageNo and #PageRows. So I use this query:
SELECT *
FROM (
SELECT *, (SELECT COUNT(1)
FROM aTable ti
WHERE ti.id < t.id) As RowNo
FROM aTable t) tr
WHERE
tr.RowNo >= (#PageNo - 1) * #PageRows + 1
AND
tr.RowNo <= #PageNo * #PageRows

BTW, Troels, PostgreSQL supports Limit/Offset

Related

Duplicate columns in Oracle query using row limiting clause

Since Oracle 12c, we can finally use the SQL standard row limiting clause like this:
SELECT * FROM t FETCH FIRST 10 ROWS ONLY
Now, in Oracle 12.1, there was a limitation that is quite annoying when joining tables. It's not possible to have two columns of the same name in the SELECT clause, when using the row limiting clause. E.g. this raises ORA-00918 in Oracle 12.1
SELECT t.id, u.id FROM t, u FETCH FIRST 10 ROWS ONLY
This is a restriction is documented in the manual for all versions 12.1, 12.2, 18.0:
The workaround is obviously to alias the columns
SELECT t.id AS t_id, u.id AS u_id FROM t, u FETCH FIRST 10 ROWS ONLY
Or to resort to "classic" pagination using ROWNUM or window functions.
Curiously, though, the original query with ambiguous ID columns runs just fine from Oracle 12.2 onwards. Is this a documentation bug, or an undocumented feature?
Seems in this case when you are using the row limiting clause, Oracle internally calling the ROW_NUMBER() function where it using the column name in OVER clause Like ROW_NUMBER OVER(ORDER BY ID). because of this you are getting the ORA-00918 error.
I noticed that you have an implicit join. It would be interesting to see if the problem goes away when joining explicitly. I'm wondering if behind the scenes Oracle is doing a join based on id=id and not using the table aliases you assigned them.
That would also explain the column aliases fixing the issue. Try explicitly joining; that could force oracle to use the table aliases and resolve the ambiguity it thinks it sees.

What is the SQL standard way to limit the number of return values?

I tried to find the standard way to limit the number of return values of a select query, but I can not find it in the BNF. Every DBMS seems to define its own way. Is there a standard way? And if not, why is not worth to standardize it?
It is standardized.
The SQL standard defines the following syntax:
select *
from some_table
order by id
fetch first 42 rows only;
Alternatively to start at a different row than the first one:
select *
from some_table
order by id
offset 42
fetch first 42 rows only;
This was introduced in SQL:2008
However not every DBMS supports the standard for this. Actually no DBMS fully supports everything that is defined in the standard. Some ignore the standard more than others.
According to Wikipedia the following DBMS supporting this:
PostgreSQL (8.4)
Oracle 12c
IBM DB2
SQL Server 2012
HSQLDB 2.0
H2
CA DATACOM/DB 11
Sybase SQL Anywhere
EffiProz

How do I generate a (dummy) column of arbitrary length with MonetDB?

I would like to run the equivalent of PostgreSQL's
SELECT * FROM GENERATE_SERIES(1, 10000000)
I've read this:
http://blog.jooq.org/2013/11/19/how-to-create-a-range-from-1-to-10-in-sql/
But most suggestions there don't really take an arbitrary length - the query depends on the length otherwise than by just replacing a number. Also, some suggestions do not apply in MonetDB. So, what's my best course of action (if any)?
Notes:
- I'm using a version from February 2013. Answers about more recent features are also welcome, but are exactly what I'm looking for.
- Assume the existing tables don't have enough lines; and do not assume that, say, a Cartesian product of the longest table with itself is sufficient (or alternatively, maybe that's too costly to perform).
Try with:
SELECT value
FROM sys.generate_series(initial_value, end_value, offset);
I have to report that the function is quite unstable on Jul2015 release as is causing the server process to crash. Hope you have better luck.
If you wants to generate an arbitrary numeric value you can use:
SELECT rand();
Forgive me; I've never worked with MonetDB before. But the documentation leads me to believe you can solve this with the ROW_NUMBER function and a pre-populated table like SYS.COLUMNS.
SELECT ROW_NUMBER() OVER () AS rownum
FROM SYS.COLUMNS;
This falls into jooq.org's category of just taking random records from a “large enough” table.
PostgreSQL's generate_series function is elegant, but non-standard. It's absent in other mainstream engines like SQL Server, Oracle, and MySQL. Your version of MonetDB doesn't have it either.
MonetDB does have the ROW_NUMBER function, a close equivalent in standard SQL. It assigns a sequential integer to rows in a result set. It will output the correct values, but it needs some rows in your database already. A chicken and egg problem!
SYS.COLUMNS is a system metadata table that contains one row for every column in your database. Most "empty" relational databases still have hundreds of system columns that appear in tables like these.
If the first query produces more rows than you need, you can push it into a subquery and filter the intermediate result.
SELECT rownum
FROM (
SELECT ROW_NUMBER() OVER () AS rownum
FROM SYS.COLUMNS
) AS tally
WHERE rownum >= 1 AND rownum <= 10;
But what if you need to generate more rows than you have in SYS.COLUMNS? Unfortunately, the shape of the query does depend on how many rows you want to generate.
A common workaround in the Microsoft SQL Server community would be to join SYS.COLUMNS to itself. This will produce an intermediate table containing the square of the number of rows in the table. In practice, it's probably more rows than you'll ever need.
With a self-join, the solution looks like this:
SELECT rownum
FROM (
SELECT ROW_NUMBER() OVER () AS rownum
FROM SYS.COLUMNS AS a, SYS.COLUMNS AS b
) AS tally
WHERE rownum >= 1 AND rownum <= 100000;
Hopefully these queries are also relevant in MonetDB world!

Is there a UniData SQL equivalent to the UniQuery SAMPLE keyword?

I'm using UniData 6. Is there a UniData SQL equivalent to the UniQuery SAMPLE keyword?
Using UniQuery, I've always been able to do:
SELECT CUST BY NAME SAMPLE 1
and it would give me the record with the first alphabetical name.
In UniData SQL, I'd like to be able to do something like:
SELECT NAME FROM CUST ORDER BY NAME SAMPLE 1;
...or, as in other SQL databases...
SELECT TOP 1 NAME FROM CUST ORDER BY NAME;
and get just the name of the the customer who's listed first alphabetically. Is there a keyword like this?
Unfortunately, no, there does not appear to be a UniSQL equivalent to the UniQuery SAMPLE keyword. UniSQL consists of a subset of ANSI SQL-92 standards, with some extensions to support multivalue. However, ANSI SQL-92 does not contain a standard for limiting the result set returned from a query, which is why various DBMS have different syntax for doing so.
ANSI SQL-2008 added the FETCH FIRST clause which is the standard way of implementing a limit to the number of rows returned by a query. It would require a pretty significant update to bring UniSQL up to recent standards since it is now 20+ years behind. There doesn't seem to be significant enough demand in the user community to undertake that effort.
Depending on your file's schema, you may be able to apply a workaround. If you are using an auto-incrementing key, you could use a syntax such as:
SELECT foo
FROM bar
WHERE #ID <= 10
The above query would be apply a de facto limit to the number of rows returned.
SELECT will usually only apply to record IDs. If you want to list out attributes, try LIST: LIST INVENTORY PROD_NAME PRICE QTY SAMPLE for instance will return the first 10 product names, prices and quantities.

Is there efficient SQL to query a portion of a large table

The typical way of selecting data is:
select * from my_table
But what if the table contains 10 million records and you only want records 300,010 to 300,020
Is there a way to create a SQL statement on Microsoft SQL that only gets 10 records at once?
E.g.
select * from my_table from records 300,010 to 300,020
This would be way more efficient than retrieving 10 million records across the network, storing them in the IIS server and then counting to the records you want.
SELECT * FROM my_table is just the tip of the iceberg. Assuming you're talking a table with an identity field for the primary key, you can just say:
SELECT * FROM my_table WHERE ID >= 300010 AND ID <= 300020
You should also know that selecting * is considered poor practice in many circles. They want you specify the exact column list.
Try looking at info about pagination. Here's a short summary of it for SQL Server.
Absolutely. On MySQL and PostgreSQL (the two databases I've used), the syntax would be
SELECT [columns] FROM table LIMIT 10 OFFSET 300010;
On MS SQL, it's something like SELECT TOP 10 ...; I don't know the syntax for offsetting the record list.
Note that you never want to use SELECT *; it's a maintenance nightmare if anything ever changes. This query, though, is going to be incredibly slow since your database will have to scan through and throw away the first 300,010 records to get to the 10 you want. It'll also be unpredictable, since you haven't told the database which order you want the records in.
This is the core of SQL: tell it which 10 records you want, identified by a key in a specific range, and the database will do its best to grab and return those records with minimal work. Look up any tutorial on SQL for more information on how it works.
When working with large tables, it is often a good idea to make use of Partitioning techniques available in SQL Server.
The rules of your partitition function typically dictate that only a range of data can reside within a given partition. You could split your partitions by date range or ID for example.
In order to select from a particular partition you would use a query similar to the following.
SELECT <Column Name1>…/*
FROM <Table Name>
WHERE $PARTITION.<Partition Function Name>(<Column Name>) = <Partition Number>
Take a look at the following white paper for more detailed infromation on partitioning in SQL Server 2005.
http://msdn.microsoft.com/en-us/library/ms345146.aspx
I hope this helps however please feel free to pose further questions.
Cheers, John
I use wrapper queries to select the core query and then just isolate the ROW numbers that i wish to take from the query - this allows the SQL server to do all the heavy lifting inside the CORE query and just pass out the small amount of the table that i have requested. All you need to do is pass the [start_row_variable] and the [end_row_variable] into the SQL query.
NOTE: The order clause is specified OUTSIDE the core query [sql_order_clause]
w1 and w2 are TEMPORARY table created by the SQL server as the wrapper tables.
SELECT
w1.*
FROM(
SELECT w2.*,
ROW_NUMBER() OVER ([sql_order_clause]) AS ROW
FROM (
<!--- CORE QUERY START --->
SELECT [columns]
FROM [table_name]
WHERE [sql_string]
<!--- CORE QUERY END --->
) AS w2
) AS w1
WHERE ROW BETWEEN [start_row_variable] AND [end_row_variable]
This method has hugely optimized my database systems. It works very well.
IMPORTANT: Be sure to always explicitly specify only the exact columns you wish to retrieve in the core query as fetching unnecessary data in these CORE queries can cost you serious overhead
Use TOP to select only a limited amont of rows like:
SELECT TOP 10 * FROM my_table WHERE ID >= 300010
Add an ORDER BY if you want the results in a particular order.
To be efficient there has to be an index on the ID column.