Emulate MySQL LIMIT clause in Microsoft SQL Server 2000 - sql

When I worked on the Zend Framework's database component, we tried to abstract the functionality of the LIMIT clause supported by MySQL, PostgreSQL, and SQLite. That is, creating a query could be done this way:
$select = $db->select();
$select->from('mytable');
$select->order('somecolumn');
$select->limit(10, 20);
When the database supports LIMIT, this produces an SQL query like the following:
SELECT * FROM mytable ORDER BY somecolumn LIMIT 10, 20
This was more complex for brands of database that don't support LIMIT (that clause is not part of the standard SQL language, by the way). If you can generate row numbers, make the whole query a derived table, and in the outer query use BETWEEN. This was the solution for Oracle and IBM DB2. Microsoft SQL Server 2005 has a similar row-number function, so one can write the query this way:
SELECT z2.*
FROM (
SELECT ROW_NUMBER OVER(ORDER BY id) AS zend_db_rownum, z1.*
FROM ( ...original SQL query... ) z1
) z2
WHERE z2.zend_db_rownum BETWEEN #offset+1 AND #offset+#count;
However, Microsoft SQL Server 2000 doesn't have the ROW_NUMBER() function.
So my question is, can you come up with a way to emulate the LIMIT functionality in Microsoft SQL Server 2000, solely using SQL? Without using cursors or T-SQL or a stored procedure. It has to support both arguments for LIMIT, both count and offset. Solutions using a temporary table are also not acceptable.
Edit:
The most common solution for MS SQL Server 2000 seems to be like the one below, for example to get rows 50 through 75:
SELECT TOP 25 *
FROM (
SELECT TOP 75 *
FROM table
ORDER BY BY field ASC
) a
ORDER BY field DESC;
However, this doesn't work if the total result set is, say 60 rows. The inner query returns 60 rows because that's in the top 75. Then the outer query returns rows 35-60, which doesn't fit in the desired "page" of 50-75. Basically, this solution works unless you need the last "page" of a result set that doesn't happen to be a multiple of the page size.
Edit:
Another solution works better, but only if you can assume the result set includes a column that is unique:
SELECT TOP n *
FROM tablename
WHERE key NOT IN (
SELECT TOP x key
FROM tablename
ORDER BY key
);
Conclusion:
No general-purpose solution seems to exist for emulating LIMIT in MS SQL Server 2000. A good solution exists if you can use the ROW_NUMBER() function in MS SQL Server 2005.

Here is another solution which only works in Sql Server 2005 and newer because it uses the except statement. But I share it anyway.
If you want to get the records 50 - 75 write:
select * from (
SELECT top 75 COL1, COL2
FROM MYTABLE order by COL3
) as foo
except
select * from (
SELECT top 50 COL1, COL2
FROM MYTABLE order by COL3
) as bar

SELECT TOP n *
FROM tablename
WHERE key NOT IN (
SELECT TOP x key
FROM tablename
ORDER BY key
DESC
);

When you need LIMIT only, ms sql has the equivalent TOP keyword, so that is clear.
When you need LIMIT with OFFSET, you can try some hacks like previously described, but they all add some overhead, i.e. for ordering one way and then the other, or the expencive NOT IN operation.
I think all those cascades are not needed. The cleanest solution in my oppinion would be just use TOP without offset on the SQL side, and then seek to the required starting record with the appropriate client method, like mssql_data_seek in php. While this isn't a pure SQL solution, I think it is the best one because it doesn't add any overhead (the skipped-over records will not be transferred on the network when you seek past them, if that is what worries you).

I would try to implement this in my ORM as it is pretty simple there. If it really needs to be in SQL Server then I would look at the code generated by linq to sql for the following linq to sql statement and go from there. The MSFT engineer who implemented that code was part of the SQL team for many years and knew what he was doing.
var result = myDataContext.mytable.Skip(pageIndex * pageSize).Take(pageSize)

Related

SQL Select, different than the last 10 records

I have a table called "dutyroster". I want to make a random selection from this table's "names" column, but, I want the selection be different than the last 10 records so that the same guy is not given a second duty in 10 days. Is that possible ?
Create a temporary table with only one column called oldnames which will have no records initially. For each select, execute a query like
select names from dutyroster where dutyroster.names not in (select oldnamesfrom temporarytable) limit 10
and when execution is done add the resultset to the temporary table
The other answer already here is addressing the portion of the question on how to avoid duplicating selections.
To accomplish the random part of the selection, leverage newid() directly within your select statement. I've made this sqlfiddle as an example.
SELECT TOP 10
newid() AS [RandomSortColumn],
*
FROM
dutyroster
ORDER BY
[RandomSortColumn] ASC
Keep executing the query, and you'll keep getting different results. Use the technique in the other answer for avoiding doubling a guy up.
The basic idea is to use a subquery to get all but users from the last ten days, then sort the rest randomly:
select dr.*
from dutyroster dr
where dr.name not in (select dr2.name
from dutyroster dr2
where dr2.datetimecol >= date_sub(curdate(), interval 10 day)
)
order by rand()
limit 1;
Different databases may have different syntax for limit, rand(), and for the date/time functions. The above gives the structure of the query, but the functions may differ.
If you have a large amount of data and performance is a concern, there are other (more complicated) ways to take a random sample.
you could use TOP function for SQL Server
and for MYSQL you could use LIMIT function
Maybe this would help...
SELECT TOP number|percent column_name(s)
FROM table_name;
Source: http://www.w3schools.com/sql/sql_top.asp

How do I generate a (dummy) column of arbitrary length with MonetDB?

I would like to run the equivalent of PostgreSQL's
SELECT * FROM GENERATE_SERIES(1, 10000000)
I've read this:
http://blog.jooq.org/2013/11/19/how-to-create-a-range-from-1-to-10-in-sql/
But most suggestions there don't really take an arbitrary length - the query depends on the length otherwise than by just replacing a number. Also, some suggestions do not apply in MonetDB. So, what's my best course of action (if any)?
Notes:
- I'm using a version from February 2013. Answers about more recent features are also welcome, but are exactly what I'm looking for.
- Assume the existing tables don't have enough lines; and do not assume that, say, a Cartesian product of the longest table with itself is sufficient (or alternatively, maybe that's too costly to perform).
Try with:
SELECT value
FROM sys.generate_series(initial_value, end_value, offset);
I have to report that the function is quite unstable on Jul2015 release as is causing the server process to crash. Hope you have better luck.
If you wants to generate an arbitrary numeric value you can use:
SELECT rand();
Forgive me; I've never worked with MonetDB before. But the documentation leads me to believe you can solve this with the ROW_NUMBER function and a pre-populated table like SYS.COLUMNS.
SELECT ROW_NUMBER() OVER () AS rownum
FROM SYS.COLUMNS;
This falls into jooq.org's category of just taking random records from a “large enough” table.
PostgreSQL's generate_series function is elegant, but non-standard. It's absent in other mainstream engines like SQL Server, Oracle, and MySQL. Your version of MonetDB doesn't have it either.
MonetDB does have the ROW_NUMBER function, a close equivalent in standard SQL. It assigns a sequential integer to rows in a result set. It will output the correct values, but it needs some rows in your database already. A chicken and egg problem!
SYS.COLUMNS is a system metadata table that contains one row for every column in your database. Most "empty" relational databases still have hundreds of system columns that appear in tables like these.
If the first query produces more rows than you need, you can push it into a subquery and filter the intermediate result.
SELECT rownum
FROM (
SELECT ROW_NUMBER() OVER () AS rownum
FROM SYS.COLUMNS
) AS tally
WHERE rownum >= 1 AND rownum <= 10;
But what if you need to generate more rows than you have in SYS.COLUMNS? Unfortunately, the shape of the query does depend on how many rows you want to generate.
A common workaround in the Microsoft SQL Server community would be to join SYS.COLUMNS to itself. This will produce an intermediate table containing the square of the number of rows in the table. In practice, it's probably more rows than you'll ever need.
With a self-join, the solution looks like this:
SELECT rownum
FROM (
SELECT ROW_NUMBER() OVER () AS rownum
FROM SYS.COLUMNS AS a, SYS.COLUMNS AS b
) AS tally
WHERE rownum >= 1 AND rownum <= 100000;
Hopefully these queries are also relevant in MonetDB world!

Select random sampling from sqlserver quickly

I have a huge table of > 10 million rows. I need to efficiently grab a random sampling of 5000 from it. I have some constriants that reduces the total rows I am looking for to like 9 millon.
I tried using order by NEWID(), but that query will take too long as it has to do a table scan of all rows.
Is there a faster way to do this?
If you can use a pseudo-random sampling and you're on SQL Server 2005/2008, then take a look at TABLESAMPLE. For instance, an example from SQL Server 2008 / AdventureWorks 2008 which works based on rows:
USE AdventureWorks2008;
GO
SELECT FirstName, LastName
FROM Person.Person
TABLESAMPLE (100 ROWS)
WHERE EmailPromotion = 2;
The catch is that TABLESAMPLE isn't exactly random as it generates a given number of rows from each physical page. You may not get back exactly 5000 rows unless you limit with TOP as well. If you're on SQL Server 2000, you're going to have to either generate a temporary table which match the primary key or you're going to have to do it using a method using NEWID().
Have you looked into using the TABLESAMPLE clause?
For example:
select *
from HumanResources.Department tablesample (5 percent)
SQL Server 2000 Solution, regarding to Microsoft (instead of slow NEWID() on larger Tables):
SELECT * FROM Table1
WHERE (ABS(CAST(
(BINARY_CHECKSUM(*) *
RAND()) as int)) % 100) < 10
The SQL Server team at Microsoft realized that not being able to take
random samples of rows easily was a common problem in SQL Server 2000;
so, the team addressed the problem in SQL Server 2005 by introducing
the TABLESAMPLE clause. This clause selects a subset of rows by
choosing random data pages and returning all of the rows on those
pages. However, for those of us who still have products that run on
SQL Server 2000 and need backward-compatibility, or who need truly
row-level randomness, the BINARY_CHECKSUM query is a very effective
workaround.
Explanation can be found here:
http://msdn.microsoft.com/en-us/library/cc441928.aspx
Yeah, tablesample is your friend (note that it's not random in the statistical sense of the word):
Tablesample at msdn

Is there efficient SQL to query a portion of a large table

The typical way of selecting data is:
select * from my_table
But what if the table contains 10 million records and you only want records 300,010 to 300,020
Is there a way to create a SQL statement on Microsoft SQL that only gets 10 records at once?
E.g.
select * from my_table from records 300,010 to 300,020
This would be way more efficient than retrieving 10 million records across the network, storing them in the IIS server and then counting to the records you want.
SELECT * FROM my_table is just the tip of the iceberg. Assuming you're talking a table with an identity field for the primary key, you can just say:
SELECT * FROM my_table WHERE ID >= 300010 AND ID <= 300020
You should also know that selecting * is considered poor practice in many circles. They want you specify the exact column list.
Try looking at info about pagination. Here's a short summary of it for SQL Server.
Absolutely. On MySQL and PostgreSQL (the two databases I've used), the syntax would be
SELECT [columns] FROM table LIMIT 10 OFFSET 300010;
On MS SQL, it's something like SELECT TOP 10 ...; I don't know the syntax for offsetting the record list.
Note that you never want to use SELECT *; it's a maintenance nightmare if anything ever changes. This query, though, is going to be incredibly slow since your database will have to scan through and throw away the first 300,010 records to get to the 10 you want. It'll also be unpredictable, since you haven't told the database which order you want the records in.
This is the core of SQL: tell it which 10 records you want, identified by a key in a specific range, and the database will do its best to grab and return those records with minimal work. Look up any tutorial on SQL for more information on how it works.
When working with large tables, it is often a good idea to make use of Partitioning techniques available in SQL Server.
The rules of your partitition function typically dictate that only a range of data can reside within a given partition. You could split your partitions by date range or ID for example.
In order to select from a particular partition you would use a query similar to the following.
SELECT <Column Name1>…/*
FROM <Table Name>
WHERE $PARTITION.<Partition Function Name>(<Column Name>) = <Partition Number>
Take a look at the following white paper for more detailed infromation on partitioning in SQL Server 2005.
http://msdn.microsoft.com/en-us/library/ms345146.aspx
I hope this helps however please feel free to pose further questions.
Cheers, John
I use wrapper queries to select the core query and then just isolate the ROW numbers that i wish to take from the query - this allows the SQL server to do all the heavy lifting inside the CORE query and just pass out the small amount of the table that i have requested. All you need to do is pass the [start_row_variable] and the [end_row_variable] into the SQL query.
NOTE: The order clause is specified OUTSIDE the core query [sql_order_clause]
w1 and w2 are TEMPORARY table created by the SQL server as the wrapper tables.
SELECT
w1.*
FROM(
SELECT w2.*,
ROW_NUMBER() OVER ([sql_order_clause]) AS ROW
FROM (
<!--- CORE QUERY START --->
SELECT [columns]
FROM [table_name]
WHERE [sql_string]
<!--- CORE QUERY END --->
) AS w2
) AS w1
WHERE ROW BETWEEN [start_row_variable] AND [end_row_variable]
This method has hugely optimized my database systems. It works very well.
IMPORTANT: Be sure to always explicitly specify only the exact columns you wish to retrieve in the core query as fetching unnecessary data in these CORE queries can cost you serious overhead
Use TOP to select only a limited amont of rows like:
SELECT TOP 10 * FROM my_table WHERE ID >= 300010
Add an ORDER BY if you want the results in a particular order.
To be efficient there has to be an index on the ID column.

There are a method to paging using ANSI SQL only?

I know:
Firebird: FIRST and SKIP;
MySQL: LIMIT;
SQL Server: ROW_NUMBER();
Does someone knows a SQL ANSI way to perform result paging?
See Limit—with offset section on this page: http://troels.arvin.dk/db/rdbms/
BTW, Firebird also supports ROWS clause since version 2.0
No official way, no.*
Generally you'll want to have an abstracted-out function in your database access layer that will cope with it for you; give it a hint that you're on MySQL or PostgreSQL and it can add a 'LIMIT' clause to your query, or rownum over a subquery for Oracle and so on. If it doesn't know it can do any of those, fall back to fetching the lot and returning only a slice of the full list.
*: eta: there is now, in ANSI SQL:2003. But it's not globally supported, it often performs badly, and it's a bit of a pain because you have to move/copy your ORDER into a new place in the statement, which makes it harder to wrap automatically:
SELECT * FROM (
SELECT thiscol, thatcol, ROW_NUMBER() OVER (ORDER BY mtime DESC, id) AS rownumber
)
WHERE rownumber BETWEEN 10 AND 20 -- care, 1-based index
ORDER BY rownumber;
There is also the "FETCH FIRST n ROWS ONLY" suffix in SQL:2008 (and DB2, where it originated). But like the TOP prefix in SQL Server, and the similar syntax in Informix, you can't specify a start point, so you still have to fetch and throw away some rows.
In nowadays there is a standard, not necessarily a ANSI standard (people gave many anwsers, I think this is the less verbose one)
SELECT * FROM t1
WHERE ID > :lastId
ORDER BY ID
FETCH FIRST 3 ROWS ONLY
It's not supported by all databases though, bellow a list of all databases that have support
MariaDB: Supported since 5.1 (usually, limit/offset is used)
MySQL: Supported since 3.19.3 (usually, limit/offset is used)
PostgreSQL: Supported since PostgreSQL 8.4 (usually, limit/offset is used)
SQLite: Supported since version 2.1.0
Db2 LUW: Supported since version 7
Oracle: Supported since version 12c (uses subselects with the row_num function)
Microsoft SQL Server: Supported since 2012 (traditionally, top-N is used)
You can use the offset style of course, although you could have performance issues
SELECT * FROM t1
ORDER BY ID
OFFSET 0 ROWS
FETCH FIRST 3 ROWS ONLY
It has a different support
MariaDB: Supported since 5.1
MySQL: Supported since 4.0.6
PostgreSQL: Supported since PostgreSQL 6.5
SQLite: Supported since version 2.1.0
Db2 LUW: Supported since version 11.1
Oracle: Supported since version 12c
Microsoft SQL Server: Supported since 2012
Yes (SQL ANSI 2003), feature E121-10, combined with the F861 feature you have :
ORDER BY column OFFSET n1 ROWS FETCH NEXT n2 ROWS ONLY;
Like:
SELECT Name, Address FROM Employees ORDER BY Salary OFFSET 2 ROWS
FETCH NEXT 2 ROWS ONLY;
Examples:
postgres:
https://dbfiddle.uk/?rdbms=postgres_9.5&fiddle=e25bb5235ccce77c4f950574037ef379
oracle:
https://dbfiddle.uk/?rdbms=oracle_21&fiddle=07d54808407b9dbd2ad209f2d0fe7ed7
sqlserver:
https://dbfiddle.uk/?rdbms=sqlserver_2019l&fiddle=e25bb5235ccce77c4f950574037ef379
db2:
https://dbfiddle.uk/?rdbms=db2_11.1&fiddle=e25bb5235ccce77c4f950574037ef379
YugabyteDB:
https://dbfiddle.uk/?rdbms=yugabytedb_2.8&fiddle=e25bb5235ccce77c4f950574037ef379
Unfortunately, MySQL does not support this syntax, you need something like:
ORDER BY column LIMIT n1 OFFSET n2
But MariaDB does:
https://dbfiddle.uk/?rdbms=mariadb_10.6&fiddle=e25bb5235ccce77c4f950574037ef379
I know I'm very, very late to this question, but it's still one of the top results for this issue.
However one response missing for this question is that the I believe the "correct" ANSI SQL method for paging, at least if you want maximum portability, is to not to use LIMIT/OFFSET/FIRST etc. at all, but to instead do something like:
SELECT *
FROM MyTable
WHERE ColumnA > ?
ORDER BY ColumnA ASC
Where ? is a parameter using a library that supports them (such as PDO in PHP).
The idea here is simple, when fetching the first page we pass a parameter that will match every possible row, e.g- if ColumnA is text, we would pass an empty string (''). We then read in as many results as we want, and then release the rest. This may mean some extra rows are fetched behind the scenes, but our priority here is compatibility.
In order to fetch the next page, we take the value of ColumnA from the last row in our results, and pass it in as the parameter, this way we will only fetch values that appear after it. To run the same query in the other direction, just swap > for < and ASC for DESC.
There are some important caveats of this approach:
Since we're using a condition, your DBMS is free to use an index to optimise the request, which can actually be faster than some "proper" pagination methods, as you eliminate rows rather than advancing past them.
This form of paging is more tightly anchored than row number based methods. When using row number offsets, if you offset into the table, but new rows are added that sort earlier than the current page, then it will cause results to be shifted into later pages. For example, if your current page's last row is mango but since fetching it rows are added for apple and carrot, then mango may now appear on the next page as well, as it has been shifted in the sort order. By using a condition of ColumnA > 'mango' this can't happen. This can be very useful in cases where you are sorting by a DATETIME with frequent updates occurring.
This trick can be made to work in both directions, by reversing the sort order as mentioned when going backwards (flip > to < and ASC to DESC) and passing in the value of ColumnA from the first row of each page of results, rather than the last. Note that if values were added to your table, it may mean that your first page may be shorter, but this is a fairly minor issue.
To be sure you're on the last (or first) page, you should fetch N + 1 rows, where N is the number of rows you want per page, this way you can detect whether there are more rows to fetch.
This method works best if you have a single column with only unique values, but it is still possible to use in more complex cases, so long as you can expand your ORDER BY clause (and WHERE condition) to include enough columns that every row is unique.
So it's not without a few catches, but it's by far the most compatible method as every SQL database will support it.
Insert your results into a storage table, ordered how you'd like to display them, but with a new IDENTITY column.
Now SELECT from that table just the range of IDs you're interested in.
(Be sure to clean out the table when you're done)
Or do it on the client, as anything to do with presentation should not normally be done on the SQL Server (in my opinion)
ANSI Sql example:
offset=41, fetchsize=10
SELECT TOP(10) *
FROM table1
WHERE table1.ID NOT IN (SELECT TOP(40) table1.ID FROM table1)
For paging we need a RowNo column to filter over it -that it should be over a field like id- with two variables like #PageNo and #PageRows. So I use this query:
SELECT *
FROM (
SELECT *, (SELECT COUNT(1)
FROM aTable ti
WHERE ti.id < t.id) As RowNo
FROM aTable t) tr
WHERE
tr.RowNo >= (#PageNo - 1) * #PageRows + 1
AND
tr.RowNo <= #PageNo * #PageRows
BTW, Troels, PostgreSQL supports Limit/Offset