I am new to SQL and I have a large table with several hundred rows that I need to view all of its row. Is there a command in SQL that would act like the less command in Linux that would allow me to step one screen height at a time through the output of a select statement? So the pseudo-code for what I'm after would be, for example:
SELECT * from table less
What you're looking for is called "paging" or "pagination"
In MySQL and Postres it's LIMIT n OFFSET m: https://www.postgresql.org/docs/8.3/static/queries-limit.html
In SQL Server it's OFFSET m FETCH NEXT n: https://technet.microsoft.com/en-us/library/gg699618(v=sql.110).aspx
This QA has a more thorough answer: How universal is the LIMIT statement in SQL?
ANSI SQL supports TOP N
SELECT TOP 10 * from table
Certain other dialects of SQL like SQLite support LIMIT
In DB2, you need to use SELECT * from TABLE FETCH FIRST 10 rows only
Related
I have a set of around 50k numbers in my table. I want to split or fetch a certain percentage of rows. I'm using DB2, so I can't use the top clause like in MS SQL Server.
Earlier I was using limit as I needed to select n rows.
Select subs_msisdn
from DB2.table
Limit 10
But now I need to select n percent of rows instead of n rows. How can I do that?
If you don't need the exact percentage of the rows to be returned, and if your Db2 version supports it, you can use the TABLESAMPLE clause
Select subs_msisdn from DB2.table tablesample system(10)
A quick Question. Suppose I have the following two queries:
SELECT TOP 2 * FROM Persons;
and
SELECT * FROM Persons limit 2;
I want to know the difference between the execution of the above 2 queries?
Basically, I want to know when should I use the limit keyword and when it is appropriate to use the top keyword.
Also, How does the database return results based on the above 2 queries.
If you are using SQL Server use TOP.
if you are using MySQL or PostgreSQL use LIMIT!
AFAIK there is no product that currently supports both. Here's one list of current implementations and here's another (covers more products but in less detail)
As stated in my comment for Martin Smith's answer above, there are products that support both, LIMIT and TOP (as you can see here). The difference is that TOP only selects the first n records, but LIMIT allows the definition of an offset to retrieve a specific range of records:
SELECT * FROM ... LIMIT 5 OFFSET 10
This statement selects the 5 records, after skipping 10 records and this isn't possible with TOP.
The example I posted is only checked against the DBS I linked above. I didn't check a SQL standard, because of a lack of time.
TOP & LIMIT both work on amazon Redshift
limit works on MySQL and PostgreSQL, top works on SQL Server, rownum works on Oracle.
There is no difference. The TOP and LIMIT keywords function identically, and will return the same thing.
The DISTINCT command and the TOP command can't work together.
The DISTINCT command and the LIMIT command do work together.
So if you are using DISTINCT you must use LIMIT.
The difference between top and limit is, top only work with single table where as limit can work with join as well
one big mistake, LIMIT is slowly because select is return full and then database server return only limited data. When it is posible used to TOP.
I have a huge table of > 10 million rows. I need to efficiently grab a random sampling of 5000 from it. I have some constriants that reduces the total rows I am looking for to like 9 millon.
I tried using order by NEWID(), but that query will take too long as it has to do a table scan of all rows.
Is there a faster way to do this?
If you can use a pseudo-random sampling and you're on SQL Server 2005/2008, then take a look at TABLESAMPLE. For instance, an example from SQL Server 2008 / AdventureWorks 2008 which works based on rows:
USE AdventureWorks2008;
GO
SELECT FirstName, LastName
FROM Person.Person
TABLESAMPLE (100 ROWS)
WHERE EmailPromotion = 2;
The catch is that TABLESAMPLE isn't exactly random as it generates a given number of rows from each physical page. You may not get back exactly 5000 rows unless you limit with TOP as well. If you're on SQL Server 2000, you're going to have to either generate a temporary table which match the primary key or you're going to have to do it using a method using NEWID().
Have you looked into using the TABLESAMPLE clause?
For example:
select *
from HumanResources.Department tablesample (5 percent)
SQL Server 2000 Solution, regarding to Microsoft (instead of slow NEWID() on larger Tables):
SELECT * FROM Table1
WHERE (ABS(CAST(
(BINARY_CHECKSUM(*) *
RAND()) as int)) % 100) < 10
The SQL Server team at Microsoft realized that not being able to take
random samples of rows easily was a common problem in SQL Server 2000;
so, the team addressed the problem in SQL Server 2005 by introducing
the TABLESAMPLE clause. This clause selects a subset of rows by
choosing random data pages and returning all of the rows on those
pages. However, for those of us who still have products that run on
SQL Server 2000 and need backward-compatibility, or who need truly
row-level randomness, the BINARY_CHECKSUM query is a very effective
workaround.
Explanation can be found here:
http://msdn.microsoft.com/en-us/library/cc441928.aspx
Yeah, tablesample is your friend (note that it's not random in the statistical sense of the word):
Tablesample at msdn
The typical way of selecting data is:
select * from my_table
But what if the table contains 10 million records and you only want records 300,010 to 300,020
Is there a way to create a SQL statement on Microsoft SQL that only gets 10 records at once?
E.g.
select * from my_table from records 300,010 to 300,020
This would be way more efficient than retrieving 10 million records across the network, storing them in the IIS server and then counting to the records you want.
SELECT * FROM my_table is just the tip of the iceberg. Assuming you're talking a table with an identity field for the primary key, you can just say:
SELECT * FROM my_table WHERE ID >= 300010 AND ID <= 300020
You should also know that selecting * is considered poor practice in many circles. They want you specify the exact column list.
Try looking at info about pagination. Here's a short summary of it for SQL Server.
Absolutely. On MySQL and PostgreSQL (the two databases I've used), the syntax would be
SELECT [columns] FROM table LIMIT 10 OFFSET 300010;
On MS SQL, it's something like SELECT TOP 10 ...; I don't know the syntax for offsetting the record list.
Note that you never want to use SELECT *; it's a maintenance nightmare if anything ever changes. This query, though, is going to be incredibly slow since your database will have to scan through and throw away the first 300,010 records to get to the 10 you want. It'll also be unpredictable, since you haven't told the database which order you want the records in.
This is the core of SQL: tell it which 10 records you want, identified by a key in a specific range, and the database will do its best to grab and return those records with minimal work. Look up any tutorial on SQL for more information on how it works.
When working with large tables, it is often a good idea to make use of Partitioning techniques available in SQL Server.
The rules of your partitition function typically dictate that only a range of data can reside within a given partition. You could split your partitions by date range or ID for example.
In order to select from a particular partition you would use a query similar to the following.
SELECT <Column Name1>…/*
FROM <Table Name>
WHERE $PARTITION.<Partition Function Name>(<Column Name>) = <Partition Number>
Take a look at the following white paper for more detailed infromation on partitioning in SQL Server 2005.
http://msdn.microsoft.com/en-us/library/ms345146.aspx
I hope this helps however please feel free to pose further questions.
Cheers, John
I use wrapper queries to select the core query and then just isolate the ROW numbers that i wish to take from the query - this allows the SQL server to do all the heavy lifting inside the CORE query and just pass out the small amount of the table that i have requested. All you need to do is pass the [start_row_variable] and the [end_row_variable] into the SQL query.
NOTE: The order clause is specified OUTSIDE the core query [sql_order_clause]
w1 and w2 are TEMPORARY table created by the SQL server as the wrapper tables.
SELECT
w1.*
FROM(
SELECT w2.*,
ROW_NUMBER() OVER ([sql_order_clause]) AS ROW
FROM (
<!--- CORE QUERY START --->
SELECT [columns]
FROM [table_name]
WHERE [sql_string]
<!--- CORE QUERY END --->
) AS w2
) AS w1
WHERE ROW BETWEEN [start_row_variable] AND [end_row_variable]
This method has hugely optimized my database systems. It works very well.
IMPORTANT: Be sure to always explicitly specify only the exact columns you wish to retrieve in the core query as fetching unnecessary data in these CORE queries can cost you serious overhead
Use TOP to select only a limited amont of rows like:
SELECT TOP 10 * FROM my_table WHERE ID >= 300010
Add an ORDER BY if you want the results in a particular order.
To be efficient there has to be an index on the ID column.
When I worked on the Zend Framework's database component, we tried to abstract the functionality of the LIMIT clause supported by MySQL, PostgreSQL, and SQLite. That is, creating a query could be done this way:
$select = $db->select();
$select->from('mytable');
$select->order('somecolumn');
$select->limit(10, 20);
When the database supports LIMIT, this produces an SQL query like the following:
SELECT * FROM mytable ORDER BY somecolumn LIMIT 10, 20
This was more complex for brands of database that don't support LIMIT (that clause is not part of the standard SQL language, by the way). If you can generate row numbers, make the whole query a derived table, and in the outer query use BETWEEN. This was the solution for Oracle and IBM DB2. Microsoft SQL Server 2005 has a similar row-number function, so one can write the query this way:
SELECT z2.*
FROM (
SELECT ROW_NUMBER OVER(ORDER BY id) AS zend_db_rownum, z1.*
FROM ( ...original SQL query... ) z1
) z2
WHERE z2.zend_db_rownum BETWEEN #offset+1 AND #offset+#count;
However, Microsoft SQL Server 2000 doesn't have the ROW_NUMBER() function.
So my question is, can you come up with a way to emulate the LIMIT functionality in Microsoft SQL Server 2000, solely using SQL? Without using cursors or T-SQL or a stored procedure. It has to support both arguments for LIMIT, both count and offset. Solutions using a temporary table are also not acceptable.
Edit:
The most common solution for MS SQL Server 2000 seems to be like the one below, for example to get rows 50 through 75:
SELECT TOP 25 *
FROM (
SELECT TOP 75 *
FROM table
ORDER BY BY field ASC
) a
ORDER BY field DESC;
However, this doesn't work if the total result set is, say 60 rows. The inner query returns 60 rows because that's in the top 75. Then the outer query returns rows 35-60, which doesn't fit in the desired "page" of 50-75. Basically, this solution works unless you need the last "page" of a result set that doesn't happen to be a multiple of the page size.
Edit:
Another solution works better, but only if you can assume the result set includes a column that is unique:
SELECT TOP n *
FROM tablename
WHERE key NOT IN (
SELECT TOP x key
FROM tablename
ORDER BY key
);
Conclusion:
No general-purpose solution seems to exist for emulating LIMIT in MS SQL Server 2000. A good solution exists if you can use the ROW_NUMBER() function in MS SQL Server 2005.
Here is another solution which only works in Sql Server 2005 and newer because it uses the except statement. But I share it anyway.
If you want to get the records 50 - 75 write:
select * from (
SELECT top 75 COL1, COL2
FROM MYTABLE order by COL3
) as foo
except
select * from (
SELECT top 50 COL1, COL2
FROM MYTABLE order by COL3
) as bar
SELECT TOP n *
FROM tablename
WHERE key NOT IN (
SELECT TOP x key
FROM tablename
ORDER BY key
DESC
);
When you need LIMIT only, ms sql has the equivalent TOP keyword, so that is clear.
When you need LIMIT with OFFSET, you can try some hacks like previously described, but they all add some overhead, i.e. for ordering one way and then the other, or the expencive NOT IN operation.
I think all those cascades are not needed. The cleanest solution in my oppinion would be just use TOP without offset on the SQL side, and then seek to the required starting record with the appropriate client method, like mssql_data_seek in php. While this isn't a pure SQL solution, I think it is the best one because it doesn't add any overhead (the skipped-over records will not be transferred on the network when you seek past them, if that is what worries you).
I would try to implement this in my ORM as it is pretty simple there. If it really needs to be in SQL Server then I would look at the code generated by linq to sql for the following linq to sql statement and go from there. The MSFT engineer who implemented that code was part of the SQL team for many years and knew what he was doing.
var result = myDataContext.mytable.Skip(pageIndex * pageSize).Take(pageSize)