I have a set of around 50k numbers in my table. I want to split or fetch a certain percentage of rows. I'm using DB2, so I can't use the top clause like in MS SQL Server.
Earlier I was using limit as I needed to select n rows.
Select subs_msisdn
from DB2.table
Limit 10
But now I need to select n percent of rows instead of n rows. How can I do that?
If you don't need the exact percentage of the rows to be returned, and if your Db2 version supports it, you can use the TABLESAMPLE clause
Select subs_msisdn from DB2.table tablesample system(10)
Related
I am new to SQL and I have a large table with several hundred rows that I need to view all of its row. Is there a command in SQL that would act like the less command in Linux that would allow me to step one screen height at a time through the output of a select statement? So the pseudo-code for what I'm after would be, for example:
SELECT * from table less
What you're looking for is called "paging" or "pagination"
In MySQL and Postres it's LIMIT n OFFSET m: https://www.postgresql.org/docs/8.3/static/queries-limit.html
In SQL Server it's OFFSET m FETCH NEXT n: https://technet.microsoft.com/en-us/library/gg699618(v=sql.110).aspx
This QA has a more thorough answer: How universal is the LIMIT statement in SQL?
ANSI SQL supports TOP N
SELECT TOP 10 * from table
Certain other dialects of SQL like SQLite support LIMIT
In DB2, you need to use SELECT * from TABLE FETCH FIRST 10 rows only
Is there any practical limit of number of rows a select statement can fetch from a database- any database?
Assume, I am running a query SELECT * FROM TableName and that table has more than 12,00,000 rows.
How many rows it can fetch? Is there a limit for this?
12000000 is not at all a big number I have worked with way bigger result sets. As long as your memory can fit the result you should have no problems.
Is there a better way instead of SELECT COUNT(*) statement to count the number of records in table?
Sometimes we have to count billions of records from temporary tables that are imported using bcp query.
Using count(*) or count(some_column) is the fastest way to check for a tables record count.
If you don't need to filter, the following query works well:
SELECT sum(rows) FROM SYS.PARTITIONS WHERE object_id=object_id('MY_TABLE') and index_id in(0,1)
That checkes the number of rows sql server is storing for that object. It can't return any data with the count, and there is no way to include a group by or where.
I have a table that contains 300 million rows, and a clustered index on the [DataDate] column.
How do I select the last 10 rows of this table (I want to find the most recent date in the table)?
Database: Microsoft SQL Server 2008 R2.
Update
The answers below work perfectly - but only if there is a clustered index on [DataDate]. The table is, after all, 300 million rows, and a naive query would end up taking hours to execute rather than seconds. The query plan is using the clustered index on [DataDate] to get results within a few tens of milliseconds.
TOP
SELECT TOP(10) [DataDate] FROM YourTable ORDER BY [DataDate] DESC
TOP (Transact-SQL) specifies that only the first set of rows will be returned from the query result. The set of rows can be either a number or a percent of the rows. The TOP expression can be used in SELECT, INSERT, UPDATE, MERGE, and DELETE statements.
SELECT TOP(10) *
FROM MyTable
ORDER BY DataDate DESC
Do a reverse sort using ORDER BY and use TOP.
I have a huge table of > 10 million rows. I need to efficiently grab a random sampling of 5000 from it. I have some constriants that reduces the total rows I am looking for to like 9 millon.
I tried using order by NEWID(), but that query will take too long as it has to do a table scan of all rows.
Is there a faster way to do this?
If you can use a pseudo-random sampling and you're on SQL Server 2005/2008, then take a look at TABLESAMPLE. For instance, an example from SQL Server 2008 / AdventureWorks 2008 which works based on rows:
USE AdventureWorks2008;
GO
SELECT FirstName, LastName
FROM Person.Person
TABLESAMPLE (100 ROWS)
WHERE EmailPromotion = 2;
The catch is that TABLESAMPLE isn't exactly random as it generates a given number of rows from each physical page. You may not get back exactly 5000 rows unless you limit with TOP as well. If you're on SQL Server 2000, you're going to have to either generate a temporary table which match the primary key or you're going to have to do it using a method using NEWID().
Have you looked into using the TABLESAMPLE clause?
For example:
select *
from HumanResources.Department tablesample (5 percent)
SQL Server 2000 Solution, regarding to Microsoft (instead of slow NEWID() on larger Tables):
SELECT * FROM Table1
WHERE (ABS(CAST(
(BINARY_CHECKSUM(*) *
RAND()) as int)) % 100) < 10
The SQL Server team at Microsoft realized that not being able to take
random samples of rows easily was a common problem in SQL Server 2000;
so, the team addressed the problem in SQL Server 2005 by introducing
the TABLESAMPLE clause. This clause selects a subset of rows by
choosing random data pages and returning all of the rows on those
pages. However, for those of us who still have products that run on
SQL Server 2000 and need backward-compatibility, or who need truly
row-level randomness, the BINARY_CHECKSUM query is a very effective
workaround.
Explanation can be found here:
http://msdn.microsoft.com/en-us/library/cc441928.aspx
Yeah, tablesample is your friend (note that it's not random in the statistical sense of the word):
Tablesample at msdn