Query every n rows in POSTGRESQL - sql

I have a simple table in postgresql, say
id
fname
abc
bert
def
jaap
ghi
kees
jkl
jan
etc
piet
...etc...
With a string primary key id.
My table has millions of rows.
I want to get a list of every 10_000th (give or take) row.
Basically:
SELECT id
FROM (
SELECT id, ROW_NUMBER() OVER (ORDER BY id) AS rownum
FROM mytable
) as t
WHERE ((t.rownum - 1) % 10000) = 0;
But that seems to be very slow. Is there an efficient alternative?

You could try NTILE()-function
WITH CTE(ID,FNAME)AS
(
SELECT 'ABC','BERT'
UNION ALL
SELECT 'DEF','JAAP'
UNION ALL
SELECT 'GHI','KEES'
UNION ALL
SELECT 'JKL','JAN'
UNION ALL
SELECT 'ETC','PIET'
)
SELECT C.ID,C.FNAME,
NTILE(3)OVER(ORDER BY C.ID ASC)XCOL
FROM CTE AS C;

I am afraid that it might be the best possible solution. I have executed your below query in sql server on a table having almost 65 million rows and getting result with 18 seconds. I think it might be the best possible solution. Since it's primary key column a cluster is already there to speed up the process. If you regularly do the maintenance job it might be the best you can ask for.
SELECT id
FROM (
SELECT id, ROW_NUMBER() OVER (ORDER BY id) AS rownum
FROM mytable
) as t
WHERE ((t.rownum - 1) % 10000) = 0;
Please let me know the exact row numbers and your execution time. And run it after reindexing .

Related

Select nth row using where condition in oracle database [duplicate]

I'm interested in learning some (ideally) database agnostic ways of selecting the nth row from a database table. It would also be interesting to see how this can be achieved using the native functionality of the following databases:
SQL Server
MySQL
PostgreSQL
SQLite
Oracle
I am currently doing something like the following in SQL Server 2005, but I'd be interested in seeing other's more agnostic approaches:
WITH Ordered AS (
SELECT ROW_NUMBER() OVER (ORDER BY OrderID) AS RowNumber, OrderID, OrderDate
FROM Orders)
SELECT *
FROM Ordered
WHERE RowNumber = 1000000
Credit for the above SQL: Firoz Ansari's Weblog
Update: See Troels Arvin's answer regarding the SQL standard. Troels, have you got any links we can cite?
There are ways of doing this in optional parts of the standard, but a lot of databases support their own way of doing it.
A really good site that talks about this and other things is http://troels.arvin.dk/db/rdbms/#select-limit.
Basically, PostgreSQL and MySQL supports the non-standard:
SELECT...
LIMIT y OFFSET x
Oracle, DB2 and MSSQL supports the standard windowing functions:
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY key ASC) AS rownumber,
columns
FROM tablename
) AS foo
WHERE rownumber <= n
(which I just copied from the site linked above since I never use those DBs)
Update: As of PostgreSQL 8.4 the standard windowing functions are supported, so expect the second example to work for PostgreSQL as well.
Update: SQLite added window functions support in version 3.25.0 on 2018-09-15 so both forms also work in SQLite.
PostgreSQL supports windowing functions as defined by the SQL standard, but they're awkward, so most people use (the non-standard) LIMIT / OFFSET:
SELECT
*
FROM
mytable
ORDER BY
somefield
LIMIT 1 OFFSET 20;
This example selects the 21st row. OFFSET 20 is telling Postgres to skip the first 20 records. If you don't specify an ORDER BY clause, there's no guarantee which record you will get back, which is rarely useful.
I'm not sure about any of the rest, but I know SQLite and MySQL don't have any "default" row ordering. In those two dialects, at least, the following snippet grabs the 15th entry from the_table, sorting by the date/time it was added:
SELECT *
FROM the_table
ORDER BY added DESC
LIMIT 1,15
(of course, you'd need to have an added DATETIME field, and set it to the date/time that entry was added...)
SQL 2005 and above has this feature built-in. Use the ROW_NUMBER() function. It is excellent for web-pages with a << Prev and Next >> style browsing:
Syntax:
SELECT
*
FROM
(
SELECT
ROW_NUMBER () OVER (ORDER BY MyColumnToOrderBy) AS RowNum,
*
FROM
Table_1
) sub
WHERE
RowNum = 23
I suspect this is wildly inefficient but is quite a simple approach, which worked on a small dataset that I tried it on.
select top 1 field
from table
where field in (select top 5 field from table order by field asc)
order by field desc
This would get the 5th item, change the second top number to get a different nth item
SQL server only (I think) but should work on older versions that do not support ROW_NUMBER().
Verify it on SQL Server:
Select top 10 * From emp
EXCEPT
Select top 9 * From emp
This will give you 10th ROW of emp table!
Contrary to what some of the answers claim, the SQL standard is not silent regarding this subject.
Since SQL:2003, you have been able to use "window functions" to skip rows and limit result sets.
And in SQL:2008, a slightly simpler approach had been added, using
OFFSET skip ROWS
FETCH FIRST n ROWS ONLY
Personally, I don't think that SQL:2008's addition was really needed, so if I were ISO, I would have kept it out of an already rather large standard.
1 small change: n-1 instead of n.
select *
from thetable
limit n-1, 1
SQL SERVER
Select n' th record from top
SELECT * FROM (
SELECT
ID, NAME, ROW_NUMBER() OVER(ORDER BY ID) AS ROW
FROM TABLE
) AS TMP
WHERE ROW = n
select n' th record from bottom
SELECT * FROM (
SELECT
ID, NAME, ROW_NUMBER() OVER(ORDER BY ID DESC) AS ROW
FROM TABLE
) AS TMP
WHERE ROW = n
When we used to work in MSSQL 2000, we did what we called the "triple-flip":
EDITED
DECLARE #InnerPageSize int
DECLARE #OuterPageSize int
DECLARE #Count int
SELECT #Count = COUNT(<column>) FROM <TABLE>
SET #InnerPageSize = #PageNum * #PageSize
SET #OuterPageSize = #Count - ((#PageNum - 1) * #PageSize)
IF (#OuterPageSize < 0)
SET #OuterPageSize = 0
ELSE IF (#OuterPageSize > #PageSize)
SET #OuterPageSize = #PageSize
DECLARE #sql NVARCHAR(8000)
SET #sql = 'SELECT * FROM
(
SELECT TOP ' + CAST(#OuterPageSize AS nvarchar(5)) + ' * FROM
(
SELECT TOP ' + CAST(#InnerPageSize AS nvarchar(5)) + ' * FROM <TABLE> ORDER BY <column> ASC
) AS t1 ORDER BY <column> DESC
) AS t2 ORDER BY <column> ASC'
PRINT #sql
EXECUTE sp_executesql #sql
It wasn't elegant, and it wasn't fast, but it worked.
In Oracle 12c, You may use OFFSET..FETCH..ROWS option with ORDER BY
For example, to get the 3rd record from top:
SELECT *
FROM sometable
ORDER BY column_name
OFFSET 2 ROWS FETCH NEXT 1 ROWS ONLY;
Here is a fast solution of your confusion.
SELECT * FROM table ORDER BY `id` DESC LIMIT N, 1
Here You may get Last row by Filling N=0, Second last by N=1, Fourth Last By Filling N=3 and so on.
This is very common question over the interview and this is Very simple ans of it.
Further If you want Amount, ID or some Numeric Sorting Order than u may go for CAST function in MySQL.
SELECT DISTINCT (`amount`)
FROM cart
ORDER BY CAST( `amount` AS SIGNED ) DESC
LIMIT 4 , 1
Here By filling N = 4 You will be able to get Fifth Last Record of Highest Amount from CART table. You can fit your field and table name and come up with solution.
ADD:
LIMIT n,1
That will limit the results to one result starting at result n.
Oracle:
select * from (select foo from bar order by foo) where ROWNUM = x
For example, if you want to select every 10th row in MSSQL, you can use;
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY ColumnName1 ASC) AS rownumber, ColumnName1, ColumnName2
FROM TableName
) AS foo
WHERE rownumber % 10 = 0
Just take the MOD and change number 10 here any number you want.
For SQL Server, a generic way to go by row number is as such:
SET ROWCOUNT #row --#row = the row number you wish to work on.
For Example:
set rowcount 20 --sets row to 20th row
select meat, cheese from dbo.sandwich --select columns from table at 20th row
set rowcount 0 --sets rowcount back to all rows
This will return the 20th row's information. Be sure to put in the rowcount 0 afterward.
Here's a generic version of a sproc I recently wrote for Oracle that allows for dynamic paging/sorting - HTH
-- p_LowerBound = first row # in the returned set; if second page of 10 rows,
-- this would be 11 (-1 for unbounded/not set)
-- p_UpperBound = last row # in the returned set; if second page of 10 rows,
-- this would be 20 (-1 for unbounded/not set)
OPEN o_Cursor FOR
SELECT * FROM (
SELECT
Column1,
Column2
rownum AS rn
FROM
(
SELECT
tbl.Column1,
tbl.column2
FROM MyTable tbl
WHERE
tbl.Column1 = p_PKParam OR
tbl.Column1 = -1
ORDER BY
DECODE(p_sortOrder, 'A', DECODE(p_sortColumn, 1, Column1, 'X'),'X'),
DECODE(p_sortOrder, 'D', DECODE(p_sortColumn, 1, Column1, 'X'),'X') DESC,
DECODE(p_sortOrder, 'A', DECODE(p_sortColumn, 2, Column2, sysdate),sysdate),
DECODE(p_sortOrder, 'D', DECODE(p_sortColumn, 2, Column2, sysdate),sysdate) DESC
))
WHERE
(rn >= p_lowerBound OR p_lowerBound = -1) AND
(rn <= p_upperBound OR p_upperBound = -1);
But really, isn't all this really just parlor tricks for good database design in the first place? The few times I needed functionality like this it was for a simple one off query to make a quick report. For any real work, using tricks like these is inviting trouble. If selecting a particular row is needed then just have a column with a sequential value and be done with it.
Nothing fancy, no special functions, in case you use Caché like I do...
SELECT TOP 1 * FROM (
SELECT TOP n * FROM <table>
ORDER BY ID Desc
)
ORDER BY ID ASC
Given that you have an ID column or a datestamp column you can trust.
For SQL server, the following will return the first row from giving table.
declare #rowNumber int = 1;
select TOP(#rowNumber) * from [dbo].[someTable];
EXCEPT
select TOP(#rowNumber - 1) * from [dbo].[someTable];
You can loop through the values with something like this:
WHILE #constVar > 0
BEGIN
declare #rowNumber int = #consVar;
select TOP(#rowNumber) * from [dbo].[someTable];
EXCEPT
select TOP(#rowNumber - 1) * from [dbo].[someTable];
SET #constVar = #constVar - 1;
END;
LIMIT n,1 doesn't work in MS SQL Server. I think it's just about the only major database that doesn't support that syntax. To be fair, it isn't part of the SQL standard, although it is so widely supported that it should be. In everything except SQL server LIMIT works great. For SQL server, I haven't been able to find an elegant solution.
In Sybase SQL Anywhere:
SELECT TOP 1 START AT n * from table ORDER BY whatever
Don't forget the ORDER BY or it's meaningless.
T-SQL - Selecting N'th RecordNumber from a Table
select * from
(select row_number() over (order by Rand() desc) as Rno,* from TableName) T where T.Rno = RecordNumber
Where RecordNumber --> Record Number to Select
TableName --> To be Replaced with your Table Name
For e.g. to select 5 th record from a table Employee, your query should be
select * from
(select row_number() over (order by Rand() desc) as Rno,* from Employee) T where T.Rno = 5
SELECT
top 1 *
FROM
table_name
WHERE
column_name IN (
SELECT
top N column_name
FROM
TABLE
ORDER BY
column_name
)
ORDER BY
column_name DESC
I've written this query for finding Nth row.
Example with this query would be
SELECT
top 1 *
FROM
Employee
WHERE
emp_id IN (
SELECT
top 7 emp_id
FROM
Employee
ORDER BY
emp_id
)
ORDER BY
emp_id DESC
I'm a bit late to the party here but I have done this without the need for windowing or using
WHERE x IN (...)
SELECT TOP 1
--select the value needed from t1
[col2]
FROM
(
SELECT TOP 2 --the Nth row, alter this to taste
UE2.[col1],
UE2.[col2],
UE2.[date],
UE2.[time],
UE2.[UID]
FROM
[table1] AS UE2
WHERE
UE2.[col1] = ID --this is a subquery
AND
UE2.[col2] IS NOT NULL
ORDER BY
UE2.[date] DESC, UE2.[time] DESC --sorting by date and time newest first
) AS t1
ORDER BY t1.[date] ASC, t1.[time] ASC --this reverses the order of the sort in t1
It seems to work fairly fast although to be fair I only have around 500 rows of data
This works in MSSQL
SELECT * FROM emp a
WHERE n = (
SELECT COUNT( _rowid)
FROM emp b
WHERE a. _rowid >= b. _rowid
);
unbelievable that you can find a SQL engine executing this one ...
WITH sentence AS
(SELECT
stuff,
row = ROW_NUMBER() OVER (ORDER BY Id)
FROM
SentenceType
)
SELECT
sen.stuff
FROM sentence sen
WHERE sen.row = (ABS(CHECKSUM(NEWID())) % 100) + 1
select * from
(select * from ordered order by order_id limit 100) x order by
x.order_id desc limit 1;
First select top 100 rows by ordering in ascending and then select last row by ordering in descending and limit to 1. However this is a very expensive statement as it access the data twice.
It seems to me that, to be efficient, you need to 1) generate a random number between 0 and one less than the number of database records, and 2) be able to select the row at that position. Unfortunately, different databases have different random number generators and different ways to select a row at a position in a result set - usually you specify how many rows to skip and how many rows you want, but it's done differently for different databases. Here is something that works for me in SQLite:
select *
from Table
limit abs(random()) % (select count(*) from Words), 1;
It does depend on being able to use a subquery in the limit clause (which in SQLite is LIMIT <recs to skip>,<recs to take>) Selecting the number of records in a table should be particularly efficient, being part of the database's meta data, but that depends on the database's implementation. Also, I don't know if the query will actually build the result set before retrieving the Nth record, but I would hope that it doesn't need to. Note that I'm not specifying an "order by" clause. It might be better to "order by" something like the primary key, which will have an index - getting the Nth record from an index might be faster if the database can't get the Nth record from the database itself without building the result set.
Most suitable answer I have seen on this article for sql server
WITH myTableWithRows AS (
SELECT (ROW_NUMBER() OVER (ORDER BY myTable.SomeField)) as row,*
FROM myTable)
SELECT * FROM myTableWithRows WHERE row = 3

SQL query to fetch second last row or nth row from the table [duplicate]

I'm interested in learning some (ideally) database agnostic ways of selecting the nth row from a database table. It would also be interesting to see how this can be achieved using the native functionality of the following databases:
SQL Server
MySQL
PostgreSQL
SQLite
Oracle
I am currently doing something like the following in SQL Server 2005, but I'd be interested in seeing other's more agnostic approaches:
WITH Ordered AS (
SELECT ROW_NUMBER() OVER (ORDER BY OrderID) AS RowNumber, OrderID, OrderDate
FROM Orders)
SELECT *
FROM Ordered
WHERE RowNumber = 1000000
Credit for the above SQL: Firoz Ansari's Weblog
Update: See Troels Arvin's answer regarding the SQL standard. Troels, have you got any links we can cite?
There are ways of doing this in optional parts of the standard, but a lot of databases support their own way of doing it.
A really good site that talks about this and other things is http://troels.arvin.dk/db/rdbms/#select-limit.
Basically, PostgreSQL and MySQL supports the non-standard:
SELECT...
LIMIT y OFFSET x
Oracle, DB2 and MSSQL supports the standard windowing functions:
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY key ASC) AS rownumber,
columns
FROM tablename
) AS foo
WHERE rownumber <= n
(which I just copied from the site linked above since I never use those DBs)
Update: As of PostgreSQL 8.4 the standard windowing functions are supported, so expect the second example to work for PostgreSQL as well.
Update: SQLite added window functions support in version 3.25.0 on 2018-09-15 so both forms also work in SQLite.
PostgreSQL supports windowing functions as defined by the SQL standard, but they're awkward, so most people use (the non-standard) LIMIT / OFFSET:
SELECT
*
FROM
mytable
ORDER BY
somefield
LIMIT 1 OFFSET 20;
This example selects the 21st row. OFFSET 20 is telling Postgres to skip the first 20 records. If you don't specify an ORDER BY clause, there's no guarantee which record you will get back, which is rarely useful.
I'm not sure about any of the rest, but I know SQLite and MySQL don't have any "default" row ordering. In those two dialects, at least, the following snippet grabs the 15th entry from the_table, sorting by the date/time it was added:
SELECT *
FROM the_table
ORDER BY added DESC
LIMIT 1,15
(of course, you'd need to have an added DATETIME field, and set it to the date/time that entry was added...)
SQL 2005 and above has this feature built-in. Use the ROW_NUMBER() function. It is excellent for web-pages with a << Prev and Next >> style browsing:
Syntax:
SELECT
*
FROM
(
SELECT
ROW_NUMBER () OVER (ORDER BY MyColumnToOrderBy) AS RowNum,
*
FROM
Table_1
) sub
WHERE
RowNum = 23
I suspect this is wildly inefficient but is quite a simple approach, which worked on a small dataset that I tried it on.
select top 1 field
from table
where field in (select top 5 field from table order by field asc)
order by field desc
This would get the 5th item, change the second top number to get a different nth item
SQL server only (I think) but should work on older versions that do not support ROW_NUMBER().
Verify it on SQL Server:
Select top 10 * From emp
EXCEPT
Select top 9 * From emp
This will give you 10th ROW of emp table!
Contrary to what some of the answers claim, the SQL standard is not silent regarding this subject.
Since SQL:2003, you have been able to use "window functions" to skip rows and limit result sets.
And in SQL:2008, a slightly simpler approach had been added, using
OFFSET skip ROWS
FETCH FIRST n ROWS ONLY
Personally, I don't think that SQL:2008's addition was really needed, so if I were ISO, I would have kept it out of an already rather large standard.
1 small change: n-1 instead of n.
select *
from thetable
limit n-1, 1
SQL SERVER
Select n' th record from top
SELECT * FROM (
SELECT
ID, NAME, ROW_NUMBER() OVER(ORDER BY ID) AS ROW
FROM TABLE
) AS TMP
WHERE ROW = n
select n' th record from bottom
SELECT * FROM (
SELECT
ID, NAME, ROW_NUMBER() OVER(ORDER BY ID DESC) AS ROW
FROM TABLE
) AS TMP
WHERE ROW = n
When we used to work in MSSQL 2000, we did what we called the "triple-flip":
EDITED
DECLARE #InnerPageSize int
DECLARE #OuterPageSize int
DECLARE #Count int
SELECT #Count = COUNT(<column>) FROM <TABLE>
SET #InnerPageSize = #PageNum * #PageSize
SET #OuterPageSize = #Count - ((#PageNum - 1) * #PageSize)
IF (#OuterPageSize < 0)
SET #OuterPageSize = 0
ELSE IF (#OuterPageSize > #PageSize)
SET #OuterPageSize = #PageSize
DECLARE #sql NVARCHAR(8000)
SET #sql = 'SELECT * FROM
(
SELECT TOP ' + CAST(#OuterPageSize AS nvarchar(5)) + ' * FROM
(
SELECT TOP ' + CAST(#InnerPageSize AS nvarchar(5)) + ' * FROM <TABLE> ORDER BY <column> ASC
) AS t1 ORDER BY <column> DESC
) AS t2 ORDER BY <column> ASC'
PRINT #sql
EXECUTE sp_executesql #sql
It wasn't elegant, and it wasn't fast, but it worked.
In Oracle 12c, You may use OFFSET..FETCH..ROWS option with ORDER BY
For example, to get the 3rd record from top:
SELECT *
FROM sometable
ORDER BY column_name
OFFSET 2 ROWS FETCH NEXT 1 ROWS ONLY;
Here is a fast solution of your confusion.
SELECT * FROM table ORDER BY `id` DESC LIMIT N, 1
Here You may get Last row by Filling N=0, Second last by N=1, Fourth Last By Filling N=3 and so on.
This is very common question over the interview and this is Very simple ans of it.
Further If you want Amount, ID or some Numeric Sorting Order than u may go for CAST function in MySQL.
SELECT DISTINCT (`amount`)
FROM cart
ORDER BY CAST( `amount` AS SIGNED ) DESC
LIMIT 4 , 1
Here By filling N = 4 You will be able to get Fifth Last Record of Highest Amount from CART table. You can fit your field and table name and come up with solution.
ADD:
LIMIT n,1
That will limit the results to one result starting at result n.
Oracle:
select * from (select foo from bar order by foo) where ROWNUM = x
For example, if you want to select every 10th row in MSSQL, you can use;
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY ColumnName1 ASC) AS rownumber, ColumnName1, ColumnName2
FROM TableName
) AS foo
WHERE rownumber % 10 = 0
Just take the MOD and change number 10 here any number you want.
For SQL Server, a generic way to go by row number is as such:
SET ROWCOUNT #row --#row = the row number you wish to work on.
For Example:
set rowcount 20 --sets row to 20th row
select meat, cheese from dbo.sandwich --select columns from table at 20th row
set rowcount 0 --sets rowcount back to all rows
This will return the 20th row's information. Be sure to put in the rowcount 0 afterward.
Here's a generic version of a sproc I recently wrote for Oracle that allows for dynamic paging/sorting - HTH
-- p_LowerBound = first row # in the returned set; if second page of 10 rows,
-- this would be 11 (-1 for unbounded/not set)
-- p_UpperBound = last row # in the returned set; if second page of 10 rows,
-- this would be 20 (-1 for unbounded/not set)
OPEN o_Cursor FOR
SELECT * FROM (
SELECT
Column1,
Column2
rownum AS rn
FROM
(
SELECT
tbl.Column1,
tbl.column2
FROM MyTable tbl
WHERE
tbl.Column1 = p_PKParam OR
tbl.Column1 = -1
ORDER BY
DECODE(p_sortOrder, 'A', DECODE(p_sortColumn, 1, Column1, 'X'),'X'),
DECODE(p_sortOrder, 'D', DECODE(p_sortColumn, 1, Column1, 'X'),'X') DESC,
DECODE(p_sortOrder, 'A', DECODE(p_sortColumn, 2, Column2, sysdate),sysdate),
DECODE(p_sortOrder, 'D', DECODE(p_sortColumn, 2, Column2, sysdate),sysdate) DESC
))
WHERE
(rn >= p_lowerBound OR p_lowerBound = -1) AND
(rn <= p_upperBound OR p_upperBound = -1);
But really, isn't all this really just parlor tricks for good database design in the first place? The few times I needed functionality like this it was for a simple one off query to make a quick report. For any real work, using tricks like these is inviting trouble. If selecting a particular row is needed then just have a column with a sequential value and be done with it.
Nothing fancy, no special functions, in case you use Caché like I do...
SELECT TOP 1 * FROM (
SELECT TOP n * FROM <table>
ORDER BY ID Desc
)
ORDER BY ID ASC
Given that you have an ID column or a datestamp column you can trust.
For SQL server, the following will return the first row from giving table.
declare #rowNumber int = 1;
select TOP(#rowNumber) * from [dbo].[someTable];
EXCEPT
select TOP(#rowNumber - 1) * from [dbo].[someTable];
You can loop through the values with something like this:
WHILE #constVar > 0
BEGIN
declare #rowNumber int = #consVar;
select TOP(#rowNumber) * from [dbo].[someTable];
EXCEPT
select TOP(#rowNumber - 1) * from [dbo].[someTable];
SET #constVar = #constVar - 1;
END;
LIMIT n,1 doesn't work in MS SQL Server. I think it's just about the only major database that doesn't support that syntax. To be fair, it isn't part of the SQL standard, although it is so widely supported that it should be. In everything except SQL server LIMIT works great. For SQL server, I haven't been able to find an elegant solution.
In Sybase SQL Anywhere:
SELECT TOP 1 START AT n * from table ORDER BY whatever
Don't forget the ORDER BY or it's meaningless.
T-SQL - Selecting N'th RecordNumber from a Table
select * from
(select row_number() over (order by Rand() desc) as Rno,* from TableName) T where T.Rno = RecordNumber
Where RecordNumber --> Record Number to Select
TableName --> To be Replaced with your Table Name
For e.g. to select 5 th record from a table Employee, your query should be
select * from
(select row_number() over (order by Rand() desc) as Rno,* from Employee) T where T.Rno = 5
SELECT
top 1 *
FROM
table_name
WHERE
column_name IN (
SELECT
top N column_name
FROM
TABLE
ORDER BY
column_name
)
ORDER BY
column_name DESC
I've written this query for finding Nth row.
Example with this query would be
SELECT
top 1 *
FROM
Employee
WHERE
emp_id IN (
SELECT
top 7 emp_id
FROM
Employee
ORDER BY
emp_id
)
ORDER BY
emp_id DESC
I'm a bit late to the party here but I have done this without the need for windowing or using
WHERE x IN (...)
SELECT TOP 1
--select the value needed from t1
[col2]
FROM
(
SELECT TOP 2 --the Nth row, alter this to taste
UE2.[col1],
UE2.[col2],
UE2.[date],
UE2.[time],
UE2.[UID]
FROM
[table1] AS UE2
WHERE
UE2.[col1] = ID --this is a subquery
AND
UE2.[col2] IS NOT NULL
ORDER BY
UE2.[date] DESC, UE2.[time] DESC --sorting by date and time newest first
) AS t1
ORDER BY t1.[date] ASC, t1.[time] ASC --this reverses the order of the sort in t1
It seems to work fairly fast although to be fair I only have around 500 rows of data
This works in MSSQL
SELECT * FROM emp a
WHERE n = (
SELECT COUNT( _rowid)
FROM emp b
WHERE a. _rowid >= b. _rowid
);
unbelievable that you can find a SQL engine executing this one ...
WITH sentence AS
(SELECT
stuff,
row = ROW_NUMBER() OVER (ORDER BY Id)
FROM
SentenceType
)
SELECT
sen.stuff
FROM sentence sen
WHERE sen.row = (ABS(CHECKSUM(NEWID())) % 100) + 1
select * from
(select * from ordered order by order_id limit 100) x order by
x.order_id desc limit 1;
First select top 100 rows by ordering in ascending and then select last row by ordering in descending and limit to 1. However this is a very expensive statement as it access the data twice.
It seems to me that, to be efficient, you need to 1) generate a random number between 0 and one less than the number of database records, and 2) be able to select the row at that position. Unfortunately, different databases have different random number generators and different ways to select a row at a position in a result set - usually you specify how many rows to skip and how many rows you want, but it's done differently for different databases. Here is something that works for me in SQLite:
select *
from Table
limit abs(random()) % (select count(*) from Words), 1;
It does depend on being able to use a subquery in the limit clause (which in SQLite is LIMIT <recs to skip>,<recs to take>) Selecting the number of records in a table should be particularly efficient, being part of the database's meta data, but that depends on the database's implementation. Also, I don't know if the query will actually build the result set before retrieving the Nth record, but I would hope that it doesn't need to. Note that I'm not specifying an "order by" clause. It might be better to "order by" something like the primary key, which will have an index - getting the Nth record from an index might be faster if the database can't get the Nth record from the database itself without building the result set.
Most suitable answer I have seen on this article for sql server
WITH myTableWithRows AS (
SELECT (ROW_NUMBER() OVER (ORDER BY myTable.SomeField)) as row,*
FROM myTable)
SELECT * FROM myTableWithRows WHERE row = 3

query optimization (nested subqueries)

I try to simplify below subqueries to improve select statement. I have table with 3 basic columns as ID, GRAGE and AGE. To select all records which have GRADE same as GRADE of Maximum ID
Might somebody have better way that create nested subqueries, welcome all your suggestions?
Note: My apologise for formatting the table
ID GRADE AGE
10 A 30
12 B 45
13 A 15
09 B 14
20 A 12
SELECT
*
FROM
TABLE
WHERE
GRADE = (
SELECT
grade
FROM
TABLE
WHERE
id = (SELECT MAX(id) FROM TABLE)
);
You could use a CTE to make the query easier to read:
WITH cte AS
(
SELECT GRADE,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ID DESC) RowNum
FROM yourTable
)
SELECT *
FROM yourTable
WHERE GRADE = (SELECT t.GRADE FROM cte t WHERE t.RowNum = 1)
However, I don't have a problem with your original approach because the subqueries are not correlated to anything. What I mean by this is that
SELECT MAX(id) FROM yourTable
should effectively only be executed once, and afterwards sort of be treated as a constant. Similarly, the query
SELECT grade FROM TABLE WHERE id = (max from above query)
should also be executed only once. This assumes that the query optimizer is smart enough to figure this out, which it probably is.
You can do the following (not much simpler though):
SELECT
*
FROM
TABLE
WHERE
GRADE IN (
SELECT
first_value (GRADE) over (ORDER BY id DESC)
FROM
TABLE
)

Permuting values in SQL

Let's say I have a table with two columns:
id | value
----------
1 | 101
2 | 356
3 | 28
I need to randomly permute the value column so that each id is randomly assigned a new value from the existing set {101,356,28}. How could I do this in Oracle SQL?
It may sound odd but this is a real problem, just with more columns.
You can do this by using row_number() with a random number generator and then joining back to the original rows:
with cte as (
select id, value,
row_number() over (order by id) as i,
row_number() over (order by dbms_random.random) as rand_i
from table t
)
select cte.id, cte1.value
from cte join
cte cte1
on cte.i = cte.rand_i;
This guarantees a permutation (i.e. no original row has its value used twice).
EDIT:
By the way, if the original ids are sequential from 1 and have no gaps, you could just do:
select row_number() over (order by dbms.random) as id, value
from table t;
An Option : select * from x_table where id = round(dbms_random.value() * 3) + 1; [Here 3 is the number of rows in your random data table and I am assuming that id is incremental and unique?]
I'll think of other options.
I'm not sure whether this is the right task for SQL database. Maybe you should implement something like this:
Factoradic permutation - in PL/SQL and then return a cursor via PIPE ROW construct. Ordering by dbms.random might be slow for large data sets.

How can I get a random cartesian product in PostgreSQL?

I have two tables, custassets and tags. To generate some test data I'd like to do an INSERT INTO a many-to-many table with a SELECT that gets random rows from each (so that a random primary key from one table is paired with a random primary key from the second). To my surprise this isn't as easy as I first thought, so I'm persisting with this to teach myself.
Here's my first attempt. I select 10 custassets and 3 tags, but both are the same in each case. I'd be fine with the first table being fixed, but I'd like to randomise the tags assigned.
SELECT
custassets_rand.id custassets_id,
tags_rand.id tags_rand_id
FROM
(
SELECT id FROM custassets WHERE defunct = false ORDER BY RANDOM() LIMIT 10
) AS custassets_rand
,
(
SELECT id FROM tags WHERE defunct = false ORDER BY RANDOM() LIMIT 3
) AS tags_rand
This produces:
custassets_id | tags_rand_id
---------------+--------------
9849 | 3322 }
9849 | 4871 } this pattern of tag PKs is repeated
9849 | 5188 }
12145 | 3322
12145 | 4871
12145 | 5188
17837 | 3322
17837 | 4871
17837 | 5188
....
I then tried the following approach: doing the second RANDOM() call in the SELECT column list. However this one was worse, as it chooses a single tag PK and sticks with it.
SELECT
custassets_rand.id custassets_id,
(SELECT id FROM tags WHERE defunct = false ORDER BY RANDOM() LIMIT 1) tags_rand_id
FROM
(
SELECT id FROM custassets WHERE defunct = false ORDER BY RANDOM() LIMIT 30
) AS custassets_rand
Result:
custassets_id | tags_rand_id
---------------+--------------
16694 | 1537
14204 | 1537
23823 | 1537
34799 | 1537
36388 | 1537
....
This would be easy in a scripting language, and I'm sure can be done quite easily with a stored procedure or temporary table. But can I do it just with a INSERT INTO SELECT?
I did think of choosing integer primary keys using a random function, but unfortunately the primary keys for both tables have gaps in the increment sequences (and so an empty row might be chosen in each table). That would have been fine otherwise!
Note that what you are looking for is not a Cartesian product, which would produce n*m rows; rather a random 1:1 association, which produces GREATEST(n,m) rows.
To produce truly random combinations, it's enough to randomize rn for the bigger set:
SELECT c_id, t_id
FROM (
SELECT id AS c_id, row_number() OVER (ORDER BY random()) AS rn
FROM custassets
) x
JOIN (SELECT id AS t_id, row_number() OVER () AS rn FROM tags) y USING (rn);
If arbitrary combinations are good enough, this is faster (especially for big tables):
SELECT c_id, t_id
FROM (SELECT id AS c_id, row_number() OVER () AS rn FROM custassets) x
JOIN (SELECT id AS t_id, row_number() OVER () AS rn FROM tags) y USING (rn);
If the number of rows in both tables do not match and you do not want to lose rows from the bigger table, use the modulo operator % to join rows from the smaller table multiple times:
SELECT c_id, t_id
FROM (
SELECT id AS c_id, row_number() OVER () AS rn
FROM custassets -- table with fewer rows
) x
JOIN (
SELECT id AS t_id, (row_number() OVER () % small.ct) + 1 AS rn
FROM tags
, (SELECT count(*) AS ct FROM custassets) AS small
) y USING (rn);
Window functions were added with PostgreSQL 8.4.
WITH a_ttl AS (
SELECT count(*) AS ttl FROM custassets c),
b_ttl AS (
SELECT count(*) AS ttl FROM tags),
rows AS (
SELECT gs.*
FROM generate_series(1,
(SELECT max(ttl) AS ttl FROM
(SELECT ttl FROM a_ttl UNION SELECT ttl FROM b_ttl) AS m))
AS gs(row)),
tab_a_rand AS (
SELECT custassets_id, row_number() OVER (order by random()) as row
FROM custassets),
tab_b_rand AS (
SELECT id, row_number() OVER (order by random()) as row
FROM tags)
SELECT a.custassets_id, b.id
FROM rows r
JOIN a_ttl ON 1=1 JOIN b_ttl ON 1=1
LEFT JOIN tab_a_rand a ON a.row = (r.row % a_ttl.ttl)+1
LEFT JOIN tab_b_rand b ON b.row = (r.row % b_ttl.ttl)+1
ORDER BY 1,2;
You can test this query on SQL Fiddle.
Here is a different approach to pick a single combination from 2 tables by random, assuming two tables a and b, both with primary key id. The tables needn't be of same size, and the second row is independently chosen from the first, which might not be that important for testdata.
SELECT * FROM a, b
WHERE a.id = (
SELECT id
FROM a
OFFSET (
SELECT random () * (SELECT count(*) FROM a)
)
LIMIT 1)
AND b.id = (
SELECT id
FROM b
OFFSET (
SELECT random () * (SELECT count(*) FROM b)
)
LIMIT 1);
Tested with two tables, one of size 7000 rows, one with 100k rows, result: immediately. For more than one result, you have to call the query repeatedly - increasing the LIMIT and changing x.id = to x.id IN would produce (aA, aB, bA, bB) result patterns.
It bugs me that after all these years of relational databases, there doesn't seem to be very good cross database ways of doing things like this. The MSDN article http://msdn.microsoft.com/en-us/library/cc441928.aspx seems to have some interesting ideas, but of course that's not PostgreSQL. And even then, their solution requires a single pass, when I'd think it ought to be able to be done without the scan.
I can imagine a few ways that might work without a pass (in selection), but it would involve creating another table that maps your table's primary keys to random numbers (or to linear sequences that you later randomly select, which in some ways may actually be better), and of course, that may have issues as well.
I realize this is probably a non-useful comment, I just felt I needed to rant a bit.
If you just want to get a random set of rows from each side, use a pseudo-random number generator. I would use something like:
select *
from (select a.*, row_number() over (order by NULL) as rownum -- NULL may not work, "(SELECT NULL)" works in MSSQL
from a
) a cross join
(select b.*, row_number() over (order by NULL) as rownum
from b
) b
where a.rownum <= 30 and b.rownum <= 30
This is doing a Cartesian product, which returns 900 rows assuming a and b each have at least 30 rows.
However, I interpreted your question as getting random combinations. Once again, I'd go for the pseudo-random approach.
select *
from (select a.*, row_number() over (order by NULL) as rownum -- NULL may not work, "(SELECT NULL)" works in MSSQL
from a
) a cross join
(select b.*, row_number() over (order by NULL) as rownum
from b
) b
where modf(a.rownum*107+b.rownum*257+17, 101) < <some vaue>
This let's you get combinations among arbitrary rows.
Just a plain carthesian product ON random() appears to work reasonably well. Simple comme bonjour...
-- Cartesian product
-- EXPLAIN ANALYZE
INSERT INTO dirgraph(point_from,point_to,costs)
SELECT p1.the_point , p2.the_point, (1000*random() ) +1
FROM allpoints p1
JOIN allpoints p2 ON random() < 0.002
;