How to "pick" random records with T-SQL

How to "pick" random records with T-SQL - sql

This is a simple question that is actually hard to answer, because the "picking" has a special meaning.
I need to give three random picks for each person (and give pick/row number of 1, 2, and 3). What makes it hard is that the persons and picks are from different tables and there is no logical joining between the person and picks.
The closest I can get is:
SELECT TOP 15 database_id, create_date, RowNo, cs.name FROM sys.databases
CROSS apply (
SELECT top 3 Row_number()OVER(ORDER BY (SELECT NULL)) AS RowNo,*
FROM (SELECT top 3 name from sys.all_views ORDER BY NEWID()) T
) cs
I know the above is not person and picks, but it a working SQL that anyone can test it out without creating person and picks tables first. And,
It illustrates the problem I'm facing --
the above SQL will give each person the same picks, whereas I need to give different person different picks.
How to do that? Thx.

Adding a correlated condition inside the CROSS APPLY will solve your problem
SELECT TOP 15 database_id,
create_date,
RowNo,
cs.NAME
FROM sys.databases d
CROSS apply (SELECT TOP 3 Row_number() OVER(ORDER BY (SELECT NULL)) AS RowNo, *
FROM (SELECT TOP 3 NAME
FROM sys.all_views v
WHERE d.NAME = d.NAME --Here
ORDER BY Newid()) T) cs
Check the alias name in Where clause both LHS and RHS are from same table and same column it is just to execute the sub-query for each row in databases table

Modifying your own answer slightly will do the job. Check this.
SELECT TOP 15 database_id, create_date, RowNo, cs.name FROM sys.databases
CROSS apply (
SELECT top 3 Row_number()OVER(ORDER BY NEWID()) AS RowNo,*
FROM (SELECT top 3 name from sys.all_views ORDER BY NEWID()) T
) cs
The only change that I have done is to replace the NULL with NEWID().

Random function is available in Sql language could use it to randomly pick a record id in a range of record ids found in the source table.
In a newer sql version I think this may work but I can't test it currently. Older sql version will not support the by rand() command. Try this let me know if it works. Later this week I can get you something that will work on sql 2000 and up. I had to do this years ago. Let me know if this works on your Sql 2008.
select top 3 from table_name order by rand()

Related

How to join in SQL-SERVER

I am trying to learn SQL-SERVER and I have created the below query:
WITH T AS
(
SELECT ROW_NUMBER() OVER(ORDER BY d.DIALOG_ID) as row_num, *
FROM test.db as d
INNER JOIN test.dbs as ds
ON d.DIALOG_ID = ds.DIALOG_ID
)
SELECT *
FROM T
WHERE row_num <=10;
I found that the only way to limit is with ROW_NUMBER().
Although when I try to run the join I have this error:
org.jkiss.dbeaver.model.sql.DBSQLException: SQL Error [8156] [S0001]: The column 'DIALOG_ID' was specified multiple times for 'T'.

The problem: In the WITH, you do SELECT * which gets all columns from both tables db and dbs. Both have a column DIALOG_ID, so a column by that name ends up twice in the result set of the WITH.
Although until here that is all allowed, it is not good practice: why have the same data twice?
Things go wrong when SQL Server has to determine what SELECT * FROM T means: it expands SELECT * to the actual columns of T, but it finds a duplicate column name, and then it refuses to continue.
The fix (and also highly recommended in general): be specific about the columns that you want to output. If T has no duplicate columns, then SELECT * FROM T will succeed.
Note that the even-more-pure variant is to also be specific about what columns you select from T. By doing that it becomes clear at a glance what the SELECT produces, instead of having to guess or investigate when you look at the query later on (or when someone else does).
The updated code would look like this (fill in your column names as we don't know them):
WITH T AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY d.DIALOG_ID) as row_num,
d.DIALOG_ID, d.SOME_OTHER_COL,
ds.DS_ID, ds.SOME_OTHER_COL_2
FROM test.db AS d
INNER JOIN test.dbs AS ds ON d.DIALOG_ID = ds.DIALOG_ID
)
SELECT row_num, DIALOG_ID, SOME_OTHER_COL, DS_ID, SOME_OTHER_COL_2
FROM T
WHERE row_num <= 10;

WITH T AS
(
SELECT ROW_NUMBER() OVER(ORDER BY d.DIALOG_ID) as row_num, d.*
FROM test.db as d
INNER JOIN test.dbs as ds
ON d.DIALOG_ID = ds.DIALOG_ID
)
SELECT *
FROM T
WHERE row_num <=10;

get ROW NUMBER of random records

For a simple SQL like,
SELECT top 3 MyId FROM MyTable ORDER BY NEWID()
how to add row numbers to them so that the row numbers become 1,2, and 3?
UPDATE:
I thought I can simplify my question as above, but it turns out to be more complicated. So here is a fuller version -- I need to give three random picks (from MyTable) for each person, with pick/row number of 1, 2, and 3, and there is no logical joining between person and picks.
SELECT * FROM Person
LEFT JOIN (
SELECT top 3 MyId FROM MyTable ORDER BY NEWID()
) D ON 1=1
The problem with above SQL are,
Obviously, pick/row number of 1, 2, and 3 should be added
and what is not obvious is that, the above SQL will give each person the same picks, whereas I need to give different person different picks
Here is a working SQL to test it out:
SELECT TOP 15 database_id, create_date, cs.name FROM sys.databases
CROSS apply (
SELECT top 3 Row_number()OVER(ORDER BY (SELECT NULL)) AS RowNo,*
FROM (SELECT top 3 name from sys.all_views ORDER BY NEWID()) T
) cs
So, Please help.
NOTE: This is NOT about MySQL byt T-SQL as their syntax are different, Thus the solution is different as well.

Add Row_number to outer query. Try this
SELECT Row_number()OVER(ORDER BY (SELECT NULL)),*
FROM (SELECT TOP 3 MyId
FROM MyTable
ORDER BY Newid()) a
Logically TOP keyword is processed after Select. After Row Number is generated random 3 records will be pulled. So you should not generate Row Number in original query
Update
It can be achieved through CROSS APPLY. Replace the column names inside cross apply where clause with valid column name from Person table
SELECT *
FROM Person p
CROSS apply (SELECT Row_number()OVER(ORDER BY (SELECT NULL)) rn,*
FROM (SELECT TOP 3 MyId
FROM MyTable
WHERE p.some_col = p.some_col -- Replace it with some column from person table
ORDER BY Newid())a) cs

MSSQL 2008 SP pagination and count number of total records

In my SP I have the following:
with Paging(RowNo, ID, Name, TotalOccurrences) as
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
SELECT * FROM Paging WHERE RowNo BETWEEN 1 and 50
SELECT COUNT(*) FROM Paging
The result is that I get the error: invalid object name 'Paging'.
Can I query again the Paging table? I don't want to include the count for all results as a new column ... I would prefer to return as another data set. Is that possible?
Thanks, Radu

After more research I fond another way of doing this:
with Paging(RowNo, ID, Name, TotalOccurrences) AS
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
select RowNo, ID, Name, TotalOccurrences, (select COUNT(*) from Paging) as TotalResults from Paging where RowNo between (#PageNumber - 1 )* #PageSize + 1 and #PageNumber * #PageSize;
I think that this has better performance than calling two times the query.

You can't do that because the CTE you are defining will only be available to the FIRST query that appears after it's been defined. So when you run the COUNT(*) query, the CTE is no longer available to reference. That's just a limitation of CTEs.
So to do the COUNT as a separate step, you'd need to not use the CTE and instead use the full query to COUNT on.
Or, you could wrap the CTE up in an inline table valued function and use that instead, to save repeating the main query, something like this:
CREATE FUNCTION dbo.ufnExample()
RETURNS TABLE
AS
RETURN
(
with Paging(RowNo, ID, Name, TotalOccurrences) as
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
SELECT * FROM Paging
)
SELECT * FROM dbo.ufnExample() x WHERE RowNo BETWEEN 1 AND 50
SELECT COUNT(*) FROM dbo.ufnExample() x

Please be aware that Radu D's solution's query plan shows double hits to those tables. It is doing two executions under the covers. However, this still may be the best way as I haven't found a truly scalable 1-query design.
A less scalable 1-query design is to dump a completed ordered list into a #tablevariable , SELECT ##ROWCOUNT to get the full count, and select from #tablevariable where row number between X and Y. This works well for <10000 rows, but with results in the millions of rows, populating that #tablevariable gets expensive.
A hybrid approach is to populate this temp/variable up to 10000 rows. If not all 10000 rows are filled up, you're set. If 10000 rows are filled up, you'll need to rerun the search to get the full count. This works well if most of your queries return well under 10000 rows. The 10000 limit is a rough approximation, you can play around with this threshold for your case.

Write "AS" after the CTE table name Paging as below:
with Paging AS (RowNo, ID, Name, TotalOccurrences) as
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
SELECT * FROM Paging WHERE RowNo BETWEEN 1 and 50
SELECT COUNT(*) FROM Paging

Problem using ROW_NUMBER() to get records randomly (SQL Server 2005)

I want to get 1000 records from a table randomly, so I use:
SELECT top 1000
mycol1
, mycol2
, ROW_NUMBER() OVER (ORDER BY NEWID()) rn
FROM mytable
However, I don't want to see rn in my resultset, so I do:
SELECT mycol1
, mycol2
FROM (
SELECT top 1000
mycol1
, mycol2
, ROW_NUMBER() OVER (ORDER BY NEWID()) rn
FROM mytable
) a
When I do this, the results do not come randomly anymore. They come as if I just said top 10000 without randomization using row_number().
When I change the query to
SELECT mycol1
, mycol2
, rn
FROM (
SELECT top 1000
mycol1
, mycol2
, ROW_NUMBER() OVER (ORDER BY NEWID()) rn
FROM mytable
) a
they are random again.
I guess sql server does some kind of optimization, saying "hey, this guy doesn't need the column rn anyway, so just ignore it". But this results to an unexpected behavior in this case. Is there any way to avoid this?
PS: I use the ROW_NUMBER() trick because mytable has 10 mio. rows and
SELECT top 10000 *
FROM mytable
ORDER BY NEWID()
runs forever, whereas with ROW_NUMBER() it takes only up to 30 secs.

You could also try using the rn field in some petty where clause like
WHERE rn > 0 in your outer query which would maybe force the compiler to bring the RN field through.
Also I think your overall query is going to be an issue if you want to randomly sample your entire millions of records. This will only grab the "first off disk" block of records which while not guaranteed to be the same will more often than not be the same 10000.
I would suggest creating a set of 10,000 random numbers between MIN(PrimaryKey) and the MAX(PrimaryKey) and then doing a WHERE PrimaryKey IN (...) or similar

Add something like Where rn Is Not Null to the outer query so rn it is included in query plan and not optimised out

I was struggling with this same problem. I solved it with CROSS APPLY and TOP. Keeping in mind that CROSS APPLY pulls my outer table into scope for the derived table, I knew there had to be a way to do this.
The following code results in 3(*) random related products being added based on the manufacturer.
INSERT INTO ProductGroup (
ParentId,
ChildId
)
SELECT DISTINCT
P.ProductId,
CandidateInner.ChildId
FROM ProductRelated PR
JOIN Product P
ON PR.ChildId = P.ProductId
CROSS APPLY
(
SELECT DISTINCT TOP 3
NewId() AS RandId,
Product.ManufacturerId,
ProductRelated.ChildId
FROM ProductRelated
JOIN Product
ON Product.ProductId = ProductRelated.ChildId
WHERE ManufacturerId IS NOT NULL
AND Product.ManufacturerId = P.ManufacturerId
ORDER BY NewId()
) CandidateInner
LEFT JOIN (
SELECT DISTINCT TOP 100 PERCENT
ParentId,
COUNT(DISTINCT ChildId) AS Ct
FROM ProductGroup
GROUP BY ParentId
HAVING COUNT(DISTINCT ChildId) >= 3
) AlreadyGrouped
ON P.ProductId = AlreadyGrouped.ParentId
WHERE P.ProductId <> CandidateInner.ChildId
AND AlreadyGrouped.ParentId IS NULL
ORDER BY P.ProductId
*Note that this will insert fewer than 3 in the following 2 cases:
1) Where there are < 3 products related by manufacturer
2) (Problematic) Where the random top 3 returns the same product to itself.
(1) above is unavoidable.
The way I handled (2) above was to run this twice then delete duplicates. This is still not 100%, but statistically, it's more than sufficient for my requirement. This is in a nightly-run script, but I still like the speediness of having the <> outside of the CROSS APPLY - anything pulling that check in scope results in scans of the derived tables resulting from the manufacturer join, even though pulling it inside will mean that (2) is no longer an issue, but it's painfully slow vs. instant with proper indexes.

Selecting Nth Record in an SQL Query

I have an SQL Query that i'm running but I only want to select a specific row. For example lets say my query was:
Select * from Comments
Lets say this returns 10 rows, I only want to select the 8th record returned by this query. I know I can do:
Select Top 5 * from Comments
To get the top 5 records of that query but I only want to select a certain record, is there anything I can put into this query to do that (similar to top).
Thanks
jack

This is a classic interview question.
In Ms SQL 2005+ you can use the ROW_NUMBER() keyword and have the Predicate ROW_NUMBER = n
USE AdventureWorks;
GO
WITH OrderedOrders AS
(
SELECT SalesOrderID, OrderDate,
ROW_NUMBER() OVER (ORDER BY OrderDate) AS 'RowNumber'
FROM Sales.SalesOrderHeader
)
SELECT *
FROM OrderedOrders
WHERE RowNumber = 5;
In SQL2000 you could do something like
SELECT Top 1 *FROM
[tblApplications]
where [ApplicationID] In
(
SELECT TOP 5 [ApplicationID]
FROM [dbo].[tblApplications]
order by applicationId Desc
)

How about
SELECT TOP 1 * FROM
(SELECT TOP 8 * FROM Comments ORDER BY foo ASC)
ORDER BY foo DESC

First, you should say which RDBMS you're using.
Second, you should give careful thought to what it is you're trying to accomplish. Relational Databases are set-based. In general, the order of elements in a set does not matter. You'll want to ask why it matters in this case, then see if there's a better way to embed the concept of order into the query itself.
For instance, in SQL Server 2005 (and other RDBMS), you can use the ROW_NUMBER function to assign a sequential number to each row returned, based on the criteria you specify. You could then select rows based on the row number. Example from Books Online:
USE AdventureWorks;
GO
WITH OrderedOrders AS
(
SELECT SalesOrderID, OrderDate,
ROW_NUMBER() OVER (ORDER BY OrderDate) AS 'RowNumber'
FROM Sales.SalesOrderHeader
)
SELECT *
FROM OrderedOrders
WHERE RowNumber BETWEEN 50 AND 60;

SELECT * FROM comments WHERE ...conditions... LIMIT 1 OFFSET 8
OFFSET is a good thing for MySQL

For SQL Server 2005:
select rank() OVER (ORDER BY c.subject, c.date) as rank, c.subject, c.date
from comments c
where rank = 8

Well, in T-SQL (the dialect for SQL Server) you can do the following:
SELECT TOP 1 *
FROM (SELECT TOP 8 *
FROM Table
ORDER
BY SortField)
ORDER
BY SortField DESC
This way you get the 8th record.

I have read the question & your comments on you would want next 3 blog comments etc.
How is your tables structured?
Assume that you have blog post Id & comment Id is generated in ascending order for each blog post, you could do a SELECT based on the current Id.
e.g. if the blogpostId = 101, you get the top 3 comments order by posted Id. Now lets say, you want to get the next 3 comments - you could do a SELECT WHERE commentId between the last comment id shown TO the comment id - 3
But all that depends on how your tables are defined.

In SQL 2000 where you do not have ROW_NUMBER() function you could use a work-around like this:
SELECT CommentsTableFieldList, IDENTITY(INT, 1,1) as seqNo
INTO #SeqComments
FROM Comments
SELECT * FROM #SeqComments
WHERE seqNo = 8

select top 1 *
from TableName
where ColumnName1 in
(
select top nth ColumnName1
from TableName
order by ColumnName1 desc
)
order by ColumnName1 desc

From the SELECT reference, use the LIMIT keyword:
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows
Note: this is for MySQL, other SQL engines may have a different keyword.

Select from tablename limit nthrow,1;

try This
Let us assume , We want select 5th row of WC_Video Table
And
Select * from (Select Row_Number() over (Order by Uploadedon) as 'rownumber',* from Wc_Video )as Temp where rownumber=5

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to "pick" random records with T-SQL - sql

Related

How to join in SQL-SERVER

get ROW NUMBER of random records

MSSQL 2008 SP pagination and count number of total records

Problem using ROW_NUMBER() to get records randomly (SQL Server 2005)

Selecting Nth Record in an SQL Query

Categories

Resources