Select rows in random order and then reverse it - sql-server-2005

I need to select rows in random order and return a query which holds the rows in both regular order and in reverse order. This is done to simulate a fantasy draft for a basketball game I'm working on.
For example, I need a result set as followed:
team1 1
team2 2
team6 3
team9 4
team9 5
team6 6
team2 7
team1 8
As you can see, the first four teams are random then then following four are in reverse order.
Hope I managed to explain the problem, if not - please comment and I'll explain further.

You have to "cache" the results of the random ORDER BY.
In this code, if you refer to the CTE in the UNION it will be evaluated twice and you'll have 2 different orders. A CTE is just a macro
;WITH cList AS
(
SELECT team, ROW_NUMBER() OVER (ORDER BY NEWID()) AS rn
FROM teams
)
SELECT * INTO #tempresults FROM cList WHERE rn <= #rn --or however many
SELECT *, rn FROM #tempresults
UNION ALL
SELECT *, (2 * #rn) - rn FROM #tempresults
ORDER BY rn
Duplicating rows is easy with a dummy cross join (like this) but this requires ordering and rownumbering too over the intermediate results. I don't think it can be done in a single SQL statement

you can use a query like this:
select top(10) teamname, NewId() as Random
from teams
order by Random
this will return the top ten random teams from your database. Then you can reverse it with some code.

Related

Getting random N rows by SQL query which will be proportional to the total number of rows in different sections

I have a table that persists a lot of questions, each question belongs to a section:
Id Question SectionId
1 What is ... 3
2 Who is... 3
3 When is... 2
4 Why is... 1
5 How is... 3
There is like 1000 questions, and around 50 sections. However, my query is simple, I select a given number of questions from the table from specific sections, for example
SELECT TOP 10 [Id], [Question] FROM [Questions]
WHERE [SectionId] IN (1,2)
ORDER BY NEWID()
This is simple and working fine, except that sometimes I get 5 questions out of the requested 10 from a section that has only 6 questions, and 2 from a section that has 100 questions, and 3 from a section that has 20 questions.
How can I make the result "proportional" with the number of the questions in each section. For example if I request 10 questions, I get more questions from the section that has more questions, and less questions from the sections with less questions.
The only I can think of currently is to make multiple queries, first one to get the number of questions in each section, then do some math and decide how many questions from each section, and then make another few queries to get the number of questions as I want. This sound intensive and I hope there's a more practical way.
Note: An SQL query, or EF Linq query would work.
For a stratified sample, do an nth sample on the ordering. This is a little tricky, but this should work:
SELECT TOP (10) q.*
FROM (SELECT q.*,
ROW_NUMBER() OVER (ORDER BY section, NEWID()) as seqnum,
COUNT(*) OVER (ORDER BY section, NEWID()) as cnt
FROM [Questions] q
WHERE [SectionId] IN (1, 2)
) q
ORDER BY seqnum % (cnt / 10);
There may be some boundary conditions on this logic, but as the number of questions grows and the sample is large enough, it should do what you want.
I can't think of a way to do this in a single step, unless you know in advance the number of sections and the proportions of each.
If these values have to be calculated at query time, you will need to run a query to get the sections and proportions and use that to build a Dynamic SQL query.
Use a GROUP BY query to get the SectionIDs and the number of questions in each Section, filtered by the Sections you want to include.
Iterate through that result to build a dynamic UNION ALL query that gets a TOP n (calculate n based on the percentage of the Section's Count / Total Count) of questions for each Section (one query per section), so that you end up dynamically building something that looks something like this:
SELECT TOP 5 ID, Question --because SectionID 1 is 50% of the questions
FROM Questions
WHERE SectionID=1
ORDER BY NEWID()
UNION ALL
SELECT TOP 3 ID, Question --because SectionID 2 is 30% of the questions
FROM Questions
WHERE SectionID=2
ORDER BY NEWID()
UNION ALL
SELECT TOP 2 ID, Question --because SectionID 3 is 20% of the questions
FROM Questions
WHERE SectionID=3
ORDER BY NEWID()
Another approach you could think about is to create an artificial ranking column that is factored by the relative density of the section.
What I mean, for example (super simplifying it) is suppose Section 1 was 75% of the questions, and Section 2 was 25%.
You'd use ROW_NUMBER(), partitioned by SectionID, ordered by NEWID() and factored so that:
Section 1 would have values like 1,2,3,5,6,7, etc (3 out of every 4 cardinal values)
Section 2 would have values like 1, 5, 9, 10 etc (1 out of every 4)
Then Order your query result by this artificial column.
This is untested in the absence of sample data, however, something like this might work:
WITH CTE AS(
SELECT ID,
Question,
SectionID,
ROW_NUMBER() OVER (ORDER BY NEWID()) AS RN,
(COUNT(ID) OVER (PARTITION BY SectionID) / (COUNT(ID) OVER () *1.0)) *10 AS Perc
FROM YourTable
)
SELECT TOP 10
ID,
Question,
SectionID
FROM CTE
WHERE RN <= CEILING(Perc)
ORDER BY RN ASC;
Another alternative, for example...return 20% of total rows per section
DECLARE #percentage numeric(10,2)
SET #percentage = 0.20 --20% of total question for section
SELECT [SectionID],[ID],[Question]
FROM ( SELECT
[ID],
[Question],
[SectionID],
ROW_NUMBER() OVER(PARTITION BY SectionID ORDER BY NEWID()) [idx],
COUNT(1) OVER(PARTITION BY SectionID) * #percentage AS [Proportional]
FROM [Questions]) tbl
WHERE
(tbl.[SectionID] = 1 AND tbl.[idx] <= [Proportional])
OR (tbl.[SectionID] = 2 AND tbl.[idx] <= [Proportional])
OR (tbl.[SectionID] = 3 AND tbl.[idx] <= [Proportional])
You can use the NTILE(100) function along with a over clause partition by section to get
SELECT TOP 10 [Id], [Question] FROM [Questions]
WHERE [SectionId] IN (1,2)
ORDER BY NEWID()
should be
declare #limit int = 10;
;with data as (
SELECT NTILE(100) over (partition by sectionid ORDER BY NEWID() ) as Centile, [Id], [Question]
FROM [Questions]
WHERE [SectionId] IN (1,2)
)
select * from data where centile <= #limit
https://learn.microsoft.com/en-us/sql/t-sql/functions/ntile-transact-sql
You could select always 10% of your records in any sections with this:
SELECT TOP ( select CAST(( COUNT(*) * 0.1 ) AS INT )
FROM QUESTION WHERE SECTIONID IN ( 1,2)) * FROM QUESTION
WHERE [SectionId] IN (1,2)
ORDER BY NEWID()

How to write a sql microsoft access query that picks 20 random records out of 100 but filter based on record categories?

I need a sql query that will randomly pick 20 records from a table that contains about 100 records. Each record has an associated category that goes from 1 to 15. I want the records that are picked to be completely random. However, I can't have 3 records from the same category being picked.
It seems to me that I can randomly pick 20 records and then eliminate records which contain a given category >=3 times. And then pick again. But all these implies having more than one query. And I don't know how to pass the results of one query to another and then another in microsoft access query. The query results are supposed to serve as a control source for a form. What do i do so that just one query will give me the results which can then be used as a control source for the form?
I tried the following and the problem is that the questions from the same category are grouped together which is not what I want. Here's a sample of what I am trying.
`(SELECT TOP 3 MCQuestionsT.QuestionID, MCQuestionsT.QuestionText, MCQuestionsT.CategoryID
FROM MCQuestionsT
WHERE (((MCQuestionsT.CourseCode)="2323") AND MCQuestionsT.CategoryID = 1)
ORDER BY Rnd(MCQuestionsT.QuestionID))
UNION ALL
(SELECT TOP 3 MCQuestionsT.QuestionID, MCQuestionsT.QuestionText, MCQuestionsT.CategoryID
FROM MCQuestionsT
WHERE (((MCQuestionsT.CourseCode)="2323") AND MCQuestionsT.CategoryID = 2)
ORDER BY Rnd(MCQuestionsT.QuestionID))
UNION ALL
(SELECT TOP 3 MCQuestionsT.QuestionID, MCQuestionsT.QuestionText, MCQuestionsT.CategoryID
FROM MCQuestionsT
WHERE (((MCQuestionsT.CourseCode)="2323") AND MCQuestionsT.CategoryID = 3)
ORDER BY Rnd(MCQuestionsT.QuestionID))
`
-- example using sys.all_objects that returns three random objects of each type
SELECT type_desc, name
FROM (
SELECT type_desc, name, Id = ROW_NUMBER() OVER (PARTITION BY type_desc ORDER BY NEWID())
FROM sys.all_objects
) Q
WHERE Id < 4
-- example using your table
SELECT QuestionID, QuestionText, CategoryID
FROM (
SELECT QuestionID, QuestionText, CategoryID, Id = ROW_NUMBER() OVER (PARTITION BY CategoryID ORDER BY NEWID())
FROM dbo.MCQuestionsT
WHERE CourseCode = '2323'
) Q
WHERE Id < 4

postgresql partition large dataset and random select 3 from a category

I'm a novice when It comes to database programming or programming in general.
But I hope you can help me.
I've a database with circa 9mln rows and there is a column named category. Some rows belong in the same category there maybe be 10 in the same, while others may only have 1 or 2.
I would like to make a new table where RANDOM three rows are selected from the same category. Basically filtering or excluding the rest.
If there are categories with less then 3 rows then only the available rows will be selected.
I've looked around on the forum and a friend told me I needed to do this with PARTITION. I've tried the following but this doesn't do what I want. Any help is much appreciated.
Create table test as
SELECT *
FROM (
SELECT *, row_number() OVER (PARTITION BY category ORDER BY random()) as rn
FROM data2016 ) sub
WHERE rn = 1;
Probably:
SELECT *
FROM (
SELECT *, row_number() OVER (PARTITION BY category ORDER BY random()) as rn
FROM data2016
) sub
WHERE rn <= 3;

Unique Top 5 Random Query

Let's say I have an app that determine the winners in a prize drawing. All entries are entered into a table indicating their employeeID. Each employee can enter the drawing multiple times. I select from the table, order by newid to get a random sort. I assume the more entries (database records) an employee has the better chance he will end up in the top 5 of my query each time I run it. So far so good. However, because each employee has multiple records, there is a good chance he will come up multiple times in the top 5. I need the ability to return 5 unique records from the randomly sorted results.
How do I get 5 unique rows while still ensuring those with multiple drawing entries get a heavier weighting in the selection?
My base query:
SELECT TOP 5 employeeID
FROM events
TABLESAMPLE(1000 ROWS)
ORDER BY CHECKSUM(NEWID());
Kinda what I am trying to do:
SELECT TOP 5 *
FROM events
WHERE employeeID IN (SELECT employeeID
FROM events
TABLESAMPLE(1000 ROWS)
ORDER BY CHECKSUM(NEWID())
)
ORDER BY CHECKSUM(NEWID())
But of course I cannot do an order by in the subquery.
Any solution must take into account 2 things:
If an employee enter multiple tickets, his chance of winning increases relative to other.
Everyone can only win once
Here's my approach:
;WITH
tmp1 AS
(
SELECT EmployeeID,
ROW_NUMBER() OVER (ORDER BY NEWID()) AS SortOrder
FROM Events
),
tmp2 AS
(
SELECT EmployeeID,
MIN(SortOrder) AS WinOrder
FROM tmp1
GROUP BY EmployeeID
)
SELECT TOP 5 *
FROM tmp2
ORDER BY WinOrder
The SQL Fiddle gives employees 1 & 5 higher chances to win, but they will only win once each, regardless of how many times they enter.
Here's a fairly simple way to get what you're after:
select top 5 EmployeeID
from
(
select EmployeeID, row_number() over (order by newid()) DrawOrder
from Events
) wins
group by EmployeeID
order by min(DrawOrder)

MSSQL 2008 SP pagination and count number of total records

In my SP I have the following:
with Paging(RowNo, ID, Name, TotalOccurrences) as
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
SELECT * FROM Paging WHERE RowNo BETWEEN 1 and 50
SELECT COUNT(*) FROM Paging
The result is that I get the error: invalid object name 'Paging'.
Can I query again the Paging table? I don't want to include the count for all results as a new column ... I would prefer to return as another data set. Is that possible?
Thanks, Radu
After more research I fond another way of doing this:
with Paging(RowNo, ID, Name, TotalOccurrences) AS
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
select RowNo, ID, Name, TotalOccurrences, (select COUNT(*) from Paging) as TotalResults from Paging where RowNo between (#PageNumber - 1 )* #PageSize + 1 and #PageNumber * #PageSize;
I think that this has better performance than calling two times the query.
You can't do that because the CTE you are defining will only be available to the FIRST query that appears after it's been defined. So when you run the COUNT(*) query, the CTE is no longer available to reference. That's just a limitation of CTEs.
So to do the COUNT as a separate step, you'd need to not use the CTE and instead use the full query to COUNT on.
Or, you could wrap the CTE up in an inline table valued function and use that instead, to save repeating the main query, something like this:
CREATE FUNCTION dbo.ufnExample()
RETURNS TABLE
AS
RETURN
(
with Paging(RowNo, ID, Name, TotalOccurrences) as
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
SELECT * FROM Paging
)
SELECT * FROM dbo.ufnExample() x WHERE RowNo BETWEEN 1 AND 50
SELECT COUNT(*) FROM dbo.ufnExample() x
Please be aware that Radu D's solution's query plan shows double hits to those tables. It is doing two executions under the covers. However, this still may be the best way as I haven't found a truly scalable 1-query design.
A less scalable 1-query design is to dump a completed ordered list into a #tablevariable , SELECT ##ROWCOUNT to get the full count, and select from #tablevariable where row number between X and Y. This works well for <10000 rows, but with results in the millions of rows, populating that #tablevariable gets expensive.
A hybrid approach is to populate this temp/variable up to 10000 rows. If not all 10000 rows are filled up, you're set. If 10000 rows are filled up, you'll need to rerun the search to get the full count. This works well if most of your queries return well under 10000 rows. The 10000 limit is a rough approximation, you can play around with this threshold for your case.
Write "AS" after the CTE table name Paging as below:
with Paging AS (RowNo, ID, Name, TotalOccurrences) as
(
ROW_NUMBER() over (order by TotalOccurrences desc) as RowNo, V.ID, V.Name, R.TotalOccurrences FROM dbo.Videos V INNER JOIN ....
)
SELECT * FROM Paging WHERE RowNo BETWEEN 1 and 50
SELECT COUNT(*) FROM Paging