T-SQL Randomize order of results using RAND(seed)

T-SQL Randomize order of results using RAND(seed) - sql

Im using the following statement (this is a shortened version as an example) to get results from my Microsoft SQL Express 2012 database:
SELECT id, name, city
FROM tblContact
ORDER BY RAND(xxx)
and injecting a seed stored in the session for the xxx part so that the results are consistently random for a given session (so when paging through results, the user doesn't see duplicates)
PROBLEM: No matter what the seed is, the results get returned in the same order
I have also tried this:
SELECT id, name, city, RAND(xxx) AS OrderValue
FROM tblContact
ORDER BY OrderValue
Both give the same (unexpected result) - am I using this incorrectly?

The value of rand(seed) will be the same for the entire query, You my want to use the ID column to generate random value on the row per row basis:
SELECT id, name, city, RAND(xxx + id) AS OrderValue
FROM tblContact ORDER BY OrderValue
However I've been developing some functionality in the past where I needed to have random order for different session, but the same order within the same session. At that time I have used HASHBYTES() and it worked very well:
SELECT id, name, city, HASHBYTES('md5',cast(xxx+id as varchar)) AS OrderValue
FROM tblContact ORDER BY OrderValue

In SQL Server, Rand() is calculated once for the query. To get a random order, use ORDER BY NEWID().

Often, the newid() function is used for this purpose:
SELECT id, name, city
FROM tblContact
ORDER BY newid();
I have heard that rand(checksum(newid())) actually has better properties as a random number generator:
SELECT id, name, city
FROM tblContact
ORDER BY rand(checksum(newid()));
If you want consistent result from one query to the next, then #dimt's solution using id or a function of id.

Related

SQL - select specific results

I have a table containing 2 columns for example. First column has unique values and the second column duplicates. Is there any way for me to select the first unique value only from the first column in relation to the second column?
For example: The results should get: Apple, Tire, and Fork only since they are the first results of the second column (category)
Details
Category
Apple
Fruits
Banana
Fruits
Tire
Car
Engine
Car
Fork
Silverware
Spoon
Silverware
Knife
Silverware

Usually we can use windowing functions like ROW_NUMBER() to simplify these types of queries, however your requested record set does not have a natural sort order that could be used that would result in the output you are expecting.
The following is a simple solution that uses ROW_NUMBER(), however it will not result as you have requested:
SELECT Category, Details
FROM
(
SELECT Category, Details, row_number() over (partition by category order by details) as rn
FROM SpecificResults
) as numberedRecords
WHERE rn = 1;
Results:
Category
Details
Car
Engine
Fruits
Apple
Silverware
Fork
You requested an output of: Apple, Tire, and Fork
The next query might produce the expected output, because we do not specify the sort, however due to this the output is non-deterministic, that is we cannot gaurantee it, due to database internals over time or even after instantaneously repeated queries the result might be different.
There are many discussions on non-deterministic queries in SQL, have a read through this thread on SO: The order of a SQL Select statement without Order By clause
SELECT Category, details.Details
FROM SpecificResults byCategory
CROSS APPLY (
SELECT TOP 1 Details
FROM SpecificResults lookup
WHERE lookup.Category = byCategory.Category
--ORDER BY Details
) as details
GROUP BY Category, details.Details;
Results in:
Category
Details
Car
Tire
Fruits
Apple
Silverware
Fork
I have setup a SQL Fiddle for you to explore this further: http://sqlfiddle.com/#!18/68530/12
Real World Solution
In the real world, your dataset will have a primary key, and in many cases that key value might be incrementally tallied, if not there may be other columns that could be used to determine the sort order that will match your expected results.
Assuming that your dataset has an integer column called Id and that column is an Identity column, then a simple change to the original query using ROW_NUMBER() will achieve the desired result:
SELECT Category, Details
FROM
(
SELECT Category, Details, row_number() over (partition by category order by Id) as rn
FROM OrderedResults
) as numberedRecords
WHERE rn = 1;
I have updated the SQL Fiddle with this variation: http://sqlfiddle.com/#!18/3f7bd/2
If there is a Created date or some other Timestamp or DateTime based column in your recordset then you you could consider those as candidates for your ORDER BY clause.

SQL table represent unordered sets. There is no "first" value unless a column specifies the value. If you have such a column, then you can use row_number():
select t.*
from (select t.*,
row_number() over (partition by category order by <ordering col>) as seqnum
from t
) t
where seqnum = 1;
If you don't have such a column, then you simply cannot ask such a question in a relational database. The data doesn't support the question.

If I understand it correctly, try this -
select category, details from ( select *, row number() over (partition by category order by details) as rn from tablename) where rn = 1

SQL 'partition by order by' turns count() into rank()?

I am trying to figure out how to use partition by properly, and looking for a brief explanation to the following results. (I apologize for including the test data without proper SQL code.)
Example 1: Counts the IDs (e.g. shareholders) for each company and adds it to the original data frame (as "newvar").
select ID, company,
count(ID) over(partition by company) as newvar
from testdata;
Example 2: When I now add order by shares count() somehow seems to turn into rank(), so that the output is merely a ranking variable.
select ID, company,
count(ID) over(partition by company order by shares) as newvar
from testdata;
I thought order by just orders the data, but it seems to have an impact on "newvar".
Is there a simple explanation to this?
Many thanks in advance!
.csv file that contains testdata:
ID;company;shares
1;a;10
2;a;20
3;a;70
1;b;50
4;b;10
5;b;10
6;b;30
2;c;80
3;c;10
7;c;10
1;d;20
2;d;30
3;d;25
6;d;10
7;d;15

count() with an order by does a cumulative count. It is going to turn the value either into rank() or row_number(), depending on ties in the shares value and how the database handles missing windows frames (rows between or range between).
If you want to just order the data, then the order by should be after the from clause:
select ID, company,
count(ID) over(partition by company) as newvar
from testdata
order by shares;

How to insert a count column into a sql query

I need the second column of the table retrieved from a query to have a count of the number of rows, so row one would have a 1, row 2 would have a 2 and so on. I am not very proficient with sql so I am sorry if this is a simple task.
A basic example of what I am doing would be is:
SELECT [Name], [I_NEED_ROW_COUNT_HERE],[Age],[Gender]
FROM [customer]
The row count must be the second column and will act as an ID for each row. It must be the second row as the text file it is generating will be sent to the state and they require a specific format.
Thanks for any help.

With your edit, I see that you want a row ID (normally called row number rather than "count") which is best gathered from a unique ID in the database (person_id or some other unique field). If that isn't possible, you can make one for this report with ROW_NUMBER() OVER (ORDER BY EMPLOYEE_ID DESC) AS ID, in your select statement.
select Name, ROW_NUMBER() OVER (ORDER BY Name DESC) AS ID,
Age, Gender
from customer
This function adds a field to the output called ID (see my tips at the bottom to describe aliases). Since this isn't in the database, it needs a method to determine how it will increment. After the over keyword it orders by Name in descending order.
Information on Counting follows (won't be unique by row):
If each customer has multiple entries but the selected fields are the same for that user and you are counting that user's records (summed in one result record for the user) then you would write:
select Name, count(*), Age, Gender
from customer
group by name, age, gender
This will count (see MSDN) all the user's records as grouped by the name, age and gender (if they match, it's a single record).
However, if you are counting all records so that your whole report has the grand total on every line, then you want:
select Name, (select count(*) from customer) as "count", Age, Gender
from customer
TIP: If you're using something like SSMS to write a query, dragging in columns will put brackets around the columns. This is only necessary if you have spaces in column names, but a DBA will tend to avoid that like the plague. Also, if you need a column header to be something specific, you can use the as keyword like in my first example.
W3Schools has a good tutorial on count()
The COUNT(column_name) function returns
the number of values (NULL values will not be counted) of the
specified column:
SELECT COUNT(column_name) FROM table_name;
The COUNT(*) function returns the number of records in a table:
SELECT COUNT(*) FROM table_name;
The COUNT(DISTINCT column_name) function returns the number of
distinct values of the specified column:
SELECT COUNT(DISTINCT column_name) FROM table_name;
COUNT(DISTINCT) works with ORACLE and Microsoft SQL Server, but
not with Microsoft Access.

It's odd to repeat the same number in every row but it sounds like this is what you're asking for. And note that this might not work in your flavor of SQL. MS Access?
SELECT [Name], (select count(*) from [customer]), [Age], [Gender]
FROM [customer]

SQL to get rows (not groups) that match an aggregate

Given table USER (name, city, age), what's the best way to get the user details of oldest user per city?
I have seen the following example SQL used in Oracle which I think it works
select name, city, age
from USER, (select city as maxCity, max(age) as maxAge
from USER
group by city)
where city=maxCity and age=maxAge
So in essence: use a nested query to select the grouping key and aggregate for it, then use it as another table in the main query and join with the grouping key and the aggregate value for each key.
Is this the standard SQL way of doing it? Is it any quicker than using a temporary table, or is in fact using a temporary table interanlly anyway?

What you are using will work, although it displays all users which share the max age.
You can do this in a slightly more readable way using the row_number() ranking function:
select name, city, age
from (
select
city
, age
, row_number() over (partition by city order by age) as rn
from USER
) sub
where rn = 1
This will also select at most one user per city.
Most database systems will use a temporary table to store the inner query. So I don't think a temporary table would speed it up. But database performance is notoriously hard to predict from a distance :)

How to randomize order of data in 3 columns

I have 3 columns of data in SQL Server 2005 :
LASTNAME
FIRSTNAME
CITY
I want to randomly re-order these 3 columns (and munge the data) so that the data is no longer meaningful. Is there an easy way to do this? I don't want to change any data, I just want to re-order the index randomly.

When you say "re-order" these columns, do you mean that you want some of the last names to end up in the first name column? Or do you mean that you want some of the last names to get associated with a different first name and city?
I suspect you mean the latter, in which case you might find a programmatic solution easier (as opposed to a straight SQL solution). Sticking with SQL, you can do something like:
UPDATE the_table
SET lastname = (SELECT lastname FROM the_table ORDER BY RAND())
Depending on what DBMS you're using, this may work for only one line, may make all the last names the same, or may require some variation of syntax to work at all, but the basic approach is about right. Certainly some trials on a copy of the table are warranted before trying it on the real thing.
Of course, to get the first names and cities to also be randomly reordered, you could apply a similar query to either of those columns. (Applying it to all three doesn't make much sense, but wouldn't hurt either.)
Since you don't want to change your original data, you could do this in a temporary table populated with all rows.
Finally, if you just need a single random value from each column, you could do it in place without making a copy of the data, with three separate queries: one to pick a random first name, one a random last name, and the last a random phone number.

I suggest using newid with checksum for doing randomization
SELECT LASTNAME, FIRSTNAME, CITY FROM table ORDER BY CHECKSUM(NEWID())

In SQL Server 2005+ you could prepare a ranked rowset containing the three target columns and three additional computed columns filled with random rankings (one for each of the three target columns). Then the ranked rowset would be joined with itself three times using the ranking columns, and finally each of the three target columns would be pulled from their own instance of the ranked rowset. Here's an illustration:
WITH sampledata (FirstName, LastName, CityName) AS (
SELECT 'John', 'Doe', 'Chicago' UNION ALL
SELECT 'James', 'Foe', 'Austin' UNION ALL
SELECT 'Django', 'Fan', 'Portland'
),
ranked AS (
SELECT
*,
FirstNameRank = ROW_NUMBER() OVER (ORDER BY NEWID()),
LastNameRank = ROW_NUMBER() OVER (ORDER BY NEWID()),
CityNameRank = ROW_NUMBER() OVER (ORDER BY NEWID())
FROM sampledata
)
SELECT
fnr.FirstName,
lnr.LastName,
cnr.CityName
FROM ranked fnr
INNER JOIN ranked lnr ON fnr.FirstNameRank = lnr.LastNameRank
INNER JOIN ranked cnr ON fnr.FirstNameRank = cnr.CityNameRank
This is the result:
FirstName LastName CityName
--------- -------- --------
James Fan Chicago
John Doe Portland
Django Foe Austin

select *, rand() from table order by rand();
I understand some versions of SQL have a rand() that doesn't change for each line. Check for yours. Works on MySQL.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

T-SQL Randomize order of results using RAND(seed) - sql

In SQL Server, Rand() is calculated once for the query. To get a random order, use ORDER BY NEWID().

Related

SQL - select specific results

SQL 'partition by order by' turns count() into rank()?

How to insert a count column into a sql query

SQL to get rows (not groups) that match an aggregate

How to randomize order of data in 3 columns

Categories

Resources