MySQL - How do I order the results randomly inside a column? - sql

I need to retrieve rows from a table (i.e: 'orders') ordered by a column (lets say 'user') randomly. That is, I need all orders from the same user to remain together (one after the other) and users to be ordered randomly.

I'm going to assume you have a second table called "users" that has all the users in it. If not, you could still do this by adding another SELECT DISTINCT subquery on orders, but that would be much messier:
SELECT orders.*
FROM orders
INNER JOIN (SELECT userid, RAND() as random FROM users) tmp
ON orders.userid = tmp.userid
ORDER BY tmp.random, tmp.userid
You'll want to order by the random number AND the user id so if two user ids get the same random number their orders won't be all jumbled together.

How random does it have to be? I can think of a few possible answers.
If the "random" sequence should be repeatable, you can sort by a hash of the user ID, using MD5 or a custom one you create yourself e.g. ORDER BY MD5(), secondary_sort_column.

order by reverse(user) ?

Related

Best approach to ocurrences of ids on a table and all elements in another table

Well, the query I need is simple, and maybe is in another question, but there is a performance thing in what I need, so:
I have a table of users with 10.000 rows, the table contains id, email and more data.
In another table called orders I have way more rows, maybe 150.000 rows.
In this orders I have the id of the user that made the order, and also a status of the order. The status could be a number from 0 to 9 (or null).
My final requirement is to have every user with the id, email, some other column , and the number of orders with status 3 or 7. it does not care of its 3 or 7, I just need the amount
But I need to do this query in a low-impact way (or a performant way).
What is the best approach?
I need to run this in a redash with postgres 10.
This sounds like a join and group by:
select u.*, count(*)
from users u join
orders o
on o.user_id = u.user_id
where o.status in (3, 7)
group by u.user_id;
Postgres is usually pretty good about optimizing these queries -- and the above assumes that users(user_id) is the primary key -- so this should work pretty well.

SQL JOIN to select MAX value among multiple user attempts returns two values when both attempts have the same value

Good morning, everyone!
I have a pretty simple SELECT/JOIN statement that gets some imported data from a placement test and returns the highest scored attempt a user made, the best score. Users can take this test multiple times, so we just use the best attempt. What if a user makes multiple attempts (say, takes it twice,) and receives the SAME score both times?
My current query ends up returning BOTH of those records, as they're both equal, so MAX() returns both. There are no primary keys setup on this yet--the query I'm using below is the one I hope to add into an INSERT statement for another table, once I only get a SINGLE best attempt per User (StudentID), and set that StudentID as the key. So you see my problem...
I've tried a few DISTINCT or TOP statements in my query but either I'm putting them into the wrong part of the query or they still return two records for a user who had identically scored attempts. Any suggestions?
SELECT p.*
FROM
(SELECT
StudentID, MAX(PlacementResults) AS PlacementResults
FROM AleksMathResults
GROUP BY StudentID)
AS mx
JOIN AleksMathResults p ON mx.StudentID = p.StudentID AND mx.PlacementResults = p.PlacementResults
ORDER BY
StudentID
Sounds like you want row_number():
SELECT amr.*
FROM (SELECT amr.*
ROW_NUMBER() OVER (PARTITION BY StudentID ORDER BY PlacementResults DESC) as seqnum
FROM AleksMathResults amr
) amr
WHERE seqnum = 1;

Randomly choose which group values to include in GROUP BY

I have a fairly large table, so that I'd like to do a group by over the entire table across users, but only return data for 10% of the users, sampled randomly. I know how to sample rows uniformly, but is there a 1-step way to randomly decide which users to include in a group by on the user field?
Maybe could you order the results of the select clause randomly like this
SELECT TOP x *
FROM table_name
ORDER BY newid()

How can I order by a specific order?

It would be something like:
SELECT * FROM users ORDER BY id ORDER("abc","ghk","pqr"...);
In my order clause there might be 1000 records and all are dynamic.
A quick google search gave me below result:
SELECT * FROM users ORDER BY case id
when "abc" then 1
when "ghk" then 2
when "pqr" then 3 end;
As I said all my order clause values are dynamic. So is there any suggestion for me?
Your example isn't entirely clear, as it appears that a simple ORDER BY would suffice to order your id's alphabetically. However, it appears you are trying to create a dynamic ordering scheme that may not be alphabetical. In that case, my recommendation would be to use a lookup table for the values that you will be ordering by. This serves two purposes: first, it allows you to easily reorder the items without altering each entry in the users table, and second, it avoids (or at lest reduces) problems with typos and other issues that can occur with "magic strings."
This would look something like:
Lookup Table:
CREATE TABLE LookupValues (
Id CHAR(3) PRIMARY KEY,
Order INT
);
Query:
SELECT
u.*
FROM
users u
INNER JOIN
LookupTable l
ON
u.Id = l.Id
ORDER BY
l.Order

Is a GROUP BY on UNIQUE key calculates all the groups before applying LIMIT clause?

If I GROUP BY on a unique key, and apply a LIMIT clause to the query, will all the groups be calculated before the limit is applied?
If I have hundred records in the table (each has a unique key), Will I have 100 records in the temporary table created (for the GROUP BY) before a LIMIT is applied?
A case study why I need this:
Take Stack Overflow for example.
Each query you run to show a list of questions, also shows the user who asked this question, and the number of badges he has.
So, while a user<->question is one to one, user<->badges is one has many.
The only way to do it in one query (and not one on questions and another one on users and then combine results), is to group the query by the primary key (question_id) and join+group_concat to the user_badges table.
The same goes for the questions TAGS.
Code example:
Table Questions:
question_id (int)(pk)| question_body(varchar)
Table tag-question:
question-id (int) | tag_id (int)
SELECT:
SELECT quesuestions.question_id,
questions.question_body,
GROUP-CONCAT(tag_id,' ') AS 'tags-ids'
FROM
questions
JOIN
tag_question
ON
questions.question_id=tag-question.question-id
GROUP BY
questions.question-id
LIMIT 15
Yes, the order the query executes is:
FROM
WHERE
GROUP
HAVING
SORT
SELECT
LIMIT
LIMIT is the last thing calculated, so your grouping will be just fine.
Now, looking at your rephrased question, then you're not having just one row per group, but many: in the case of stackoverflow, you'll have just one user per row, but many badges - i.e.
(uid, badge_id, etc.)
(1, 2, ...)
(1, 3, ...)
(1, 12, ...)
all those would be grouped together.
To avoid full table scan all you need are indexes. Besides that, if you need to SUM, for example, you cannot avoid a full scan.
EDIT:
You'll need something like this (look at the WHERE clause):
SELECT
quesuestions.question_id,
questions.question_body,
GROUP_CONCAT(tag_id,' ') AS 'tags_ids'
FROM
questions q1
JOIN tag_question tq
ON q1.question_id = tq.question-id
WHERE
q1.question_id IN (
SELECT
tq2.question_id
FROM
tag_question tq2
ON q2.question_id = tq2.question_id
JOIN tag t
tq2.tag_id = t.tag_id
WHERE
t.name = 'the-misterious-tag'
)
GROUP BY
q1.question_id
LIMIT 15
LIMIT does get applied after GROUP BY.
Will the temporary table be created or not, depends on how your indexes are built.
If you have an index on the grouping field and don't order by the aggregate results, then an INDEX SCAN FOR GROUP BY is applied, and each aggregate is counted on the fly.
That means that if you don't select an aggregate due to the LIMIT, it won't ever be calculated.
But if you order by an aggregate, then, of course, all of them need to be calculated before they can be sorted.
That's why they are calculated first and then the filesort is applied.
Update:
As for your query, see what EXPLAIN EXTENDED says for it.
Most probably, question_id is a PRIMARY KEY for your table, and most probably, it will be used in a scan.
That means no filesort will be applies and the join itself will not ever happen after the 15'th row.
To make sure, rewrite your query as following:
SELECT question_id,
question_body,
(
SELECT GROUP_CONCAT(tag_id, ' ')
FROM tag_question t
WHERE t.question_id = q.question_id
)
FROM questions q
ORDER BY
question_id
LIMIT 15
First, it is more readable,
Second, it is more efficient, and
Third, it will return even untagged questions (which your current query doesn't).
If the field you're grouping on is indexed, it shouldn't do a full table scan.