Given the table snippet:
id | name | age
I am trying to form a query that will return 10 people within a certain age range. However, if there are not enough people in that range, I want to extend the range until I can find 10 people.
For instance, if I only find 5 people in a range of 30-40, I would find 5 others in a 25-45 range.
In addition, I would like the query to be able use order by RAND() or similar, in order to be able to get different results each time.
Is this going beyond what MySQL can handle? Will I have to put some of this logic in the application instead?
UPDATED for performance:
My original solution worked but requuired a table scan. Am's solution is a good one and doesn't require a table scan but its hard-coded ranges won't work when the only matches are far outliers. Plus it requires de-duping records. But combining both solutions can get you the best of both worlds, provided you have an index on age. (if you don't have an index on age, then all solutions will require a table scan).
The combined solution first picks only the rows which might qualify (the desired range, plus the 10 rows over and 10 rows under that range), and then uses my original logic to rank the results. Caveat: I don't have enough sample data present to verify that MySQL's optimizer is indeed smart enough to avoid a table scan here-- MySQL might not be smart enough to weave those three UNIONs together without a scan.
[just updated again to fix 2 embarrassing SQL typos: DESC where DESC shouldn't have been!]
SELECT * FROM
(
SELECT id, name, age,
CASE WHEN age BETWEEN 25 and 35 THEN RAND() ELSE ABS (age-30) END as distance
FROM
(
SELECT * FROM (SELECT * FROM Person WHERE age > 35 ORDER BY age DESC LIMIT 10) u1
UNION
SELECT * FROM (SELECT * FROM Person WHERE age < 25 ORDER BY age LIMIT 10) u2
UNION
SELECT * FROM (SELECT * FROM Person WHERE age BETWEEN 25 and 35) u3
) p2
ORDER BY distance
LIMIT 10
) p ORDER BY RAND() ;
Original Solution:
I'd approach it this way:
first, compute how far each record is from the center of the desired age range, and order the results by that distance. For all results inside the range, treat the distance as a random number between zero and one. This ensures that records inside the range will be selected in a random order, while records outside the range, if needed, will be selected in order closest to the desired range.
trim the number of records in that distance-ordered resultset to 10 records
randomize order of the resulting records
Like this:
CREATE TABLE Person (id int AUTO_INCREMENT PRIMARY KEY, name varchar(50) NOT NULL, age int NOT NULL);
INSERT INTO Person (name, age) VALUES ("Joe Smith", 26);
INSERT INTO Person (name, age) VALUES ("Frank Johnson", 32);
INSERT INTO Person (name, age) VALUES ("Sue Jones", 24);
INSERT INTO Person (name, age) VALUES ("Ella Frederick", 44);
SELECT * FROM
(
SELECT id, name, age,
CASE WHEN age BETWEEN 25 and 35 THEN RAND() ELSE ABS (age-30) END as distance
FROM Person
ORDER BY distance DESC
LIMIT 10
) p ORDER BY RAND() ;
Note that I'm assuming that, if there are not enough records inside the range, the records you want to append are the ones closest to that range. If this assumption is incorrect, please add more details to the question.
re: performance, this requires a scan through the table, so won't be fast-- I'm working on a scan-less solution now...
I would do somthing like this:
select * from (
SELECT * FROM (select * from ppl_table where age>30 and age<40 order by rand() limit 10) as Momo1
union
SELECT * FROM (select * from ppl_table where age>25 and age<40 order by rand() limit 20) as Momo2
) as FinalMomo
limit 10
basically selecting 10 users from the first group and then more from the second group.
if the first group doesn't add up to 10, there will be more from the second group.
The reason we are selectong 20 from the second group is because UNION will remove the duplicated values, and you want to have at least 10 users in the final result.
Edit
I added the as aliases from the inner SELECT, and made a separate in the inner SELECTs since MySql doesn't like ORDER BY with UNION
Related
I am exploring SQL with W3School page and I have this requirements where I need to limit the query to a certain number but also having a default row included with that limit.
Here I want a default row where the customer name is Alfreds, then grab the remaining 29 rows to complete the query regardless of what their name is.
I tried to look on other SO question but they are too complicated to understand and using different syntax.
What you are looking for is a specific order clause.
Try this
SELECT * FROM Customers order by (case when CustomerName in ('Alfreds Futterkiste') then 0 else CustomerId end) limit 30 ;
If you're going to have a default row in SQL you should really have that row in the table with a known primary key, and then UNION it onto your query:
--default row, that is always included as long as the table has a PK 1
SELECT *
FROM Customers
WHERE CustomerId = 1
UNION ALL
--other rows, a variable number of
SELECT *
FROM Customers
WHERE CustomerId <> 1 AND ...
LIMIT 30
The limit presented in this way applies to the result of the Union
If you ever want to do something where you're unioning together limited sets in other combinations you might want to look at eg a form like
(... LIMIT 2)
UNION ALL
(... LIMIT 28)
Use UNION to combine the two queries.
SELECT *
FROM Customers
WHERE CustomerName != 'Alfredo Futterkiste'
LIMIT 9
UNION
SELECT *
FROM Customers
WHERE CustomerName = 'Alfreo Futterkiste'
I have a table with 4 columns
USER_ID: numeric
EVENT_DATE: date
VERSION: date
SCORE: decimal
I have a clustered index on (USER_ID, EVENT_DATE, VERSION). These three values together are unique.
I need to get the maximum EventDate for a set of UserIds (~1000 different ids) where the Score is larger than a specific value and only consider those entries with a specific Version.
SELECT M.*
FROM (VALUES
( 5237 ),
………1000 more
( 27054 ) ) C (USER_ID)
CROSS APPLY
(SELECT TOP 1 C.USER_ID, M.EVENT_DATE, M.SCORE
FROM MY_HUGE_TABLE M
WHERE C. USER_ID = M. USER_ID
AND M.VERSION = 'xxxx-xx-xx'
AND M.SCORE > 2 --Comment M.SCORE > 2
ORDER BY M.EVENT_DATE DESC) M
Once I execute the query, I get poor results with respect to runtime, due to a missing index on score column (I suppose).
If I delete the filtering on “M.SCORE > 2” I get my results ten times faster, nevertheless the latest Scores may be less than “2”.
Could anyone please hint me on how to setup an index which could allow me to improve my query performance.
Thank you very much in advance
For your query, the optimal index would be on (User_ID, Version, ValueDate desc, Score).
Unfortunately, your clustered index doesn't match. Only the first and third columns match, but they need to match in order. So, only the User_ID can help but that probably doesn't do much to filter the data.
Failed finding a solution to my problem, would love your help.
~~ Post has been edited to have only one question ~~-
Group by one query while selecting multiple columns.
In MySQL you can simply group by whatever you want, and it will still select all of them, so if for example I wanted to select the newest 100 transactions, grouped by Email (only get the last transaction of a single email)
In MySQL I would do that:
SELECT * FROM db.transactionlog
group by Email
order by TransactionLogId desc
LIMIT 100;
In SQL Server its not possible, googling a bit suggested to specify each column that I want to have with an aggregate as a hack, that couldn't cause a mix of values (mixing columns between the grouped rows)?
For example:
SELECT TOP(100)
Email,
MAX(ResultCode) as 'ResultCode',
MAX(Amount) as 'Amount',
MAX(TransactionLogId) as 'TransactionLogId'
FROM [db].[dbo].[transactionlog]
group by Email
order by TransactionLogId desc
TransactionLogId is the primarykey which is identity , ordering by it to achieve the last inserted.
Just want to know that the ResultCode and Amount that I'll get doing such query will be of the last inserted row, and not the highest of the grouped rows or w/e.
~Edit~
Sample data -
row1:
Email : test#email.com
ResultCode : 100
Amount : 27
TransactionLogId : 1
row2:
Email: test#email.com
ResultCode:50
Amount: 10
TransactionLogId: 2
Using the sample data above, my goal is to get the row details of
TransactionLogId = 2.
but what actual happens is that I get a mixed values of the two, as I do get transactionLogId = 2, but the resultcode and amount of the first row.
How do I avoid that?
Thanks.
You should first find out which is the latest transaction log by each email, then join back against the same table to retrieve the full record:
;WITH MaxTransactionByEmail AS
(
SELECT
Email,
MAX(TransactionLogId) as LatestTransactionLogId
FROM
[db].[dbo].[transactionlog]
group by
Email
)
SELECT
T.*
FROM
[db].[dbo].[transactionlog] AS T
INNER JOIN MaxTransactionByEmail AS M ON T.TransactionLogId = M.LatestTransactionLogId
You are currently getting mixed results because your aggregate functions like MAX() is considering all rows that correspond to a particular value of Email. So the MAX() value for the Amount column between values 10 and 27 is 27, even if the transaction log id is lower.
Another solution is using a ROW_NUMBER() window function to get a row-ranking by each Email, then just picking the first row:
;WITH TransactionsRanking AS
(
SELECT
T.*,
MostRecentTransactionLogRanking = ROW_NUMBER() OVER (
PARTITION BY
T.Email -- Start a different ranking for each different value of Email
ORDER BY
T.TransactionLogId DESC) -- Order the rows by the TransactionLogID descending
FROM
[db].[dbo].[transactionlog] AS T
)
SELECT
T.*
FROM
TransactionsRanking AS T
WHERE
T.MostRecentTransactionLogRanking = 1
I have two tables which are joined by an ID...
table 1
- Assessment ID
- Module ID
- Assessment Weighting
table 2
- ID
- AssessmentID
- ModuleID
- UserID
- MarkFrom100
An assessment can have many students taking the assessment.
For example
A module has two assessments, one worth 60% and the other worth 40%. in table 2, I want to take the weighting value from table 1 and multiply it against the mark from 100.
SELECT * FROM Assessment, ModuleAssessmentUser WHERE
INNER JOIN moduleassementuser.assessmentID on Assessment.assessmentID
MULTIPLY AssessmentWeighting BY MarkFrom100 AS finalmark
UserID = 1
I know this is way off, but I really don't know how else to go about it.
My SQL knowledge is limited, so any help is appreciated!
You may use a SUM function in your query which will sum all the data of a certain group in a sub query wich will allow you to multiply the sum to the weight
sub query :
SELECT ModuleID, AssessmentID, UserID, SUM(MarkFrom100) as Total
FROM Table_2
GROUP BY ModuleID
Then use this sub query as a table in a main query :
SELECT T1.Assessment_ID, T1.ModuleID, Q1.UserID (Q1.Total * T1.Assessment_Weighting) as FinalMark
FROM (SELECT ModuleID, UserID, SUM(MarkFrom100) as Total
FROM Table_2
GROUP BY ModuleID) AS Q1
INNER JOIN Table_1 as T1 on T1.ModuleID = Q1.ModuleID
-- WHERE T1.ModuleID = 2 -- a particular module ID
GROUP BY ModuleID;
Note that the WHERE statement is in comment. If you want the whole data, remove it, if you want a particular data, use it ^^
NOTE :
I don't have your database, so it may need some tweeks, but the main idea is there
For example, I have table with columns:
playerName TEXT,
score INTEGER
And I have 10,000 rows in this table.
Now I can select e.g. top 100 players by using simple SQLite query:
SELECT playerName FROM table ORDER BY score DESC LIMIT 100
And now I have a question: how I can get position in statistics of player X which is not in top100?
I can do this by selecting all rows and then in a loop find position of player X but I think it doesn't have a good performance.
Is a simpler way to do this in SQLite and MySQL?
You can count how many players with larger score there are:
SELECT COUNT(*)
FROM MyTable
WHERE score >= (SELECT score
FROM MyTable
WHERE playerName = 'X');
(If you want to know this for all players, a single query would be more efficient.)