Randomize Return Results - Access - sql

I need to match up an employee with a task in a small Microsoft Access DB I built. Essentially, I have a list of 45 potential tasks, and I have 25 employees. What I need is:
Each employee to have at LEAST one task
No employee to have more than TWO
Be able to randomize the results every time I run the query (so the same people don't get consistently the same tasks)
My table structure is:
Employees - w/ fields: ID, Name
Tasks - w/ fields: ID, Location, Task Group, Task
I know this is a dumb question, but I truly am struggling. I have searched through SO and Google for help but have been unsuccessful.
I don't have a way to link together employees to tasks since each employee is capable of every task, so I was going to:
1. SELECT * from Employees
2. SELECT * from Tasks
3. Union
4. COUNT(Name) <= 2
But I don't know how to randomize those results so that folks are randomly matched up, with each person at least once and nobody more than twice.
Any help or guidance is appreciated. Thank you.

Consider a cross join with an aggregate query that randomizes the choice set. Currently, at 45 X 25 this yields a cartesian product of 1,125 records which is manageable.
Select query (save as a query object, assumes Tasks has autonumber field)
SELECT cj.[Emp_Name], Max(cj.ID) As M_ID, Max(cj.Task) As M_Task
FROM
(SELECT e.[Emp_Name], t.ID, t.Task
FROM Employees e,
Tasks t) cj
GROUP BY cj.[Emp_Name], Rnd(cj.ID)
ORDER BY cj.[Emp_Name], Rnd(cj.ID)
However, the challenge here is this above query randomizes the order of all 45 tasks per each of the 25 employees whereas you need the top two tasks per employee. Unfortunately, MS Access does not have a row id like other DBMS to use to select top 2 per employee. And we cannot use a correlated subquery on Task ID per Employee since this will always return the highest two task IDs by their value and not random top two IDs.
Therefore to do so in Access, you will need a temp table regularly cleaned out prior to each allocation of employee tasks and use autonumber for selection via correlated subquery.
Create table (run once, autonumber field required)
CREATE TABLE CrossJoinRandomPicks (
ID AUTOINCREMENT PRIMARY KEY,
Emp_Name TEXT(255),
M_ID LONG,
M_Task TEXT(255)
)
Delete query (run regularly)
DELETE FROM CrossJoinRandomPicks;
Append query (run regularly)
INSERT INTO CrossJoinRandomPicks ([Emp_Name], [M_ID], [M_Task])
SELECT [Emp_Name], [M_ID], [M_Task]
FROM mySavedCrossJoinQuery;
Final query (selects top two random tasks for each employee)
SELECT c.name, c.M_Letter
FROM CrossJoinRandomPicks c
WHERE
(SELECT Count(*) FROM CrossJoinRandomPicks sub
WHERE sub.name = c.name
AND sub.ID <= c.ID) <= 2;

Related

SQL JOIN to select MAX value among multiple user attempts returns two values when both attempts have the same value

Good morning, everyone!
I have a pretty simple SELECT/JOIN statement that gets some imported data from a placement test and returns the highest scored attempt a user made, the best score. Users can take this test multiple times, so we just use the best attempt. What if a user makes multiple attempts (say, takes it twice,) and receives the SAME score both times?
My current query ends up returning BOTH of those records, as they're both equal, so MAX() returns both. There are no primary keys setup on this yet--the query I'm using below is the one I hope to add into an INSERT statement for another table, once I only get a SINGLE best attempt per User (StudentID), and set that StudentID as the key. So you see my problem...
I've tried a few DISTINCT or TOP statements in my query but either I'm putting them into the wrong part of the query or they still return two records for a user who had identically scored attempts. Any suggestions?
SELECT p.*
FROM
(SELECT
StudentID, MAX(PlacementResults) AS PlacementResults
FROM AleksMathResults
GROUP BY StudentID)
AS mx
JOIN AleksMathResults p ON mx.StudentID = p.StudentID AND mx.PlacementResults = p.PlacementResults
ORDER BY
StudentID
Sounds like you want row_number():
SELECT amr.*
FROM (SELECT amr.*
ROW_NUMBER() OVER (PARTITION BY StudentID ORDER BY PlacementResults DESC) as seqnum
FROM AleksMathResults amr
) amr
WHERE seqnum = 1;

Validate that only one value exists

I have a table with two relevant columns. I'll call them EID and MID. They are not unique.
In theory, if the data is set up correctly, there will be many records for each EID and every one of those records should have the same MID.
There are situations where someone may manually update data incorrectly and I need to be able to quickly identify if there is a second MID for any EID.
Ideally, I'd have a query that returns how many MIDs for each EID, but only showing results where there is more than 1 MID. Below is what I'd like the results to look like.
EID Count of Distinct MID values
200345 2
304334 3
I've tried several different forms of queries, but I can't seem to figure out how to reach this result. We're on SQL Server.
You can use the following using COUNT with DISTINCT and HAVING:
SELECT EID, COUNT(DISTINCT MID)
FROM table_name
GROUP BY EID
HAVING COUNT(DISTINCT MID) > 1
demo on dbfiddle.uk

Which Oracle query is faster

I am trying to display employee properties using C# WPF view.
I have data in '2' different oracle tables in my database:
Those tables structure at high-level is...
Employee table (EMP) - columns:
ID, Name, Organisation
Employee properties table (EMPPR) - columns
ID, PropertyName, PropertyValue
The user will input 'List of Employee Name' and I need to display Employee properties using data in those '2' tables.
Each employee has properties from 40-80 i.e. 40-80 rows per employee in EMPPR table. In this case, which approach is more efficient?
Approach #1 - single query data retrieval:
SELECT Pr.PropertyName, Pr.PropertyValue
FROM EMP Emp, EMPPR Pr
WHERE Emp.ID = Pr.ID
AND Emp.Name IN (<List of Names entered>)
Approach #2 - get IDs list using one query and Get properties using that ID in the second query
Query #1:
SELECT ID
FROM EMP
WHERE Name IN (<List of Names entered>)
Query #2:
SELECT PropertyName, PropertyValue
FROM EMPPR
WHERE ID IN (<List of IDs got from Query#1>)
I need to retrieve ~10K employee details at once where each employee has 40-80 properties.
Which approach is good?
Which query is faster?
The first one, which uses a single query to fetch your results.
Why? much of the elapsed time handling queries, especially ones with modestly sized rows like yours, is consumed going back and forth from the client to the database server.
Plus, the construct WHERE something IN (val, val, val, val ... ... val) can throw an error when you have too many values. So the first query is more robust.
Pro tip: Come on into the 21st century and use the new JOIN syntax.
SELECT Pr.PropertyName, Pr.PropertyValue
FROM EMP Emp
JOIN EMPPR Pr ON Emp.ID = Pr.ID
WHERE Emp.Name IN (<List of Names Inputted>)
Use first approach of join between two tables which is far better than using where clause two times.

Select Last Updated Row with condition

I'm working on building a workload tracking system, I have a table that currently has listed all the tasks to be completed (each with a unique ID), but also has all the updates with a datestamp so that I can track how long it took for the status to be updated.
My dilemma is that for a form I want to query only the latest update, currently the select query shows both the original task and the updated task separately.
In words, I guess what I need to do is to select only a task given that the ID is the last one with that same task number (which is different than the ID, there will be duplicates when it is updated)
So if I have:
ID Task Date
1 A 4/30/13
2 B 5/2/13
3 A 5/3/13
That the table only shows:
ID Task Date
3 A 5/3/13
2 B 5/2/13
How can I do this? I think I'm missing something simple...
There are multiple ways to approach this query, even in Access. Here is a way using in with a subquery:
select t.*
from t
where t.id in (select MAX(id) as maxid
from t
group by task
)
order by task
The subquery finds the maximum ids for all the tasks. It then returns the rows from the original table that match those ids.

Fetch data with single and fast SQL query

I have the following data:
ExamEntry Student_ID Grade
11 1 80
12 2 70
13 3 20
14 3 68
15 4 75
I want to find all the students that passed an exam. In this case, if there are few exams
that one student attended to, I need to find the last result.
So, in this case I'd get that all students passed.
Can I find it with one fast query? I do it this way:
Find the list of entries by
select max(ExamEntry) from data group by Student_ID
Find the results:
select ExamEntry from data where ExamEntry in ( ).
But this is VERY slow - I get around 1000 entries, and this 2 step process takes 10 seconds.
Is there a better way?
Thanks.
If your query is very slow at with 1000 records in your table, there is something wrong.
For a modern Database system a table containing, 1000 entries is considered very very small.
Most likely, you did not provid a (primary) key for your table?
Assuming that a student would pass if at least on of the grades is above the minimum needed, the appropriate query would be:
SELECT
Student_ID
, MAX(Grade) AS maxGrade
FROM table_name
GROUP BY Student_ID
HAVING maxGrade > MINIMUM_GRADE_NEEDED
If you really need the latest grade to be above the minimum:
SELECT
Student_ID
, Grade
FROM table_name
WHERE ExamEntry IN (
SELECT
MAX(ExamEntry)
FROM table_name
GROUP BY Student_ID
)
HAVING Grade > MINIMUM_GRADE_NEEDED
SELECT student_id, MAX(ExamEntry)
FROM data
WHERE Grade > :threshold
GROUP BY student_id
Like this?
I'll make some assumptions that you have a student table and test table and the table you are showing us is the test_result table... (if you don't have a similar structure, you should revisit your schema)
select s.id, s.name, t.name, max(r.score)
from student s
left outer join test_result r on r.student_id = s.id
left outer join test t on r.test_id = t.id
group by s.id, s.name, t.name
All the fields with id in it should be indexed.
If you really only have a single test (type) in your domain... then the query would be
select s.id, s.name, max(r.score)
from student s
left outer join test_result r on r.student_id = s.id
group by s.id, s.name
I've used the hints given here, and here the query I found that runs almost 3 orders faster than my first one (.03 sec instead of 10 sec):
SELECT ExamEntry, Student_ID, Grade from data,
( SELECT max(ExamEntry) as ExId GROUP BY Student_ID) as newdata
WHERE `data`.`ExamEntry`=`newdata`.`ExId` AND Grade > 60;
Thanks All!
As mentioned, indexing is a powerful tool for speeding up queries. The order of the index, however, is fundamentally important.
An index in order of (ExamEntry) then (Student_ID) then (Grade) would be next to useless for finding exams where the student passed.
An index in the opposite order would fit perfectly, if all you wanted was to find what exams had been passed. This would enable the query engine to quickly identify rows for exams that have been passed, and just process those.
In MS SQL Server this can be done with...
CREATE INDEX [IX_results] ON [dbo].[results]
(
[Grade],
[Student_ID],
[ExamEntry]
)
ON [PRIMARY]
(I recommend reading more about indexs to see what other options there are, such as ClusterdIndexes, etc, etc)
With that index, the following query would be able to ignore the 'failed' exams very quickly, and just display the students who ever passed the exam...
(This assumes that if you ever get over 60, you're counted as a pass, even if you subsequently take the exam again and get 27.)
SELECT
Student_ID
FROM
[results]
WHERE
Grade >= 60
GROUP BY
Student_ID
Should you definitely need the most recent value, then you need to change the order of the index back to something like...
CREATE INDEX [IX_results] ON [dbo].[results]
(
[Student_ID],
[ExamEntry],
[Grade]
)
ON [PRIMARY]
This is because the first thing we are interested in is the most recent ExamEntry for any given student. Which can be achieved using the following query...
SELECT
*
FROM
[results]
WHERE
[results].ExamEntry = (
SELECT
MAX([student_results].ExamEntry)
FROM
[results] AS [student_results]
WHERE
[student_results].Student_ID = [results].student_id
)
AND [results].Grade > 60
Having a sub query like this can appear slow, especially since it appears to be executed for every row in [results].
This, however, is not the case...
- Both main and sub query reference the same table
- The query engine scans through the Index for every unique Student_ID
- The sub query is executed, for that Student_ID
- The query engine is already in that part of the index
- So a new Index Lookup is not needed
EDIT:
A comment was made that at 1000 records indexs are not relevant. It should be noted that the question states that there are 1000 records Returned, not that the table contains 1000 records. For a basic query to take as long as stated, I'd wager there are many more than 1000 records in the table. Maybe this can be clarified?
EDIT:
I have just investigated 3 queries, with 999 records in each (3 exam results for each of 333 students)
Method 1: WHERE a.ExamEntry = (SELECT MAX(b.ExamEntry) FROM results [a] WHERE a.Student_ID = b.student_id)
Method 2: WHERE a.ExamEntry IN (SELECT MAX(ExamEntry) FROM resuls GROUP BY Student_ID)
Method 3: USING an INNER JOIN instead of the IN clause
The following times were found:
Method QueryCost(No Index) QueryCost(WithIndex)
1 23% 9%
2 38% 46%
3 38% 46%
So, Query 1 is faster regardless of indexes, but indexes also definitely make method 1 substantially faster.
The reason for this is that indexes allow lookups, where otherwise you need a scan. The difference between a linear law and a square law.
Thanks for the answers!!
I think that Dems is probably closest to what I need, but I will elaborate a bit on the issue.
Only the latest grade counts. If the student had passed first time, attended again and failed, he failed in total. He/She could've attended 3 or 4 exams, but still only the last one counts.
I use MySQL server. The problem I experience in both Linux and Windows installations.
My data set is around 2K entries now and grows with the speed of ~ 1K per new exam.
The query for specific exam also returns ~ 1K entries, when ~ 1K would be the number of students attended (received by SELECT DISTINCT STUDENT_ID from results;), then almost all have passed and some have failed.
I perform the following query in my code:
SELECT ExamEntry, Student_ID from exams WHERE ExamEntry in ( SELECT MAX(ExamEntry) from exams GROUP BY Student_ID). As subquery returns about ~1K entries, it appears that main query scans them in loop, making all the query run for a very long time and with 50% server load (100% on Windows).
I feel that there is a better way :-), just can't find it yet.
select examentry,student_id,grade
from data
where examentry in
(select max(examentry)
from data
where grade > 60
group by student_id)
don't use
where grade > 60
but
where grade between 60 and 100
that should go faster