SELECT TOP 1 is returning multiple records - sql

I shall link my database down below.
I have a query called 'TestMonday1' and what this does is return the student with the fewest 'NoOfFrees' and insert the result of the query into the lesson table. Running the query should help explain what i mean. The problem im having is my SQL code has 'SELECT TOP 1' yet if the query returns two students who have the same number of frees it returns both these records. Wit this being a timetable planner, it should only ever return one result, i shall also put the code below,
Many thanks
Code:
INSERT INTO Lesson ( StudentID, LessonStart, LessonEnd, DayOfWeek )
SELECT TOP 1 Availability.StudentID, Availability.StartTime,
Availability.EndTime, Availability.DayOfWeek
FROM Availability
WHERE
Availability.StartTime='16:00:00' AND
Availability.EndTime='18:00:00' AND
Availability.DayOfWeek='Monday' AND
LessonTaken IS NULL
ORDER BY
Availability.NoOfFrees;

This happens because Access returns all records in case of ties in ORDER BY (all records returned have the same values of fields used in ORDER BY).
You can add another field to ORDER BY to make sure there's no ties. StudentID looks like a good candidate (though I don't know your schema, replace with something else if it suits better):
ORDER BY
Availability.NoOfFrees, Availability.StudentID;

Related

SQL JOIN to select MAX value among multiple user attempts returns two values when both attempts have the same value

Good morning, everyone!
I have a pretty simple SELECT/JOIN statement that gets some imported data from a placement test and returns the highest scored attempt a user made, the best score. Users can take this test multiple times, so we just use the best attempt. What if a user makes multiple attempts (say, takes it twice,) and receives the SAME score both times?
My current query ends up returning BOTH of those records, as they're both equal, so MAX() returns both. There are no primary keys setup on this yet--the query I'm using below is the one I hope to add into an INSERT statement for another table, once I only get a SINGLE best attempt per User (StudentID), and set that StudentID as the key. So you see my problem...
I've tried a few DISTINCT or TOP statements in my query but either I'm putting them into the wrong part of the query or they still return two records for a user who had identically scored attempts. Any suggestions?
SELECT p.*
FROM
(SELECT
StudentID, MAX(PlacementResults) AS PlacementResults
FROM AleksMathResults
GROUP BY StudentID)
AS mx
JOIN AleksMathResults p ON mx.StudentID = p.StudentID AND mx.PlacementResults = p.PlacementResults
ORDER BY
StudentID
Sounds like you want row_number():
SELECT amr.*
FROM (SELECT amr.*
ROW_NUMBER() OVER (PARTITION BY StudentID ORDER BY PlacementResults DESC) as seqnum
FROM AleksMathResults amr
) amr
WHERE seqnum = 1;

Validate that only one value exists

I have a table with two relevant columns. I'll call them EID and MID. They are not unique.
In theory, if the data is set up correctly, there will be many records for each EID and every one of those records should have the same MID.
There are situations where someone may manually update data incorrectly and I need to be able to quickly identify if there is a second MID for any EID.
Ideally, I'd have a query that returns how many MIDs for each EID, but only showing results where there is more than 1 MID. Below is what I'd like the results to look like.
EID Count of Distinct MID values
200345 2
304334 3
I've tried several different forms of queries, but I can't seem to figure out how to reach this result. We're on SQL Server.
You can use the following using COUNT with DISTINCT and HAVING:
SELECT EID, COUNT(DISTINCT MID)
FROM table_name
GROUP BY EID
HAVING COUNT(DISTINCT MID) > 1
demo on dbfiddle.uk

Optimizing my stored procedure - is this the right way to do it?

SELECT TOP 1 #CurrentStudentID = StudentID
FROM Courses WITH (NOLOCK)
WHERE Courses.CourseID = #CourseID
ORDER BY StudentID
-- Loop through all the students and find if he/she is registered for more than one course.
WHILE (##ROWCOUNT > 0 AND #CurrentStudentID IS NOT NULL)
BEGIN
-- Select all other courses student is currently registered in.
IF ##ROWCOUNT > 0
BEGIN
-- return required information
END
ELSE
BEGIN
-- Perform some operations
END
-- Select the next registered student
SELECT TOP 1 #CurrentStudentID = StudentID
FROM Courses WITH (NOLOCK)
WHERE Courses.CourseID = #CourseID AND
Courses.StudentID > #CurrentStudentID
ORDER BY StudentID
END
Can someone help with my logic here? I wrote a stored procedure to find out if a student of a course is currently taking other courses from the same school.
I'm particularly worried about the two SELECT queries and the performance of a while loop if the number of students is huge. I feel the way I am doing it feels very contrived. I'm sure there are better ways to do this.
I've done SQL profiling on this stored procedure and it's duration can range from 0 - 60 ms for a single call. I don't understand why the same stored procedure's execution time is so random and inconsistent.
Appreciate any help. I only have 1 year plus of SQL Server 2008 experience.
Thanks in advance.
AS I mentioned, SQL is a set-based theory language. In other words, it is semi-relational with data sets that allows for efficient comparisons between groups of data. "Lower" languages such as C++ or Java do not maintain such large data sets, since they are cursor (line by line) based-languages.
High level as this definition is, the point is to think of your data like EXCEL sheets. You have predefined columns such as CourseID and StudentID, that have information in the other columns that are dependent on those values (CourseID 1:1 Course_Name) and some information that is repetitive (CourseID can have multiple students).
True normalization includes removing interdependent columns, but lets not worry about that right now. The main focus is on what makes sense for the business. Your table has Identifying columns for its courses and students. So you do not need to use a cursor if those values do not have conflicting interdependent values.
SELECT StudentID, COUNT(COURSEID) AS CLASS_NUM
FROM COURSES
GROUP BY StudentID
HAVING COUNT(COURSEID) > 1
The GROUP BY returns distinct sets of values from the columns listed, flattening the other rows and allowing aggregate functions like COUNT(). (note: NULLS are not counted in the COUNT(). Use an ISNULL function)
You have not yet limited the list, and yet you achieve the same results. After SQL flattens the rows, you can use a HAVING clause to further limit the result sets from the GROUP BY if needed.
Way faster than a cursor, definitely. :)
Now, if your table includes students in different semesters and years, you might consider adding this to the GROUP BY, so that you have sets in your GROUP BY (StudentID and Year)
Also, recall that the SELECT statement LOGICALLY read AFTER the GROUP BY and HAVING clauses, so any columns listed in the SELECT statement must appear in the GROUP BY or or have an aggregate function.

Find row number in a sort based on row id, then find its neighbours

Say that I have some SELECT statement:
SELECT id, name FROM people
ORDER BY name ASC;
I have a few million rows in the people table and the ORDER BY clause can be much more complex than what I have shown here (possibly operating on a dozen columns).
I retrieve only a small subset of the rows (say rows 1..11) in order to display them in the UI. Now, I would like to solve following problems:
Find the number of a row with a given id.
Display the 5 items before and the 5 items after a row with a given id.
Problem 2 is easy to solve once I have solved problem 1, as I can then use something like this if I know that the item I was looking for has row number 1000 in the sorted result set (this is the Firebird SQL dialect):
SELECT id, name FROM people
ORDER BY name ASC
ROWS 995 TO 1005;
I also know that I can find the rank of a row by counting all of the rows which come before the one I am looking for, but this can lead to very long WHERE clauses with tons of OR and AND in the condition. And I have to do this repeatedly. With my test data, this takes hundreds of milliseconds, even when using properly indexed columns, which is way too slow.
Is there some means of achieving this by using some SQL:2003 features (such as row_number supported in Firebird 3.0)? I am by no way an SQL guru and I need some pointers here. Could I create a cached view where the result would include a rank/dense rank/row index?
Firebird appears to support window functions (called analytic functions in Oracle). So you can do the following:
To find the "row" number of a a row with a given id:
select id, row_number() over (partition by NULL order by name, id)
from t
where id = <id>
This assumes the id's are unique.
To solve the second problem:
select t.*
from (select id, row_number() over (partition by NULL order by name, id) as rownum
from t
) t join
(select id, row_number() over (partition by NULL order by name, id) as rownum
from t
where id = <id>
) tid
on t.rownum between tid.rownum - 5 and tid.rownum + 5
I might suggest something else, though, if you can modify the table structure. Most databases offer the ability to add an auto-increment column when a row is inserted. If your records are never deleted, this can server as your counter, simplifying your queries.

Filtering Database Results to Top n Records for Each Value in a Lookup Column

Let's say I have two tables in my database.
TABLE:Categories
ID|CategoryName
01|CategoryA
02|CategoryB
03|CategoryC
and a table that references the Categories and also has a column storing some random number.
TABLE:CategoriesAndNumbers
CategoryType|Number
CategoryA|24
CategoryA|22
CategoryC|105
.....(20,000 records)
CategoryB|3
Now, how do I filter out this data? So, I want to know what the 3 smallest numbers are out of each category and delete the rest. The end result would be like this:
TABLE:CategoriesAndNumbers
CategoryType|Number
CategoryA|2
CategoryA|5
CategoryA|18
CategoryB|3
CategoryB|500
CategoryB|1601
CategoryC|1
CategoryC|4
CategoryC|62
Right now, I can get the smallest numbers between all the categories, but I would like each category to be compared individually.
EDIT: I'm using Access and here's my code so far
SELECT TOP 10 cdt1.sourceCounty, cdt1.destCounty, cdt1.distMiles
FROM countyDistanceTable as cdt1, countyTable
WHERE cdt1.sourceCounty = countyTable.countyID
ORDER BY cdt1.sourceCounty, cdt1.distMiles, cdt1.destCounty
EDIT2: Thanks to Remou, here would be the working query that solved my problem. Thank you!
DELETE
FROM CategoriesAndNumbers a
WHERE a.Number NOT IN (
SELECT Top 3 [Number]
FROM CategoriesAndNumbers b
WHERE b.CategoryType=a.CategoryType
ORDER BY [Number])
You could use something like:
SELECT a.CategoryType, a.Number
FROM CategoriesAndNumbers a
WHERE a.Number IN (
SELECT Top 3 [Number]
FROM CategoriesAndNumbers b
WHERE b.CategoryType=a.CategoryType
ORDER BY [Number])
ORDER BY a.CategoryType
The difficulty with this is that Jet/ACE Top selects duplicate values where they exist, so you will not necessarily get three values, but more, if there are ties. The problem can often be solved with a key field, if one exists :
WHERE a.Number IN (
SELECT Top 3 [Number]
FROM CategoriesAndNumbers b
WHERE b.CategoryType=a.CategoryType
ORDER BY [Number], [KeyField])
However, I do not think it will help in this instance, because the outer table will include ties.
Order it by number and take 3, find out what the biggest number is and then remove rows where Number is greater than the Number.
I imagine it would need to be two seperate queries as your business tier would hold the value for the biggest number out of the 3 results and dynamically build the query to delete the rest.