Need help understanding what is going on in this subquery

Need help understanding what is going on in this subquery - sql

I'm new to SQL and while I understand the basics, I'm trying to improve my skills by practicing on LeetCode. I came across the below problem and I'm trying to understand what is going on in the solution as I can't wrap my head around it:
Table: Scores
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| score | decimal |
+-------------+---------+
Problem: Write an SQL query to rank the scores. The ranking should be calculated according to the following rules:
The scores should be ranked from the highest to the lowest.
If there is a tie between two scores, both should have the same ranking.
After a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no holes between ranks.
Return the result table ordered by score in descending order.
Solution:
SELECT
Score,
(SELECT count(distinct Score) FROM Scores WHERE Score >= s.Score) Rank
FROM Scores s
ORDER BY Score desc
What's going on in the sub-query? Could someone explain what it's doing by breaking it down? Any resources I could reference to better understand?

The best way is to test it with different values.
In plain English, I would say that for each row R1 in Scores table we count the number of scores having a larger value than R1.score (plus one).

I believe this is what you want, if not, I'm sorry:
SELECT
s.Score --field [score] in each row from table Scores
, (
SELECT
count(distinct s.Score) --counting only unique (distinct) values that are bigger than field [score] from current row outside subquery
FROM
Scores
WHERE
Score >= s.Score --filtering only rows with field [score] bigger than field [score] from current row outside subquery
) Rank --alias from field created by subquery
FROM
Scores s --Query all rows in table Scores
ORDER BY
Score desc --sort rows by score from bigger to smaller

Related

First name should randomly match with other FIRST name

All first name should randomly match with each other and when I tried to run query again the First Name should be match with others name. Not the match with FIRST time match.
For example I have 6 records in one table ...
First name column looks like:
JHON
LEE
SAM
HARRY
JIM
KRUK
So I want result like
First name1 First name2
Jhon. Harry
LEE. KRUK
HARRY SAM

The simplest solution is to first randomly sort the records, then calculate the grouping and a sequence number within the group and then finally select out the groups as rows.
You can follow along with the logic in this fiddle: https://dbfiddle.uk/9JlK59w4
DECLARE #Sorted TABLE
(
Id INT PRIMARY KEY,
FirstName varchar(30),
RowNum INT IDENTITY(1,1)
);
INSERT INTO #Sorted (Id, FirstName)
SELECT Id, FirstName
FROM People
ORDER BY NEWID();
WITH Pairs as
(
SELECT *
, (RowNum+1)/2 as PairNum
, RowNum % 2 as Ordinal
FROM #Sorted
)
SELECT
Person1.FirstName as [First name1], Person2.FirstName as [First name2]
FROM Pairs Person1
LEFT JOIN Pairs Person2 ON Person1.PairNum = Person2.PairNum AND Person2.Ordinal = 1
WHERE Person1.Ordinal = 0
ORDER BY Person1.PairNum
ORDER BY NEWID() is used here to randomly sort the records. Note that it is indeterminate and will return a new value with each execution. It's not very efficient, but is suitable for our requirement.
You can't easily use CTE's for producing lists of randomly sorted records because the result of a CTE is not cached. Each time the CTE is referenced in the subsequent logic can result in re-evaluating the expression. Run this fiddle a few times and watch how it often allocates the names incorrectly: https://dbfiddle.uk/rpPdkkAG
Due to the volatility of NEWID() this example stores the results in a table valued variable. For a very large list of records a temporary table might be more efficient.
PairNum uses the simple divide by n logic to assign a group number with a length of n
It is necessary to add 1 to the RowNum because the integer math will round down, see this in action in the fiddle.
Ordinal uses the modulo on the RowNumber and is a value we can use to differentiate between Person 1 and Person 2 in the pair. This helps us keep the rest of the logic determinate.
In the final SELECT we select first from the Pairs that have an Ordinal of 0, then we join on the Pairs that have an Ordinal of 1 matching by the PairNum
You can see in the fiddle I added a solution using groups of 3 to show how this can be easily extended to larger groupings.

SQL Server calculate average scores from 6 possible columns with Null and Not Null values

I have a table and want to get the average score for each student。
To be more specific, scoremonth1 has more weight to be calculated than 2,3,4,5 and 6 (1>2>3>4>5>6). And we should add no more than 3 monthly scores from the table.
For instance, the average score for Tom should be (80+90)/2 since there are only 2 scores available. As for Marry, the average score should be (90+70+80)/3 since those are the three monthly scores with more weight. And again, for Anna, the average score should be (90+100+70)/3
In my case, there would be over 100 students. Except listing all the possible cases like CASE WHEN scoremonth1 is not null, scoremonth2 is NULL . etc to calculate the average, what else method could do the calculation dynamically?
I know there is a SQL function coalesce to return the first not null value, but how could I get the second and third not null values? And is there a way to track which monthlyscores are added up? I really appreciate your help!

Stu mentioned your underlying issue. To normalize your data without changing table design you can use cross apply...
select student, sum(score)
from table
cross apply (
values(1,scoremonth1),(2,scoremonth2),(3,scoremonth3)) as scores(month,score)
group by student
I strongly suggest you redesign so you don't have to manage this query when adding months by creating a new table called studentScores.
create table studentscores
(
student varchar(200)
,scoremonth int
,score decimal(5,2)
)
And then populate it like this...
insert into studentScores(student,scoremonth,score)
select *
from table
cross apply
(values
(student,1,scoremonth1)
,(student,2,scoremonth2)
,(student,3,scoremonth3)
,(student,4,scoremonth4)
,(student,5,scoremonth5)
) ca(ca1,ca2,ca3)
where ca3 is not null
And finally, usse it like this...
select ss.student, sum(score), count(*) NumOfScores, sum(score)/Count(*) avg
from table
join studentscores ss on ss.student=table.student
where ss.scoremonth between 1 and 3
group by ss.student

Limiting output of rows based on count of values in another table?

As a base example, I have a query that effectively produces a table with a list of values (ID numbers), each of which is attached to a specific category. As a simplified example, it would produce something like this (but at a much larger scale):
IDS
Categories
12345
type 1
12456
type 6
77689
type 3
32456
type 4
67431
type 2
13356
type 2
.....
.....
Using this table, I want to populate another table that gives me a list of ID numbers, with a limit placed on how many of each category are in that list, cross referenced against a sort of range based chart. For instance, if there are 5-15 IDS of type 1 in my first table, I want the new table with the column of IDS to have 3 type 1 IDS in it, if there are 15-30 type 1 IDS in the first table, I want to have 6 type 1 IDS in the new table.
This sort of range based limit would apply to each category, and the IDS would all populate the same column in the new table. The order, or specific IDS that end up in the final table don't matter, as long as the correct number of IDS end up as a part of that final list of ID numbers. This is being used to provide a semi-random sampling of ID numbers based on categories for a sort of QA related process.
If parts of this are unclear I can do my best to explain more. My initial thought was using a variable for a limit clause, but that isnt possible. I have been trying to sort out how to do this with a case statement but I'm really just not making any headway there but I feel like I am at this sort of paper thin wall I just can't break through.

You can use two window functions:
COUNT to keep track of the amount of ids for each category
ROW_NUMBER to uniquely identify each id within each category
Once you have collected these information, it's sufficient to keep all those rows that satisfy either of the following conditions:
count of rows less or equal to 30 >> ranking less or equal to 6
count of rows less or equal to 15 >> ranking less or equal to 3
WITH cte AS (
SELECT IDS,
Categories,
ROW_NUMBER() OVER(ORDER BY IDS PARTITION BY Categories) AS rn
COUNT(IDS) OVER(PARTITION BY Categories) AS cnt
FROM tab
)
SELECT *
FROM cte
WHERE (rn <= 3 AND cnt <= 15)
OR (rn <= 6 AND cnt <= 30)
Note: If you have concerns regarding a specific ordering, you need to fix the ORDER BY clause inside the ROW_NUMBER window function.

SQL Server : index for finding latest value which is greater than a passed value

I have a table with 4 columns
USER_ID: numeric
EVENT_DATE: date
VERSION: date
SCORE: decimal
I have a clustered index on (USER_ID, EVENT_DATE, VERSION). These three values together are unique.
I need to get the maximum EventDate for a set of UserIds (~1000 different ids) where the Score is larger than a specific value and only consider those entries with a specific Version.
SELECT M.*
FROM (VALUES
( 5237 ),
………1000 more
( 27054 ) ) C (USER_ID)
CROSS APPLY
(SELECT TOP 1 C.USER_ID, M.EVENT_DATE, M.SCORE
FROM MY_HUGE_TABLE M
WHERE C. USER_ID = M. USER_ID
AND M.VERSION = 'xxxx-xx-xx'
AND M.SCORE > 2 --Comment M.SCORE > 2
ORDER BY M.EVENT_DATE DESC) M
Once I execute the query, I get poor results with respect to runtime, due to a missing index on score column (I suppose).
If I delete the filtering on “M.SCORE > 2” I get my results ten times faster, nevertheless the latest Scores may be less than “2”.
Could anyone please hint me on how to setup an index which could allow me to improve my query performance.
Thank you very much in advance

For your query, the optimal index would be on (User_ID, Version, ValueDate desc, Score).
Unfortunately, your clustered index doesn't match. Only the first and third columns match, but they need to match in order. So, only the User_ID can help but that probably doesn't do much to filter the data.

SQL - Position in statistics

For example, I have table with columns:
playerName TEXT,
score INTEGER
And I have 10,000 rows in this table.
Now I can select e.g. top 100 players by using simple SQLite query:
SELECT playerName FROM table ORDER BY score DESC LIMIT 100
And now I have a question: how I can get position in statistics of player X which is not in top100?
I can do this by selecting all rows and then in a loop find position of player X but I think it doesn't have a good performance.
Is a simpler way to do this in SQLite and MySQL?

You can count how many players with larger score there are:
SELECT COUNT(*)
FROM MyTable
WHERE score >= (SELECT score
FROM MyTable
WHERE playerName = 'X');
(If you want to know this for all players, a single query would be more efficient.)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Need help understanding what is going on in this subquery - sql

The best way is to test it with different values. In plain English, I would say that for each row R1 in Scores table we count the number of scores having a larger value than R1.score (plus one).

Related

First name should randomly match with other FIRST name

SQL Server calculate average scores from 6 possible columns with Null and Not Null values

Limiting output of rows based on count of values in another table?

SQL Server : index for finding latest value which is greater than a passed value

SQL - Position in statistics

Categories

Resources