SQL - Position in statistics

SQL - Position in statistics - sql

For example, I have table with columns:
playerName TEXT,
score INTEGER
And I have 10,000 rows in this table.
Now I can select e.g. top 100 players by using simple SQLite query:
SELECT playerName FROM table ORDER BY score DESC LIMIT 100
And now I have a question: how I can get position in statistics of player X which is not in top100?
I can do this by selecting all rows and then in a loop find position of player X but I think it doesn't have a good performance.
Is a simpler way to do this in SQLite and MySQL?

You can count how many players with larger score there are:
SELECT COUNT(*)
FROM MyTable
WHERE score >= (SELECT score
FROM MyTable
WHERE playerName = 'X');
(If you want to know this for all players, a single query would be more efficient.)

Related

Optimize query to get rows with highest, filtered count in another table

I'm trying to create the most optimal query where the database would return the names of readers who often borrow sci-fi books. That's what I'm trying to optimize:
SELECT reader.name,
COUNT (CASE WHEN book.status_id = 1 AND book.category_id = 2 THEN 1 END)
FROM reader
JOIN book ON book.reader_id = reader.id
GROUP BY reader.name
ORDER BY COUNT (CASE WHEN book.status_id = 1 AND book.category_id = 2 THEN 1 END) DESC
LIMIT 10;
How can I improve my query other than with INNER JOIN or memory consumption increase?
This is my ERD diagram:

You could try to add your criteria in your join statement and just use the total count. It really depends on how much data you have etc....
SELECT reader.name,
COUNT(1) AS COUNTER
FROM reader
JOIN book ON book.reader_id = reader.id
AND book.status_id = 1
AND book.category = 2
GROUP BY reader.name
ORDER BY COUNTER DESC
LIMIT 10;

Assuming at least 10 readers that pass the criteria (like another answer also silently assumes), else you get fewer than 10 result rows.
Start with the filter. Aggregate & limit before joining to the second table. Much cheaper:
SELECT r.reader_id, r.surname, r.name, b.ct
FROM (
SELECT reader_id, count(*) AS ct
FROM book
WHERE status_id = 1
AND category_id = 2
GROUP BY reader_id
ORDER BY ct DESC, reader_id -- tiebreaker
LIMIT 10
) b
JOIN reader r ON r.id = b.reader_id
ORDER BY b.ct DESC, r.reader_id; -- tiebreaker
A multicolumn index on (status_id, category_id) would help a lot. Or an index on just one of both columns if either predicate is very selective. If performance of this particular query is your paramount objective, have this partial multicolumn index:
CREATE INDEX book_reader_special_idx ON book (reader_id)
WHERE status_id = 1 AND category_id = 2;
Typically, you'd vary the query, then this last index is too specialized.
Additional points:
Group by reader_id, which is the primary key (I assume) and guaranteed to be unique - as opposed to reader.name! Your original is likely to fail completely, name being just the "first name" from the looks of your ERD.
It's also typically substantially faster to group by an integer instead of varchar(25) (two times). But that's secondary, correctness comes first.
Also output surname and reader_id to disambiguate identical names. (Even name & surname are not reliably unique.)
count(*) is faster than count(1) while doing the same, exactly.
Add a tiebreaker to the ORDER BY clause to get a stable sort order and deterministic results. (Else, the result can be different every time with ties on the count.)

Need help understanding what is going on in this subquery

I'm new to SQL and while I understand the basics, I'm trying to improve my skills by practicing on LeetCode. I came across the below problem and I'm trying to understand what is going on in the solution as I can't wrap my head around it:
Table: Scores
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| score | decimal |
+-------------+---------+
Problem: Write an SQL query to rank the scores. The ranking should be calculated according to the following rules:
The scores should be ranked from the highest to the lowest.
If there is a tie between two scores, both should have the same ranking.
After a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no holes between ranks.
Return the result table ordered by score in descending order.
Solution:
SELECT
Score,
(SELECT count(distinct Score) FROM Scores WHERE Score >= s.Score) Rank
FROM Scores s
ORDER BY Score desc
What's going on in the sub-query? Could someone explain what it's doing by breaking it down? Any resources I could reference to better understand?

The best way is to test it with different values.
In plain English, I would say that for each row R1 in Scores table we count the number of scores having a larger value than R1.score (plus one).

I believe this is what you want, if not, I'm sorry:
SELECT
s.Score --field [score] in each row from table Scores
, (
SELECT
count(distinct s.Score) --counting only unique (distinct) values that are bigger than field [score] from current row outside subquery
FROM
Scores
WHERE
Score >= s.Score --filtering only rows with field [score] bigger than field [score] from current row outside subquery
) Rank --alias from field created by subquery
FROM
Scores s --Query all rows in table Scores
ORDER BY
Score desc --sort rows by score from bigger to smaller

SQL Server : index for finding latest value which is greater than a passed value

I have a table with 4 columns
USER_ID: numeric
EVENT_DATE: date
VERSION: date
SCORE: decimal
I have a clustered index on (USER_ID, EVENT_DATE, VERSION). These three values together are unique.
I need to get the maximum EventDate for a set of UserIds (~1000 different ids) where the Score is larger than a specific value and only consider those entries with a specific Version.
SELECT M.*
FROM (VALUES
( 5237 ),
………1000 more
( 27054 ) ) C (USER_ID)
CROSS APPLY
(SELECT TOP 1 C.USER_ID, M.EVENT_DATE, M.SCORE
FROM MY_HUGE_TABLE M
WHERE C. USER_ID = M. USER_ID
AND M.VERSION = 'xxxx-xx-xx'
AND M.SCORE > 2 --Comment M.SCORE > 2
ORDER BY M.EVENT_DATE DESC) M
Once I execute the query, I get poor results with respect to runtime, due to a missing index on score column (I suppose).
If I delete the filtering on “M.SCORE > 2” I get my results ten times faster, nevertheless the latest Scores may be less than “2”.
Could anyone please hint me on how to setup an index which could allow me to improve my query performance.
Thank you very much in advance

For your query, the optimal index would be on (User_ID, Version, ValueDate desc, Score).
Unfortunately, your clustered index doesn't match. Only the first and third columns match, but they need to match in order. So, only the User_ID can help but that probably doesn't do much to filter the data.

how can I calculate the sum of my top n records in crystal report?

I m using report tab -> group sort expert-> top n to get top n record but i m getting sum of value in report footer for all records
I want only sum of value of top n records...
In below image i have select top 3 records but it gives sum of all records.

The group sort expert (and the record sort expert too) intervenes in your final result after the total summary is calculated. It is unable to filter and remove rows, in the same way an ORDER BY clause of SQL cannot effect the SELECT's count result (this is a job for WHERE clause). As a result, your summary will always be computed for all rows of your detail section and, of course, for all your group sums.
If you have in mind a specific way to exlude specific rows in order to appear the appropriate sum the you can use the Select Expert of Crystal Reports to remove rows.
Alternatively (and I believe this is the best way), I would make all the necessary calculations in the SQL command and I would sent to the report only the Top 3 group sums (then you can get what you want with a simple total summary of these 3 records)
Something like that
CREATE TABLE #TEMP
(
DEP_NAME varchar(50),
MINVAL int,
RMAVAL int,
NETVAL int
)
INSERT INTO #TEMP
SELECT TOP 3
T.DEP_NAME ,T.MINVAL,T.RMAVAL,T.NETVAL
FROM
(SELECT DEP_NAME AS DEP_NAME,SUM(MINVAL) AS MINVAL,SUM(RMAVAL) AS
RMAVAL,SUM(NETVAL) AS NETVAL
FROM YOURTABLE
GROUP BY DEP_NAME) AS T
ORDER BY MINVAL DESC
SELECT * FROM #TEMP

MySQL querying with a dynamic range?

Given the table snippet:
id | name | age
I am trying to form a query that will return 10 people within a certain age range. However, if there are not enough people in that range, I want to extend the range until I can find 10 people.
For instance, if I only find 5 people in a range of 30-40, I would find 5 others in a 25-45 range.
In addition, I would like the query to be able use order by RAND() or similar, in order to be able to get different results each time.
Is this going beyond what MySQL can handle? Will I have to put some of this logic in the application instead?

UPDATED for performance:
My original solution worked but requuired a table scan. Am's solution is a good one and doesn't require a table scan but its hard-coded ranges won't work when the only matches are far outliers. Plus it requires de-duping records. But combining both solutions can get you the best of both worlds, provided you have an index on age. (if you don't have an index on age, then all solutions will require a table scan).
The combined solution first picks only the rows which might qualify (the desired range, plus the 10 rows over and 10 rows under that range), and then uses my original logic to rank the results. Caveat: I don't have enough sample data present to verify that MySQL's optimizer is indeed smart enough to avoid a table scan here-- MySQL might not be smart enough to weave those three UNIONs together without a scan.
[just updated again to fix 2 embarrassing SQL typos: DESC where DESC shouldn't have been!]
SELECT * FROM
(
SELECT id, name, age,
CASE WHEN age BETWEEN 25 and 35 THEN RAND() ELSE ABS (age-30) END as distance
FROM
(
SELECT * FROM (SELECT * FROM Person WHERE age > 35 ORDER BY age DESC LIMIT 10) u1
UNION
SELECT * FROM (SELECT * FROM Person WHERE age < 25 ORDER BY age LIMIT 10) u2
UNION
SELECT * FROM (SELECT * FROM Person WHERE age BETWEEN 25 and 35) u3
) p2
ORDER BY distance
LIMIT 10
) p ORDER BY RAND() ;
Original Solution:
I'd approach it this way:
first, compute how far each record is from the center of the desired age range, and order the results by that distance. For all results inside the range, treat the distance as a random number between zero and one. This ensures that records inside the range will be selected in a random order, while records outside the range, if needed, will be selected in order closest to the desired range.
trim the number of records in that distance-ordered resultset to 10 records
randomize order of the resulting records
Like this:
CREATE TABLE Person (id int AUTO_INCREMENT PRIMARY KEY, name varchar(50) NOT NULL, age int NOT NULL);
INSERT INTO Person (name, age) VALUES ("Joe Smith", 26);
INSERT INTO Person (name, age) VALUES ("Frank Johnson", 32);
INSERT INTO Person (name, age) VALUES ("Sue Jones", 24);
INSERT INTO Person (name, age) VALUES ("Ella Frederick", 44);
SELECT * FROM
(
SELECT id, name, age,
CASE WHEN age BETWEEN 25 and 35 THEN RAND() ELSE ABS (age-30) END as distance
FROM Person
ORDER BY distance DESC
LIMIT 10
) p ORDER BY RAND() ;
Note that I'm assuming that, if there are not enough records inside the range, the records you want to append are the ones closest to that range. If this assumption is incorrect, please add more details to the question.
re: performance, this requires a scan through the table, so won't be fast-- I'm working on a scan-less solution now...

I would do somthing like this:
select * from (
SELECT * FROM (select * from ppl_table where age>30 and age<40 order by rand() limit 10) as Momo1
union
SELECT * FROM (select * from ppl_table where age>25 and age<40 order by rand() limit 20) as Momo2
) as FinalMomo
limit 10
basically selecting 10 users from the first group and then more from the second group.
if the first group doesn't add up to 10, there will be more from the second group.
The reason we are selectong 20 from the second group is because UNION will remove the duplicated values, and you want to have at least 10 users in the final result.
Edit
I added the as aliases from the inner SELECT, and made a separate in the inner SELECTs since MySql doesn't like ORDER BY with UNION

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - Position in statistics - sql

You can count how many players with larger score there are: SELECT COUNT(*) FROM MyTable WHERE score >= (SELECT score FROM MyTable WHERE playerName = 'X'); (If you want to know this for all players, a single query would be more efficient.)

Related

Optimize query to get rows with highest, filtered count in another table

Need help understanding what is going on in this subquery

SQL Server : index for finding latest value which is greater than a passed value

how can I calculate the sum of my top n records in crystal report?

MySQL querying with a dynamic range?

Categories

Resources