What order is used by First() function? - sql

Why do the following two queries return identical results?
SELECT FIRST(score) FROM (SELECT score FROM scores ORDER BY score ASC)
SELECT FIRST(score) FROM (SELECT score FROM scores ORDER BY score DESC)
It's confusing, considering that I manually specify the order of subqueries.

The order of the results in the subquery is irrelevant, unless you use TOP within the subquery, which you don't here. Most SQL variants won't allow this syntax -- using an ORDER BY in a subquery throws an error in SQL Server, for example.
Your top-level query has no ORDER BY, thus the concepts of FIRST or TOP 1 are undefined in the context of that query.
In the reference docs, Microsoft states (emphasis mine):
Because records are usually returned in no particular order (unless
the query includes an ORDER BY clause), the records returned by these
functions will be arbitrary.

To answer the question directly:
Access ignores the ORDER BY clause in most subqueries. I beleive (but can't prove) this is due to bugs/limitations in the query optimiser, although it's not documented anywhere (that I could find). I've tested lots of SQL using Access 2007 and Access 2016 to come to this conclusion.
To make the examples work as expected:
Add TOP 100 PERCENT to the subqueries:
SELECT FIRST(score) FROM (SELECT TOP 100 PERCENT score FROM scores ORDER BY score ASC)
SELECT FIRST(score) FROM (SELECT TOP 100 PERCENT score FROM scores ORDER BY score DESC)
When to use First/Last instead of Max/Min:
A good example of when you'd want to use this approach instead of the simpler Min and Max aggregate functions is when there's another field that you want from the same record, e.g if the underlying scores table also held the names of players and the rounds of the game, you can get the name and score of the best and worst player in each round like this:
SELECT
round, FIRST(name) AS best, FIRST(score) AS highscore, LAST(name) AS worst, LAST(score) AS lowscore
FROM
(SELECT TOP 100 PERCENT * FROM scores ORDER BY score DESC)
GROUP BY
round

Your statements are a perfect functional equivalents to
SELECT Min(Score) FROM Scores and
SELECT Max(Score) FROM Scores.
If you really want to retrieve the first and last score, you will need an AutoNumber or a DateTime field to indicate the input order. You could then query:
SELECT First(Score), Last(Score) FROM Scores ORDER BY MySortKey
If you persist with your question, the correct syntax would be
SELECT FIRST(score) FROM (SELECT score FROM scores) ORDER BY score ASC,
or, simplified,
SELECT FIRST(score) FROM scores ORDER BY score ASC

Related

How to select the row with the lowest value- oracle

I have a table where I save authors and songs, with other columns. The same song can appear multiple times, and it obviously always comes from the same author. I would like to select the author that has the least songs, including the repeated ones, aka the one that is listened to the least.
The final table should show only one author name.
Clearly, one step is to find the count for every author. This can be done with an elementary aggregate query. Then, if you order by count and you can just select the first row, this would solve your problem. One approach is to use ROWNUM in an outer query. This is a very elementary approach, quite efficient, and it works in all versions of Oracle (it doesn't use any advanced features).
select author
from (
select author
from your_table
group by author
order by count(*)
)
where rownum = 1
;
Note that in the subquery we don't need to select the count (since we don't need it in the output). We can still use it in order by in the subquery, which is all we need it for.
The only tricky part here is to remember that you need to order the rows in the subquery, and then apply the ROWNUM filter in the outer query. This is because ORDER BY is the very last thing that is processed in any query - it comes after ROWNUM is assigned to rows in the output. So, moving the WHERE clause into the subquery (and doing everything in a single query, instead of a subquery and an outer query) does not work.
You can use analytical functions as follows:
Select * from
(Select t.*,
Row_number() over (partition by song order by cnt_author) as rn
From
(Select t.*,
Count(*) over (partition by author) as cnt_author
From your_table t) t ) t
Where rn = 1

Calculating SQL Server ROW_NUMBER() OVER() for a derived table

In some other databases (e.g. DB2, or Oracle with ROWNUM), I can omit the ORDER BY clause in a ranking function's OVER() clause. For instance:
ROW_NUMBER() OVER()
This is particularly useful when used with ordered derived tables, such as:
SELECT t.*, ROW_NUMBER() OVER()
FROM (
SELECT ...
ORDER BY
) t
How can this be emulated in SQL Server? I've found people using this trick, but that's wrong, as it will behave non-deterministically with respect to the order from the derived table:
-- This order here ---------------------vvvvvvvv
SELECT t.*, ROW_NUMBER() OVER(ORDER BY (SELECT 1))
FROM (
SELECT TOP 100 PERCENT ...
-- vvvvv ----redefines this order here
ORDER BY
) t
A concrete example (as can be seen on SQLFiddle):
SELECT v, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM (
SELECT TOP 100 PERCENT 1 UNION ALL
SELECT TOP 100 PERCENT 2 UNION ALL
SELECT TOP 100 PERCENT 3 UNION ALL
SELECT TOP 100 PERCENT 4
-- This descending order is not maintained in the outer query
ORDER BY 1 DESC
) t(v)
Also, I cannot reuse any expression from the derived table to reproduce the ORDER BY clause in my case, as the derived table might not be available as it may be provided by some external logic.
So how can I do it? Can I do it at all?
The Row_Number() OVER (ORDER BY (SELECT 1)) trick should NOT be seen as a way to avoid changing the order of underlying data. It is only a means to avoid causing the server to perform an additional and unneeded sort (it may still perform the sort but it's going to cost the minimum amount possible when compared to sorting by a column).
All queries in SQL server ABSOLUTELY MUST have an ORDER BY clause in the outermost query for the results to be reliably ordered in a guaranteed way.
The concept of "retaining original order" does not exist in relational databases. Tables and queries must always be considered unordered until and unless an ORDER BY clause is specified in the outermost query.
You could try the same unordered query 100,000 times and always receive it with the same ordering, and thus come to believe you can rely on said ordering. But that would be a mistake, because one day, something will change and it will not have the order you expect. One example is when a database is upgraded to a new version of SQL Server--this has caused many a query to change its ordering. But it doesn't have to be that big a change. Something as little as adding or removing an index can cause differences. And more: Installing a service pack. Partitioning a table. Creating an indexed view that includes the table in question. Reaching some tipping point where a scan is chosen instead of a seek. And so on.
Do not rely on results to be ordered unless you have said "Server, ORDER BY".

SQL random aggregate

Say I have a simple table with 3 fields: 'place', 'user' and 'bytes'. Let's say, that under some filter, I want to group by 'place', and for each 'place', to sum all the bytes for that place, and randomly select a user for that place (uniformly from all the users that fit the 'where' filter and the relevant 'place'). If there was a "select randomly from" aggregate function, I would do:
SELECT place, SUM(bytes), SELECT_AT_RANDOM(user) WHERE .... GROUP BY place;
...but I couldn't find such an aggregate function. Am I missing something? What could be a good way to achieve this?
If your RDBMS supports analytical functions.
WITH T
AS (SELECT place,
Sum(bytes) OVER (PARTITION BY place) AS Sum_bytes,
user,
Row_number() OVER (PARTITION BY place ORDER BY random_function()) AS RN
FROM YourTable
WHERE .... )
SELECT place,
Sum_bytes,
user
FROM T
WHERE RN = 1;
For SQL Server Crypt_gen_random(4) or NEWID() would be examples of something that could be substituted in for random_function()
I think your question is DBMS specific. If your DBMS is MySql, you can use a solution like this:
SELECT place_rand.place, SUM(place_rand.bytes), place_rand.user as random_user
FROM
(SELECT place, bytes, user
FROM place
WHERE ...
ORDER BY rand()) place_rand
GROUP BY
place_rand.place;
The subquery orders records in random order. The outer query groups by place, sums bytes, and returns first random user, since user is not in an aggregate function and neither in the group by clause.
With a custom aggregate function, you could write expressions as simple as:
SELECT place, SUM(bytes), SELECT_AT_RANDOM(user) WHERE .... GROUP BY place;
SELECT_AT_RAMDOM would be the custom aggregate function.
Here is precisely an implementation in PostgreSQL.
I would do a bit of a variation on Martin's solution:
select place, sum(bytes), max(case when seqnum = 1 then user end) as random_user
from (select place, bytes,
row_number() over (partition by place order by newid()) as sequm
from t
) t
group by place
(Where newid() is just one way to get a random number, depending on the database.)
For some reason, I prefer this approach, because it still has the aggregation function in the outer query. If you are summarizing a bunch of fields, then this seems cleaner to me.

Using a SELECT statement within a WHERE clause

SELECT * FROM ScoresTable WHERE Score =
(SELECT MAX(Score) FROM ScoresTable AS st WHERE st.Date = ScoresTable.Date)
Is there a name to describe using a SELECT statement within a WHERE clause? Is this good/bad practice?
Would this be a better alternative?
SELECT ScoresTable.*
FROM ScoresTable INNER JOIN
(SELECT Date, MAX(Score) AS MaxScore
FROM ScoresTable GROUP BY Date) SubQuery
ON ScoresTable.Date = SubQuery.Date
AND ScoresTable.Score = SubQuery.MaxScore
It is far less elegant, but appears to run more quickly than my previous version. I dislike it because it is not displayed very clearly in the GUI (and it needs to be understood by SQL beginners). I could split it into two separate queries, but then things begin to get cluttered...
N.B. I need more than just Date and Score (e.g. name)
It's called correlated subquery. It has it's uses.
It's not bad practice at all. They are usually referred as SUBQUERY, SUBSELECT or NESTED QUERY.
It's a relatively expensive operation, but it's quite common to encounter a lot of subqueries when dealing with databases since it's the only way to perform certain kind of operations on data.
There's a much better way to achieve your desired result, using SQL Server's analytic (or windowing) functions.
SELECT DISTINCT Date, MAX(Score) OVER(PARTITION BY Date) FROM ScoresTable
If you need more than just the date and max score combinations, you can use ranking functions, eg:
SELECT *
FROM ScoresTable t
JOIN (
SELECT
ScoreId,
ROW_NUMBER() OVER (PARTITION BY Date ORDER BY Score DESC) AS [Rank]
FROM ScoresTable
) window ON window.ScoreId = p.ScoreId AND window.[Rank] = 1
You may want to use RANK() instead of ROW_NUMBER() if you want multiple records to be returned if they both share the same MAX(Score).
The principle of subqueries is not at all bad, but I don't think that you should use it in your example. If I understand correctly you want to get the maximum score for each date. In this case you should use a GROUP BY.
This is a correlated sub-query.
(It is a "nested" query - this is very non-technical term though)
The inner query takes values from the outer-query (WHERE st.Date = ScoresTable.Date) thus it is evaluated once for each row in the outer query.
There is also a non-correlated form in which the inner query is independent as as such is only executed once.
e.g.
SELECT * FROM ScoresTable WHERE Score =
(SELECT MAX(Score) FROM Scores)
There is nothing wrong with using subqueries, except where they are not needed :)
Your statement may be rewritable as an aggregate function depending on what columns you require in your select statement.
SELECT Max(score), Date FROM ScoresTable
Group By Date
In your case scenario, Why not use GROUP BY and HAVING clause instead of JOINING table to itself. You may also use other useful function. see this link
Subquery is the name.
At times it's required, but good/bad depends on how it's applied.

How can I optimize this query?

I've got a bit of a nasty query with several subselects that are really slowing it down. I'm already caching the query, but the results of it changes often and the query results are meant to be shown on a high traffic page.
SELECT user_id, user_id AS uid, (SELECT correct_words
FROM score
WHERE user_id = `uid`
ORDER BY correct_words DESC, incorrect_words ASC
LIMIT 0, 1) AS correct_words,
(SELECT incorrect_words
FROM score
WHERE user_id = `uid`
ORDER BY correct_words DESC, incorrect_words ASC
LIMIT 0, 1) AS incorrect_words
FROM score
WHERE user_id > 0
AND DATE(date_tested) = DATE(NOW())
GROUP BY user_id
ORDER BY correct_words DESC,incorrect_words ASC
LIMIT 0,7
The goal of the query is to pick out the top score for users for that day, but only show the highest scoring instance of that user instead of all of their scores (So, for instance, if one user actually had 4 of the top 10 scores for that day, I only want to show that user's top score and remove the rest)
Try as I might, I've yet to replicate the results of this query any other way. Right now its average run time is about 2 seconds, but I'm afraid that might increase greatly as the table gets bigger.
Any thoughts?
try this:
The subquery basically returns the resultset of all the scores in the right order, and the outer query greps out the first occurence. When grouping in MySQL, columns that are not grouped on return the equivalent to FIRST(column): the value of the first occurence.
SELECT user_id, correct_words, incorrect_words
FROM
( SELECT user_id, correct_words, incorrect_words
FROM score
WHERE user_id>0
AND DATE(date_tested)=DATE(NOW())
ORDER BY correct_words DESC,incorrect_words ASC
)
GROUP BY user_id
LIMIT 0,7
The subqueries for correct_words and incorrect_words could be really killing your performance. In the worst case, MySQL has to execute those queries for each row it considers (not each row that it returns!). Rather than using scalar subqueries, consider rewriting your query to use JOIN-variants as appropriate.
Additionally, filtering by DATE(date_tested)=DATE(NOW()) may be preventing MySQL from using an index. I don't believe any of the production versions of MySQL allow function-based indices.
Make sure you have indices on all the columns you filter and order by. MySQL can make use of multi-column indices if the columns filtered or ordered by match your query, e.g. CREATE INDEX score_correct_incorrect_idx ON score ( correct_words DESC, incorrect_words ASC ); would be a candidate index, though MySQL may choose not to use it depending on the execution plan it creates and its estimates of table sizes.