SQL ORDER BY in SQL table returning function - sql

so I have simple function trying to get two fields from database. I'm trying to use order by for the results however I cannot use ORDER BY in return clause.
It tells me
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.
Is is it possible to use ORDER BY in RETURN statement? I would like to avoid using order by when executing the function.
CREATE FUNCTION goalsGames1 () RETURNS TABLE
AS RETURN(
SELECT MAX(goals_scored) goals,
no_games
FROM Player
GROUP BY no_games
ORDER BY no_games DESC )

One trick to skip this error is using top as it is mentioned in the error message:
CREATE FUNCTION goalsGames1 () RETURNS TABLE
AS RETURN(
SELECT Top 100 Percent MAX(goals_scored) goals,
no_games
FROM Player
GROUP BY no_games
ORDER BY no_games DESC )

I would like to avoid using order by when executing the function.
If you are using the function and want the results in a particular order, then you need to use ORDER BY.
This is quite clearly stated in the documentation:
The ORDER clause does not guarantee ordered results when a SELECT query is executed, unless ORDER BY is also specified in the query.

use order by intimes of selection your function not in times of creation
so use here in select * from goalsGames1 order by col
and your error tells you where order by is invalid

You cannot order by inside a function, the idea is to order the resultset returned by the function.
select *
from dbo.goalsGames1()
order by no_games
Even if you would order by inside the function, there is no guaranty that this ordering would be preserved when the resultset is returned. The executing query (select * from functionname) has to be responsible for setting the order, not the function or view.
Who ever receives the rows is the only one that can order them, so in this case, the select * from goalsGames1() is the receiver, and this query has to order the results.

Related

Adding ORDER BY gives error "must appear in the group by clause or be used in an aggregate function"

I want to add an ORDER_BY on car_max_price in my query but don't want it in the GROUP_BY.
How to fix the must appear in the group by clause or be used in an aggregate function error? Any idea of what's wrong?
subquery = (
session.query(
garage.id,
garage.name,
func.max(func.coalesce(car.max_price, 0)).label("car_max_price"),
func.jsonb_build_object(
text("'type'"),
car.type,
text("'color'"),
car.color
text("'max_price'"),
func.max(car.max_price),
).label("some_cars"),
)
.group_by(
garage,
car.type,
)
.subquery("subquery")
)
query = (
session.query(
func.jsonb_build_object(
text("'id'"),
subquery.c.id,
text("'name'"),
subquery.c.name,
text("'some_cars'"),
func.jsonb_agg(
func.distinct(subquery.c.some_cars),
).label("some_cars"),
)
.select_from(subquery)
.group_by(
subquery.c.id,
subquery.c.name,
)
.order_by(
subquery.c.car_max_price
)
)
return query
When aggregating, you cannot order by an un-aggregated column. There may be any number of different values in the same aggregated group of rows for it. So you have to define what to use exactly (by way of another aggregate function).
Order by min(subquery.c.car_max_price) or max(subquery.c.car_max_price) or avg(subquery.c.car_max_price) or whatever you actually need.
There is one exception to this rule: if the PRIMARY KEY of a table is listed in the GROUP BY clause, that covers all columns of that table. See:
PostgreSQL - GROUP BY clause
But while operating on a derived table (subquery) that exception cannot apply. A derived table cannot have a PK constraint.

Can't use ORDER BY in a derived table

I am trying to select the last 20 rows of my SQL Database, but I get this error:
[Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.
My query is:
SELECT TOP 20 * FROM (SELECT * FROM TBArticles ORDER BY id_art DESC)
I think it's because I am using ORDER BY in this second expression... but what can I do to select the 20 last rows fixing this error?
You don't need a subquery for this:
SELECT TOP 20 *
FROM TBArticles
ORDER BY id_art DESC
The documentation is quite clear on the use of ORDER BY in subqueries:
The ORDER BY clause is not valid in views, inline functions, derived tables, and subqueries, unless either the TOP or OFFSET and FETCH clauses are also specified. When ORDER BY is used in these objects, the clause is used only to determine the rows returned by the TOP clause or OFFSET and FETCH clauses. The ORDER BY clause does not guarantee ordered results when these constructs are queried, unless ORDER BY is also specified in the query itself.
Gordon's answer is probably the most direct way to handle your requirement. However, if you wanted to use a query along the same lines as the pattern you were already using, you could use ROW_NUMBER here:
SELECT *
FROM
(
SELECT *, ROW_NUMBER() OVER (ORDER BY id_art DESC) rn
FROM TBArticles
) t
WHERE rn <= 20;
By computing a row number in the derived table, the ordering "sticks" in the same way your original query was expecting.

How to call a function using the results of a query ordered

I'm trying to call a function on each of the values that fit my query ordered by date. The reason being that the (black box) function is internally aggregating values into a string and I need the aggregated values to be in the timestamp order.
The things that I have tried are below: (function returns a boolean value and does blackbox things that I do not know and cannot modify)
-- This doesn't work
SELECT
bool_and (
function(MT.id)
)
FROM my_table MT
WHERE ...conditions...
ORDER BY MT.timestamp_value, MT.id
and got the error column "mt.timestamp_value" must appear in the GROUP BY clause or be used in an aggregate function. If I remove the ORDER BY as below, it will also work:
-- This works!
SELECT
bool_and (
function(MT.id)
)
FROM my_table MT
WHERE ...conditions...
I also tried removing the function and only selected MT.id and it worked, but with the function it doesn't. So I tried using the GROUP BY clause.
Doing that, I tried:
-- This also doesn't work
SELECT
bool_and(
function(MT.id)
)
FROM my_table MT
WHERE ...conditions...
GROUP BY MT.id, MT.timestamp_value
ORDER BY MT.timestamp_value, MT.id
but this gives the error more than one row returned by a subquery used as an expression. MT.id is the primary key btw. It also works without the function and just SELECT MT.id
Ideally, a fix to either one of the code bits above would be nice or otherwise something that fulfills the following:
-- Not real postgresql code but something I want it to do
SELECT FUNCTION(id)
FOR EACH id in (MY SELECT STATEMENT HERE ORDERED)
In response to #a_horse_with_no_name
This code falls under a section of another query that looks like the below:
SELECT Function2()
WHERE true = (
(my_snippet)
AND (...)
AND (...)
...
)
The error is clear. The subquery SELECT function(MT.id) is returning more than 1 row and the calling function bool_and can only operate on 1 row at a time. Adjust the subquery so that it only returns 1 record.
Issue resolution:
I discovered the reason that everything was failing was because of the AND in
WHERE true = (
(my_snippet)
AND (...)
AND (...)
...
)
What happened was that using GROUP BY and using ORDER BY caused the value returned by my snippet to be multiple rows of true.
Without the GROUP BY and the ORDER BY, it only returned a single row of true.
So my solution was to wrap the code into another bool_and and use that.
SELECT
bool_and(vals)
FROM(
SELECT
bool_and(
function(MT.id)
) as vals
FROM my_table MT
WHERE ...conditions...
GROUP BY MT.id, MT.timestamp_value
ORDER BY MT.timestamp_value, MT.id
) t
Since I have to guess the reason, and the way that you are trying to accomplish your stated goal:
"return value of the function is aggregated into a string"
And since:
You are using: bool_and and therefore the return value of the function must be boolean, and
The only aggregation I can see is the bool_and aggregation into either true or false, and
You mention that the function is a black box to you,
I would presume that instead of:
"return value of the function is aggregated into a string"
You meant to say: function is aggregating (input/transformed input) values into a string,
And you need this aggregating to be in a certain order.
Further I assume that you own the my_table and can create indexes on it.
So, if you need the function being used in the context:
bool_and ( function(MT.id) )
to process (and therefore aggregate into string) MT.id inputs (or their transformed values) in a certain order, you need to create a clustered index in that order for your my_table table.
To accomplish that in postgresql, you need to (instead of using the group by and order by):
create that_index, in the order you need for the aggregation, for your my_table table, and then
run: CLUSTER my_table USING that_index to physically bake in that order to table structure, and therefore ensure the default aggregation order to be in that order in the bool_and ( function(MT.id) ) aggregation.
(see CLUSTER for more info)

What is the execution order of the PARTITION BY clause compared to other SQL clauses?

I cannot find any source mentioning execution order for Partition By window functions in SQL.
Is it in the same order as Group By?
For example table like:
Select *, row_number() over (Partition by Name)
from NPtable
Where Name = 'Peter'
I understand if Where gets executed first, it will only look at Name = 'Peter', then execute window function that just aggregates this particular person instead of entire table aggregation, which is much more efficient.
But when the query is:
Select top 1 *, row_number() over (Partition by Name order by Date)
from NPtable
Where Date > '2018-01-02 00:00:00'
Doesn't the window function need to be executed against the entire table first then applies the Date> condition otherwise the result is wrong?
Window functions are executed/calculated at the same stage as SELECT, stage 5 in your table. In other words, window functions are applied to all rows that are "visible" in the SELECT stage.
In your second example
Select top 1 *,
row_number() over (Partition by Name order by Date)
from NPtable
Where Date > '2018-01-02 00:00:00'
WHERE is logically applied before Partition by Name of the row_number() function.
Note, that this is logical order of processing the query, not necessarily how the engine physically processes the data.
If query optimiser decides that it is cheaper to scan the whole table and later discard dates according to the WHERE filter, it can do it. But, any kind of these transformations must be performed in such a way that the final result is consistent with the order of the logical steps outlined in the table you showed.
It is part of the SELECT phase of the query execution. There are different types of SELECT clauses, based on the query.
SELECT FOR
SELECT GROUP BY
SELECT ORDER BY
SELECT OVER
SELECT INTO
SELECT HAVING
PARTITION BY comes in the SELECT OVER clause. Here, a window of the result set is generated out of the result set generated in the previous stages: FROM, WHERE, GROUP BY etc.
The OVER clause defines a window or user-specified set of rows within
a query result set. A window function then computes a value for each
row in the window. You can use the OVER clause with functions to
compute aggregated values such as moving averages, cumulative
aggregates, running totals, or a top N per group results.
OVER ( [ PARTITION BY value_expression ] [ order_by_clause ] )
Arguments
PARTITION BY Divides the query result set into partitions. The window
function is applied to each partition separately and computation
restarts for each partition.
value_expression Specifies the column by which the rowset is
partitioned. value_expression can only refer to columns made available
by the FROM clause. value_expression cannot refer to expressions or
aliases in the select list. value_expression can be a column
expression, scalar subquery, scalar function, or user-defined
variable.
Defines the logical order of the rows within each
partition of the result set. That is, it specifies the logical order
in which the window functioncalculation is performed.
order_by_expression Specifies a column or expression on which to sort.
order_by_expression can only refer to columns made available by the
FROM clause. An integer cannot be specified to represent a column name
or alias.
You can read more about it SELECT-OVER
row_number() (and other window functions) are allowed in two clauses:
SELECT
ORDER BY
The function is parsed along with the rest of the clause. After all, it is a function present in the clause. In both cases, the WHERE clause would be -- logically -- applied first, so the results would be after filtering.
Do note that this is a logical parsing of the query. The actual execution may have little to do with the structure of the query.

Does ROW_NUMBER queried data require to fetch the entire result set?

I'm trying to write query that's not using offset (because as I just have learnt offset fetches all data which causes performance overhead). with ROW_NUMBER window function. For instance:
SELECT id
FROM(
SELECT id, ROW_NUMBER() over (order by id) rn
FROM users) sq
WHERE rn > 1000
Does it require all rows to be fetched as it would be with offset 1000? I mean, does it make a sense to use such query instead of
SELECT if
FROM users
OFFSET 1000
? Do I get performance improvement on large amount of data?
Check out the window function docs. Window functions operate on the result set, after the fetch:
Window functions are permitted only in the SELECT list and the ORDER
BY clause of the query. They are forbidden elsewhere, such as in GROUP
BY, HAVING and WHERE clauses. This is because they logically execute
after the processing of those clauses. Also, window functions execute
after regular aggregate functions. This means it is valid to include
an aggregate function call in the arguments of a window function, but
not vice versa.
Does it make sense to use the row_number() query? Well, it produces the same result set. However, the query basically has to assign row_number() to all the rows in order to find the ones that meet the requirement.
The second query, however, is lacking an order by. When using offset, you should have an order by:
SELECT id
FROM users u
ORDER BY id
OFFSET 1000
I would imagine that this is more efficient than using row_number(), but actual timings would demonstrate that.