How to call a function using the results of a query ordered - sql

I'm trying to call a function on each of the values that fit my query ordered by date. The reason being that the (black box) function is internally aggregating values into a string and I need the aggregated values to be in the timestamp order.
The things that I have tried are below: (function returns a boolean value and does blackbox things that I do not know and cannot modify)
-- This doesn't work
SELECT
bool_and (
function(MT.id)
)
FROM my_table MT
WHERE ...conditions...
ORDER BY MT.timestamp_value, MT.id
and got the error column "mt.timestamp_value" must appear in the GROUP BY clause or be used in an aggregate function. If I remove the ORDER BY as below, it will also work:
-- This works!
SELECT
bool_and (
function(MT.id)
)
FROM my_table MT
WHERE ...conditions...
I also tried removing the function and only selected MT.id and it worked, but with the function it doesn't. So I tried using the GROUP BY clause.
Doing that, I tried:
-- This also doesn't work
SELECT
bool_and(
function(MT.id)
)
FROM my_table MT
WHERE ...conditions...
GROUP BY MT.id, MT.timestamp_value
ORDER BY MT.timestamp_value, MT.id
but this gives the error more than one row returned by a subquery used as an expression. MT.id is the primary key btw. It also works without the function and just SELECT MT.id
Ideally, a fix to either one of the code bits above would be nice or otherwise something that fulfills the following:
-- Not real postgresql code but something I want it to do
SELECT FUNCTION(id)
FOR EACH id in (MY SELECT STATEMENT HERE ORDERED)
In response to #a_horse_with_no_name
This code falls under a section of another query that looks like the below:
SELECT Function2()
WHERE true = (
(my_snippet)
AND (...)
AND (...)
...
)

The error is clear. The subquery SELECT function(MT.id) is returning more than 1 row and the calling function bool_and can only operate on 1 row at a time. Adjust the subquery so that it only returns 1 record.

Issue resolution:
I discovered the reason that everything was failing was because of the AND in
WHERE true = (
(my_snippet)
AND (...)
AND (...)
...
)
What happened was that using GROUP BY and using ORDER BY caused the value returned by my snippet to be multiple rows of true.
Without the GROUP BY and the ORDER BY, it only returned a single row of true.
So my solution was to wrap the code into another bool_and and use that.
SELECT
bool_and(vals)
FROM(
SELECT
bool_and(
function(MT.id)
) as vals
FROM my_table MT
WHERE ...conditions...
GROUP BY MT.id, MT.timestamp_value
ORDER BY MT.timestamp_value, MT.id
) t

Since I have to guess the reason, and the way that you are trying to accomplish your stated goal:
"return value of the function is aggregated into a string"
And since:
You are using: bool_and and therefore the return value of the function must be boolean, and
The only aggregation I can see is the bool_and aggregation into either true or false, and
You mention that the function is a black box to you,
I would presume that instead of:
"return value of the function is aggregated into a string"
You meant to say: function is aggregating (input/transformed input) values into a string,
And you need this aggregating to be in a certain order.
Further I assume that you own the my_table and can create indexes on it.
So, if you need the function being used in the context:
bool_and ( function(MT.id) )
to process (and therefore aggregate into string) MT.id inputs (or their transformed values) in a certain order, you need to create a clustered index in that order for your my_table table.
To accomplish that in postgresql, you need to (instead of using the group by and order by):
create that_index, in the order you need for the aggregation, for your my_table table, and then
run: CLUSTER my_table USING that_index to physically bake in that order to table structure, and therefore ensure the default aggregation order to be in that order in the bool_and ( function(MT.id) ) aggregation.
(see CLUSTER for more info)

Related

Adding ORDER BY gives error "must appear in the group by clause or be used in an aggregate function"

I want to add an ORDER_BY on car_max_price in my query but don't want it in the GROUP_BY.
How to fix the must appear in the group by clause or be used in an aggregate function error? Any idea of what's wrong?
subquery = (
session.query(
garage.id,
garage.name,
func.max(func.coalesce(car.max_price, 0)).label("car_max_price"),
func.jsonb_build_object(
text("'type'"),
car.type,
text("'color'"),
car.color
text("'max_price'"),
func.max(car.max_price),
).label("some_cars"),
)
.group_by(
garage,
car.type,
)
.subquery("subquery")
)
query = (
session.query(
func.jsonb_build_object(
text("'id'"),
subquery.c.id,
text("'name'"),
subquery.c.name,
text("'some_cars'"),
func.jsonb_agg(
func.distinct(subquery.c.some_cars),
).label("some_cars"),
)
.select_from(subquery)
.group_by(
subquery.c.id,
subquery.c.name,
)
.order_by(
subquery.c.car_max_price
)
)
return query
When aggregating, you cannot order by an un-aggregated column. There may be any number of different values in the same aggregated group of rows for it. So you have to define what to use exactly (by way of another aggregate function).
Order by min(subquery.c.car_max_price) or max(subquery.c.car_max_price) or avg(subquery.c.car_max_price) or whatever you actually need.
There is one exception to this rule: if the PRIMARY KEY of a table is listed in the GROUP BY clause, that covers all columns of that table. See:
PostgreSQL - GROUP BY clause
But while operating on a derived table (subquery) that exception cannot apply. A derived table cannot have a PK constraint.

Using calculation with an an aliased column in ORDER BY

As we all know, the ORDER BY clause is processed after the SELECT clause, so a column alias in the SELECT clause can be used.
However, I find that I can’t use the aliased column in a calculation in the ORDER BY clause.
WITH data AS(
SELECT *
FROM (VALUES
('apple'),
('banana'),
('cherry'),
('date')
) AS x(item)
)
SELECT item AS s
FROM data
-- ORDER BY s; -- OK
-- ORDER BY item + ''; -- OK
ORDER BY s + ''; -- Fails
I know there are alternative ways of doing this particular query, and I know that this is a trivial calculation, but I’m interested in why the column alias doesn’t work when in a calculation.
I have tested in PostgreSQL, MariaDB, SQLite and Oracle, and it works as expected. SQL Server appears to be the odd one out.
The documentation clearly states that:
The column names referenced in the ORDER BY clause must correspond to
either a column or column alias in the select list or to a column
defined in a table specified in the FROM clause without any
ambiguities. If the ORDER BY clause references a column alias from
the select list, the column alias must be used standalone, and not as
a part of some expression in ORDER BY clause:
Technically speaking, your query should work since order by clause is logically evaluated after select clause and it should have access to all expressions declared in select clause. But without looking at having access to the SQL specs I cannot comment whether it is a limitation of SQL Server or the other RDBMS implementing it as a bonus feature.
Anyway, you can use CROSS APPLY as a trick.... it is part of FROM clause so the expressions should be available in all subsequent clauses:
SELECT item
FROM t
CROSS APPLY (SELECT item + '') AS CA(item_for_sort)
ORDER BY item_for_sort
It is simply due to the way expressions are evaluated. A more illustrative example:
;WITH data AS
(
SELECT * FROM (VALUES('apple'),('banana')) AS sq(item)
)
SELECT item AS s
FROM data
ORDER BY CASE WHEN 1 = 1 THEN s END;
This returns the same Invalid column name error. The CASE expression (and the concatenation of s + '' in the simpler case) is evaluated before the alias in the select list is resolved.
One workaround for your simpler case is to append the empty string in the select list:
SELECT
item + '' AS s
...
ORDER BY s;
There are more complex ways, like using a derived table or CTE:
;WITH data AS
(
SELECT * FROM (VALUES('apple'),('banana') AS sq(item)
),
step2 AS
(
SELECT item AS s FROM data
)
SELECT s FROM step2 ORDER BY s+'';
This is just the way that SQL Server works, and I think you could say "well SQL Server is bad because of this" but SQL Server could also say "what the heck is this use case?" :-)

Select columns from second subquery if first returns NULL

I have two queries that I'm running separately from a PHP script. The first is checking if an identifier (group) has a timestamp in a table.
SELECT
group, MAX(timestamp) AS timestamp, value
FROM table_schema.sample_table
GROUP BY group, value
If there is no timestamp, then it runs this second query that retrieves the minimum timestamp from a separate table:
SELECT
group, MIN(timestamp) as timestamp, value AS value
FROM table_schema.src_table
GROUP BY group, value
And goes on from there.
What I would like to do, for the sake of conciseness, is to have a single query that runs the first statement, but that defaults to the second if NULL. I've tried with coalesce() and CASE statements, but they require subqueries to return single columns (which I hadn't run into being an issue yet). I then decided I should try a JOIN on the table with the aggregate timestamp to get the whole row, but then quickly realized I can't variate the table being joined (not to my knowledge). I opted to try joining both results and getting the max, something like this:
Edit: I am so tired, this should be a UNION, not a JOIN
sorry for any possible confusion :(
SELECT smpl.group, smpl.value, MAX(smpl.timestamp) AS timestamp
FROM table_schema.sample_table as smpl
INNER JOIN
(SELECT src.group, src.value, MIN(src.timestamp) AS timestamp
FROM source_table src
GROUP BY src.group, src.value) AS history
ON
smpl.group = history.group
GROUP BY smpl.group, smpl.value
I don't have a SELECT MAX() on this because it's really slow as is, most likely because my SQL is a bit rusty.
If anyone knows a better approach, I'd appreciate it!
Please try this:
select mx.group,(case when mx.timestamp is null then mn.timestamp else mx.timestamp end)timestamp,
(case when mx.timestamp is null then mn.value else mx.value end)value
(
SELECT
group, MAX(timestamp) AS timestamp, value
FROM table_schema.sample_table
GROUP BY group, value
)mx
left join
(
SELECT
group, MIN(timestamp) as timestamp, value AS value
FROM table_schema.src_table
GROUP BY group, value
)mn
on mx.group = mn.group

Elegant approach to fetch the first value from each group without using outer query

I am trying to fetch the first value from each of the groups in my data. However I don't like to use the outer query/with clause. Can you let me know how can I write this in a elegant way without using outer query?
I have used outer query to fetch the first value from each group. However, Can you please help me write this in elegant manner within the same sql. Is there any function like Max, Min that will give us the first value so that I don't have to write this outer query
select *
from (
select subject_id,hadm_id,
rank() OVER (PARTITION BY subject_id ORDER BY row_id) AS BG_CG_number
from labevents
where itemid in ('50809','50931','51529')
AND valuenum > 110
and hadm_id is not null
) t1
where t1.bg_cg_number = 1
Please find the screenshot below for current and expected output
There is nothing wrong with the derived table (aka sub-query).
Postgres' proprietary distinct on () will achieve the same and is usually faster than using a window function (that's not because of the derived table, but because of the window function):
Quote from the manual
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.
So you query can be rewritten to:
select distinct on (subject_id) subject_id, hadm_id
from labevents
where itemid in ('50809','50931','51529')
AND valuenum > 110
and hadm_id is not null
order by subject_id, row_id;

SQL ORDER BY in SQL table returning function

so I have simple function trying to get two fields from database. I'm trying to use order by for the results however I cannot use ORDER BY in return clause.
It tells me
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.
Is is it possible to use ORDER BY in RETURN statement? I would like to avoid using order by when executing the function.
CREATE FUNCTION goalsGames1 () RETURNS TABLE
AS RETURN(
SELECT MAX(goals_scored) goals,
no_games
FROM Player
GROUP BY no_games
ORDER BY no_games DESC )
One trick to skip this error is using top as it is mentioned in the error message:
CREATE FUNCTION goalsGames1 () RETURNS TABLE
AS RETURN(
SELECT Top 100 Percent MAX(goals_scored) goals,
no_games
FROM Player
GROUP BY no_games
ORDER BY no_games DESC )
I would like to avoid using order by when executing the function.
If you are using the function and want the results in a particular order, then you need to use ORDER BY.
This is quite clearly stated in the documentation:
The ORDER clause does not guarantee ordered results when a SELECT query is executed, unless ORDER BY is also specified in the query.
use order by intimes of selection your function not in times of creation
so use here in select * from goalsGames1 order by col
and your error tells you where order by is invalid
You cannot order by inside a function, the idea is to order the resultset returned by the function.
select *
from dbo.goalsGames1()
order by no_games
Even if you would order by inside the function, there is no guaranty that this ordering would be preserved when the resultset is returned. The executing query (select * from functionname) has to be responsible for setting the order, not the function or view.
Who ever receives the rows is the only one that can order them, so in this case, the select * from goalsGames1() is the receiver, and this query has to order the results.