I don't usually write sql and have run into this problem. While using case statements. This is a simplified version of the function that still gets the same error:
CREATE OR REPLACE FUNCTION retrieve_test(
_period interval
)
returns table(
profit double precision,
bid double precision,
ask double precision
) as $$
begin
raise notice 'Value: %', _period;
return query
SELECT
(CASE WHEN _period IS NOT NULL THEN AVG(o.profit) ELSE o.profit END)::double precision,
o.bid, o.ask
FROM opportunities o
GROUP by
case WHEN _period is NULL then 1 end,
2,3;
END;
$$ LANGUAGE PLPGSQL;
I get the following error:
SQL Error [42803]: ERROR: column "o.profit" must appear in the GROUP BY clause or be used in an aggregate function
Where: PL/pgSQL function retrieve_test(interval) line 4 at RETURN QUERY
When I run any of the following queries:
select * from retrieve_test(null);
--or
select * from retrieve_test('1 minute'::interval);
I'm not sure if this is the correct structure for this type of query. What am I missing.
Running:
postgres:14.2 docker image
The error tells you everything. You must have columns, that you listed in a select, to be present in a group by clause, if the columns are not aggregated.
The PostgreSQL documentation
From the link above:
When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.
Related
Working with PostgreSQL 9.6.3. I am new to functions in databases.
Let's say there are multiple tables of item numbers. Each one has the item number, the item cost and several other columns which are factored into the "additional cost". I would like to put the calculation into a function so I can call it for any of these tables.
So instead of:
SELECT
itemnumber,
itemname,
base,
CASE
WHEN labor < 100 AND overhead < .20 THEN
WHEN .....
WHEN .....
WHEN .....
.....
END AS add_cost,
gpm
FROM items1;
I can just do:
SELECT
itemnumber,
itemname,
base,
calc_add_cost(),
gpm
FROM items1;
If I want to be able to use it on any of the item tables, I guess I would need to set a table_name parameter that the function takes since adding the table name into the function would be undesirable to say the least.
calc_add_cost(items1)
However, is there a simpler way such that when I call calc_add_cost() it will just use the table name from the FROM clause?
SELECT ....., calc_add_cost(item1) FROM item1
Just seems redundant.
I did come across a few topics with titles that sounded like they addressed what I was hoping to accomplish, but upon reviewing them it looked like they were a different issue.
You can even emulate a "computed field" or "generated column" like you had in mind. Basics here:
Store common query as column?
Simple demo for one table:
CREATE OR REPLACE FUNCTION add_cost(items1) -- function name = default col name
RETURNS numeric AS
$func$
SELECT
CASE
WHEN $1.labor < 100 AND $1.overhead < .20 THEN numeric '1'
-- WHEN .....
-- WHEN .....
-- WHEN .....
ELSE numeric '0' -- ?
END;
$func$
LANGUAGE sql IMMUTABLE;
Call:
SELECT *, t.add_cost FROM items1 t;
Note the table-qualification in t.add_cost. I only demonstrate this syntax variant since you have been asking for it. My advise is to use the less confusing standard syntax:
SELECT *, add_cost(t) AS add_cost FROM items1 t; -- column alias is also optional
However, SQL is a strictly typed language. If you define a particular row type as input parameter, it is bound to this particular row type. Passing various whole table types is more sophisticated, but still possible with polymorphic input type.
CREATE OR REPLACE FUNCTION add_cost(ANYELEMENT) -- function name = default col name
RETURNS numeric AS
$func$
SELECT
CASE
WHEN $1.labor < 100 AND $1.overhead < .20 THEN numeric '1'
-- WHEN .....
-- WHEN .....
-- WHEN .....
ELSE numeric '0' -- ?
END;
$func$
LANGUAGE sql IMMUTABLE;
Same call for any table that has the columns labor and overhead with matching data type.
dbfiddle here
Also see the related simple case passing simple values here:
How to put part of a SELECT statement into a Postgres function
For even more complex requirements - like also returning various row types - see:
Refactor a PL/pgSQL function to return the output of various SELECT queries
I want to add a new column to a table to record the number of attributes whose value are null for each tuple (row). How can I use SQL to get the number?
for example, if a tuple is like this:
Name | Age | Sex
-----+-----+-----
Blice| 100 | null
I want to update the tuple as this:
Name | Age | Sex | nNULL
-----+-----+-----+--------
Blice| 100 | null| 1
Also, because I'm writing a PL/pgSQL function and the table name is obtained from argument, I don't know the schema of a table beforehand. That means I need to update the table with the input table name. Anyone know how to do this?
Possible without spelling out columns. Unpivot columns to rows and count.
The aggregate function count(<expression>) only counts non-null values, while count(*) counts all rows. The shortest and fastest way to count NULL values for more than a few columns is count(*) - count(col) ...
Works for any table with any number of columns of any data types.
In Postgres 9.3+ with built-in JSON functions:
SELECT *, (SELECT count(*) - count(v)
FROM json_each_text(row_to_json(t)) x(k,v)) AS ct_nulls
FROM tbl t;
What is x(k,v)?
json_each_text() returns a set of rows with two columns. Default column names are key and value as can be seen in the manual where I linked. I provided table and column aliases so we don't have to rely on default names. The second column is named v.
Or, in any Postgres version since at least 8.3 with the additional module hstore installed, even shorter and a bit faster:
SELECT *, (SELECT count(*) - count(v) FROM svals(hstore(t)) v) AS ct_nulls
FROM tbl t;
This simpler version only returns a set of single values. I only provide a simple alias v, which is automatically taken to be table and column alias.
Best way to install hstore on multiple schemas in a Postgres database?
Since the additional column is functionally dependent I would consider not to persist it in the table at all. Rather compute it on the fly like demonstrated above or create a tiny function with a polymorphic input type for the purpose:
CREATE OR REPLACE FUNCTION f_ct_nulls(_row anyelement)
RETURNS int LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT (count(*) - count(v))::int FROM svals(hstore(_row)) v';
(PARALLEL SAFE only for Postgres 9.6 or later.)
Then:
SELECT *, f_ct_nulls(t) AS ct_nulls
FROM tbl t;
You could wrap this into a VIEW ...
db<>fiddle here - demonstrating all
Old sqlfiddle
This should also answer your second question:
... the table name is obtained from argument, I don't know the schema of a table beforehand. That means I need to update the table with the input table name.
In Postgres, you can express this as:
select t.*,
((name is null)::int +
(age is null)::int +
(sex is null)::int
) as numnulls
from table t;
In order to implement this on an unknown table, you will need to use dynamic SQL and obtaining a list of columns (say from information_schema.columns)).
Function to add column automatically
This is an audited version of what #winged panther posted, per request.
The function adds a column with given name to any existing table that the calling role has the necessary privileges for:
CREATE OR REPLACE FUNCTION f_add_null_count(_tbl regclass, _newcol text)
RETURNS void AS
$func$
BEGIN
-- add new col
EXECUTE format('ALTER TABLE %s ADD COLUMN %I smallint', _tbl, _newcol);
-- update new col with dynamic count of nulls
EXECUTE (
SELECT format('UPDATE %s SET %I = (', _tbl, _newcol) -- regclass used as text
|| string_agg(quote_ident(attname), ' IS NULL)::int + (')
|| ' IS NULL)::int'
FROM pg_catalog.pg_attribute
WHERE attnum > 0
AND NOT attisdropped
AND attrelid = _tbl -- regclass used as OID
AND attname <> _newcol -- no escaping here, it's the *text*!
);
END
$func$ LANGUAGE plpgsql;
SQL Fiddle demo.
How to treat identifiers properly
Sanitize identifiers with cast to regclass, format() with %I or quote_ident().
I am using all three techniques in the example, each happens to be the best choice where they are used. More here:
Table name as a PostgreSQL function parameter
I formatted the relevant code fragments in bold.
Other points
I am basing my query on pg_catalog.pg_attribute, but that's a optional decision with pros and cons. Makes my query simpler and faster because I can use the OID of the table. Related:
How to check if a table exists in a given schema
Select columns with particular column names in PostgreSQL
You have to exclude the newly added column from the count, or the count will be off by one.
Using data type smallint for the count, since there cannot more than 1600 columns in a table.
I don't use a variable but execute the result of the SELECT statement directly. Assignments are comparatively expensive in plpgsql. Not a big deal, though. Also a matter of taste and style.
I make it a habbit to prepend parameters and variable with an underscore (_tbl) to rule out ambiguity between variables and column names.
I just created a function to perform OP's requirement by using Gordon Linoff's answer with following table and data:
Table det:
CREATE TABLE det (
name text,
age integer,
sex text
);
Data:
insert into det (name,age,sex) values
('Blice',100,NULL),
('Glizz',NULL,NULL),
(NULL,NULL,NULL);
Function:
create or replace function fn_alter_nulls(tbl text,new_col text) returns void as
$$
declare vals text;
begin
-- dynamically getting list of columns *
select string_agg(format('(%s is null)::int',column_name),'+') into vals
from information_schema.columns
where table_schema='public' and table_name=''||tbl||'' and table_catalog='yourDB_Name';
-- adds new column
execute format('alter table %s add column "%s" int',tbl,new_col);
--updates new column
execute format('update det set %s =(%s)',new_col,vals);
end;
$$
language plpgsql
Function call:
select fn_alter_nulls('det','nnulls')
Since the null count is derived data and simple/cheap to determine at query time, why not create a view:
create view MyTableWithNullCount as
select
*,
case when nullableColumn1 is null then 1 else 0 end +
case when nullableColumn2 is null then 1 else 0 end +
...
case when nullableColumnn is null then 1 else 0 end as nNull
from myTable
And just use the view instead.
This has the upside of not having to write triggers/code to maintain a physical null count column, which will be a bigger headache than this approach.
I tried to pass the result of a SQL query to a function, but I got a syntax error.
contacts=> SELECT count(*) FROM update_name(contact_ids := select array(select id from contact where name is NULL));
ERROR: syntax error at or near "select"
LINE 1: SELECT count(*) FROM update_name(contact_ids := select array...
The subselect returns BIGINTs, and the function accepts an array of BIGINTs. I verified that running the subselect and turning the result into an array of BIGINTs works.
Switching to positional notation did not make a difference. Using an Array constructor did not change anything, either.
Following an intuition, I wrapped the argument in parens:
SELECT count(*) FROM update_name(contact_ids := (select array(select id from contact where name is NULL)));
And that worked. I don't get why. The docs on expressions state that arguments in a function call are expressions. Function calls and Array constructors are expressions, so at least using the Array constructor should have worked.
Why do I need the parens? Where does the necessity come from, i.e. how could I have known?
The expression form you are using is called a Scalar Subquery. The manual says:
A scalar subquery is an ordinary SELECT query in parentheses that
returns exactly one row with one column ... The SELECT query is executed and
the single returned value is used in the surrounding value expression.
Your subquery returns a single value (which happens to be an array, prepared from the result of another subquery).
As a basic rule of thumb, subqueries are always in parenthesis.
I would like the aggregates of an empty result set to be 0. I have tried the following:
SELECT SUM(COALESCE(capacity, 0))
FROM objects
WHERE null IS NOT NULL;
Result:
sum
-----
(1 row)
Subquestion: wouldn't the above work in Oracle, using SUM(NVL(capacity, 0))?
From the documentation page about aggregate functions:
It should be noted that except for count, these functions return a null value when no rows are selected. In particular, sum of no rows returns null, not zero as one might expect. The coalesce function may be used to substitute zero for null when necessary.
So, if you want to guarantee a value returned, apply COALESCE to the result of SUM, not to its argument:
SELECT COALESCE(SUM(capacity), 0) …
As for the Oracle 'subquestion', well, I couldn't find any notion of NULLs at the official doc page (the one for 10.2, in particular), but two other sources are unambiguous:
Oracle SQL Functions:
SUM([DISTINCT] n) Sum of values of n, ignoring NULLs
sum aggregate function [Oracle SQL]:
…if a sum() is created over some numbers, nulls are disregarded, as the following example shows…
That is, you needn't apply NVL to capacity. (But, like with COALESCE in PostgreSQL, you might want to apply it to SUM.)
The thing is, the aggregate always returns a row, even if no rows were aggregated (as is the case in your query). You summed an expression over no rows. Hence the null value you're getting.
Try this instead:
select coalesce(sum(capacity),0)
from objects
where false;
Just do this:
SELECT COALESCE( SUM(capacity), 0)
FROM objects
WHERE null IS NOT NULL;
By the way, COALESCE inside of SUM is redundant, even if capacity is NULL, it won't make the summary null.
To wit:
create table objects
(
capacity int null
);
insert into objects(capacity) values (1),(2),(NULL),(3);
select sum(capacity) from objects;
That will return a value of 6, not null.
And a coalesce inside an aggregate function is a performance killer too, as your RDBMS engine cannot just rip through all the rows, it has to evaluate each row's column if its value is null. I've seen a bit OCD query where all the aggregate queries has a coalesce inside, I think the original dev has a symptom of Cargo Cult Programming, the query is way very sloooowww. I removed the coalesce inside of SUM, then the query become fast.
Although this post is very old, but i would like to update what I use in such cases
SELECT NVL(SUM(NVL(capacity, 0)),0)
FROM objects
WHERE false;
Here external NVL avoids the cases when there is no row in the result set. Inner NVL is used for null column values, consider the case of (1 + null) and it will result in null. So inner NVL is also necessary other wise in alternate set default value 0 to the column.
I have a Postgresql function which returns a composite type defined as (location TEXT, id INT). When I run "SELECT myfunc()", My output is a single column of type text, formatted as:
("locationdata", myid)
This is pretty awful. Is there a way to select my composite so that I get 2 columns back - a TEXT column, and an INT column?
Use:
SELECT *
FROM myfunc()
You can read more about the functionality in this article.
Answer has already been accepted, but I thought I'd throw this in:
It may help to think about the type of the data and where those types fit into an overall query. SQL queries can return essentially three types:
A single scalar value
A list of values
A table of values
(Of course, a list is just a one-column table, and a scalar is just a one-value list.)
When you look at the types, you see that an SQL SELECT query has the following template:
SELECT scalar(s)
FROM table
WHERE boolean-scalar
If your function or subquery is returning a table, it belongs in the FROM clause. If it returns a list, it could go in the FROM clause or it could be used with the IN operator as part of the WHERE clause. If it returns a scalar, it can go in the SELECT clause, the FROM clause, or in a boolean predicate in the WHERE clause.
That's an incomplete view of SELECT queries, but I've found it helps to figure out where my subqueries should go.