Null value with sum aggregate? - sql

A query performs a sum aggregate over a single column of a table with 10 tuples. If exactly one of tuples has a NULL value on that column, which of the following will happen?
The query will return NULL.
The query will return the sum of the remaining 9 values.
The query will throw an exception.
Would this be 3?

Aggregate functions ignore null values. That's how their behaviour is defined.
So the answer to your question is: 2)
You can easily test that yourself:
create table test_null(value integer);
insert into test_null
values (1),(1),(1),(1),(1),(1),(1),(1),(1),(null);
select sum(value)
from test_null;
Returns
sum
---
9
The "ignoring" part is more obvious when you use the avg() aggregate function. The result for the above test data is 1, not 0.9 as one might think. That's because aggregates ignore the rows with null values and therefor the average is computed as 9/9.
select avg(value)
from test_null;
is equivalent to:
select avg(value)
from test_null
WHERE value IS NOT NULL;
Online example: http://rextester.com/QQREJS70393

Related

PostgreSQL Case expression in Group By block has no effect

I don't usually write sql and have run into this problem. While using case statements. This is a simplified version of the function that still gets the same error:
CREATE OR REPLACE FUNCTION retrieve_test(
_period interval
)
returns table(
profit double precision,
bid double precision,
ask double precision
) as $$
begin
raise notice 'Value: %', _period;
return query
SELECT
(CASE WHEN _period IS NOT NULL THEN AVG(o.profit) ELSE o.profit END)::double precision,
o.bid, o.ask
FROM opportunities o
GROUP by
case WHEN _period is NULL then 1 end,
2,3;
END;
$$ LANGUAGE PLPGSQL;
I get the following error:
SQL Error [42803]: ERROR: column "o.profit" must appear in the GROUP BY clause or be used in an aggregate function
Where: PL/pgSQL function retrieve_test(interval) line 4 at RETURN QUERY
When I run any of the following queries:
select * from retrieve_test(null);
--or
select * from retrieve_test('1 minute'::interval);
I'm not sure if this is the correct structure for this type of query. What am I missing.
Running:
postgres:14.2 docker image
The error tells you everything. You must have columns, that you listed in a select, to be present in a group by clause, if the columns are not aggregated.
The PostgreSQL documentation
From the link above:
When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.

Incorrect Count when counting rows from AWS S3 bucket using Snowflake

Requirement: Get the exact count of rows in the file including NULL
Issue: Count ignores Null count
Expectation: How to get a count of rows including Null values
SELECT count($1)
FROM #public.bckt/east/
(file_format=> csv,pattern=>'.*/2020/08/23/abc.csv');
Here the first column in the file has some NULL values, If there are 10 rows in the file including NULL in the first field, I would expect to get 10 as count. But I get like 7 as count, with 3 of them been NULL values
A little tip that I've used is to use the metadata information that Snowflake provides on SELECT from a staged file. For example,
SELECT
count(metadata$FILE_ROW_NUMBER),
max(metadata$FILE_ROW_NUMBER)
FROM #public.bckt/east/
(file_format=> csv,pattern=>'.*/2020/08/23/abc.csv');
This will provide you a count and max for the FILE_ROW_NUMBER metadata of your file. They should always be equivalent and will never be NULL. Use either one of them, and you'll get what you're looking for.
https://docs.snowflake.com/en/user-guide/querying-metadata.html#metadata-columns
Please try count(*) on the stage.
Count(*) returns total count of rows in a table and count(column_name) will return count of rows with a non-null value in the column.
Finally, I have figured out
SELECT count(*)
FROM (
SELECT $1
FROM #public.bckt/east/
(file_format=> csv,pattern=>'.*/2020/08/23/abc.csv')
);

Proc Sql case confusion

Within SAS
I have a proc-sql step that I'm using to create macro variables to do some list processing.
I have ran into a confusing step where using a case statement rather than a where statement results in the first row of the resulting data set being a null string ('')
There are no null strings contained in either field in either table.
These are two sample SQL steps with all of the macro business removed for simplicity:
create table test as
select distinct
case
when brand in (select distinct core_brand from new_tv.core_noncore_brands) then brand
end as brand1
from new_tv.new_tv2
;
create table test2 as
select distinct brand
from new_tv.new_tv2
where brand in (select distinct core_brand from new_tv.core_noncore_brands)
;
using the first piece of code the result is a table with multiple rows, the first being an empty string.
The second piece of code works as expected
Any reason for this?
So the difference is that without a WHERE clause you aren't limiting what you are selecting, IE every row is considered. The CASE statement can bucket items by criteria, but you don't lose results just because your buckets don't catch everything, hence the NULL. WHERE limits the items being returned.
Yes, the first has no then clause in the case statement. I'm surprised that it even parses. It wouldn't in many SQL dialects.
Presumably you mean:
create table test as
select distinct
case
when brand in (select distinct core_brand from new_tv.core_noncore_brands)
then brand
end as brand1
from new_tv.new_tv2
;
The reason you are getting the NULL is because the case statement is return NULL for the non-matching brands. You would need to add:
where brand1 is not NULL
to prevent this (using either a subquery or making brand1 a calculated field).
Your first query is not correct, there is no 'then' statement in the 'case' clause.
create table test as
select distinct
case
when brand in (select distinct core_brand from new_tv.core_noncore_brands)
*then value*
end as brand1
from new_tv.new_tv2
;
Probably, you have NULL value because there is no default value for the 'case' clause, so for the value which doesn't meet the condition it returns NULL. There is a difference between 'case' clause and 'NOT IN', the first returns you all the rows, but without values, which do not meet condition, when second query will return only row which meet condition.

aggregate of an empty result set

I would like the aggregates of an empty result set to be 0. I have tried the following:
SELECT SUM(COALESCE(capacity, 0))
FROM objects
WHERE null IS NOT NULL;
Result:
sum
-----
(1 row)
Subquestion: wouldn't the above work in Oracle, using SUM(NVL(capacity, 0))?
From the documentation page about aggregate functions:
It should be noted that except for count, these functions return a null value when no rows are selected. In particular, sum of no rows returns null, not zero as one might expect. The coalesce function may be used to substitute zero for null when necessary.
So, if you want to guarantee a value returned, apply COALESCE to the result of SUM, not to its argument:
SELECT COALESCE(SUM(capacity), 0) …
As for the Oracle 'subquestion', well, I couldn't find any notion of NULLs at the official doc page (the one for 10.2, in particular), but two other sources are unambiguous:
Oracle SQL Functions:
SUM([DISTINCT] n) Sum of values of n, ignoring NULLs
sum aggregate function [Oracle SQL]:
…if a sum() is created over some numbers, nulls are disregarded, as the following example shows…
That is, you needn't apply NVL to capacity. (But, like with COALESCE in PostgreSQL, you might want to apply it to SUM.)
The thing is, the aggregate always returns a row, even if no rows were aggregated (as is the case in your query). You summed an expression over no rows. Hence the null value you're getting.
Try this instead:
select coalesce(sum(capacity),0)
from objects
where false;
Just do this:
SELECT COALESCE( SUM(capacity), 0)
FROM objects
WHERE null IS NOT NULL;
By the way, COALESCE inside of SUM is redundant, even if capacity is NULL, it won't make the summary null.
To wit:
create table objects
(
capacity int null
);
insert into objects(capacity) values (1),(2),(NULL),(3);
select sum(capacity) from objects;
That will return a value of 6, not null.
And a coalesce inside an aggregate function is a performance killer too, as your RDBMS engine cannot just rip through all the rows, it has to evaluate each row's column if its value is null. I've seen a bit OCD query where all the aggregate queries has a coalesce inside, I think the original dev has a symptom of Cargo Cult Programming, the query is way very sloooowww. I removed the coalesce inside of SUM, then the query become fast.
Although this post is very old, but i would like to update what I use in such cases
SELECT NVL(SUM(NVL(capacity, 0)),0)
FROM objects
WHERE false;
Here external NVL avoids the cases when there is no row in the result set. Inner NVL is used for null column values, consider the case of (1 + null) and it will result in null. So inner NVL is also necessary other wise in alternate set default value 0 to the column.

How to write sqlite select statement for NULL records

I have a column which contains null values in some of the rows.
I want to do sum of the column values by writing a select statement in sqlite.
How do I write the statement so that it treats null values as 0.
My current sqlite statement: select sum(amount) from table1
gives error as it returns null.
Please help.
Accordgin to SQLite documentation:
The sum() and total() aggregate functions return sum of all non-NULL values in the group. If there are no non-NULL input rows then sum() returns NULL but total() returns 0.0. NULL is not normally a helpful result for the sum of no rows but the
So I guess you'd better use total() instead of sum() in case you expect that the result be 'non-null'.
You can use ifnull(x,y) or coalesce(x,y,z,...) function. Each of them return the first non-null value from the parameter list from left to right. ifnull has exactly two parameter whereas coalesce has at least two.
select sum(ifnull(amount, 0)) from table1