How to conditionally aggregate a column based on another - google-bigquery

a row of my data looks like this
[someId, someBool, someInt]
I'm looking for a way to aggregate someInt (to put them in an array specifically).
I use a GROUP BY clause to group by the someId field, then I can aggregate all the someInt using ARRAY_AGG but I only want to include rows where someBool=TRUE. How to approach this the right way ?
PS: It might be relevant to note what I got several booleans like someBool and would like to output to a different array each time

You can use ARRAY_AGG with IGNORE NULLS, e.g.:
ARRAY_AGG(IF(someBool IS NOT TRUE, NULL, someId) IGNORE NULLS)
This will only aggregate the IDs for which someBool is true. If you have multiple boolean columns that you want to use in the condition, you can AND them together or use a CASE WHEN ... or whatever other kind of condition you want that produces NULL in order to exclude a value.

Related

SQL or function

In SQl its common to take the SUM of all values from a column
SELECT SUM(column_name)
FROM table_name
WHERE condition;
Can the same thing be done but instead of summing each value in an attribute, each value is "OR'd" together? This would only work if the attribute was boolean of course
MAX(booleancolumn) will return OR'd together. (Since TRUE > FALSE.)
Note that this will not work if null values are involved, because FALSE OR NULL evaluates to NULL.
There are two challenges here:
SQL doesn't have boolean type
This isn't part of any of the built-in aggregate functions.
However, some databases will allow you to define your own functions, including aggregate functions. So it is at least possible.

A constant expression was encountered in the ORDER BY list, position 1

I tried to concat two string using sql query. Below is my code which is not working.
SELECT TOP 100 CONCAT('James ','Stephen') AS [Column1]
FROM [dbo].[ORDERS]
Group BY ()
ORDER BY CONCAT('James ','Stephen') ASC
If I use [Column1] instead of CONCAT('James ','Stephen') in Order by clause, it seems working.
SELECT TOP 100 CONCAT('James ','Stephen') AS [Column1]
FROM [dbo].[ORDERS]
Group by ()
ORDER BY [Column1] ASC
Can anyone explain me, why did not the first query work?
ORDER BY clause is to be used with columns from underlyring tables. You cannot order by constants.
This is explained in the documentation
Specifies a column or expression on which to sort the query result
set. A sort column can be specified as a name or column alias, or a
nonnegative integer representing the position of the column in the
select list.
ORDER BY documentation
Note that you can use an expression, such as ORDER BY CONCAT(field1, field2), but it makes no sense to try to sort by a hard coded string which would obviously be the same for every record.
You can get around this by referencing the alias, but this is not very useful.
From documentation
A sort column can be specified as a name or column alias, or a nonnegative integer representing the position of the column in the select list.
Multiple sort columns can be specified. Column names must be unique. The sequence of the sort columns in the ORDER BY clause defines the organization of the sorted result set. That is, the result set is sorted by the first column and then that ordered list is sorted by the second column, and so on.
However in first query you not specified neither a name or column alias nor position of the column in select list

How to exclude null values from Count() function in reporting services 2005?

I am creating a report that has a pareto-like graph and a table of Order Types and how many units of each order type there are. The subset returned from the stored procedure that I am using includes a field called WorkItemId, and if that value is null that means that item isn't to be counted. How should I count Order Types in the report without including the values that have the null WorkItemId? Right now I am using the expression:
Count(Fields!OrderType.Value)
to count the each unit for a specific order type.
Thanks!
EDIT: WorkItemId is what cannot be null to be counted, not Order Type
Null values in WorkItemId are needed in other reports, so I can't just simply filter them in SQL.
You can use something like
Sum(IIF(IsNothing(Fields!WorkItemId.Value),0,1))
Use a where clause in your SQL clause :
where WorkItemId is not null
Hope it helps.
Figured out my own question...Again
Since the row in my table is already being grouped by order type I just needed to put
Fields!WorkItemId.Value
Into the Count function rather than OrderType, since Count() automatically disgards nulls.

(sql) how can I use count() method when data type is text?

select count(category) from list where category like 'action'
above is a query i want to run. However, when I run that query, I am getting
data type error.
is there any alternative method for count()? or... how can I use count() method
when data type is text?
You cant apply COUNT() function on text,ntext,image datatypes.
Why you can't use :
select count(*) from list where category like 'action'
? Do you have some nulls ?
If you don't have to exclude null value the above query could already work well ...
To answer the question in the title for future googlers you can use
SELECT COUNT(CASE
WHEN some_text_column IS NOT NULL THEN 1
END)
FROM some_table
In your case though (as pointed out in the comments by #hvd) the WHERE category LIKE 'action' predicate ensures that any NULL values will be excluded anyway so it is safe to replace it with COUNT(*)
Moreover you almost certainly should not be using the text datatype for this. This is a deprecated datatype that was intended for holding LOB (Large Object) data over 8000 bytes.
Strings like "action" definitely do not fit this description!
count() always return int like type.
Read result data with int accessor: eg. getInt(1)

aggregate of an empty result set

I would like the aggregates of an empty result set to be 0. I have tried the following:
SELECT SUM(COALESCE(capacity, 0))
FROM objects
WHERE null IS NOT NULL;
Result:
sum
-----
(1 row)
Subquestion: wouldn't the above work in Oracle, using SUM(NVL(capacity, 0))?
From the documentation page about aggregate functions:
It should be noted that except for count, these functions return a null value when no rows are selected. In particular, sum of no rows returns null, not zero as one might expect. The coalesce function may be used to substitute zero for null when necessary.
So, if you want to guarantee a value returned, apply COALESCE to the result of SUM, not to its argument:
SELECT COALESCE(SUM(capacity), 0) …
As for the Oracle 'subquestion', well, I couldn't find any notion of NULLs at the official doc page (the one for 10.2, in particular), but two other sources are unambiguous:
Oracle SQL Functions:
SUM([DISTINCT] n) Sum of values of n, ignoring NULLs
sum aggregate function [Oracle SQL]:
…if a sum() is created over some numbers, nulls are disregarded, as the following example shows…
That is, you needn't apply NVL to capacity. (But, like with COALESCE in PostgreSQL, you might want to apply it to SUM.)
The thing is, the aggregate always returns a row, even if no rows were aggregated (as is the case in your query). You summed an expression over no rows. Hence the null value you're getting.
Try this instead:
select coalesce(sum(capacity),0)
from objects
where false;
Just do this:
SELECT COALESCE( SUM(capacity), 0)
FROM objects
WHERE null IS NOT NULL;
By the way, COALESCE inside of SUM is redundant, even if capacity is NULL, it won't make the summary null.
To wit:
create table objects
(
capacity int null
);
insert into objects(capacity) values (1),(2),(NULL),(3);
select sum(capacity) from objects;
That will return a value of 6, not null.
And a coalesce inside an aggregate function is a performance killer too, as your RDBMS engine cannot just rip through all the rows, it has to evaluate each row's column if its value is null. I've seen a bit OCD query where all the aggregate queries has a coalesce inside, I think the original dev has a symptom of Cargo Cult Programming, the query is way very sloooowww. I removed the coalesce inside of SUM, then the query become fast.
Although this post is very old, but i would like to update what I use in such cases
SELECT NVL(SUM(NVL(capacity, 0)),0)
FROM objects
WHERE false;
Here external NVL avoids the cases when there is no row in the result set. Inner NVL is used for null column values, consider the case of (1 + null) and it will result in null. So inner NVL is also necessary other wise in alternate set default value 0 to the column.