Correctly Migrate Postgres least() Behavior to BigQuery - sql

I'm attempting to migrate postgres scripts over to bigquery, with the end goal of both scripts returning the exact same tables (schema and values).
I'm running into an issue when trying to replicate the behavior of least() in postgres in my bigquery selects.
In postgres, if any parameters of the least() call are null, they are skipped and the least non-null value is returned. In bigquery, however, if any of the parameters of the least() call are null, the function automatically returns null.
I'm looking for an elegant solution to replicate the postgres least() behavior in bigquery. My current—clunky—solution is below:
Postgres (returns -1):
SELECT LEAST(1, 0, -1, null)
BigQuery (returns null):
SELECT LEAST(1, 0, -1, null)
Postgres (returns -1):
SELECT LEAST(COALESCE(1, 0, -1, null),
COALESCE(0, 1, -1, null),
COALESCE(-1, 0, 1, null),
COALESCE(null, 0, -1, 1))
BigQuery (returns -1):
SELECT LEAST(COALESCE(1, 0, -1, null),
COALESCE(0, 1, -1, null),
COALESCE(-1, 0, 1, null),
COALESCE(null, 0, -1, 1))
This works but is a less-than-ideal solution.
In the original postgres script I need to migrate, there is nested logic like least(w, x, least(y, z)) so that fix gets exponentially more unreadable as the number of values/complexity grows. That same issue applies when you try to do this as a massive CASE block.
If anyone has an obvious fix that I'm missing or a more elegant way to mirror the postgres behavior in bigquery, it is much appreciated!

There is a simple workaround for BigQuery Standard SQL
You just create your own function (let's say myLeast)
It works for "standalone" as well as in nested scenario
#standardSQL
CREATE TEMP FUNCTION myLeast(x ARRAY<INT64>) AS
((SELECT MIN(y) FROM UNNEST(x) AS y));
SELECT
LEAST(1, 0, -1, NULL) AS least_standard,
LEAST(COALESCE(1, 0, -1, NULL),
COALESCE(0, 1, -1, NULL),
COALESCE(-1, 0, 1, NULL),
COALESCE(NULL, 0, -1, 1)) AS least_less_than_ideal,
myLeast([1, 0, -1, NULL]) AS least_workaround,
myLeast([1, 0, -1, NULL, myLeast([2, 0, -2, NULL])]) AS least_with_nested
Output is
least_standard least_less_than_ideal least_workaround least_with_nested
null -1 -1 -2
first two is from your question - third and forth are "standalone" and nested workaround
Hope you can apply this approach to your specific case

Both Oracle and Vertica behave the same as BigQuery, following general rule of SQL functions - if one of the arguments is NULL - the result is NULL. PostgreSQL makes an exception to that rule, explicitly stating in documentation:
The result will be NULL only if all the expressions evaluate to NULL.
Note that GREATEST and LEAST are not in the SQL standard, but are a
common extension. Some other databases make them return NULL if any
argument is NULL, rather than only when all are NULL.
I would open Feature Request in BigQuery issue tracker to add IGNORE NULLS parameter to LEAST and GREATEST to get PostgreSQL compatible behavior. Even though normally IGNORE NULLS only applies to aggregate functions, LEAST and GREATEST are kind of similar to aggregate functions.

Without a function:
select
(select min(col) from unnest([a,b,c,d,e]) col) least,
(select max(col) from unnest([a,b,c,d,e]) col) greatest,
*
from
(
select 1 a, 2 b, 3 c, null d, 5 e
union all
select null a, null b, null c, null d, null e
) tbl

Maybe something like this could work?
WITH tbl AS(
SELECT 1 AS a, 2 AS b
UNION ALL SELECT NULL, 2
UNION ALL SELECT 1, NULL
UNION ALL SELECT NULL, NULL
)
SELECT
tbl.*
, COALESCE( LEAST(a, b), a , b)
FROM tbl

How about this? :) "The Postgres library" :)
DECLARE input STRING DEFAULT (
WITH t AS
(
SELECT 1 a, 0 b, -1 c, null d
UNION ALL
SELECT 0, 1, -1, null
UNION ALL
SELECT -1, 0, 1, null
UNION ALL
SELECT null, 0, -1, 1
)
SELECT '['||STRING_AGG("'"||TO_JSON_STRING(t)||"'")||']' FROM t
)
;
EXECUTE IMMEDIATE '''
SELECT * FROM EXTERNAL_QUERY("project.location.connection",
\'\'\'
SELECT
GREATEST (
(t::json->>'a')::INT,
(t::json->>'b')::INT,
(t::json->>'c')::INT,
(t::json->>'d')::INT
)
FROM
UNNEST (ARRAY '''||input||") AS t ''')"
Of course it may improve with dynamic "body" for GREATEST but I hope nobody will use it.
It's too sad to live in the world with no IGNORE NULLS for GREATEST and LEAST...as well as impossible to direct string variable into EXTERNAL_QUERY 😔 😔 😔
Questions to community
Does anyone know about limitations for this "method"?
How long string is allowed to execute in EXECUTE IMMEDIATE?
as argument for EXTERNAL_QUERY?

Related

Concise test for zero or NULL for multiple columns

I would like to make test for multiple columns if they have a positive number and return results in a binary string. Single test condition yield 0 if either =0 or is NULL. Say, we have a 3 conditions A, B, and C which return either 0 or 1. The result i.e., 101 means that A and C have positive numbers while B is either zero or null.
This gets what I want:
SELECT
format(
iif(coalesce([A], 0) > 0, 100, 0)
+ iif(coalesce([B], 0) > 0, 10, 0)
+ iif(coalesce([C], 0) > 0, 1, 0)
), '000'
)
Is there a more concise way to achieve the goal, perhaps avoiding COALESCE?
I'd just go for CONCAT if I am honest, it's far more performant than FORMAT, and remove the COALESCE as it isn't needed.
SELECT CONCAT(IIF(A>0,1,0),IIF(B>0,1,0),IIF(C>0,1,0))
FROM (VALUES(1,NULL,2),
(-1,12,18),
(-1,0,1),
(NULL,NULL,-4))V(A,B,C);

Access query to SQL Server view

I am trying to convert an Access query to a SQL Server view. I have 3 boolean columns in the Access query MedTypeHealth, MedTypeSocial and MedTypeEducation.
In Access I use this formula:
MedType: IIf([MedTypeHealth],"H"," ") & IIf([MedTypeSocial],"S"," ") & IIf([MedTypeEducation],"E"," ")
If all 3 flags are set, this returns 'HSE' and if only the Health flag is set, I get me 'H'.
I have the same columns in the SQL Server view.
How can I get the equivalent result in a SQL Server view? What T-SQL functions and code should I use?
SQL Server has no boolean datatype, so you would probably use a small integer value, or a bit.
Then, I would recommend concat_ws() and conditional expressions: it happily ignores null values, which simplifies the case expressions:
concat_ws('',
case when MedTypeHealth = 1 then 'H' end,
case when MedTypeSocial = 1 then 'S' end,
case when MedTypeEducation = 1 then 'E' end
) as MedType
You can use the same IIF function.
Replace double quotes with single quotes to indicate strings in SQL Server
Replace & with + to concatenate strings in SQL Server
You need to explicitly tell the comparison like [MedTypeHealth]=1 in SQL Server
SQL Server does not have boolean. You can use bit (which is 0 or 1) but you cannot do SUM() operation on it. If you need SUM() operation, either use another numeric type (tinyint, smallint, etc.) or do an explicit type cast cast(MedTypeSocial as smallint)
Here is an example you can use:
CREATE TABLE [MyTestTable1] (
[MedTypeHealth] bit
, [MedTypeSocial] bit
, [MedTypeEducation] bit
)
INSERT INTO [MyTestTable1]
VALUES
( 1, 0, 1)
, ( 0, 0, 1)
, ( 1, 1, 1)
, ( 1, 0, 0)
, ( 0, 0, 0)
SELECT
*
, IIf([MedTypeHealth]=1,'H',' ') + IIf([MedTypeSocial]=1,'S',' ') + IIf([MedTypeEducation]=1,'E',' ') WithSpace
, IIf([MedTypeHealth]=1,'H','') + IIf([MedTypeSocial]=1,'S','') + IIf([MedTypeEducation]=1,'E','') WithoutSpace
FROM
[MyTestTable1]
---- drop when done. uncomment first
-- DROP TABLE [MyTestTable1]

SQL SUM to ignore NULL value

I have a table TEST_TABLE as follows:
Name x_col y_col
=======================
Jay NULL 2
This is a simplistic representation of a much larger issue but will suffice.
When I do the following query I get NULL returned
SELECT SUM(x_col + y_col) FROM TEST_TABLE WHERE Name='Jay'
I want it to be 2. I thought the SUM() method ignores NULL values. How can I ignore values that are null in this query? Or actually in general, as this is a problem for a lot of my algorithms.
You get NULL because NULL + 2 returns NULL. The SUM() has only one row, and if the + expression is NULL, then the SUM() returns NULL.
If you want NULL to be treated as 0, the use COALESCE():
SELECT SUM(COALESCE(x_col, 0) + COALESCE(y_col, 0))
FROM TEST_TABLE
WHERE Name = 'Jay';
One final note. If you start with your data and filtered out all rows, then the result will still be NULL. To get 0, you need an additional COALESCE():
SELECT COALESCE(SUM(COALESCE(x_col, 0) + COALESCE(y_col, 0)), 0)
FROM TEST_TABLE
WHERE Name = 'Jayden';
Use COALESCE to replace NULL with 0.
SELECT sum(coalesce(x_col, 0) + coalesce(y_col, 0)) FROM TEST_TABLE WHERE Name='Jay'

Db2 locate_in_string equivalent in PostgreSQL

while migration from DB2 to PostgreSQL, i found some views using db2's locate_in_string() function, which returns the position of a specified instance of a given substring.
For example:
LOCATE_IN_STRING('aaabaabbaaaab','b',1,3); -- returns 8, for the 3d instance of 'b'
LOCATE_IN_STRING('aaabaabbaaaab','b',1,1); -- returns 4, for the 1st instance of 'b'
Unfortunately PostgreSQLs function position() gives me only the position for the first instance.
I didn't find something similar in PostgreSQL.
Is there any alternative or workaround (maybe regex?)?
There may be a different method, this is rather brute force.
It splits the string based on the pattern you are looking for. It then adds up the length of the pieces:
select v.*,
(select coalesce(sum(length(el)), 0) + count(*) * length(v.splitter)
from unnest( (regexp_split_to_array(v.val, v.splitter))[1:v.n] ) el
) as pos
from (values ('aaabaabbaaaab', 3, 'b'), ('aaabaabbaaaab', 1, 'b')
) v(val, n, splitter);

SQL minus 2 columns - with null values

I have this table (made from a SQL query):
Row 1 Row 2
2 1
3 NULL
And I want to minus the 2 columns, so I just select like this:
Select Row1 - Row2
From table
But then I get this result:
1
NULL
instead of:
1
3
How can I make it possible to get the last result?
Please try:
SELECT ISNULL([Row 1], 0) - ISNULL([Row 2], 0) from YourTable
For more Information visit ISNULL
The reason you got this is because Any Mathematical operation with NULL produces NULL So while doing operation all values should be read as NULL=0.
With ISNULL()
Hence
SELECT ISNULL([Row 1], 0) - ISNULL([Row 2], 0) from YourTable
The MySQL equivalent of ISNULL is IFNULL
If expr1 is not NULL, IFNULL() returns expr1; otherwise it returns
expr2.
Maybe also look at SQL NULL Functions
The ISNULL from MySQL is used to check if a value is null
If expr is NULL, ISNULL() returns 1, otherwise it returns 0.
in sql anything minus with NULL then it is always NULL so you need to convert NULL to Zero
SELECT ISNULL(ROW1,0)-ISNULL(ROW2,0) FROM YOUR_TABLE
Select Row1 - COALESCE(Row2,0)
From table