Any CONCAT() variation that tolerates NULL values? - sql

CONCAT() returns NULL when any value is NULL. I have to use IFNULL() to
wrap all fields passed to CONCAT(). Is there a CONCAT() variation that just
ignores NULL?
For example:
#standardSQL
WITH data AS (
SELECT 'a' a, 'b' b, CAST(null AS STRING) nu
)
SELECT CONCAT(a, b, nu) concatenated, ARRAY_TO_STRING([a,b,nu], ',') w_array_to_string
FROM `data`
--->
null

Quick Jam Session on interesting theme in question
There are potentially unlimited combination of real-life use cases here
Below are few variations:
#standardSQL
WITH data AS (
SELECT 'a' a, 'b' b, 'c' c UNION ALL
SELECT 'y', 'x', NULL UNION ALL
SELECT 'v', NULL, 'w'
)
SELECT
*,
CONCAT(a, b, c) by_concat,
ARRAY_TO_STRING([a,b,c], '') by_array_to_string,
ARRAY_TO_STRING([a,b,c], '', '?') with_null_placeholder,
ARRAY_TO_STRING(
(SELECT ARRAY_AGG(col ORDER BY col DESC)
FROM UNNEST([a,b,c]) AS col
WHERE NOT col IS NULL)
, '') with_order
FROM `data`
The output is:
a b c by_concat by_array_to_string with_null_placeholder with_order
- ---- ---- --------- ------------------ --------------------- ----------
y x null null yx yx? yx
a b c abc abc abc cba
v null w null vw v?w wv

Use ARRAY_TO_STRING([col1, col2, ...]) instead:
#standardSQL
WITH data AS (
SELECT 'a' a, 'b' b, CAST(null AS STRING) nu
)
SELECT ARRAY_TO_STRING([a,b,nu], '') w_array_to_string
FROM `data`
--->
ab

Related

How to check in SQL if multi columnar set is in the table (without string concatenation)

Let's assume I've 3 columns in a table with values like this:
table_1:
A | B | C
-----------------------
'xx' | '' | 'y'
'x' | 'y' | 'x'
'x' | 'x' | 'y'
'x' | 'yy' | ''
'x' | '' | 'yy'
'x' | 'y' | 'y'
I've a result set (result of an SQL SELECT statement) which I want to identify in the above table if it exists there:
[
('x', 'x', 'y')
('x', 'y', 'y')
]
This result set would match for 5 (of 6) rows in instead of the 2 from the table above if I've compared the results of simple string concatenation, e.g. I would simply compare the results of this: SELECT concat(A, B, C) FROM table_1
I could solve this problem with comparing the results of more complex string concatenation functions like this: SELECT concat('A=', A, '_', 'B=', B, '_', 'C=', C )
BUT:
I don't want to use any hardcoded special separator in a string concatenation like _ or =
because any character might be in the data
e.g.: somewhere in column B there might be this value: xx_C=yy
it's not a clean solution
I don't want to use string concatenation at all, because it's an ugly solution
it makes the "distance" between the attributes disappear
not general enough
maybe I've columns with different datatypes I don't want to convert to a STRING based column
Question:
Is it possible to solve somehow this problem without using string concatenation?
Is there a simple solution for this multi column value checking problem?
I want to solve this in BiqQuery, but I'm interested in a general solution for every relational databse/datawarehouse.
Thank you!
CREATE TABLE test.table_1 (
A STRING,
B STRING,
C STRING
) AS
SELECT * FROM (
SELECT 'xx', '', 'y'
UNION ALL
SELECT 'x', 'y', 'x'
UNION ALL
SELECT 'x', 'x', 'y'
UNION ALL
SELECT 'x', 'yy', ''
UNION ALL
SELECT 'x', '', 'yy'
UNION ALL
SELECT 'x', 'y', 'y'
)
SELECT A, B, C
FROM test.table_1
WHERE (A, B, C) IN ( -> I need this functionality
SELECT 'x', 'x', 'y'
UNION ALL
SELECT 'x', 'y', 'y'
);
Below is the most generic way I can think of (BigQuery Standard SQL):
#standardSQL
SELECT *
FROM `project.test.table1` t
WHERE t IN (
SELECT t
FROM `project.test.table2` t
)
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.test.table1` AS (
SELECT 'xx' a, '' b, 'y' c UNION ALL
SELECT 'x', 'y', 'x' UNION ALL
SELECT 'x', 'x', 'y' UNION ALL
SELECT 'x', 'yy', '' UNION ALL
SELECT 'x', '', 'yy' UNION ALL
SELECT 'x', 'y', 'y'
), `project.test.table2` AS (
SELECT 'x' a, 'x' b, 'y' c UNION ALL
SELECT 'x', 'y', 'y'
)
SELECT *
FROM `project.test.table1` t
WHERE t IN (
SELECT t
FROM `project.test.table2` t
)
with output
Row a b c
1 x x y
2 x y y
Use join:
SELECT t1.*
FROM test.table_1 t1 JOIN
(SELECT 'x' as a, 'x' as b, 'y' as c
UNION ALL
SELECT 'x', 'y', 'y'
) t2
USING (a, b, c);

Showing NULL on purpose when a NULL joined value is present in SQL

I have a table with some input values and a table with lookup values like below:
select input.value, coalesce(mapping.value, input.value) result from (
select 'a' union all select 'c'
) input (value) left join (
select 'a', 'z' union all select 'b', 'y'
) mapping (lookupkey, value) on input.value = mapping.lookupkey
which gives:
value | result
--------------
a | z
c | c
i.e. I want to show the original values as well as the mapped value but if there is none then show the original value as the result.
The above works well so far with coalesce to determine if there is a mapped value or not. But now if I allow NULL as a valid mapped value, I want to see NULL as the result and not the original value, since it does find the mapped value, only that the mapped value is NULL. The same code above failed to achieve this:
select input.value, coalesce(mapping.value, input.value) result from (
select 'a' union all select 'c'
) input (value) left join (
select 'a', 'z' union all select 'b', 'y' union all select 'c', null
) mapping (lookupkey, value) on input.value = mapping.lookupkey
which gives the same output as above, but what I want is:
value | result
--------------
a | z
c | NULL
Is there an alternative to coalesce that can achieve what I want?
I think you just want a case expression e.g.
select input.[value]
, coalesce(mapping.[value], input.[value]) result
, case when mapping.lookupkey is not null then mapping.[value] else input.[value] end new_result
from (
select 'a'
union all
select 'c'
) input ([value])
left join (
select 'a', 'z'
union all
select 'b', 'y'
union all
select 'c', null
) mapping (lookupkey, [value]) on input.[value] = mapping.lookupkey
Returns:
value result new_result
a z z
c c NULL

1 of these 2 request apparently equivalent is not working

I try to understand how PIVOT table works
These 2 requests with pivot table seem equivalent:
I only write
tablename.column1, ...........column2 instead of tablename.*
You can find the requests here:
https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=a5c3aacdaebe599bb050295caf3512b6
with
a as
(
select
a1.column_value a, a2.column_value b , cos(a1.r) c
from
(select column_value, rownum r from table(sys.odcinumberlist(1,2,3,4,5))) a1 ,
(select column_value, rownum r from table(sys.odcivarchar2list('a','b','a','b','a'))) a2
where
a1.r = a2.r)
select a.a,a.b,a.c from a --a.a,a.b
PIVOT
(
count(a.a)--,sum(a.c)
FOR b IN ('a', 'b')
)
ORA-00904: "A"."C": invalid identifier
with
a as
(
select
a1.column_value a, a2.column_value b , cos(a1.r) c
from
(select column_value, rownum r from table(sys.odcinumberlist(1,2,3,4,5))) a1 ,
(select column_value, rownum r from table(sys.odcivarchar2list('a','b','a','b','a'))) a2
where
a1.r = a2.r)
select * from a
PIVOT
(
COUNT(a.a)--,sum(a.c)
FOR b IN ('a', 'b')
)
intended result
When you do a PIVOT, Oracle will name the resulting columns just like their original values.
You can see this behavior when you do your select * that is working :
with
a as
(
select
a1.column_value a, a2.column_value b , cos(a1.r) c
from
(select column_value, rownum r from table(sys.odcinumberlist(1,2,3,4,5))) a1 ,
(select column_value, rownum r from table(sys.odcivarchar2list('a','b','a','b','a'))) a2
where
a1.r = a2.r)
select * from a
PIVOT
(
COUNT(a.a)--,sum(a.c)
FOR b IN ('a', 'b')
)
result is
C 'a' 'b'
-.65364362086361191463916818309775038145 0 1
.5403023058681397174009366074429766037354 1 0
-.98999249660044545727157279473126130238 1 0
.2836621854632262644666391715135573083265 1 0
-.41614683654714238699756822950076218977 0 1
Your columns headings have been turned by Oracle into the exact values you've got in the IN clause, including the surrounding quotes.
So to refer them in your SELECT clause, you should use double quotes like this:
select "'a'","'b'", c from a
PIVOT
(
count(a.a)--,sum(a.c)
FOR b IN ('a', 'b')
)
An alternative is to alias your values directly in the IN clause
select val_a, val_b, c from a --a.a,a.b
PIVOT
(
count(a.a)--,sum(a.c)
FOR b IN ('a' val_a, 'b' val_b )
)
VAL_A VAL_B C
0 1 -.65364362086361191463916818309775038145
1 0 .5403023058681397174009366074429766037354
1 0 -.98999249660044545727157279473126130238
1 0 .2836621854632262644666391715135573083265
0 1 -.41614683654714238699756822950076218977
And finally, you had another mistake in your initial approach:
select a.a,a.b,a.c from a --a.a,a.b
PIVOT
(
count(a.a)--,sum(a.c)
FOR b IN ('a', 'b')
)
in this query a refers to your initial a CTE. When you do a.a,a.b,a.c, Oracle doesn't know what you are referencing because of the PIVOT that comes afterwards.
You should properly alias the PIVOT results if you want to refer to it in the SELECT clause :
select pa."'a'",pa."'b'", pa.c from a
PIVOT
(
count(a.a)--,sum(a.c)
FOR b IN ('a', 'b')
) pa

How to select columns of data in BigQuery that has all NULL values

How to select columns of data in BigQuery that has all NULL values
A B C
NULL 1 NULL
NULL NULL NULL
NULL 2 NULL
NULL 3 NULL
I want to retrieve columns A and C. Please can you help!!
Expanding on my comment on Mikhail's answer, this is what I had in mind. It doesn't require generating a query string, which could be quite long if you have a large number of columns. It compares the count of null values for each column name to the total number of rows in the table to decide if the column should be included in the result.
#standardSQL
WITH `project.dataset.table` AS (
SELECT NULL A, 1 B, NULL C UNION ALL
SELECT NULL, NULL, NULL UNION ALL
SELECT NULL, 2, NULL UNION ALL
SELECT NULL, 3, NULL
)
SELECT null_column
FROM `project.dataset.table` AS t,
UNNEST(REGEXP_EXTRACT_ALL(
TO_JSON_STRING(t),
r'\"([a-zA-Z0-9\_]+)\":null')
) AS null_column
GROUP BY null_column
HAVING COUNT(*) = (SELECT COUNT(*) FROM `project.dataset.table`);
Below is for BigQuery StandardSQL
Simple option:
#standardSQL
WITH `project.dataset.table` AS (
SELECT NULL A, 1 B, NULL C UNION ALL
SELECT NULL, NULL, NULL UNION ALL
SELECT NULL, 2, NULL UNION ALL
SELECT NULL, 3, NULL
)
SELECT COUNT(A) A, COUNT(B) B, COUNT(C) C
FROM `project.dataset.table`
it returns below where 0(zero) indicates that respective column has all NULLs
A B C
0 3 0
If this is "not enough" - below is more "sophisticated" version:
#standardSQL
WITH `project.dataset.table` AS (
SELECT NULL A, 1 B, NULL C UNION ALL
SELECT NULL, NULL, NULL UNION ALL
SELECT NULL, 2, NULL UNION ALL
SELECT NULL, 3, NULL
)
SELECT SPLIT(y, ':')[OFFSET(0)] column
FROM (
SELECT REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}"]', '') x
FROM (
SELECT COUNT(A) A, COUNT(B) B, COUNT(C) C
FROM `project.dataset.table`
) t
), UNNEST(SPLIT(x)) y
WHERE CAST(SPLIT(y, ':')[OFFSET(1)] AS INT64) = 0
it returns result as below - enlisting only columns with all NULLs
column
A
C
Note: for your real table - just remove WITH block and replace project.dataset.table with your real table reference
Also, of course, use real column names
My table has round 700 columns..
Below is an example of how you can easily generate above query for any number of columns.
1. Just run below
2. Copy result - this is a generated query
3. paste generated query into new UI and run it
4. Enjoy (I hope you will) result :o)
Of course, as usually replace project.dataset.table with your real table reference
#standardSQL
SELECT
CONCAT('''
SELECT SPLIT(y, ':')[OFFSET(0)] column
FROM (
SELECT REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}"]', '') x
FROM (
SELECT ''', y,
'''
FROM `project.dataset.table`
) t
), UNNEST(SPLIT(x)) y
WHERE CAST(SPLIT(y, ':')[OFFSET(1)] AS INT64) = 0
'''
)
FROM (
SELECT
STRING_AGG(CONCAT('COUNT(', x, ') ', x), ', ') y
FROM (
SELECT REGEXP_EXTRACT_ALL(REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}]', ''), r'"([\w_]+)":') x
FROM `project.dataset.table` t
LIMIT 1
), UNNEST(x) x
)
Note: please pay attention to query cost - both "generation query" and final query itself will do full scan
You can generate columns list much cheaper off of table schema in any client of your choice
To test / play with it - you can use same dummy data as for initial queries in my answer

Transforming an SQL table using Unpivot

If I have a table with the following structure
ID NAME A1 A2 A3 B1 B2 B3
X----Y------0---1---2---3---4---5 (dashes are just to push values under headers)
How do I transform it to be the following:
ID NAME LETTER 1 2 3
X----Y----------A------0-1-2
X---Y------------B-----3-4-5
SELECT id, name, 'A' letter, a1 1, a2 2, a3 3 from example_table
UNION
SELECT id, name, 'B' letter, b1 1, b2 2, b3 3 from example_table
I'm not sure that 1,2,3 will work as column names; that may be RDBMS-dependant.
While I don't consider this a feasible answer, it may give you something to think about as it produces the end result in sql server.
SELECT
upvt.id
, upvt.name
, left(label,1) AS letter
, CASE
WHEN left(label,1) = 'A' THEN MIN(value)
WHEN left(label,1) = 'B' THEN MAX(value)
END AS [1]
, CASE
WHEN left(label,1) = 'A' THEN MIN(value1)
WHEN left(label,1) = 'B' THEN MAX(value1)
END AS [2]
, CASE
WHEN left(label,1) = 'A' THEN MIN(value2)
WHEN left(label,1) = 'B' THEN MAX(value2)
END AS [3]
FROM example_table
UNPIVOT (
value
FOR label IN (a1,b1)
) upvt
INNER JOIN (
SELECT
id
, name
, left(label1,1) AS letter
, value1
FROM example_table
UNPIVOT (
value1
FOR label1 IN (a2,b2)
) upvt1 ) as b
ON upvt.id = b.id
AND upvt.name = b.name
INNER JOIN (
SELECT
id
, name
, left(label2,1) AS letter
, value2
FROM example_table
UNPIVOT (
value2
FOR label2 IN (a3,b3)
) upvt2 ) c
ON upvt.id = c.id
AND upvt.name = c.name
group by upvt.id, upvt.name, upvt.label
Not exactly what I would call "production ready".