Create an array/json from columns excluding NULL values in Bigquery

Create an array/json from columns excluding NULL values in Bigquery - google-bigquery

I have a sample with 6 possible phone numbers and I need to create an array or json that has them all, excluding duplicates and NULLs.
My sample is something like this:
WITH material as(
SELECT 619883407 as phone_1,
CAST(null AS INT64) as phone_2,
CAST(null AS INT64) as phone_3,
CAST(null AS INT64) as phone_4,
69883407 as phone_5,
688234 as phone_6)
SELECT ARRAY_AGG(a IGNORE NULLS) as phones
FROM material CROSS JOIN UNNEST(JSON_EXTRACT_ARRAY(TO_JSON_STRING([phone_1,phone_2,phone_3,phone_4,phone_5,phone_6]))) a
I am happy with my result, but I would need to exclude NULL values. For some reason, adding 'IGNORE NULLS' into the array_agg is not working. Any idea why would this happen?
Thank you!

Any idea why would this happen?
when you do to_json_string - all nulls becomes strings 'null's
Use below instead
select array_agg(a) as phones
from material,
unnest(json_extract_array(to_json_string([phone_1,phone_2,phone_3,phone_4,phone_5,phone_6]))) a
where a != 'null'
with output

Related

Big Query first non null value record type

I'm working with Big Query and I have a record field 'funnels_informations' containing two subfields: 'partnership_title' and 'voucher_code'.
I want to have the first non-null value of partnership_title and the corresponding value of voucher_code.
For example here, I want to have partnership_title=indep and voucher_code=null:
Any solution please?
Thanks in advance.

You may consider below scalar subquery for your purpose.
WITH sample_table AS (
SELECT [STRUCT(STRING(null) AS partnership_title, STRING(NULL) AS voucher_code),
('indep', NULL), ('Le', 'LB')
] AS funnels_informations
)
SELECT (SELECT AS STRUCT * EXCEPT(offset)
FROM UNNEST(t.funnels_informations) WITH OFFSET
WHERE partnership_title IS NOT NULL
ORDER BY offset LIMIT 1
).*
FROM sample_table t;

Consider below option
select fi.*
from your_table t, t.funnels_informations fi with offset
where not partnership_title is null
qualify 1 = row_number() over(partition by to_json_string(t) order by offset)
if applied to sample data in your question - output is

Transposing Column Data into Row Data SQL

I have some data that looks like this in an SQL table.
[ID],[SettleDate],[Curr1],[Curr2][Quantity1],[Quantity2],[CashAmount1],[CashAmount2]
The issue i have, i need to create 2 records from this data (all information from 1 and all information of 2). Example below.
[ID],[SettleDate],[Curr1],[Quantity1],[CashAmount1]
[ID],[SettleDate],[Curr2],[Quantity2],[CashAmount2]
Does anyone have an ideas how to do so?
Thanks

A standard (ie cross-RDBMS) solution for this is to use union:
select ID, SettleDate, Curr1, Quantity1, CashAmount1 from mytable
union all select ID, SettleDate, Curr2, Quantity2, CashAmount2 from mytable
Depending on your RBDMS, neater solutions might be available.

Just another option. The ItemNbr 1/2 is just to maintain which element.
Select A.[ID]
,A.[SettleDate]
,B.*
From YourTable A
Cross Apply ( values (1,[Curr1],[Quantity1],[CashAmount1])
,(2,[Curr2],[Quantity2],[CashAmount2])
) B{ItemNbr,Curr,Quantity,CashAmount)

Sort a list with but with pre-determined override values using SQL

The business problem is a bit obtuse so I won't get into the details.
I have to come up with a sort index for a set of keys, but some of those keys have a pre-determined position in the index which must be respected. The remaining keys have to be ordered as normal but "around" the pre-determined ones.
Simple example is to sort the letters A through E, except that A must be position 3 and D must be position 1. The result I want to achieve is:
A: 3 B: 2 C: 4 D: 1 E: 5
DDL to set up sample:
CREATE TABLE test.element (element_key TEXT, override_sort_idx INTEGER);
insert into test.element VALUES ('A', 3), ('B', Null), ('C', NULL), ('D', 1), ('E', NULL);
The best solution I can come up with is this, but although it appears to work for this simple example, it goes wrong in the general case - it falls apart if you add some more pre-defined values [EDIT - it doesn't even work in this example because A comes out as 4 - apologies]:
WITH inner_sort AS (SELECT element_key, override_sort_idx, row_number()
OVER (ORDER BY element_key) AS natural_sort_idx
FROM test.element)
SELECT element_key, row_number()
OVER
(ORDER BY
CASE
WHEN override_sort_idx IS NULL
THEN natural_sort_idx
ELSE override_sort_idx END) AS hybrid_sort
FROM inner_sort;
Any ideas for a solution that works in the general case?

This proved to be more of a challenge that I initially expected.
But this SQL returns the expected results:
WITH OPENNUMBERS AS
(
select row_number() over () as num
from test.element
except
select override_sort_idx
from test.element
where override_sort_idx is not null
)
, OPENNUMBERS2 AS
(
select num, row_number() over (order by num) as rn
from OPENNUMBERS
)
,NORMALS AS
(
select element_key, row_number() over (order by element_key) as rn
from test.element
where override_sort_idx is null
)
select n.element_key, o.num as hybrid_sort_idx
from OPENNUMBERS2 o
join NORMALS n ON n.rn = o.rn
union all
select element_key, override_sort_idx
from test.element
where override_sort_idx is not null
order by hybrid_sort_idx;
You can test it here on SQL Fiddle.
The trick used?
Get a list of index numbers that are still free after you remove the overriden. (using EXCEPT)
Then get a row_number for those numbers and also for the non-overridden.
Join those on the rownumber.
Then stich the overridden to it.

Posting this since it works, and I think it's the best I can do, but it's pretty horrific.
WITH grouped AS (
SELECT element_key, override_sort_idx,
row_number() OVER (
PARTITION BY override_sort_idx IS NULL
ORDER BY override_sort_idx, element_key)
AS group_idx,
row_number() OVER (ORDER BY element_key) AS natural_sort_idx
FROM test.element),
remaining_idx AS (
SELECT row_number() OVER () AS remain_idx FROM test.element
EXCEPT
SELECT override_sort_idx FROM test.element),
indexed_remaining AS (
SELECT row_number() OVER (ORDER BY remain_idx) AS r_sort_idx,
remain_idx
FROM remaining_idx)
SELECT g.element_key,
coalesce(g.override_sort_idx, r.remain_idx) AS hybrid_index
FROM grouped g
LEFT JOIN indexed_remaining r ON
(CASE WHEN g.override_sort_idx IS NULL
THEN g.group_idx END = r.r_sort_idx)
ORDER BY hybrid_index
This involves creating the "remaining" index values first as the difference between a simple row_number() and the pre-determined index values, which is then joined to a sorted list of keys without pre-determined index values.
The CASE statement in the JOIN is functionally unnecessary given the order of the coalesce but it seems like the "purer" approach.
I have a feeling that someone smarter than me, who understands window functions properly, could write this using window functions with filters, or manipulating the range of a window function, without the crazy nested subqueries/CTEs and joins.

Getting a list of productSKUs into one field in Bigquery SQL

I am wrangling with a query in Google bigquery and hope someone can help.
Desired Output Columns:
visit id - ecommerce_actiontype - sku_list
example:
visit id: 123
ecommerce_actiontype: 6
sku_list: [SKU1,SKU2,SKU3]
I have tried the following two queries:
1)
SELECT
visitid,
eCommerceAction.action_type as ecommerce_actiontype,
(SELECT ARRAY_AGG(productSKU) from UNNEST(hits.product)) AS sku_list
FROM `test.1234.ga_sessions_*` as t, t.hits as hits, hits.product as p
This doesn't put the productSKUs into one list grouped by visitid and ecommat.
2)
SELECT
visitid,
eCommerceAction.action_type as ecommerce_actiontype,
ARRAY_AGG(productSKU) AS sku_list
FROM `oval-unity-88908.97547244.ga_sessions_*` as t, t.hits as hits, hits.product as p
This gives me the error "Error: SELECT list expression references column visitid which is neither grouped nor aggregated at [16:1]"
Anyone have any idea how to achieve the result I want?

Don't have schema details for ga_sessions tables but try below
#standardSQL
SELECT
visitid,
eCommerceAction.action_type AS ecommerce_actiontype,
(SELECT ARRAY_AGG(product.productSKU) FROM UNNEST(t.hits) WHERE NOT product.productSKU IS NULL) AS sku_list,
ARRAY(SELECT product.productSKU FROM UNNEST(t.hits) WHERE NOT product.productSKU IS NULL) AS sku_list2
FROM `test.1234.ga_sessions_*` AS t
as you can see - I presented two options for construction needed array - ARRAY_AGG within SELECT and ARRAY(SELECT

See if this works for you:
SELECT
visitid,
ARRAY(SELECT AS STRUCT ecommerceAction.action_type act_type, ARRAY_AGG(DISTINCT productSku IGNORE NULLS) skus_list FROM UNNEST(hits), UNNEST(product) GROUP BY 1) data
FROM `test.1234.ga_sessions_*`
The main difference between this query and yours is that it avoids using the outer UNNEST on the hits field which helps later on to aggregate on for each visitId all the ecommerceActions and the skus associated with them.

Sorting twice on same column

I'm having a bit of a weird question, given to me by a client.
He has a list of data, with a date between parentheses like so:
Foo (14/08/2012)
Bar (15/08/2012)
Bar (16/09/2012)
Xyz (20/10/2012)
However, he wants the list to be displayed as follows:
Foo (14/08/2012)
Bar (16/09/2012)
Bar (15/08/2012)
Foot (20/10/2012)
(notice that the second Bar has moved up one position)
So, the logic behind it is, that the list has to be sorted by date ascending, EXCEPT when two rows have the same name ('Bar'). If they have the same name, it must be sorted with the LATEST date at the top, while staying in the other sorting order.
Is this even remotely possible? I've experimented with a lot of ORDER BY clauses, but couldn't find the right one. Does anyone have an idea?
I should have specified that this data comes from a table in a sql server database (the Name and the date are in two different columns). So I'm looking for a SQL-query that can do the sorting I want.
(I've dumbed this example down quite a bit, so if you need more context, don't hesitate to ask)

This works, I think
declare #t table (data varchar(50), date datetime)
insert #t
values
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-09-16'),
('Xyz','2012-10-20')
select t.*
from #t t
inner join (select data, COUNT(*) cg, MAX(date) as mg from #t group by data) tc
on t.data = tc.data
order by case when cg>1 then mg else date end, date desc
produces
data date
---------- -----------------------
Foo 2012-08-14 00:00:00.000
Bar 2012-09-16 00:00:00.000
Bar 2012-08-15 00:00:00.000
Xyz 2012-10-20 00:00:00.000

A way with better performance than any of the other posted answers is to just do it entirely with an ORDER BY and not a JOIN or using CTE:
DECLARE #t TABLE (myData varchar(50), myDate datetime)
INSERT INTO #t VALUES
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-09-16'),
('Xyz','2012-10-20')
SELECT *
FROM #t t1
ORDER BY (SELECT MIN(t2.myDate) FROM #t t2 WHERE t2.myData = t1.myData), T1.myDate DESC
This does exactly what you request and will work with any indexes and much better with larger amounts of data than any of the other answers.
Additionally it's much more clear what you're actually trying to do here, rather than masking the real logic with the complexity of a join and checking the count of joined items.

This one uses analytic functions to perform the sort, it only requires one SELECT from your table.
The inner query finds gaps, where the name changes. These gaps are used to identify groups in the next query, and the outer query does the final sorting by these groups.
I have tried it here (SQL Fiddle) with extended test-data.
SELECT name, dat
FROM (
SELECT name, dat, SUM(gap) over(ORDER BY dat, name) AS grp
FROM (
SELECT name, dat,
CASE WHEN LAG(name) OVER (ORDER BY dat, name) = name THEN 0 ELSE 1 END AS gap
FROM t
) x
) y
ORDER BY grp, dat DESC
Extended test-data
('Bar','2012-08-12'),
('Bar','2012-08-11'),
('Foo','2012-08-14'),
('Bar','2012-08-15'),
('Bar','2012-08-16'),
('Bar','2012-09-17'),
('Xyz','2012-10-20')
Result
Bar 2012-08-12
Bar 2012-08-11
Foo 2012-08-14
Bar 2012-09-17
Bar 2012-08-16
Bar 2012-08-15
Xyz 2012-10-20

I think that this works, including the case I asked about in the comments:
declare #t table (data varchar(50), [date] datetime)
insert #t
values
('Foo','20120814'),
('Bar','20120815'),
('Bar','20120916'),
('Xyz','20121020')
; With OuterSort as (
select *,ROW_NUMBER() OVER (ORDER BY [date] asc) as rn from #t
)
--Now we need to find contiguous ranges of the same data value, and the min and max row number for such a range
, Islands as (
select data,rn as rnMin,rn as rnMax from OuterSort os where not exists (select * from OuterSort os2 where os2.data = os.data and os2.rn = os.rn - 1)
union all
select i.data,rnMin,os.rn
from
Islands i
inner join
OuterSort os
on
i.data = os.data and
i.rnMax = os.rn-1
), FullIslands as (
select
data,rnMin,MAX(rnMax) as rnMax
from Islands
group by data,rnMin
)
select
*
from
OuterSort os
inner join
FullIslands fi
on
os.rn between fi.rnMin and fi.rnMax
order by
fi.rnMin asc,os.rn desc
It works by first computing the initial ordering in the OuterSort CTE. Then, using two CTEs (Islands and FullIslands), we compute the parts of that ordering in which the same data value appears in adjacent rows. Having done that, we can compute the final ordering by any value that all adjacent values will have (such as the lowest row number of the "island" that they belong to), and then within an "island", we use the reverse of the originally computed sort order.
Note that this may, though, not be too efficient for large data sets. On the sample data it shows up as requiring 4 table scans of the base table, as well as a spool.

Try something like...
ORDER BY CASE date
WHEN '14/08/2012' THEN 1
WHEN '16/09/2012' THEN 2
WHEN '15/08/2012' THEN 3
WHEN '20/10/2012' THEN 4
END
In MySQL, you can do:
ORDER BY FIELD(date, '14/08/2012', '16/09/2012', '15/08/2012', '20/10/2012')
In Postgres, you can create a function FIELD and do:
CREATE OR REPLACE FUNCTION field(anyelement, anyarray) RETURNS numeric AS $$
SELECT
COALESCE((SELECT i
FROM generate_series(1, array_upper($2, 1)) gs(i)
WHERE $2[i] = $1),
0);
$$ LANGUAGE SQL STABLE
If you do not want to use the CASE, you can try to find an implementation of the FIELD function to SQL Server.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Create an array/json from columns excluding NULL values in Bigquery - google-bigquery

Any idea why would this happen? when you do to_json_string - all nulls becomes strings 'null's Use below instead select array_agg(a) as phones from material, unnest(json_extract_array(to_json_string([phone_1,phone_2,phone_3,phone_4,phone_5,phone_6]))) a where a != 'null' with output

Related

Big Query first non null value record type

Transposing Column Data into Row Data SQL

Sort a list with but with pre-determined override values using SQL

Getting a list of productSKUs into one field in Bigquery SQL

Sorting twice on same column

Categories

Resources