Flattening an array in SQL - sql

I am trying to flatten this array so that each neighbor has its own column.
How do I write a query that allows me to flatten this array when I don't know the elements in the array?
SELECT deviceid,
neighbors
FROM
`etl.routing_table_nodes`
WHERE
Parent = 'QMI-YSK'
And results currently look like:
Row deviceid neighbors
1 OHX-ZSI DMR-RLE
WMI-YEK
2 OHX-ZFI DMR-RLE
QMI-YSK
Bigquery screenshot

Try
SELECT
deviceid, unnested_neighbors
FROM
`etl.routing_table_nodes` table,
UNNEST(table.neighbors) as unnested_neighbors
WHERE
unnested_neighbors = 'QMI-YSK'

Related

Is it abe to unite two columns of one document into one column in ArangoDB query?

I have an AQL query:
return union_distinct(
for a in table
return a.Id,
for b in table
return b.Id)
Is it able to get two columns as one array in one loop?
Yes. It's possible the following way:
RETURN UNIQUE(FLATTEN(
FOR doc IN collection
RETURN [doc.colA,doc.colB]
))
The FOR loop generates a two-dimensional array: [[doc1.colA,doc1.colB],[doc2.colA,doc2.colB], etc...]
This needs to be flattened, then then unique values filtered out. See the documentation of array function: https://www.arangodb.com/docs/3.8/aql/functions-array.html
The result of this query will be a nested array: [[unique values...]], so you just need to get the first element of the result array to get to the list you want.
Another solution which will give you a non-nested array:
FOR elem IN (FLATTEN(
FOR doc IN collection
RETURN [doc.colA,doc.colB]
)
RETURN DISTINCT elem
Although this is two loops again, but the second one runs on an in-memory array, so it's quite fast.
Note: The "DISTINCT" here is not a function but a special form of RETURN: https://www.arangodb.com/docs/3.8/aql/operations-return.html#return-distinct

Unnesting repeated records to a single row in Big Query

I have a dataset that includes repeated records. When I unnest them I get 2 rows. 1 per nested record.
Before unnest raw data:
After unnest using this query:
SELECT
eventTime
participant.id
FROM
`public.table`,
UNNEST(people) AS participant
WHERE
verb = 'event'
These are actually 2 rows that are expanded to 4. I've been trying to unnest into a single row so I have 3 columns,
eventTime, buyer.Id, seller.Id.
I've been trying to use REPLACE to build a struct of the unnested content but I cannot figure out how to do it. Any pointer , documentation or steps that could help me out?
Consider below approach
SELECT * EXCEPT(key) FROM (
SELECT
eventTime,
participant.id,
personEventRole,
TO_JSON_STRING(t) key
FROM `public.table` t,
UNNEST(people) AS participant
WHERE verb = 'event'
)
PIVOT (MIN(id) FOR personEventRole IN ('buyer', 'seller'))
if applied to sample data in your question - output is

function to sum all first value of Results SQL

I have a table with "Number", "Name" and "Result" Column. Result is a 2D text Array and I need to create a Column with the name "Average" that sum all first values of Result Array and divide by 2, can somebody help me Pls, I must use the create function for this. Its look like this:
Table1
Number
Name
Result
Average
01
Kevin
{{2.0,10},{3.0,50}}
2.5
02
Max
{{1.0,10},{4.0,30},{5.0,20}}
5.0
Average = ((2.0+3.0)/2) = 2.5
= ((1.0+4.0+5.0)/2) = 5.0
First of all: You should always avoid storing arrays in the table (or generate them in a subquery if not extremely necessary). Normalize it, it makes life much easier in nearly every single use case.
Second: You should avoid more-dimensional arrays. The are very hard to handle. See Unnest array by one level
However, in your special case you could do something like this:
demo:db<>fiddle
SELECT
number,
name,
SUM(value) FILTER (WHERE idx % 2 = 1) / 2 -- 2
FROM mytable,
unnest(avg_result) WITH ORDINALITY as elements(value, idx) -- 1
GROUP BY number, name
unnest() expands the array elements into one element per record. But this is not an one-level expand: It expand ALL elements in depth. To keep track of your elements, you could add an index using WITH ORDINALITY.
Because you have nested two-elemented arrays, the unnested data can be used as follows: You want to sum all first of two elements, which is every second (the odd ones) element. Using the FILTER clause in the aggregation helps you to aggregate only exact these elements.
However: If that's was a result of a subquery, you should think about doing the operation BEFORE array aggregation (if this is really necessary). This makes things easier.
Assumptions:
number column is Primary key.
result column is text or varchar type
Here are the steps for your requirements:
Add the column in your table using following query (you can skip this step if column is already added)
alter table table1 add column average decimal;
Update the calculated value by using below query:
update table1 t1
set average = t2.value_
from
(
select
number,
sum(t::decimal)/2 as value_
from table1
cross join lateral unnest((result::text[][])[1:999][1]) as t
group by 1
) t2
where t1.number=t2.number
Explanation: Here unnest((result::text[][])[1:999][1]) will return the first value of each child array (considering you can have up to 999 child arrays in your 2D array. You can increase or decrease it as per your requirement)
DEMO
Now you can create your function as per your requirement with above query.

Converting arrays to nested fields in BigQuery

I'm streaming Stackdriver logs into Bigquery, and they end up in a textPayload field in the following format:
member_id_hashed=123456789,
member_age -> Float(37.0,244),
operations=[92967,93486,86220,92814,92943,93279,...],
scores=[3.214899,2.3641025E-5,2.5823574,2.3818345,3.9919448,0.0,...],
[etc]
I then define a query/view on the table with the raw logging entries as follows:
SELECT
member_id_hashed as member_id, member_age,
split(operations,',') as operation,
split(scores,',') as score
FROM
(
SELECT
REGEXP_EXTRACT(textPayload, r'member_id=([0-9]+)') as member_id_hashed,
REGEXP_EXTRACT(textPayload, r'member_age -> Float\(([0-9]+)') as member_age,
REGEXP_EXTRACT(textPayload, r'operations=\[(.+)') as operations,
REGEXP_EXTRACT(textPayload, r'scores=\[(.+)') as scores
from `myproject.mydataset.mytable`
)
resulting in one row with two single fields and two arrays:
Ideally, for further analysis, I would like the two arrays to be nested (e.g. operation.id and operation.score) or flatten the arrays line by line while keeping the positions (i.e. line 1 of array 1 should appear next to line 1 of array 2, etc):
Can anybody point me to a way to make nested fields out of the arrays, or to flatten the arrays? I tried unnesting and joining, but that would give me too many possible cross-combinations in the result.
Thanks for your help!
You can zip the two arrays like this. It unnests the array with operation IDs and gets the index of each element, then selects the corresponding element of the array with scores. Note that this assumes that the arrays have the same number of elements. If they don't, you could use SAFE_OFFSET instead of OFFSET in order to get NULL if there are more IDs than scores, for instance.
SELECT
member_id_hashed, member_age,
ARRAY(
SELECT AS STRUCT id, split(scores,',')[OFFSET(off)] AS score
FROM UNNEST(split(operations,',')) AS id WITH OFFSET off
ORDER BY off
) AS operations
FROM (
SELECT
REGEXP_EXTRACT(textPayload, r'member_id=([0-9]+)') as member_id,
REGEXP_EXTRACT(textPayload, r'member_age -> Float\(([0-9]+)') as member_age,
REGEXP_EXTRACT(textPayload, r'operations=\[(.+)') as operations,
REGEXP_EXTRACT(textPayload, r'scores=\[(.+)') as scores
from `myproject.mydataset.mytable`
)

Transposing JSON array values to rows in Stream Analytics yields no output

I'm streaming JSON input from blob storage. Most data in the JSON is stored as name/value pairs in an array. I need to send each input as a single output where each name/value pair is transposed to a column in the output. I have code that works when using the "Test" feature while editing the query. However when testing live, only the debugblob1 output receives data.
Why would the the live test work different from the query test? Is there a better way to transpose array data to columns?
Note: The array's name/value pairs are always the same, though I don't want a solution that depends on their order always being the same, since that is out of my control.
QUERY
-- Get one row per input and array value
WITH OneRowPerArrayValue AS
(SELECT
INPUT.id AS id,
ARRAYVALUE.ArrayValue.value1 AS value1,
ARRAYVALUE.ArrayValue.value2 AS value2
FROM
[inputblob] INPUT
CROSS APPLY GetElements(INPUT.arrayValues) as ARRAYVALUE),
-- Get one row per input, transposing the array values to columns.
OneRowPerInput AS
(SELECT
INPUT.id as id,
ORPAV_value1.value1 as value1,
ORPAV_value2.value2 as value2
FROM
[inputblob] INPUT
left join OneRowPerArrayValue ORPAV_value1 ON ORPAV_value1.id = INPUT.id AND ORPAV_value1.value1 IS NOT NULL AND DATEDIFF(microsecond, INPUT, ORPAV_value1) = 0
left join OneRowPerArrayValue ORPAV_value2 ON ORPAV_value2.id = INPUT.id AND ORPAV_value2.value2 IS NOT NULL AND DATEDIFF(microsecond, INPUT, ORPAV_value2) = 0
WHERE
-- This is so that we only get one row per input, instead of one row per input multiplied by number of array values
ORPAV_value1.value1 is not null)
SELECT * INTO debugblob1 FROM OneRowPerArrayValue
SELECT * INTO debugblob2 FROM OneRowPerInput
DATA
{"id":"1","arrayValues":[{"value1":"1"},{"value2":"2"}]}
{"id":"2","arrayValues":[{"value1":"3"},{"value2":"4"}]}
See my generic example below. I believe this is what your asking; where you have a JSON object that contains an Array of json objects.
WITH MyValues AS
(
SELECT
arrayElement.ArrayIndex,
arrayElement.ArrayValue
FROM Input as event
CROSS APPLY GetArrayElements(event.<JSON Array Name>) AS arrayElement
)
SELECT ArrayValue.Value1, CAST(ArrayValue.Value2 AS FLOAT) AS Value
INTO Output
FROM MyValues