Find difference between elements of arrays with hardcoded array in Bigquery - google-bigquery

One of my tables has a column for json, which is an array of key-value objects (about 40). The order of the keys in the array is the same for the records in the table. That is, the i-th elements of different records will have the same key. The value of such objects can be of different types: number, string, null, array of strings, etc.
I have a hardcoded json with the same structure as this array, like [{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]
I want to compare this array with array-column of all records in the table element by element. And display only the elements of the array that have no matching values.
That is, if some record has an array equal to [{"key": "key 1", "value": "not equal value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}], then for this case the difference will be [{"key": "key 1", "value": "not equal value 1"}], key2 and key3 are skipped, because their value is equal.
So for such a data sample
id | json | ...
----------
1 |[{"key": "key 1", "value": "not equal value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]
----------
2 |[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]
----------
3 |[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]
----------
4 |[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]
----------
5 |[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]
I expect the result
id | json
----------
1 |[{"key": "key 1", "value": "not equal value 1"}]
----------
2 |[{"key": "key 2", "value" : "not equal value 2"}]
----------
3 |[{"key": "key 2", "value" : "not equal value 2"}]
----------
4 |[]
----------
5 |[]
I also want to make a query that will group the result above by key and count their number. That is, it will make it clear which key values ​​differ the most and least of all from a hardcoded array.
key | count
--------------
key 2 | 2
key 1 | 1

It is not clear what you actually have array col or json/string col - so I am using whatever data sample you provided - which is a json.
... display only the elements of the array that have no matching values.
with your_table as (
select 1 id, '[{"key": "key 1", "value": "not equal value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' json union all
select 2, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]' union all
select 3, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]' union all
select 4, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' union all
select 5, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]'
), search as (
select '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' json
)
select id,
array(
select t_element
from unnest(json_extract_array(t.json)) t_element, search s
left join unnest(json_extract_array(s.json)) s_element
on t_element = s_element
where s_element is null
) arr
from your_table t
with output
... I also want to make a query that will group the result above by key and count their number.
with your_table as (
select 1 id, '[{"key": "key 1", "value": "not equal value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' json union all
select 2, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]' union all
select 3, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]' union all
select 4, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' union all
select 5, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]'
), search as (
select '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' json
)
select key, count(*) counts
from (
select id,
array(
select json_extract_scalar(t_element, '$.key')
from unnest(json_extract_array(t.json)) t_element, search s
left join unnest(json_extract_array(s.json)) s_element
on t_element = s_element
where s_element is null
) keys
from your_table t
), unnest(keys) key
group by key
with output

I created the table using array of structures in schema as follows:
I ran the following query:
SELECT m.key, m.value FROM `Project.Dataset.Table`, unnest(jsoncolumn) m group by m.key, m.value having count(*)=1)
Output :
You can do a group by over key to calculate count over the outcome of previous query as follows:
select key as key, count(*) as count from (SELECT m.key, m.value FROM `Project.Dataset.Table`, unnest(jsoncolumn) m group by m.key, m.value having count(*)=1) group by key
Output:

Related

Query json_array in where clauses by key and value

I have a select query witch returns me an array_to_json object. I want to filters the results of select based on specifics keys and values.
Here is my actual query:
select jsonarray
from (
SELECT body.id_user,
array_to_json(array_agg(row_to_json(body))) as jsonarray
FROM (
SELECT id_user, name, value
FROM table_1
group by id_user, name, value
) body
group by body.id_user
) as test;
It returns a lot of rows like this:
[{"id_user": 1489, "name": "name 1", "value": "value aaaaaa"}, {"id_user": 1489, "name": "name 2", "value": "value babababab"}]
[{ "id_user": 1490, "name": "name 12", "value": "value aaaaaa" }, { "id_user": 1490, "name": "name 2", "value": "value babababab" }]
[ { "id_user": 1491, "name": "name 13", "value": "value aaaaaa" }, { "id_user": 1491, "name": "name 23", "value": "value uouououo" }]
Well, I want only the rows that have the fields "name": "name 2", "value": "value babababab" into the json... I've tried
select jsonarray->'name'
from (
....
) as test
where jsonarray->>'name'::text = 'name 2';
but it returns nothing. There's another way to query it?
You can check if name 2 is present during the aggregation:
SELECT jsonb_agg(to_jsonb(body)) as jsonarray
FROM (
SELECT DISTINCT id_user, name, value
FROM table_1
) body
group by body.id_user
having bool_or(name = 'name 2') -- those with at least one `name = 'name 2'`
Online example

Map nested json array into CSV

I am trying to map an array payload into CSV in Dataweave and not able to achieve the result.
The csv will not need header, the content in the array will print in column by column. I'm facing problem to make the mapping through the nested array.
Input payload
[
{
"Invoice": {
"Invoice Number*": "Test",
"Supplier Number": "1201",
"Submit For Approval?": "Yes",
"Invoice Date*": "20190828",
"Line Level Taxation*": "Yes",
"Payment Date": "00/00/0000",
"Original invoice number": "",
"Original invoice date": ""
},
"Invoice Line": [
{
"Invoice Number*": "Test1",
"Line Number": "1",
"Description*": "Test1",
"Price*": "500",
"Quantity": null,
"Unit of Measure*": null,
"PO Number": "001",
"PO Line Number": "1"
},
{
"Invoice Number*": "Test2",
"Line Number": "2",
"Description*": "Test2",
"Price*": "500",
"Quantity": null,
"Unit of Measure*": null,
"PO Number": "001",
"PO Line Number": "2"
}
],
"Invoice Tax Line": [
{
"Tax Amount": "500",
"Invoice Line Number": "1",
"Line Number": "1"
},
{
"Tax Amount": "50",
"Invoice Line Number": "2",
"Line Number": "2"
}
]
}
]
Expected Output
column_0, column_1, column_2 ... //no header
"Invoice Number*","Supplier Number","Submit For Approval?"... //Invoice
"Invoice Number*","Line Number*"... //InvoiceLine[0]
"Tax Amount","Invoice Line Number","Line Number"... //Tax Line[0]
"Invoice Number*","Line Number*"... //InvoiceLine[1]
"Tax Amount","Invoice Line Number","Line Number"... //Tax Line[1]
How can i write the dataweave mapping to archive the result like above?
This is the solution I found for your use case. Basically there are two functions to that dispatches on the right method according to the type. And then you also want to use zip function to mix one "Invoice Line" with one "Invoice Tax Line" so they are mixed correctly.
%dw 2.0
output application/csv headers=false
import * from dw::core::Objects
fun collectKeyNames(obj: {}): Array<{}> =
[
obj
]
fun collectKeyNames(arr: Array): Array<{}> =
arr flatMap ((obj, index) -> collectKeyNames(obj))
---
payload flatMap ((item, index) ->
collectKeyNames(item.Invoice) ++
(collectKeyNames(item."Invoice Line" zip item."Invoice Tax Line"))
)

array of json object

database screenshot [
{
"id": "901651",
"supplier_id": "180",
"price": "18.99",
"product_id": "books",
"name": "bookmate",
"quantity": "1"
},
{
"id": "1423326",
"supplier_id": "180",
"price": "53.99",
"product_id": "books",
"name": "classmate",
"quantity": "5"
}
]
"
[{"id":"3811088","supplier_id":"2609","price":"22.99","product_id":"book","name":"classmate","quantity":"10"}]"
I have my purchased books details stored in an array of json object in a field named items in table purchase_list. This corresponds to only one order.Field may contain single or multiple orders. There are multiple orders like this. how can i get the total number of each type of book purchased and the type of books only using pgsql query to generate jasper report. for eg: classmate:15, bookmate:1
you can unnest array and aggregate it:
t=# with c(j) as (values('[
{
"id": "901651",
"supplier_id": "180",
"price": "18.99",
"product_id": "books",
"name": "bookmate",
"quantity": "1"
},
{
"id": "1423326",
"supplier_id": "180",
"price": "53.99",
"product_id": "books",
"name": "classmate",
"quantity": "5"
}
,{"id":"3811088","supplier_id":"2609","price":"22.99","product_id":"book","name":"classmate","quantity":"10"}]'::jsonb))
, agg as (select jsonb_array_elements(j) jb from c)
, mid as (select format('"%s":"%s"',jb->>'name',sum((jb->>'quantity')::int)) from agg group by jb->>'name')
select format('{%s}',string_agg(format,','))::jsonb from mid;
format
--------------------------------------
{"bookmate": "1", "classmate": "15"}
(1 row)
looks ugly, but gives the idea

Update all element of an array in json postgresql

In my table I have a jsonb type column.
Example jsonb data:
{
"id": "1",
"customer":[{"id": "1", "isPosted": "false"},{"id": "2","isPosted": "false"}]
}
Is it possible to update all element named isPosted to 'true'?
you can use a monkey hack here - replace the value and use result as jsonb:
t=# with c(j) as (values('{"id": "1", "customer":[{"id": "1", "isPosted": "false"},{"id": "2","isPosted": "false"}]} '::jsonb))
select *,replace(j::text,'"isPosted": "false"','"isPosted": "true"')::jsonb from c;
-[ RECORD 1 ]------------------------------------------------------------------------------------------
j | {"id": "1", "customer": [{"id": "1", "isPosted": "false"}, {"id": "2", "isPosted": "false"}]}
replace | {"id": "1", "customer": [{"id": "1", "isPosted": "true"}, {"id": "2", "isPosted": "true"}]}
finally you can do it the right way:
t=# with c(j) as (values('{"id": "1", "customer":[{"id": "1", "isPosted": "false"},{"id": "2","isPosted": "false"}]} '::jsonb))
, n as (select jsonb_set(e,'{isPosted}'::text[],'true'),j from c, jsonb_array_elements(j->'customer') with ordinality a (e,o))
select jsonb_set(j,'{customer}'::text[],jsonb_agg(jsonb_set)) from n group by j;
jsonb_set
-----------------------------------------------------------------------------------------
{"id": "1", "customer": [{"id": "1", "isPosted": true}, {"id": "2", "isPosted": true}]}
(1 row)

Split multi valued cells in more than one column into rows (Open Refine)

I have been cleaning a table on Open Refine. I now have it like this:
REF Handle Size Price
2002, 2003 t-shirt1 M, L 23
3001, 3002, 3003 t-shirt2 S, M, L 24
I need to split those multivalued cells in REF and Size so that I get:
REF Handle Size Price
2002 t-shirt1 M 23
2003 t-shirt1 L 23
3001 t-shirt2 S 24
3002 t-shirt2 M 24
3003 t-shirt2 L 24
Is it possible to do this in Open Refine? The "Split multi-valued cells..." command only takes care of one column.
Thank you,
Ana Rita
Yes, it's possible :
Split the 1st column using ", " as separator.
Move column 2 at position one
display your project as record (not row)
Split column 3 using ", " as separator
Fill down columns 4 and 2
reorder columns
Here's my recipe in GREL :
[
{
"op": "core/row-removal",
"description": "Remove rows",
"engineConfig": {
"facets": [
{
"invert": false,
"expression": "row.starred",
"selectError": false,
"omitError": false,
"selectBlank": false,
"name": "Starred Rows",
"omitBlank": false,
"columnName": "",
"type": "list",
"selection": [
{
"v": {
"v": true,
"l": "true"
}
}
]
}
],
"mode": "row-based"
}
},
{
"op": "core/multivalued-cell-split",
"description": "Split multi-valued cells in column Column 1",
"columnName": "Column 1",
"keyColumnName": "Column 1",
"separator": ", ",
"mode": "plain"
},
{
"op": "core/column-move",
"description": "Move column Column 2 to position 0",
"columnName": "Column 2",
"index": 0
},
{
"op": "core/multivalued-cell-split",
"description": "Split multi-valued cells in column Column 3",
"columnName": "Column 3",
"keyColumnName": "Column 2",
"separator": ", ",
"mode": "plain"
},
{
"op": "core/fill-down",
"description": "Fill down cells in column Column 4",
"engineConfig": {
"facets": [],
"mode": "record-based"
},
"columnName": "Column 4"
},
{
"op": "core/fill-down",
"description": "Fill down cells in column Column 2",
"engineConfig": {
"facets": [],
"mode": "record-based"
},
"columnName": "Column 2"
},
{
"op": "core/column-reorder",
"description": "Reorder columns",
"columnNames": [
"Column 1",
"Column 2",
"Column 3",
"Column 4"
]
}
]
Hervé
Just found a nice, free OpenRefine plugin that offers "Unpaired pivot":
VIB-Bits plugin
From their documentation:
3.2.1 Unpaired pivot...
Unpaired pivot is the transformation of data that is organized in rows to a representation of that
data in separate columns. A simple example would be transforming
Category
Value
a
1
a
2
b
3
c
2
into
Value a
Value b
Value c
1
3
2
2