How To Query A Repeated String Field With Empty Value [] in BigQuery? - google-bigquery

New to the BigQuery, I have a repeated field in BigQuery, like this
myTable
{
"id": 12345
"myNestedStringArrayField": []
}
How can I query all rows with the myNestedStringArrayField value is empty?
I tried using myNestedStringArrayField is null, but return no results, I know I have rows that have [] as the value. I also tried using the = '[]' , but the query edit throws an error.
Thank you in advance.

You can try using ARRAY_LENGTH, all rows you are seeking have a myNestedStringArrayField with a length of zero:
WITH sample AS(
SELECT STRUCT("12345" AS id, [] AS myNestedStringArrayField) AS myTable
)
SELECT *
FROM sample
WHERE ARRAY_LENGTH(myTable.myNestedStringArrayField) = 0
This returns:

Related

Extract complex json with random key field

I am trying to extract the following JSON into its own rows like the table below in Presto query. The issue here is the name of the key/av engine name is different for each row, and I am stuck on how I can extract and iterate on the keys without knowing the value of the key.
The json is a value of a table row
{
"Bkav":
{
"detected": false,
"result": null,
},
"Lionic":
{
"detected": true,
"result": Trojan.Generic.3611249',
},
...
AV Engine Name
Detected Virus
Result
Bkav
false
null
Lionic
true
Trojan.Generic.3611249
I have tried to use json_extract following the documentation here https://teradata.github.io/presto/docs/141t/functions/json.html but there is no mention of extraction if we don't know the key :( I am trying to find a solution that works in both presto & hive query, is there a common query that is applicable to both?
You can cast your json to map(varchar, json) and process it with unnest to flatten:
-- sample data
WITH dataset (json_str) AS (
VALUES (
'{"Bkav":{"detected": false,"result": null},"Lionic":{"detected": true,"result": "Trojan.Generic.3611249"}}'
)
)
--query
select k "AV Engine Name", json_extract_scalar(v, '$.detected') "Detected Virus", json_extract_scalar(v, '$.result') "Result"
from (
select cast(json_parse(json_str) as map(varchar, json)) as m
from dataset
)
cross join unnest (map_keys(m), map_values(m)) t(k, v)
Output:
AV Engine Name
Detected Virus
Result
Bkav
false
Lionic
true
Trojan.Generic.3611249
The presto query suggested by #Guru works, but for hive, there is no easy way.
I had to extract the json
Parse it with replace to remove some character and bracket
Then convert it back to a map, and repeat for one more time to get the nested value out
SELECT
av_engine,
str_to_map(regexp_replace(engine_result, '\\}', ''),',', ':') AS output_map
FROM (
SELECT
str_to_map(regexp_replace(regexp_replace(get_json_object(raw_response, '$.scans'), '\"', ''), '\\{',''),'\\},', ':') AS key_val_map
FROM restricted_antispam.abuse_malware_scanning
) AS S
LATERAL VIEW EXPLODE(key_val_map) temp AS av_engine, engine_result

Query for retrieve matching json Objects as a list

Assume i have a table called MyTable and this table have a JSON type column called myjson and this column have next value as a json array hold multiple objects, for example like next:
[
{
"budgetType": "CF",
"financeNumber": 1236547,
"budget": 1000000
},
{
"budgetType": "ENVELOPE",
"financeNumber": 1236888,
"budget": 2000000
}
]
So how i can search if the record has any JSON objects inside its JSON array with financeNumber=1236547
Something like this:
SELECT
t.*
FROM
"MyTable",
LATERAL json_to_recordset(myjson) AS t ("budgetType" varchar,
"financeNumber" int,
budget varchar)
WHERE
"financeNumber" = 1236547;
Obviously not tested on your data, but it should provide a starting point.
with a as(
SELECT json_array_elements(myjson)->'financeNumber' as col FROM mytable)
select exists(select from a where col::text = '1236547'::text );
https://www.postgresql.org/docs/current/functions-json.html
json_array_elements return setof json, so you need cast.
Check if a row exists: Fastest check if row exists in PostgreSQL

How To Query an array of JSONB

I have table (orders) with jsonb[] column named steps in Postgres db.
I need create SQL query to select records where Step1 and Step2 and Step3 has success status
[
{
"step_name"=>"Step1",
"status"=>"success",
"timestamp"=>1636120240
},
{
"step_name"=>"Step2",
"status"=>"success",
"timestamp"=>1636120275
},
{
"step_name"=>"Step3",
"status"=>"success",
"timestamp"=>1636120279
},
{
"step_name"=>"Step4",
"timestamp"=>1636120236
"status"=>"success"
}
]
table structure
id | name | steps (jsonb)
'Normalize' steps into a list of JSON items and check whether every one of them has "status":"success". BTW your example is not valid JSON. All => need to be replaced with : and a comma is missing.
select id, name from orders
where
(
select bool_and(j->>'status' = 'success')
from jsonb_array_elements(steps) j
where j->>'step_name' in ('Step1','Step2','Step3') -- if not all steps but only these are needed
);
You can use JSON value contain operation for check condition exist or not
Demo
select
*
from
test
where
steps #> '[{"step_name":"Step1","status":"success"},{"step_name":"Step2","status":"success"},{"step_name":"Step3","status":"success"}]'

Search for exact string value in JSON

I have a column stored in JSON that looks like
column name: s2s_payload
Values:
{
"checkoutdate":"2019-10-31",
"checkindate":"2019-10-30",
"numtravelers":"2",
"domain":"www.travel.com.mx",
"destination": {
"country":"MX",
"city":"Manzanillo"
},
"eventtype":"search",
"vertical":"hotels"
}
I want to query exact values in the array rather than returning all values for a certain data type. I was using JSON_EXTRACT to get distinct counts.
SELECT
COUNT(JSON_EXTRACT(s2s_payload, '$.destination.code')) AS total,
JSON_EXTRACT(s2s_payload, '$.destination.code') AS destination
FROM
"db"."events_data_json5_temp"
WHERE
id = '111000'
AND s2s_payload IS NOT NULL
AND yr = '2019'
AND mon = '10'
AND dt >= '26'
AND JSON_EXTRACT(s2s_payload, '$.destination.code')
GROUP BY
JSON_EXTRACT(s2s_payload, '$.destination.code')
If I want to filter where ""eventtype"":""search"" how can I do this?
I tried using CAST(s2s_payload AS CHAR) = '{"eventtype"":""search"}' but that didn't work.
You need to use json_extract + a CAST to get actual value to compare against:
CAST(json_extract(s2s_payload, '$.eventtype') AS varchar) = 'search'
or, same with json_extract_scalar (and thus with no need for a CAST):
json_extract_scalar(s2s_payload, '$.eventtype')

Select rows from table with jsonb column based on arbitrary jsonb filter expression

Test data
DROP TABLE t;
CREATE TABLE t(_id serial PRIMARY KEY, data jsonb);
INSERT INTO t(data) VALUES
('{"a":1,"b":2, "c":3}')
, ('{"a":11,"b":12, "c":13}')
, ('{"a":21,"b":22, "c":23}')
Problem statement: I want to receive an arbitrary JSONB parameter which acts as a filter on column t.data, such as
{ "b":{ "from":0, "to":20 }, "c":13 }
and use this to select matching rows from my test table t.
In this example, I want rows where b is between 0 and 20 and c = 13.
No error is required if the filter specifies a "column" (or "tag") which does not exist in t.data - it just fails to find a match.
I've used numeric values for simplicity but would like an approach which generalises to text as well.
What I have tried so far. I looked at the containment approach, which works for equality conditions, but am stumped on a generic way of handling range conditions:
select * from t
where t.data#> '{"c":13}'::jsonb;
Background: This problem arose when building a generic table-preview page on a website (for Admin users).
The page displays a filter based on various columns in whichever table is selected for preview.
The filter is then passed to a function in Postgres DB which applies this dynamic filter condition to the table.
It returns a jsonb array of the rows matching the filter specified by the user.
This jsonb array is then used to populate the Preview resultset.
The columns which make up the filter may change.
My Postgres version is 9.6 - thanks.
if you want to parse { "b":{ "from":0, "to":20 }, "c":13 } you need a parser. It is out of scope of json functions, but you can write "generic" query using AND and OR to filter by such json, eg:
https://www.db-fiddle.com/f/jAPBQggG3p7CxqbKLMbPKw/0
with filt(f) as (values('{ "b":{ "from":0, "to":20 }, "c":13 }'::json))
select *
from t
join filt on
(f->'b'->>'from')::int < (data->>'b')::int
and
(f->'b'->>'to')::int > (data->>'b')::int
and
(data->>'c')::int = (f->>'c')::int
;
Thanks for the comments/suggestions.
I will definitely look at GraphQL when I have more time - I'm working under a tight deadline at the moment.
It seems the consensus is that a fully generic solution is not achievable without a parser.
However, I got a workable first draft - it's far from ideal but we can work with it. Any comments/improvements are welcome ...
Test data (expanded to include dates & text fields)
DROP TABLE t;
CREATE TABLE t(_id serial PRIMARY KEY, data jsonb);
INSERT INTO t(data) VALUES
('{"a":1,"b":2, "c":3, "d":"2018-03-10", "e":"2018-03-10", "f":"Blah blah" }')
, ('{"a":11,"b":12, "c":13, "d":"2018-03-14", "e":"2018-03-14", "f":"Howzat!"}')
, ('{"a":21,"b":22, "c":23, "d":"2018-03-14", "e":"2018-03-14", "f":"Blah blah"}')
First draft of code to apply a jsonb filter dynamically, but with restrictions on what syntax is supported.
Also, it just fails silently if the syntax supplied does not match what it expects.
Timestamp handling a bit kludgey, too.
-- Handle timestamp & text types as well as int
-- See is_timestamp(text) function at bottom
with cte as (
select t.data, f.filt, fk.key
from t
, ( values ('{ "a":11, "b":{ "from":0, "to":20 }, "c":13, "d":"2018-03-14", "e":{ "from":"2018-03-11", "to": "2018-03-14" }, "f":"Howzat!" }'::jsonb ) ) as f(filt) -- equiv to cross join
, lateral (select * from jsonb_each(f.filt)) as fk
)
select data, filt --, key, jsonb_typeof(filt->key), jsonb_typeof(filt->key->'from'), is_timestamp((filt->key)::text), is_timestamp((filt->key->'from')::text)
from cte
where
case when (filt->key->>'from') is null then
case jsonb_typeof(filt->key)
when 'number' then (data->>key)::numeric = (filt->>key)::numeric
when 'string' then
case is_timestamp( (filt->key)::text )
when true then (data->>key)::timestamp = (filt->>key)::timestamp
else (data->>key)::text = (filt->>key)::text
end
when 'boolean' then (data->>key)::boolean = (filt->>key)::boolean
else false
end
else
case jsonb_typeof(filt->key->'from')
when 'number' then (data->>key)::numeric between (filt->key->>'from')::numeric and (filt->key->>'to')::numeric
when 'string' then
case is_timestamp( (filt->key->'from')::text )
when true then (data->>key)::timestamp between (filt->key->>'from')::timestamp and (filt->key->>'to')::timestamp
else (data->>key)::text between (filt->key->>'from')::text and (filt->key->>'to')::text
end
when 'boolean' then false
else false
end
end
group by data, filt
having count(*) = ( select count(distinct key) from cte ) -- must match on all filter elements
;
create or replace function is_timestamp(s text) returns boolean as $$
begin
perform s::timestamp;
return true;
exception when others then
return false;
end;
$$ strict language plpgsql immutable;