How to count setof / number of keys of JSON in postgresql? - sql

I have a column in jsonb storing a map, like {'a':1,'b':2,'c':3} where the number of keys is different in each row.
I want to count it -- jsonb_object_keys can retrieve the keys but it is in setof
Are there something like this?
(select count(jsonb_object_keys(obj) from XXX )
(this won't work as ERROR: set-valued function called in context that cannot accept a set)
Postgres JSON Functions and Operators Document
json_object_keys(json)
jsonb_object_keys(jsonb)
setof text Returns set of keys in the outermost JSON object.
json_object_keys('{"f1":"abc","f2":{"f3":"a", "f4":"b"}}')
json_object_keys
------------------
f1
f2
Crosstab isn't feasible as the number of key could be large.

Shortest:
SELECT count(*) FROM jsonb_object_keys('{"a": 1, "b": 2, "c": 3}'::jsonb);
Returns 3
If you want all json number of keys from a table, it gives:
SELECT (SELECT COUNT(*) FROM jsonb_object_keys(myJsonField)) nbr_keys FROM myTable;
Edit: there was a typo in the second example.

You could convert keys to array and use array_length to get this:
select array_length(array_agg(A.key), 1) from (
select json_object_keys('{"f1":"abc","f2":{"f3":"a", "f4":"b"}}') as key
) A;
If you need to get this for the whole table, you can just group by primary key.

While a sub select must be used to convert the JSON keys set to rows, the following tweaked query might run faster by skipping building the temporary array:
SELECT count(*) FROM
(SELECT jsonb_object_keys('{"a": 1, "b": 2, "c": 3}'::jsonb)) v;
and it's a bit shorter ;)
To make it a function:
CREATE OR REPLACE FUNCTION public.count_jsonb_keys(j jsonb)
RETURNS bigint
LANGUAGE sql
AS $function$
SELECT count(*) from (SELECT jsonb_object_keys(j)) v;
$function$

Alternately, you could simply return the upper bounds of the keys when listed as an array:
SELECT
ARRAY_UPPER( -- Grab the upper bounds of the array
ARRAY( -- Convert rows into an array.
SELECT JSONB_OBJECT_KEYS(obj)
),
1 -- The array's dimension we're interested in retrieving the count for
) AS count
FROM
xxx
Using '{"a": 1, "b": 2, "c": 3}'::jsonb as obj, count would result in a value of three (3).
Pasteable example:
SELECT
ARRAY_UPPER( -- Grab the upper bounds of the array
ARRAY( -- Convert rows into an array.
SELECT JSONB_OBJECT_KEYS('{"a": 1, "b": 2, "c": 3}'::jsonb)
),
1 -- The array's dimension we're interested in retrieving the count for
) AS count

Related

Is there a way of replace null values in a JSON bigquery field?

I have a JSON value like the one below in a certain column of my table:
{"values":[1, 2, null, 4, null]}
What I want is to convert the value in a bigquery ARRAY: ARRAY<INT64>
I tried JSON_VALUE_ARRAY but it throws an error because the final output cannot be anarray with NULLs.
Said that, what should be the correct approach for that?
You can unnest an array with null elements. For building a new array you can provided the flag ignore nulls to remove null values.
with tbl as (select JSON '{"values":[1, 2, null, 4, null]}' as data union all select JSON ' {"values":[ ] }')
select *,
((Select array_agg(x ignore nulls) from unnest(JSON_VALUE_ARRAY (data.values ) ) x))
from tbl

SQL Array with Null

I'm trying to group BigQuery columns using an array like so:
with test as (
select 1 as A, 2 as B
union all
select 3, null
)
select *,
[A,B] as grouped_columns
from test
However, this won't work, since there is a null value in column B row 2.
In fact this won't work either:
select [1, null] as test_array
When reading the documentation on BigQuery though, it says Nulls should be allowed.
In BigQuery, an array is an ordered list consisting of zero or more
values of the same data type. You can construct arrays of simple data
types, such as INT64, and complex data types, such as STRUCTs. The
current exception to this is the ARRAY data type: arrays of arrays are
not supported. Arrays can include NULL values.
There doesn't seem to be any attributes or safe prefix to be used with ARRAY() to handle nulls.
So what is the best approach for this?
Per documentation - for Array type
Currently, BigQuery has two following limitations with respect to NULLs and ARRAYs:
BigQuery raises an error if query result has ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.
BigQuery translates NULL ARRAY into empty ARRAY in the query result, although inside the query NULL and empty ARRAYs are two distinct values.
So, as of your example - you can use below "trick"
with test as (
select 1 as A, 2 as B union all
select 3, null
)
select *,
array(select cast(el as int64) el
from unnest(split(translate(format('%t', t), '()', ''), ', ')) el
where el != 'NULL'
) as grouped_columns
from test t
above gives below output
Note: above approach does not require explicit referencing to all involved columns!
My current solution---and I'm not a fan of it---is to use a combo of IFNULL(), UNNEST() and ARRAY() like so:
select
*,
array(
select *
from unnest(
[
ifnull(A, ''),
ifnull(B, '')
]
) as grouping
where grouping <> ''
) as grouped_columns
from test
An alternative way, you can replace NULL value to some NON-NULL figures using function IFNULL(null, 0) as given below:-
with test as (
select 1 as A, 2 as B
union all
select 3, IFNULL(null, 0)
)
select *,
[A,B] as grouped_columns
from test

How to select only specific keys from postgres json

I need to extract only specific keys from postgres json, Let us consider the following json
{"aaa":1,"bbb":2,"ccc":3,"ddd":7}
From the above json i need to select keys 'bbb' and 'ccc', that is
{"bbb":2,"ccc":3}
I used the following query , but it's deleting the keys
SELECT jsonb '{"aaa":1,"bbb":2,"ccc":3,"ddd":7}' - 'ddd}'
How can I select only specified keys?
you can explicitely specify keys, like here:
t=# with c(j) as (SELECT jsonb '{"aaa":1,"bbb":2,"ccc":3,"ddd":7}' - 'ddd}')
select j,jsonb_build_object('aaa',j->'aaa','bbb',j->'bbb') from c;
j | jsonb_build_object
------------------------------------------+----------------------
{"aaa": 1, "bbb": 2, "ccc": 3, "ddd": 7} | {"aaa": 1, "bbb": 2}
(1 row)
WITH data AS (
SELECT jsonb '{"aaa":1,"bbb":2,"ccc":3,"ddd":7}' col
)
SELECT kv.*
FROM data,
LATERAL (
SELECT jsonb_object(ARRAY_AGG(keyval.key::TEXT), ARRAY_AGG(keyval.value::TEXT))
FROM jsonb_each(col) keyval
WHERE keyval.key IN ('aaa', 'bbb', 'ccc')) kv
The solution works by expanding a JSONB (or JSON) object, filtering the keys, aggregating the filtered keys & values to create the final JSONB (or JSON) object.
However, this solution does not preserve nulls, i.e. if data had a row where col had value jsonb '{"aaa":1,"bbb":2, "ddd":7}', then the above solution would return jsonb '{"aaa":1,"bbb":2}'
To preserve nulls, the following form could be used.
WITH data AS (
SELECT jsonb '{"aaa":1,"bbb":2,"ccc":3,"ddd":7}' col
), keys(k) AS (
VALUES ('aaa'), ('bbb'), ('ccc')
)
SELECT col, jsonb_object(ARRAY_AGG(k), ARRAY_AGG(col->>k))
FROM data, keys
GROUP BY 1

Vector (array) addition in Postgres

I have a column with numeric[] values which all have the same size. I'd like to take their element-wise average. By this I mean that the average of
{1, 2, 3}, {-1, -2, -3}, and {3, 3, 3}
should be {1, 1, 1}. Also of interest is how to sum these element-wise, although I expect that any solution for one will be a solution for the other.
(NB: The length of the arrays is fixed within a single table, but may vary between tables. So I need a solution which doesn't assume a certain length.)
My initial guess is that I should be using unnest somehow, since unnest applied to a numeric[] column flattens out all the arrays. So I'd like to think that there's a nice way to use this with some sort of windowing function + group by to pick out the individual components of each array and sum them.
-- EXAMPLE DATA
CREATE TABLE A
(vector numeric[])
;
INSERT INTO A
VALUES
('{1, 2, 3}'::numeric[])
,('{-1, -2, -3}'::numeric[])
,('{3, 3, 3}'::numeric[])
;
I've written an extension to do vector addition (and subtraction, multiplication, division, and powers) with fast C functions. You can find it on Github or PGXN.
Given two arrays a and b you can say vec_add(a, b). You can also add either side to a scalar, e.g. vec_add(a, 5).
If you want a SUM aggregate function instead you can find that in aggs_for_vecs, also on PGXN.
Finally if you want to sum up all the elements of a single array, you can use aggs_for_arrays (PGXN).
I discovered a solution on my own which is probably the one I will use.
First, we can define a function for adding two vectors:
CREATE OR REPLACE FUNCTION vec_add(arr1 numeric[], arr2 numeric[])
RETURNS numeric[] AS
$$
SELECT array_agg(result)
FROM (SELECT tuple.val1 + tuple.val2 AS result
FROM (SELECT UNNEST($1) AS val1
,UNNEST($2) AS val2
,generate_subscripts($1, 1) AS ix) tuple
ORDER BY ix) inn;
$$ LANGUAGE SQL IMMUTABLE STRICT;
and a function for multiplying by a constant:
CREATE OR REPLACE FUNCTION vec_mult(arr numeric[], mul numeric)
RETURNS numeric[] AS
$$
SELECT array_agg(result)
FROM (SELECT val * $2 AS result
FROM (SELECT UNNEST($1) AS val
,generate_subscripts($1, 1) as ix) t
ORDER BY ix) inn;
$$ LANGUAGE SQL IMMUTABLE STRICT;
Then we can use the PostgreSQL statement CREATE AGGREGATE to create the vec_sum function directly:
CREATE AGGREGATE vec_sum(numeric[]) (
SFUNC = vec_add
,STYPE = numeric[]
);
And finally, we can find the average as:
SELECT vec_mult(vec_sum(vector), 1 / count(vector)) FROM A;
from http://www.postgresql.org/message-id/4C2504A3.4090502#wp.pl
select avg(unnested) from (select unnest(vector) as unnested from A) temp;
Edit: I think I now understand the question better.
Here is a possible solution drawing heavily upon: https://stackoverflow.com/a/8767450/3430807 I don't consider it elegant nor am I sure it will perform well:
Test data:
CREATE TABLE A
(vector numeric[], id serial)
;
INSERT INTO A
VALUES
('{1, 2, 3}'::numeric[])
,('{4, 5, 6}'::numeric[])
,('{7, 8, 9}'::numeric[])
;
Query:
select avg(vector[temp.index])
from A as a
join
(select generate_subscripts(vector, 1) as index
, id
from A) as temp on temp.id = a.id
group by temp.index

Selecting data into a Postgres array

I have the following data:
name id url
John 1 someurl.com
Matt 2 cool.com
Sam 3 stackoverflow.com
How can I write an SQL statement in Postgres to select this data into a multi-dimensional array, i.e.:
{{John, 1, someurl.com}, {Matt, 2, cool.com}, {Sam, 3, stackoverflow.com}}
I've seen this kind of array usage before in Postgres but have no idea how to select data from a table into this array format.
Assuming here that all the columns are of type text.
You cannot use array_agg() to produce multi-dimensional arrays, at least not up to PostgreSQL 9.4.
(But the upcoming Postgres 9.5 ships a new variant of array_agg() that can!)
What you get out of #Matt Ball's query is an array of records (the_table[]).
An array can only hold elements of the same base type. You obviously have number and string types. Convert all columns (that aren't already) to text to make it work.
You can create an aggregate function for this like I demonstrated to you here before.
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
,STYPE = anyarray
,INITCOND = '{}'
);
Call:
SELECT array_agg_mult(ARRAY[ARRAY[name, id::text, url]]) AS tbl_mult_arr
FROM tbl;
Note the additional ARRAY[] layer to make it a multidimensional array (2-dimenstional, to be precise).
Instant demo:
WITH tbl(id, txt) AS (
VALUES
(1::int, 'foo'::text)
,(2, 'bar')
,(3, '}b",') -- txt has meta-characters
)
, x AS (
SELECT array_agg_mult(ARRAY[ARRAY[id::text,txt]]) AS t
FROM tbl
)
SELECT *, t[1][3] AS arr_element_1_1, t[3][4] AS arr_element_3_2
FROM x;
You need to use an aggregate function; array_agg should do what you need.
SELECT array_agg(s) FROM (SELECT name, id, url FROM the_table ORDER BY id) AS s;