Selecting only non-NULL keys in a Postgres JSONB field - sql

I have a postgres 9.6 table with a JSONB column
> SELECT id, data FROM my_table ORDER BY id LIMIT 4;
id | data
----+---------------------------------------
1 | {"a": [1, 7], "b": null, "c": [8]}
2 | {"a": [2, 9], "b": [1], "c": null}
3 | {"a": [8, 9], "b": null, "c": [3, 4]}
4 | {}
As you can see, some JSON keys have null values.
I'd like to exclude these - is there an easy way to SELECT only the non-null key-value pairs to produce:
id | data
----+---------------------------------------
1 | {"a": [1, 7], "c": [8]}
2 | {"a": [2, 9], "b": [1]}
3 | {"a": [8, 9], "c": [3, 4]}
4 | {}
Thanks!

You can use jsonb_strip_nulls()
select id, jsonb_strip_nulls(data) as data
from my_table;
Online example: http://rextester.com/GGJRW83576
Note that this function would not remove null values inside the arrays.

Related

Why use json_agg filter clause in postgresql not working?

I write a sql to calculate retention data in **PostgreSQL 11.3 **,
SELECT
part_date,
platid,
json_agg(json_build_array(return_gap, users)) filter (where return_gap is null) as return_retention_values
FROM (
SELECT
part_date,
platid,
json_agg(json_build_array(return_gap, users)) filter (
where
return_gap is null
) as return_retention_values
FROM
(
select
part_date,
platid,
return_gap,
users
from
my_table
) a
GROUP BY
grouping sets (
(part_date, platid, return_gap),
(part_date, platid)
)
) a
group by
part_date,
platid
order by
part_date asc
its return result like:
part_date platid return_retention_values
2022-11-08 23 [[3, 345], [6, 248], [2, 408], [5, 286], [null, 1], [null, 1532], [1, 535], [4, 323], [7, 199], [0, 1531]]
2022-11-08 2 [[7, 902], [null, 5379], [5, 1382], [6, 1258], [3, 1666], [1, 2680], [0, 5379], [4, 1486], [2, 1961]]
2022-11-08 1 [[1, 1042], [7, 388], [4, 600], [3, 1600], [null, 3171], [2, 716], [0, 3171], [5, 1524], [6, 1485]]
2022-11-09 1 [[5, 1597], [0, 3086], [4, 1641], [2, 1767], [3, 578], [null, 1], [null, 3087], [1, 963], [6, 366]]
the (where return_gap is null) seems not working, null values still in query result, how to fix it?

pandas.reorder_levels with only one index

Pandas offers a feature to reorder index with the reorder_index function :
pandas.DataFrame({"A" : [1, 2, 3], "B" : [4,5,6], "C" : [7,8,9]}).set_index(["A", "B"]).reorder_levels(["B", "A"])
However it doesn't seem to work with single-indexed DataFrames :
pandas.DataFrame({"A" : [1, 2, 3], "B" : [4,5,6]}).set_index("A").reorder_levels(["A"])
Am I doing something incorrectly ?
PS : I know it doesn't make sens to reorder the Index with only one index, however it's a border effect and I usually tend to avoid un-necessary if statements for code clarity.
Doing the following is equivalent to reordering:
import pandas as pd
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6], "C": [7, 8, 9]}).set_index(["A", "B"])
print(df.reset_index().set_index(['B', 'A']))
Output
C
B A
4 1 7
5 2 8
6 3 9
And it works with a single index:
odf = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}).set_index("A")
print(odf)
Output
B
A
1 4
2 5
3 6

Sort a dictionary in a column in pandas

I have a dataframe as shown below.
user_id Recommended_modules Remaining_modules
1 {A:[5,11], B:[4]} {A:2, B:1}
2 {A:[8,4,2], B:[5], C:[6,8]} {A:7, B:1, C:2}
3 {A:[2,3,9], B:[8]} {A:5, B:1}
4 {A:[8,4,2], B:[5,1,2], C:[6]} {A:3, B:4, C:1}
Brief about the dataframe:
In the column Recommended_modules A, B and C are courses and the numbers inside the list are modules.
Key(Remaining_modules) = Course name
value(Remaining_modules) = Number of modules remaining in that course
From the above I would like to reorder the recommended_modules column based on the values in the Remaining_modules as shown below.
Expected Output:
user_id Ordered_Recommended_modules Ordered_Remaining_modules
1 {B:[4], A:[5,11]} {B:1, A:2}
2 {B:[5], C:[6,8], A:[8,4,2]} {B:1, C:2, A:7}
3 {B:[8], A:[2,3,9]} {B:1, A:5}
4 {C:[6], A:[8,4,2], B:[5,1,2]} {C:1, A:3, B:4}
Explanation:
For user_id = 2, Remaining_modules = {A:7, B:1, C:2}, sort like this {B:1, C:2, A:7}
similarly arrange Recommended_modules also in the same order as shown below
{B:[5], C:[6,8], A:[8,4,2]}.
It is possible, only need python 3.6+:
def f(x):
#https://stackoverflow.com/a/613218/2901002
d1 = {k: v for k, v in sorted(x['Remaining_modules'].items(), key=lambda item: item[1])}
L = d1.keys()
#https://stackoverflow.com/a/21773891/2901002
d2 = {key:x['Recommended_modules'][key] for key in L if key in x['Recommended_modules']}
x['Remaining_modules'] = d1
x['Recommended_modules'] = d2
return x
df = df.apply(f, axis=1)
print (df)
user_id Recommended_modules \
0 1 {'B': [4], 'A': [5, 11]}
1 2 {'B': [5], 'C': [6, 8], 'A': [8, 4, 2]}
2 3 {'B': [8], 'A': [2, 3, 9]}
3 4 {'C': [6], 'A': [8, 4, 2], 'B': [5, 1, 2]}
Remaining_modules
0 {'B': 1, 'A': 2}
1 {'B': 1, 'C': 2, 'A': 7}
2 {'B': 1, 'A': 5}
3 {'C': 1, 'A': 3, 'B': 4}

series.where on a series containing lists

I have this series called hours_by_analysis_date, where the index is datetimes, and the values are a list of ints. For example:
Index |
01-01-2000 | [1, 2, 3, 4, 5]
01-02-2000 | [2, 3, 4, 5, 6]
01-03-2000 | [1, 2, 3, 4, 5]
I want to return all the indices where the value is [1, 2, 3, 4, 5], so it should return 01-01-2000 and 01-03-2000
I tried hours_by_analysis_date.where(fh_by_analysis_date==[1, 2, 3, 4, 5]), but it gives me the error:
{ValueError} lengths must match to compare
It's confused between comparing two array-like objects and equality test for each element.
You can use apply:
hours_by_analysis_date.apply(lambda elem: elem == [1,2,3,4,5])

Unnest json to multiple rows splitted into key and value columns

I would like to unnest the column json_blob:
SELECT '{"a": [1, 2, 3], "b": [4, 5, 6]}' AS json_blob
to look like this at the end:
key | val
----------
"a" | [1,2,3]
"b" | [4, 5, 6]
Note that different rows can have different keys and it's a lot of them. I don't want to write all of them by hand.
The schema of the JSON would have to stay the same, then you could do this:
with t as (SELECT '{"a": [1, 2, 3], "b": [4, 5, 6]}' AS json_blob)
select key, val
from t cross join unnest([
struct('a' as key, json_extract(json_blob, '$.a') as val),
struct('b' as key, json_extract(json_blob, '$.b') as val)
])
Below example can be a good starting point - but really depends on pattern of your json
#standardSQL
WITH `project.dataset.table` AS (
SELECT '{"a": [1, 2, 3], "b": [4, 5, 6]}' AS json_blob UNION ALL
SELECT '{"a": [11, 12, 13], "c": [14, 15, 16]}' UNION ALL
SELECT '{"d": 21, "b": [24, 25, 26]}'
)
SELECT
SPLIT(kv, ': ')[OFFSET(0)] AS key,
SPLIT(kv, ': ')[SAFE_OFFSET(1)] AS value
FROM `project.dataset.table`,
UNNEST(REGEXP_EXTRACT_ALL(json_blob, r'("\w+":[^"]*)(?:,|})')) kv
with result
Row key value
1 "a" [1, 2, 3]
2 "b" [4, 5, 6]
3 "a" [11, 12, 13]
4 "c" [14, 15, 16]
5 "d" 21
6 "b" [24, 25, 26]