Postgresql query dictionary of objects in JSONB field - sql

I have a table in a PostgreSQL 9.5 database with a JSONB field that contains a dictionary in the following form:
{'1': {'id': 1,
'length': 24,
'date_started': '2015-08-25'},
'2': {'id': 2,
'length': 27,
'date_started': '2015-09-18'},
'3': {'id': 3,
'length': 27,
'date_started': '2015-10-15'},
}
The number of elements in the dictionary (the '1', '2', etc.) may vary between rows.
I would like to be able to get the average of length using a single SQL query. Any suggestions on how to achieve this ?

Use jsonb_each:
[local] #= SELECT json, AVG((v->>'length')::int)
FROM j, jsonb_each(json) js(k, v)
GROUP BY json;
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─────────────────────┐
│ json │ avg │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────────────────┤
│ {"1": {"id": 1, "length": 240, "date_started": "2015-08-25"}, "2": {"id": 2, "length": 27, "date_started": "2015-09-18"}, "3": {"id": 3, "length": 27, "date_started": "2015-10-15"}} │ 98.0000000000000000 │
│ {"1": {"id": 1, "length": 24, "date_started": "2015-08-25"}, "2": {"id": 2, "length": 27, "date_started": "2015-09-18"}, "3": {"id": 3, "length": 27, "date_started": "2015-10-15"}} │ 26.0000000000000000 │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────┘
(2 rows)
Time: 0,596 ms

Related

How to replace single quotes in a json like column in pandas

I have a column in dataframe with values like the ones below:
{'id': 22188, 'value': 'trunk'}
{'id': 22170, 'value': 'motor'}
I want to replace the single quotes to double quotes to use as json field. I am trying:
df['column'] = df['column'].replace({'\'': '"'}, regex=True)
But nothing changes.
How can I do this?
Expected result:
{"id": 22188, "value": "trunk"}
{"id": 22170, "value": "motor"}
No need to special escape the characters, just use the opposite family.
And no need to use Regex neither
And you have to use str accessor to do string replacements
df['column'] = df['column'].str.replace("'",'"', regex=False)
Works fine with string fields
>>> pd.Series(["{'id': 22188, 'value': 'trunk'}","{'id': 22170, 'value': 'motor'}"])
0 {'id': 22188, 'value': 'trunk'}
1 {'id': 22170, 'value': 'motor'}
dtype: object
>>> pd.Series(["{'id': 22188, 'value': 'trunk'}","{'id': 22170, 'value': 'motor'}"]).str.replace("'",'"')
0 {"id": 22188, "value": "trunk"}
1 {"id": 22170, "value": "motor"}
dtype: object
>>>
Fails with dictionaries => Convert it first with astype(str)
>>> pd.Series([{'id': 22188, 'value': 'trunk'},{'id': 22170, 'value': 'motor'}])
0 {'id': 22188, 'value': 'trunk'}
1 {'id': 22170, 'value': 'motor'}
dtype: object
>>> pd.Series([{'id': 22188, 'value': 'trunk'},{'id': 22170, 'value': 'motor'}]).str.replace("'",'"')
0 NaN
1 NaN
dtype: float64
>>> pd.Series([{'id': 22188, 'value': 'trunk'},{'id': 22170, 'value': 'motor'}]).astype(str).str.replace("'",'"')
0 {"id": 22188, "value": "trunk"}
1 {"id": 22170, "value": "motor"}
dtype: object

How input data in line chart in Pentaho?

I want to make this chart in Pentaho CDE:
based in this chart (I think that is the most similar from among CCC Components):
(The code is in this link.)
but I don't know how I can adapt my data input to that graph.
For example, I want to consume the data with this format:
[Year, customers_A, customers_B, cars_A, cars_B] [2014, 8, 4, 23, 20]
[2015, 20, 6, 30, 38]
How I can input my data in this chart?
Your data should come as an object such as this:
data = {
metadata: [
{ colName: "Year", colType:"Numeric", colIndex: 1},
{ colName: "customers_A", colType:"Numeric", colIndex: 2},
{ colName: "customers_B", colType:"Numeric", colIndex: 3},
{ colName: "cars_A", colType:"Numeric", colIndex: 4},
{ colName: "cars_B", colType:"Numeric", colIndex: 5}
],
resultset: [
[2014, 8, 4, 23, 20],
[2015, 20, 6, 30, 38]
],
queryInfo: {totalRows: 2}
}

is there's function to extract value from dictionary in list?

kindly need to extract name value, even it's Action or adventure from this column in new column
in pandas
'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'
You want from_records:
import pandas as pd
data = [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]
df = pd.DataFrame.from_records(data)
df
you get
id name
0 28 Action
1 12 Adventure
2 14 Fantasy
3 878 Science Fiction

pandas: aggregate array during groupby, equivalent of SQL's array_agg?

I've got this dataframe:
df1 = pd.DataFrame([
{ 'id': 1, 'spend': 60, 'store': 'Stockport' },
{ 'id': 2, 'spend': 68, 'store': 'Didsbury' },
{ 'id': 3, 'spend': 70, 'store': 'Stockport' },
{ 'id': 4, 'spend': 35, 'store': 'Didsbury' },
{ 'id': 5, 'spend': 16, 'store': 'Didsbury' },
{ 'id': 6, 'spend': 12, 'store': 'Didsbury' },
])
I've grouped it by store and got the total spend by store:
df.groupby("store").agg({'spend': 'sum'})\
.reset_index().sort_values("spend", ascending=False)
store spend
Didsbury 131
Stockport 130
Is there a way I can get the IDs for each store as a column in the grouped object? Like the equivalent of ARRAY_AGG in Postgres? So the desired output would be:
store spend ids
Didsbury 131 [2,4,5,6]
Stockport 130 [1,3]
We can use named_aggregations, which is an aggregation method available since pandas >= 0.25.0.
Notice how we can instantely rename our column to "ids":
df1.groupby('store').agg(
spend=('spend', 'sum'),
ids=('id', list)
).reset_index()
store spend ids
0 Didsbury 131 [2, 4, 5, 6]
1 Stockport 130 [1, 3]
You can pass list like aggregation function for id column:
df = (df1.groupby("store").agg({'spend': 'sum', 'id':list})
.reset_index()
.sort_values("spend", ascending=False))
print (df)
store spend id
0 Didsbury 131 [2, 4, 5, 6]
1 Stockport 130 [1, 3]

Convert a dictionary within a list to rows in pandas

I currently have a data frame like this:
and I would like to explode the "listing" column into rows. I would like to use the key in the dictionary as column names, so ideally this is how I would like to data frame to look like this:
eventId listingId currentPrice
103337923 1307675567 ...
103337923 1307675567 ...
103337923 1307675567 ...
This is what I get with this: print(listing_df.head(3).to_dict())
Definitely there should be a better way to do this. But this works. :)
df1 = pd.DataFrame(
{"a": [1,2,3,4],
"b": [5,6,7,8],
"c": [[{"x": 17, "y": 18, "z": 19}, {"x": 27, "y": 28, "z": 29}],
[{"x": 37, "y": 38, "z": 39}, {"x": 47, "y": 48, "z": 49}],
[{"x": 57, "y": 58, "z": 59}, {"x": 27, "y": 68, "z": 69}],
[{"x": 77, "y": 78, "z": 79}, {"x": 27, "y": 88, "z": 89}]]})
Now you can create a new DataFrame from the above:
df2 = pd.DataFrame(columns=df1.columns)
df2_index = 0
for row in df1.iterrows():
one_row = row[1]
for list_value in row[1]["c"]:
one_row["c"] = list_value
df2.loc[df2_index] = one_row
df2_index += 1
Output is the way you need:
Now that we have expanded list into separate rows, you can further expand json into columns with:
df2[list(df2["c"].head(1).tolist()[0].keys())] = df2["c"].apply(
lambda x: pd.Series([x[key] for key in x.keys()]))
Hope it helps!