I have a need to run a query over a Postgres database and aggregate it and export it as a json object using native Postgres tooling.
I can't quite get the aggregation working correctly and I'm a bit stumped.
Below is an example of some of the data
| msgserial | object_type | payload_key | payload | user_id |
+-----------+---------------+-------------+-----------------------------------------------------------+---------+
| 1696962 | CampaignEmail | a8901b2c | {"id": "ff7221da", "brand": "MAGIC", "eventType": "SENT"} | 001 |
| 1696963 | OtherType | b8901b2c | {"id": "ff7221db", "brand": "MAGIC", "eventType": "SENT"} | 001 |
| 1696964 | OtherType | c8901b2c | {"id": "ff7221dc", "brand": "MAGIC", "eventType": "SENT"} | 002 |
| 1696965 | OtherType | d8901b2c | {"id": "ff7221dd", "brand": "MAGIC", "eventType": "SENT"} | 001 |
| 1696966 | CampaignEmail | e8901b2c | {"id": "ff7221de", "brand": "MAGIC", "eventType": "SENT"} | 001 |
| 1696967 | CampaignEmail | f8901b2c | {"id": "ff7221df", "brand": "MAGIC", "eventType": "SENT"} | 002 |
| 1696968 | SomethingElse | g8901b2c | {"id": "ff7221dg", "brand": "MAGIC", "eventType": "SENT"} | 001 |
+-----------+---------------+-------------+-----------------------------------------------------------+---------+
I need to output a JSON object like this grouped by user_id
{
"user_id": 001,
"brand": "MAGIC",
"campaignEmails": [
{"id": "ff7221da", "brand": "MAGIC", "eventType": "SENT"},
{"id": "ff7221de", "brand": "MAGIC", "eventType": "SENT"},
{"id": "ff7221de", "brand": "MAGIC", "eventType": "SENT"}
],
"OtherTypes": [
{"id": "ff7221db", "brand": "MAGIC", "eventType": "SENT"},
{"id": "ff7221dd", "brand": "MAGIC", "eventType": "SENT"}
],
"Somethingelses": [
{"id": "ff7221dg", "brand": "MAGIC", "eventType": "SENT"}
]
},
{
"user_id": 002,
"campaignEmails": [
],
"OtherTypes": [
],
"Somethingelses": [
]
}
Essentially need to group al the payloads into arrays by their type grouped by the user_id
I started with JSONB_BUILD_OBJECT getting one of the object_types grouped together into an array but then got stumped.
Am I trying to achieve the impossible in raw PSQL? I'm really stumped and I keep hitting errors like X needs to be included in the GROUP BY clause etc...
I can group one of the object_types into an array grouped by user_id but can't seem to do all 3
My other thinking was to do have 3 subqueries but I'm not sure how to do that either.
You need two aggregations, first one in groups by user_id, object_type and the other by user_id only:
select
jsonb_build_object('user_id', user_id)
|| jsonb_object_agg(object_type, payload) as result
from (
select user_id, object_type, jsonb_agg(payload) as payload
from my_table
group by user_id, object_type
) s
group by user_id
Db<>Fiddle.
Related
I have following SQL query and trying to extract nested json data field.
*************************** 2. row ***************************
created_at: 2023-01-05 14:25:52
updated_at: 2023-01-05 14:26:02
deleted_at: NULL
deleted: 0
id: 2
instance_uuid: ef6380b4-5455-48f8-9e4b-3d04199be3f5
numa_topology: NULL
pci_requests: []
flavor: {"cur": {"nova_object.name": "Flavor", "nova_object.namespace": "nova", "nova_object.version": "1.2", "nova_object.data": {"id": 2, "name": "tempest2", "memory_mb": 512, "vcpus": 1, "root_gb": 1, "ephemeral_gb": 0, "flavorid": "202", "swap": 0, "rxtx_factor": 1.0, "vcpu_weight": 0, "disabled": false, "is_public": true, "extra_specs": {}, "description": null, "created_at": "2023-01-05T05:30:36Z", "updated_at": null, "deleted_at": null, "deleted": false}}, "old": null, "new": null}
vcpu_model: {"nova_object.name": "VirtCPUModel", "nova_object.namespace": "nova", "nova_object.version": "1.0", "nova_object.data": {"arch": null, "vendor": null, "topology": {"nova_object.name": "VirtCPUTopology", "nova_object.namespace": "nova", "nova_object.version": "1.0", "nova_object.data": {"sockets": 1, "cores": 1, "threads": 1}, "nova_object.changes": ["cores", "threads", "sockets"]}, "features": [], "mode": "host-model", "model": null, "match": "exact"}, "nova_object.changes": ["mode", "model", "vendor", "features", "topology", "arch", "match"]}
migration_context: NULL
keypairs: {"nova_object.name": "KeyPairList", "nova_object.namespace": "nova", "nova_object.version": "1.3", "nova_object.data": {"objects": []}}
device_metadata: NULL
trusted_certs: NULL
vpmems: NULL
resources: NULL
In flavor: section i have some json data and i am trying to extract "name": "tempest2" value in my question but it's nested so i am not able to find way to extract that value.
My query but how do i remove [] square brackets in value
MariaDB [nova]> select uuid, instances.created_at, instances.deleted_at, json_extract(flavor, '$.cur.*.name') AS FLAVOR from instances join instance_extra on instances.uuid = instance_extra.instance_uuid;
+--------------------------------------+---------------------+---------------------+--------------+
| uuid | created_at | deleted_at | FLAVOR |
+--------------------------------------+---------------------+---------------------+--------------+
| edb0facb-3353-4848-82e2-f12701a0a3aa | 2023-01-05 05:37:13 | 2023-01-05 05:37:49 | ["tempest1"] |
| ef6380b4-5455-48f8-9e4b-3d04199be3f5 | 2023-01-05 14:25:51 | NULL | ["tempest2"] |
+--------------------------------------+---------------------+---------------------+--------------+
#Update
This is the MariaDB version I have
MariaDB [nova]> SELECT VERSION();
+-------------------------------------------+
| VERSION() |
+-------------------------------------------+
| 10.5.12-MariaDB-1:10.5.12+maria~focal-log |
+-------------------------------------------+
1 row in set (0.000 sec)
I'm very new to Snowflake and I am working on creating a view from the table that holds JSON data as follows :
"data": {
"baseData": {
"dom_url": "https://www.soccertables.com/european_tables",
"event_id": "01b2722a-d8e6-4f67-95d0-8dd7ba088a4a",
"event_utc_time": "2020-05-11 09:01:14.821",
"ip_address": "125.238.134.96",
"table_1": [
{
"position": "1",
"team_name": "Liverpool",
"games_played": "29",
"games_won": "26",
"games_drawn": "2",
"games_lost": "1",
"goals_for": "75",
"goals_against": "35"
"points": "80"
},
{
"position": "2",
"team_name": "Man. City",
"games_played": "29",
"games_won": "20",
"games_drawn": "5",
"games_lost": "4",
"goals_for": "60",
"goals_against": "45"
"points": "65"
},
{
"position": "...",
"team_name": "...",
"games_played": "...",
"games_won": "...",
"games_drawn": "...",
"games_lost": "...",
"goals_for": "...",
"goals_against": "..."
"points": "..."
}
],
"unitID": "CN 8000",
"ver": "1.0.0"
},
"baseType": "MatchData"
},
"dataName": "CN8000.Prod.MatchData",
"id": "18a89f9e-9620-4453-a546-23412025e7c0",
"tags": {
"itrain.access.level1": "Private",
"itrain.access.level2": "Kumar",
"itrain.internal.deviceID": "",
"itrain.internal.deviceName": "",
"itrain.internal.encodeTime": "2022-03-23T07:41:19.000Z",
"itrain.internal.sender": "Harish",
"itrain.software.name": "",
"itrain.software.partNumber": 0,
"itrain.software.version": ""
},
"timestamp": "2021-02-25T07:32:31.000Z"
}
I want to extract the common values like dom_url, event_id, event_utc_time, ip_address along with each team_name in a separate column and the associated team details like position, games_played etc possibly in rows for each team name
E.g :
I've been trying Lateral flatten function but couldn't succeed so far
create or replace view AWSS3_PM.PUBLIC.PM_POWER_CN8000_V1(
DOM_URL,
EVENT_ID,
EVENT_UTC_TIME,
IP_ADDRESS,
TIMESTAMP,
POSITION,
GAMES_PLAYED,
GAMES_WON,
GAMES_LOST,
GAMES_DRAWN
) as
select c1:data:baseData:dom_url dom_url,
c1:data:baseData:event_id event_id,
c1:data:baseData:event_utc_time event_utc_time,
c1:data:baseData:ip_address ip_address,
c1:timestamp timestamp,
value:position TeamPosition,
value:games_played gamesPlayed,
value:games_won wins ,
value:games_lost defeats,
value:games_drawn draws
from pm_power, lateral flatten(input => c1:data:baseData:table_1);
Any help would be really grateful
Thanks,
Harish
#For the table Portion in JSON it would need flattening and transpose, example below -
Sample table -
select * from test_json;
+--------------------------------+
| TAB_VAL |
|--------------------------------|
| { |
| "table_1": [ |
| { |
| "games_drawn": "2", |
| "games_lost": "1", |
| "games_played": "29", |
| "games_won": "26", |
| "goals_against": "35", |
| "goals_for": "75", |
| "points": "80", |
| "position": "1", |
| "team_name": "Liverpool" |
| }, |
| { |
| "games_drawn": "5", |
| "games_lost": "4", |
| "games_played": "29", |
| "games_won": "20", |
| "goals_against": "45", |
| "goals_for": "60", |
| "points": "65", |
| "position": "2", |
| "team_name": "Man. City" |
| } |
| ] |
| } |
+--------------------------------+
1 Row(s) produced. Time Elapsed: 0.285s
Perform transpose after flattening JSON
select * from (
select figures,stats,team_name
from (
select
f.value:"games_drawn"::number as games_drawn,
f.value:"games_lost"::number as games_lost,
f.value:"games_played"::number as games_played,
f.value:"games_won"::number as games_won,
f.value:"goals_against"::number as goals_against,
f.value:"goals_for"::number as goals_for,
f.value:"points"::number as points,
f.value:"position"::number as position,
f.value:"team_name"::String as team_name
from
TEST_JSON, table(flatten(input=>tab_val:table_1, mode=>'ARRAY')) as f
) flt
unpivot (figures for stats in(games_drawn, games_lost, games_played, games_won, goals_against, goals_for, points,position))
) up
pivot (min(up.figures) for up.team_name in ('Liverpool','Man. City'));
+---------------+-------------+-------------+
| STATS | 'Liverpool' | 'Man. City' |
|---------------+-------------+-------------|
| GAMES_DRAWN | 2 | 5 |
| GAMES_LOST | 1 | 4 |
| GAMES_PLAYED | 29 | 29 |
| GAMES_WON | 26 | 20 |
| GOALS_AGAINST | 35 | 45 |
| GOALS_FOR | 75 | 60 |
| POINTS | 80 | 65 |
| POSITION | 1 | 2 |
+---------------+-------------+-------------+
8 Row(s) produced. Time Elapsed: 0.293s
In the Metadata column i have a Map type value:
+-----------+--------+-----------+--------------------------------+
| Noun| Pronoun| Adjective|Metadata |
+-----------+--------+-----------+--------------------------------+
| Homer| Simpson|Engineer |["Age": "50", "Country": "USA"] |
| Elon | Musk |King |["Age": "45", "Country": "RSA"] |
| Bart | Lee |Cricketer |["Age": "35", "Country": "AUS"] |
| Lisa | Jobs |Daughter |["Age": "35", "Country": "IND"] |
| Joe | Root |Player |["Age": "31", "Country": "ENG"] |
+-----------+--------+-----------+--------------------------------+
I want to append another Map type value in the Metadata against a key called tags.
+-----------+--------+-----------+--------------------------------------------------------------------+
| Noun| Pronoun| Adjective|Metadata |
+-----------+--------+-----------+--------------------------------------------------------------------+
| Homer| Simpson|Engineer |["Age": "50", "Country": "USA", "tags": ["Gen": "M", "Fit": "Yes"]] |
| Elon | Musk |King |["Age": "45", "Country": "RSA", "tags": ["Gen": "M", "Fit": "Yes"]] |
| Bart | Lee |Cricketer |["Age": "35", "Country": "AUS", "tags": ["Gen": "M", "Fit": "No"]] |
| Lisa | Jobs |Daughter |["Age": "35", "Country": "IND", "tags": ["Gen": "F", "Fit": "Yes"]] |
| Joe | Root |Player |["Age": "31", "Country": "ENG", "tags": ["Gen": "M", "Fit": "Yes"]] |
+-----------+--------+-----------+--------------------------------------------------------------------+
In the Metadata column, the outer Map is already a typedLit, adding another Map within it is not being allowed.
I implemented it using a struct. This is how it looks:
df.withColumn("Metadata", struct(lit("Age").alias("Age"), lit("Country").alias("Country"), typedLit(tags).alias("tags")))
It won't be exactly key val pair but still be queryable with alias.
I'm using _lodash library,
this is my data with objects:
[ {
"itemColor": "red",
"itemSize": "L",
"itemCount": 1,
"shopId": "shop 1",
"itemName": "product name 1",
},
{
"itemColor": "red",
"itemSize": "L",
"itemCount": 3,
"shopId": "shop 2",
"itemName": "product name 1",
},
{
"itemColor": "red",
"itemSize": "L",
"itemCount": 5,
"shopId": "shop 3",
"itemName": "product name 1",
},
{
"itemColor": "green",
"itemSize": "S",
"itemCount": 1,
"shopId": "shop 3",
"itemName": "product name 2",
}]
I need to group items by itemSize, itemColor and as result I need to have this table:
+----------------+-------+------+--------+--------+--------+
| itemName | color | size | shop 1 | shop 2 | shop 3 |
+================+=======+======+========+========+========+
| product name 1 | red | L | 1 | 3 | 5 |
+----------------+-------+------+--------+--------+--------+
| product name 2 | green | S | 0 | 0 | 1 |
+----------------+-------+------+--------+--------+--------+
If shop are no matches then I need to set 0 value.
I have data with the following schema in ClickHouse:
CREATE TABLE table AS (
key String,
…
nested Nested (
key String,
value String
)
) …
Some example data:
key | … | nested |
----|---|-------------------------------|
k1 | | [{"key": "a", "value": "1"}] |
k1 | | [{"key": "a", "value": "2"}] |
k1 | | [{"key": "a", "value": "1"}, |
| | "key": "a", "value": "2"}] |
k1 | | [{"key": "b", "value": "3" |
I want to group by the key and collect all the distinct key-value pairs into two arrays:
key | nested.key | nested.value |
------|-----------------|------------------|
k1 | ["a", "a", "b"] | ["1", "2", "3"] |
What is the simplest and most efficient way to do this in ClickHouse?
I would suggest this query:
SELECT DISTINCT
key,
arrayDistinct(groupArray((nested.key, nested.value))) AS distinctNested,
arrayMap(x -> (x.1), distinctNested) AS `nested.keys`,
arrayMap(x -> (x.2), distinctNested) AS `nested.values`
FROM test.table_002
ARRAY JOIN nested
GROUP BY key
/* Result
┌─key─┬─distinctNested──────────────────┬─nested.keys───┬─nested.values─┐
│ k1 │ [('a','1'),('a','2'),('b','3')] │ ['a','a','b'] │ ['1','2','3'] │
└─────┴─────────────────────────────────┴───────────────┴───────────────┘
*/
/* Test data preparing */
CREATE TABLE test.table_002 (
key String,
nested Nested (key String, value String)
) ENGINE = Memory;
INSERT INTO test.table_002
FORMAT JSONEachRow
{"key": "k1", "nested.key":["a"], "nested.value": ["1"]}
{"key": "k1", "nested.key":["a"], "nested.value": ["2"]}
{"key": "k1", "nested.key":["a", "a"], "nested.value": ["1", "2"]}
{"key": "k1", "nested.key":["b"], "nested.value": ["3"]}