Return aggregated JSON from BigQuery?

Return aggregated JSON from BigQuery? - sql

If I have the below example_table in BigQuery. When I query the table with "Original Query" I get the "Actual Result" (which makes since). Is there a way to query BigQuery directly to get the "Desired Result"
Original Query
SELECT ID, SUBID FROM `example_table ORDER BY ID
example_table
ID | SUBID
12 abc
12 def
12 ghi
34 jkl
34 mno
56 prg
Actual Result
[{
"ID": "12",
"SUBID": "abc"
}, {
"ID": "12",
"SUBID": "def"
}, {
"ID": "12",
"SUBID": "ghi"
}, {
"ID": "34",
"SUBID": "jkl"
}, {
"ID": "34",
"SUBID": "mno"
}, {
"ID": "56",
"SUBID": "prg"
}]
Desired Result
[{
"ID": "12",
"SUBID": ["abc", "def", "ghi"]
}, {
"ID": "34",
"SUBID": ["jkl", "mno"]
}, {
"ID": "56",
"SUBID": ["prg"]
}]

Below is for BigQuery Standard SQL
#standardSQL
SELECT ID, ARRAY_AGG(SUBID) SUBID
FROM `project.dataset.example_table`
GROUP BY ID
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.example_table` AS (
SELECT 12 ID, 'abc' SUBID UNION ALL
SELECT 12, 'def' UNION ALL
SELECT 12, 'ghi' UNION ALL
SELECT 34, 'jkl' UNION ALL
SELECT 34, 'mno' UNION ALL
SELECT 56, 'prg'
)
SELECT ID, ARRAY_AGG(SUBID) SUBID
FROM `project.dataset.example_table`
GROUP BY ID
-- ORDER BY ID
with result

If BigQuery does use MySQL syntax you might be able to do this. If not you can continue CONCAT throughout all of your query using multiple selects but it would be a little more convoluted instead of the JSON_ARRAYAGG.
SELECT CONCAT('{','ID:', ID,', SUBID:', JSON_ARRAYAGG(SUBID),'}') as JSON
FROM contact GROUP BY ID;
https://www.db-fiddle.com/f/37ru5oq4dFQSscwYsfx386/25

Related

Select all the rows with same id and same value in another column without using having clause

I have cosmos DB with a container having multiple documents. I want to get all the ids with the same value of a property. Since it's Cosmos I cannot use the having clause.
eg: If there is a container with the schema,
{
"id": 1,
"source": "online",
"type": "login"
},
{
"id": 1,
"source": "online",
"type": "login"
},
{
"id": 2,
"source": "online",
"type": "login"
},
{
"id": 2,
"source": "In store",
"type": "login"
}
I want all the ids where the source value is all same and "online". So in the above example, it should return "id" as 1 only.

This request selects all IDs that did not have other source types at all, only those online
select distinct(id), source
from(
select id, source
from your_table as t
WHERE NOT EXISTS (SELECT id
FROM your_table
WHERE source != 'online'
AND your_table.id = t.id))x
-- If need at the end use this WHERE x.id IS NOT NULL
Result :
| id | source
|:---- |:------:
| 1 | online

Select only the jsonb object keys which are filtered and ordered by values in child object

in postgres 13, I have a Jsonb object and I am able to get only the keys using jsonb_object_keys like this.
SELECT keys from jsonb_object_keys('{
"135": {
"timestamp": 1659010504308,
"priority": 5,
"age": 20
},
"136": {
"timestamp": 1659010504310,
"priority": 2,
"age": 20
},
"137": {
"timestamp": 1659010504312,
"priority": 2,
"age": 20
},
"138": {
"timestamp": 1659010504319,
"priority": 1,
"age": 20
}}') as keys
Now, I want to get the keys which have priority more than 1 and which are ordered by priority and timestamp
I am able to achieve this using this query
select key from (
SELECT data->>'key' key, data->'value' value
FROM
jsonb_path_query(
'{
"135": {
"name": "test1",
"timestamp": 1659010504308,
"priority": 5,
"age": 20
},
"136": {
"name": "test2",
"timestamp": 1659010504310,
"priority": 7,
"age": 20
},
"137": {
"name": "test3",
"timestamp": 1659010504312,
"priority": 5,
"age": 20
},
"138": {
"name": "test4",
"timestamp": 1659010504319,
"priority": 1,
"age": 20
}}'
, '$.keyvalue() ? (#.value.priority > 1)')
as data) as foo, jsonb_to_record(value) x("name" text, "timestamp" decimal,
"priority" int,
"age" int)
order by priority desc, timestamp desc
This doesn't seem to be the efficient way of doing this.
Please share if this can be achieved in a better way (by using jsonb_object_keys !??)
Thanks in advance.

I would first 'normalize' JSON data into a table (the t CTE) and then do a trivial select.
with t (key, priority, ts) as
(
select key, (value ->> 'priority')::integer, value ->> 'timestamp'
from jsonb_each('{
"135": {"timestamp": 1659010504308,"priority": 5,"age": 20},
"136": {"timestamp": 1659010504310,"priority": 2,"age": 20},
"137": {"timestamp": 1659010504312,"priority": 2,"age": 20},
"138": {"timestamp": 1659010504319,"priority": 1,"age": 20}
}')
)
select key
from t
where priority > 1
order by priority, ts;

Get list of user with their latest 2 inputs -(TypeORM ,sql)

How do write a sql query to get at least 2 items for distinct UUIDs
user-table
id name
xxx a
xyx b
zzz e
visitedlocation-table
id startDate userID location
1. 1/2/21 xxx USA
2. 1/3/21 xxx UK
3. 1/2/21 xyx AR
4. 1/3/21 xyx USA
5. 1/5/21 zzz USA
6. 1/6/21 xxx IN
I want to get a list of users with their last two visited locations
Desired output
[
{
id: "xxx",
name: "a",
lastVisits: [
{
id: "6",
startDate: "1/6/21",
location: "IN"
},
{
id: "2",
startDate: "1/3/21",
location: "UK"
}
]
},
{
id: "xyx",
name: "b",
lastVisits: [
{
id: "4",
startDate: "1/3/21",
location: "USA"
},
{
id: "3",
startDate: "1/2/21",
location: "AR"
}
]
},
{
id: "zzz",
name: "b",
lastVisits: [
{
id: "5",
startDate: "1/5/21",
location: "USA"
}
]
}
]
I am using Type Orm and the user entity has a one to many relations with the "visited location" table
repository
.createQueryBuilder('user)
.leftJoinAndSelect(
'user.visitedLocation',
'visitedLocation',
'visitedLocation.userId = user.id'
)
.getRawMany();
I tried using this query but it returns all the visited locations. But I want only the last 2 visited locations.
If it's hard do in query builder please suggest SQL query for this

You can try dense_rank() to rank your rows and only get the last two rows
SELECT userID,startDate,location
FROM
(
SELECT a.id as userID, b.startDate, b.location,
--this will group your rows by user_id and then rank them based on startDate
DENSE_RANK() OVER(PARTITION BY b.userID ORDER BY b.startDate DESC) as
row_rank
FROM user-table a
INNER JOIN visitedlocation-table b
ON (a.id = b.userID)
)T WHERE row_rank <=2 -- fetch only the first two rows
you can take inspiration from the above logic. I'll be posting the JSON based output solution too
Edit
WITH user_visits AS
(
SELECT userID,name,id,startDate,location
FROM
(
SELECT a.id as userID,a.name,b.id, b.startDate, b.location,
--this will group your rows by user_id and then rank them based on startDate
DENSE_RANK() OVER(PARTITION BY b.userID ORDER BY b.startDate DESC) as
row_rank
FROM user_table a
INNER JOIN visitedlocation_table b
ON (a.id = b.userID)
)T WHERE row_rank <=2 -- fetch only the first two rows
)
SELECT jsonb_pretty(array_to_json(array_agg(row_to_json(t)))::jsonb)
FROM(
SELECT userid as id, name,
(
SELECT array_to_json(array_agg(row_to_json(d)))
FROM(
SELECT id,startdate,location
FROM user_visits b
WHERE b.userid = u.userid
)d
) as lastVisits
FROM user_visits u
GROUP BY userid,name
ORDER BY userid
)t;
output of above query
[
{
"id": "xxx",
"name": "a",
"lastvisits": [
{
"id": 6,
"location": "IN",
"startdate": "2021-06-01"
},
{
"id": 2,
"location": "UK",
"startdate": "2021-03-01"
}
]
},
{
"id": "xyz",
"name": "b",
"lastvisits": [
{
"id": 4,
"location": "USA",
"startdate": "2021-03-01"
},
{
"id": 3,
"location": "AR",
"startdate": "2021-02-01"
}
]
},
{
"id": "zzz",
"name": "e",
"lastvisits": [
{
"id": 5,
"location": "USA",
"startdate": "2021-05-01"
}
]
}
]

PostgreSQL json_build_object nested

First things first:
I'm using PostgreSQL 11.6, compiled by Visual C++ build 1800, 64-bit. :)
Im trying to create a JSON object directly from the database.
My desired result is
{
"1": [],
"2": [],
"3": []
}
Imagine my tables like:
MyIdTable
_id_|__key__
1 test1
2 test2
3 test3
MyKeyValueTable
__id__|__fkidmyidtable__|__value__
1 1 test
2 1 test1
3 2 test2
4 2 test3
Now I create a query
select
json_build_object(
a.id,
json_agg(
b.*
)
)
from "MyIdTable" a
inner join "MyKeyValueTable" b on a.id = b.fkidmyidtable group by a.id
This will get me as result, multiple rows with the desired result:
row 1: {
"1": [{ "id": 1, "fkidmyidtable": 1, "value": "test" }, { "id": 2, "fkidmyidtable": 1, "value": "test1" }]
}
row 2: {
"2": [{ "id": 3, "fkidmyidtable": 2, "value": "test2" }, { "id": 4, "fkidmyidtable": 2, "value": "test3" }]
}
After this I can use json_agg() to create almost my desired result. The issue is that it will create
[ { "json_build_object": {"1": [{ "id": 1, "fkidmyidtable": 1, "value": "test" }, { "id": 2, "fkidmyidtable": 1, "value": "test1" }]}, "json_build_object": { "2": [{ "id": 3, "fkidmyidtable": 2, "value": "test2" }, { "id": 4, "fkidmyidtable": 2, "value": "test3" }] }]
I would like to know if its possible to write a query to merge my created object into one json object like:
{
"1": [{ "id": 1, "fkidmyidtable": 1, "value": "test" }, { "id": 2, "fkidmyidtable": 1, "value": "test1" }],
"2": [{ "id": 3, "fkidmyidtable": 2, "value": "test2" }, { "id": 4, "fkidmyidtable": 2, "value": "test3" }]
}
Thank you very much in advance for taking the time to read :)!

If I followed you correctly, you can add another level of aggregation and use json_object_agg():
select json_object_agg(id, js) res
from (
select a.id, json_agg(b.*) js
from "MyIdTable" a
inner join "MyKeyValueTable" b on a.id = b.fkidmyidtable
group by a.id
) t

Removing null/empty values from an array

I'm struggling to understand arrays and structs in BigQuery. When I run this query in Standard SQL:
with t1 as (
select 1 as id, [1,2] as orders
union all
select 2 as id, null as orders
)
select
id,
orders
from t1
order by 1
I get this result in json:
[
{
"id": "1",
"orders": [
"1",
"2"
]
},
{
"id": "2",
"orders": []
}
]
I want to remove to remove the orders value for id = 2 so that I instead get:
[
{
"id": "1",
"orders": [
"1",
"2"
]
},
{
"id": "2"
}
]
How can I do this? Do I need to add another CTE to remove null values, how?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Return aggregated JSON from BigQuery? - sql

Related

Select all the rows with same id and same value in another column without using having clause

Select only the jsonb object keys which are filtered and ordered by values in child object

Get list of user with their latest 2 inputs -(TypeORM ,sql)

PostgreSQL json_build_object nested

Removing null/empty values from an array

Categories

Resources