is it possible to nest an array_agg inside another array_agg - sql

I have a SQL table message(application, type, action, date, ...) and I would like to get all the actions for a type and all the types for an application in a single query if possible.
So far I have managed to get the result in two separate queries like so:
select application, array_agg(distinct type) as types from message group by application;
application | types
--------------+----------------------------------------------------------------------------------------------------------------------------
app1 | {company,user}
app2 | {document,template}
app3 | {organization,user}
and the second query:
select type, array_agg(distinct action) as actions from message group by type;
type | actions
--------------------------------------+-----------------------------------------
company | {created,updated}
document | {created,tested,approved}
organization | {updated}
template | {deleted}
user | {created,logged,updated}
The most obvious single query I could come up with so far is just:
select application, type, array_agg(distinct action) from message group by application, type;
Which would require some programmatic processing to build the type array.
What I wanted to do was something theoretically like:
select application, array_agg(type, array_agg(action)) from message group by application, type which isn't possible as is but I feel there is a way to do it. I have also thought about nesting the second query into the first one but haven't found how to make it work yet.

demo:db<>fiddle
You can create tuples (records): (col1, col2). So if col2 is of type array, you created (text, text[]). These tuples can be aggregated as well into array of tuples:
SELECT
app,
array_agg((type, actions)) -- here is the magic
FROM (
SELECT
app,
type,
array_agg(actions) actions
FROM
message
GROUP BY app, type
) s
GROUP BY app
To get access, you have to explicitely define the record type at unnesting:
SELECT
*
FROM (
-- your query with tuples
)s,
unnest(types) AS t(type text, actions text[]) -- unnesting the tuple array
Nevertheless, as stated in the comments, maybe JSON may be a better approach for you:
demo:db<>fiddle
SELECT
app,
json_agg(json_build_object('type', type, 'actions', actions))
FROM (
SELECT
app,
type,
json_agg(actions) actions
FROM
message
GROUP BY app, type
) s
GROUP BY app
Result:
[{
"type": "company",
"actions": ["created","updated"]
},
{
"type": "user",
"actions": ["logged","updated"]
}]
Another possible JSON output:
demo:db<>fiddle
SELECT
json_agg(data)
FROM (
SELECT
json_build_object(app, json_agg(types)) as data
FROM (
SELECT
app,
json_build_object(type, json_agg(actions)) AS types
FROM
message
GROUP BY app, type
) s
GROUP BY app
) s
Result:
[{
"app1": [{
"company": ["created","updated"]
},
{
"user": ["logged","updated"]
}]
},
{
"app2": [{
"company": ["created"]
}]
}]

Related

complex couchbase query using metadata & group by

I am new to Couchbase and kind a stuck with the following problem.
This query works just fine in the Couchbase Query Editor:
SELECT
p.countryCode,
SUM(c.total) AS total
FROM bucket p
USE KEYS (
SELECT RAW "p::" || ca.token
FROM bucket ca USE INDEX (idx_cr)
WHERE ca._class = 'backend.db.p.ContactsDo'
AND ca.total IS NOT MISSING
AND ca.date IS NOT MISSING
AND ca.token IS NOT MISSING
AND ca.id = 288
ORDER BY ca.total DESC, ca.date ASC
LIMIT 20 OFFSET 0
)
LEFT OUTER JOIN bucket finished_contacts
ON KEYS ["finishedContacts::" || p.token]
GROUP BY p.countryCode ORDER BY total DESC
I get this:
[
{
"countryCode": "en",
"total": 145
},
{
"countryCode": "at",
"total": 133
},
{
"countryCode": "de",
"total": 53
},
{
"countryCode": "fr",
"total": 6
}
]
Now, using this query in a spring-boot application i end up with this error:
Unable to retrieve enough metadata for N1QL to entity mapping, have you selected _ID and _CAS?
adding metadata,
SELECT
meta(p).id AS _ID,
meta(p).cas AS _CAS,
p.countryCode,
SUM(c.total) AS total
FROM bucket p
trying to map it to the following object:
data class CountryIntermediateRankDo(
#Id
#Field
val id: String,
#Field
#NotNull
val countryCode: String,
#Field
#NotNull
val total: Long
)
results in:
Unable to execute query due to the following n1ql errors:
{“msg”:“Expression must be a group key or aggregate: (meta(p).id)“,”code”:4210}
Using Map as return value results in:
org.springframework.data.couchbase.core.CouchbaseQueryExecutionException: Query returning a primitive type are expected to return exactly 1 result, got 0
Clearly i missed something important here in terms of how to write proper Couchbase queries. I am stuck between needing metadata and getting this key/aggregate error that relates to the GROUP BY clause. I'd be very thankful for any help.
When you have a GROUP BY query, everything in the SELECT clause should be either a field used for grouping or a group aggregate. You need to add the new fields into the GROUP by statement, sort of like this:
SELECT
_ID,
_CAS,
p.countryCode,
SUM(p.c.total) AS total
FROM testBucket p
USE KEYS ["foo", "bar"]
LEFT OUTER JOIN testBucket finished_contacts
ON KEYS ["finishedContacts::" || p.token]
GROUP BY p.countryCode, meta(p).id AS _ID, meta(p).cas AS _CAS
ORDER BY total DESC
(I had to make some changes to your query to work with it effectively. You'll need to retrofit the advice to your specific case.)
If you need more detailed advice, let me suggest the N1QL forum https://forums.couchbase.com/c/n1ql . StackOverflow is great for one-and-done questions, but the forum is better for extended interactions.

Flattening multiple repeated fields in Google BigQuery

I'm trying to flatten data from repeated fields in Big Query. I have had a look at this Querying multiple repeated fields in BigQuery, however I can't seem to get this to work.
My data looks like the following:
[
{
"visitorId": null,
"visitNumber": "15",
"device": {
"browser": "Safari (in-app)",
"browserVersion": "(not set)",
"browserSize": "380x670",
"operatingSystem": "iOS",
},
"hits": [
{
"isEntrance": "true",
"isExit": "true",
"referer": null,
"page": {
"pagePath": "/news/bla-bla-bla",
"hostname": "www.example.com",
"pageTitle": "Win tickets!!",
"searchKeyword": null,
"searchCategory": null,
"pagePathLevel1": "/news/",
"pagePathLevel2": "/bla-bla-bla",
"pagePathLevel3": "",
"pagePathLevel4": ""
},
"transaction": null
}
]
}
]
What I want is the fields in the hits-page repeated fields.
For instance i want to fetch the hits.page.pagePath (with the value "/news/bla-bla-bla")
I have tried with the following query, but i get an error:
SELECT
visitorId,
visitNumber,
device.browser,
hits.page.pagePath
FROM
`Project.Page`
LIMIT 1000
The error i'm getting is this
Error: Cannot access field page on a value with type ARRAY<STRUCT<hitNumber INT64, time INT64, hour INT64, ...>>
In ga_sessions schema, the field hits is represented as an ARRAY type.
Usually when working with this type field you need to apply the UNNEST operation in order to open the array.
Specifically, in the FROM clause, you can apply a CROSS JOIN (you unnest arrays by applying a cross join operation, which can be represented as a comma followed by the UNNEST function) like so:
SELECT
visitorId,
visitNumber,
device.browser,
hits.page.pagePath
FROM `Project.Page`,
UNNEST(hits) hits
LIMIT 1000
If you want specific pagePaths, you can filter them out like so:
SELECT
visitorId,
visitNumber,
device.browser,
hits.page.pagePath
FROM `Project.Page`,
UNNEST(hits) hits
WHERE regexp_contains(hits.page.pagePath, r'/news/bla-bla-bla')
LIMIT 1000
Make sure to follow through BigQuery documentation on this topic, it's really well written and you'll learn a lot on new techniques to process big data.

Merging columns and jsonb object array into a single jsonb column

I have a table visitors(id, email, first_seen, sessions, etc.)
and another table trackings(id, visitor_id, field, value) that stores custom, user supplied data.
I want to query these and merge the visitor data columns and the trackings into a single column called data
For example, say I have two trackings
(id: 3, visitor_id: 1, field: "orders_made", value: 2)
(id: 4, visitor_id: 1, field: "city", value: 'new york')
and a visitor
(id: 1, email: 'hello#gmail.com, sessions: 5)
I want the result to be on the form of
(id: 1, data: {email: 'hello#gmail.com', sessions: 5, orders_made: 2, city: 'new york'})
What's the best way to accomplish this using Postgres 9.4?
I'll start by saying trackings is a bad idea. If you don't have many things to track, just store json instead; that's what it's made for. If you have a lot of things to track, you'll become very unhappy with the performance of trackings over time.
First you need a json object from trackings:
-- WARNING: Behavior of this with duplicate field names is undefined!
SELECT json_object(array_agg(field), array_agg(value)) FROM trackings WHERE ...
Getting json for visitors is relatively easy:
SELECT row_to_json(email, sessions) FROM visitors WHERE ...;
I recommend you do not just squash all those together. What happens if you have a field called email? Instead:
SELECT row_to_json((SELECT
(
SELECT row_to_json(email, sessions) FROM visitors WHERE ...
) AS visitor
, (
SELECT json_object(array_agg(field), array_agg(value)) FROM trackings WHERE ...
) AS trackings
));

Working with Structs within Arrays for new BigQuery Standard SQL

I'm trying to find rows with duplicate fields in an array of structs within a Google BigQuery table, using the new Standard SQL. The data in the table (simplified) where each row looks a bit like this:
{
"Session": "abc123",
"Information" [
{
"Identifier": "e8d971a4-ef33-4ea1-8627-f1213e4c67dc"
},
{
"Identifier": "1c62813f-7ec4-4968-b18b-d1eb8f4d9d26"
},
{
"Identifier": "e8d971a4-ef33-4ea1-8627-f1213e4c67dc"
}
]
}
My end goal is to display the rows that have Information entities with duplicate Identifier values present. However, most of the queries I attempt get an error message of the following form:
Cannot access field Identifier on a value with type ARRAY<STRUCT<Identifier STRING>>
Is there a way to work with the data inside of a STRUCT within an ARRAY?
Here's my first attempt at a query:
SELECT
Session,
Information
FROM
`events.myevents`
WHERE
COUNT(DISTINCT Information.Identifier) != ARRAY_LENGTH(Information.Identifier)
LIMIT
1000
And another using a subquery:
SELECT
Session,
Information
FROM (
SELECT
Session,
Information,
COUNT(DISTINCT Information.Identifier) AS info_count_distinct,
ARRAY_LENGTH(Information) AS info_count
FROM
`events.myevents`
WHERE
COUNT(DISTINCT Information.Identifier) != ARRAY_LENGTH(Information.Identifier)
LIMIT
1000)
WHERE
info_count != info_count_distinct
Try below
SELECT Session, Identifier, COUNT(1) AS dups
FROM `events.myevents`, UNNEST(Information)
GROUP BY Session, Identifier
HAVING dups > 1
ORDER BY Session
Should give you what you expect plus number of dups.
Like below (example)
Session Identifier dups
abc123 e8d971a4-ef33-4ea1-8627-f1213e4c67dc 2
abc345 1c62813f-7ec4-4968-b18b-d1eb8f4d9d26 3

SOCRATA SODA query for array item (SELECT & GROUP BY)

Is it possible to select or group by an item within an array using SoQL?
For example, here is an item returned by the SODA API:
{
"business_name" : "ACME CO",
"phone_number" : {
"phone_number" : "2145551234"
},
"zip_code" : "75235"
}
I would like to group by the phone number, but can't find a way to support it. The phone number is not available as a non-array object, and the site does not support adding it to the group by via its GUI.
Trying something like the following causes an error stating all GROUP BY arguments must be column names:
$select=hospital_name%2Cphone_number&$group=hospital_name%2Cphone_number.phone_numer
Thanks in advance!