Aggregate rows according to JSON array content - sql

I have a PSQL table with json tags, that are always strings stored in a json array :
id | tags (json)
--------------------------
1 | ["tag1", "tag2"]
2 | ["tag12", "tag2"]
122 | []
I would like to query, for instance, the count of entries in the table containing each tag.
For instance, I'd like to get :
tag | count
--------------------------
tag1 | 1
tag2 | 2
tag12 | 1
I tried
SELECT tags::text AS tag, COUNT(id) AS cnt FROM my_table GROUP BY tag;
but if does not work, since it gives
tag | cnt
--------------------------
["tag1", "tag2"] | 1
["tag12", "tag2"] | 1
I guess I need to get the list of all tags in an inner query, and then for each tag count the rows that contain this tag, but I can't find how to do that. Can you help me with that ?

Use json[b]_array_elements_text() and a lateral join to unnest the array:
select x.tag, count(*) cnt
from mytable t
cross join lateral json_array_elements_text(t.tags) as x(tag)
group by x.tag

Related

How can I aggregate values inside a jsonb array using SQL

My data looks something like this:
tags | fullName
----------------------------------------------+------------------------------------------
["tag1", "tag2"] | John
["tag3", "tag1"] | Jane
["tag1", "tag3"] | Bob
tags is a jsonb type and fullName a text in a postgres database
What I'm struggling to do is, create a view such as
tags | count
----------------------------------------------+------------------------------------------
tag1 | 3
tag2 | 1
tag3 | 2
You may use the jsonb_array_elements function to expand the row as array elements before grouping and counting.
See example and working fiddle below.
SELECT
tag_names as tag,
COUNT(1) as count
FROM (
SELECT
jsonb_array_elements(tags) as tag_names
FROM
my_table
) t
GROUP BY tag_names;
tag
count
tag1
3
tag2
1
tag3
2
View on DB Fiddle
Or shorter
SELECT
jsonb_array_elements(tags) as tag,
COUNT(1) as count
FROM
my_table
GROUP BY
tag
View on DB Fiddle

Multiple STRING_AGG on multiple join columns causes bloated aggregation

I've got a table in my MSSQL server, lets call it blogPost. I've also got two tag tables, lets call them fooTag and barTag. The tag tables are used to tag the blogPost table which are identically structured.
blogPost
| postId | title | body |
+--------+---------------------+-------------+
| 1 | The life on a query | lorem ipsum |
+--------+---------------------+-------------+
fooTag and barTag
| postId | tagName |
+--------+--------------+
| 1 | sql |
| 1 | query |
| 1 | select-query |
+--------+--------------+
I want to get a single blogpost along with all it's tags in a single row so then STRING_AGG() feels suitable to make a query like this:
SELECT blogPost.*, STRING_AGG(fooTag.tagName, ';') as [fooTags], STRING_AGG(barTag.tagName, ';') as [barTags]
FROM blogPost
LEFT JOIN fooTag ON blogPost.postId = fooTag.postId
LEFT JOIN barTag ON blogPost.postId = barTag.postId
WHERE postId = 1
GROUP BY blogPost.postId, title, body
When making this query I'd expect to get the result
| postId | title | body | fooTags | barTags |
+--------+---------------------+-------------+-------------------------+-------------------------+
| 1 | The life on a query | lorem ipsum | sql;query;select-query | sql;query;select-query |
+--------+---------------------+-------------+-------------------------+-------------------------+
But I'm getting this result instead where bar tags (i.e. the last STRING_AGG selected) are duplicated.
| postId | title | body | fooTags | barTags |
+--------+---------------------+-------------+-------------------------+-----------------------------------------------+
| 1 | The life on a query | lorem ipsum | sql;query;select-query; | sql;sql;sql;query;query;query;select-query;select-query;select-query |
+--------+---------------------+-------------+-------------------------+-----------------------------------------------+
Putting barTags last in the SELECT statement makes it so that barTags gets the duplicates instead of fooTags. The amount of duplicates created seem to be bound to the amount of rows columns being aggregated together in the first STRING_AGG result column, so if fooTags has 5 rows to aggregate together there will be 5 duplicates of each barTag in the barTags column in the result.
How would I get the result I want without duplicates?
Your problem is caused by each row in fooTags creating that many rows of barTags in the JOIN, hence the duplication. You can work around this issue by performing the STRING_AGG in the footags and bartags tables before JOINing them:
SELECT blogPost.*, f.tags as [fooTags], b.tags as [barTags]
FROM blogPost
LEFT JOIN (SELECT postId, STRING_AGG(tagName, ';') AS tags
FROM fooTag
GROUP BY postId) f ON blogPost.postId = f.postId
LEFT JOIN (SELECT postId, STRING_AGG(tagName, ';') AS tags
FROM barTag
GROUP BY postId) b ON blogPost.postId = b.postId
WHERE postId = 1
You can simplify the query like so:
SELECT blogPost.*, ca1.*, ca2.*
FROM blogPost
OUTER APPLY (
SELECT STRING_AGG(tagName, ';')
FROM fooTag
WHERE blogPost.postId = fooTag.postId
) AS ca1(fooTags)
OUTER APPLY (
SELECT STRING_AGG(tagName, ';')
FROM barTag
WHERE blogPost.postId = barTag.postId
) AS ca2(barTags)
WHERE postId = 1
No GROUP BY required, in your case it'll be an expensive operation.

BigQuery: Query to GroupBy Array Column

I have (2) columns in BigQuery table:
1. url
2. tags
URL is a single value, and TAGS is an array(example below):
row | URL &nbsp| TAGS
1 | x.com | donkey
| donkey
| lives
| here
How can I group by TAGS array in BigQuery?
What's the trick to get the following query working?
SELECT TAGS FROM `URL_TAGS_TABLE`
group by unnest(TAGS)
I have tried group by TO_JSON_STRING but it does not give me the desired results
I'd like to see the following output
x.com | donkey | count 2
x.com | lives | count 1
x.com | here | count 1
Below is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'x.com' url, ['donkey','donkey','lives','here'] tags UNION ALL
SELECT 'y.com' url, ['abc','xyz','xyz','xyz'] tags
)
SELECT url, tag, COUNT(1) AS `count`
FROM `project.dataset.table`, UNNEST(tags) tag
GROUP BY url, tag
with result
Row url tag count
1 x.com donkey 2
2 x.com lives 1
3 x.com here 1
4 y.com abc 1
5 y.com xyz 3

Select all rows that have at least a list of features with wildcard support

given a table definition:
Objects:
obj_id | obj_name
-------|--------------
1 | object1
2 | object2
3 | object3
Tags:
tag_id | tag_name
-------|--------------
1 | code:python
2 | code:cpp
3 | color:green
4 | colorful
5 | image
objects_tags:
obj_id | tag_id
-------|---------
1 | 1
1 | 2
2 | 1
2 | 3
3 | 1
3 | 2
3 | 3
I'd like to select objects that contain all tags from given list with wildcards. Similar question has been asked several times and answer to simpler variant looks more or less like this:
SELECT obj_id,count(*) c FROM objects_tags
INNER JOIN objects USING(obj_id)
INNER JOIN tags USING(tag_id)
WHERE (name GLOB 'code*' OR name GLOB 'color*')
GROUP BY obj_id
HAVING (c==2)
However this solution doesn't work with wildcards. Is it possible to create similar query that would return objects that for each given wildcard query returned at least 1 tag? Checking if c>=2 doesn't work because one wildcard tag can return multiple results while another may return 0 still passing query even though it shouldn't.
I considered builting dynamic query built by client software that would consist of N INTERSECTs (one per tag) because there's probably not going to be many of them but it sounds like really dirty solution and if there's any more SQL way then I'd prefer to use it.
SQLite supports WITH clause so I would try to use it to determine all tags first, and then use these tags to find objects in the below way.
The example (demo) is made for PostGreSQL because I could not run SQLIte on any online tester, but I belive you will convert it easily to SQLite:
this query retrieves all tags:
WITH tagss AS (
SELECT * FROM Tags
WHERE tag_name LIKE 'code:%' OR tag_name LIKE 'color:%'
)
SELECT * FROM tagss;
| tag_id | tag_name |
|--------|-------------|
| 1 | code:python |
| 2 | code:cpp |
| 3 | color:green |
and the final query uses the above subquery in this way:
WITH tagss AS (
SELECT * FROM Tags
WHERE tag_name LIKE 'code:%' OR tag_name LIKE 'color:%'
)
SELECT obj_id,count(*) c
FROM objects_tags
INNER JOIN tagss USING(tag_id)
WHERE tag_name IN ( SELECT tag_name FROM tagss)
GROUP BY obj_id
HAVING count(*) >= (
SELECT count(*) FROM tagss
)
| obj_id | c |
|--------|---|
| 3 | 3 |

postgres - pivot query with array values

Suppose I have this table:
Content
+----+---------+
| id | title |
+----+---------+
| 1 | lorem |
+----|---------|
And this one:
Fields
+----+------------+----------+-----------+
| id | id_content | name | value |
+----+------------+----------+-----------+
| 1 | 1 | subtitle | ipsum |
+----+------------+----------+-----------|
| 2 | 1 | tags | tag1 |
+----+------------+----------+-----------|
| 3 | 1 | tags | tag2 |
+----+------------+----------+-----------|
| 4 | 1 | tags | tag3 |
+----+------------+----------+-----------|
The thing is: i want to query the content, transforming all the rows from "Fields" into columns, having something like:
+----+-------+----------+---------------------+
| id | title | subtitle | tags |
+----+-------+----------+---------------------+
| 1 | lorem | ipsum | [tag1,tag2,tag3] |
+----+-------+----------+---------------------|
Also, subtitle and tags are just examples. I can have as many fields as I desired, them being array or not.
But I haven't found a way to convert the repeated "name" values into an array, even more without transforming "subtitle" into array as well. If that's not possible, "subtitle" could also turn into an array and I could change it later on the code, but I needed at least to group everything somehow. Any ideas?
You can use array_agg, e.g.
SELECT id_content, array_agg(value)
FROM fields
WHERE name = 'tags'
GROUP BY id_content
If you need the subtitle, too, use a self-join. I have a subselect to cope with subtitles that don't have any tags without returning arrays filled with NULLs, i.e. {NULL}.
SELECT f1.id_content, f1.value, f2.value
FROM fields f1
LEFT JOIN (
SELECT id_content, array_agg(value) AS value
FROM fields
WHERE name = 'tags'
GROUP BY id_content
) f2 ON (f1.id_content = f2.id_content)
WHERE f1.name = 'subtitle';
See http://www.postgresql.org/docs/9.3/static/functions-aggregate.html for details.
If you have access to the tablefunc module, another option is to use crosstab as pointed out by Houari. You can make it return arrays and non-arrays with something like this:
SELECT id_content, unnest(subtitle), tags
FROM crosstab('
SELECT id_content, name, array_agg(value)
FROM fields
GROUP BY id_content, name
ORDER BY 1, 2
') AS ct(id_content integer, subtitle text[], tags text[]);
However, crosstab requires that the values always appear in the same order. For instance, if the first group (with the same id_content) doesn't have a subtitle and only has tags, the tags will be unnested and will appear in the same column with the subtitles.
See also http://www.postgresql.org/docs/9.3/static/tablefunc.html
If the subtitle value is the only "constant" that you wan to separate, you can do:
SELECT * FROM crosstab
(
'SELECT content.id,name,array_to_string(array_agg(value),'','')::character varying FROM content inner join
(
select * from fields where fields.name = ''subtitle''
union all
select * from fields where fields.name <> ''subtitle''
) fields_ordered
on fields_ordered.id_content = content.id group by content.id,name'
)
AS
(
id integer,
content_name character varying,
tags character varying
);