postgres - pivot query with array values - sql

Suppose I have this table:
Content
+----+---------+
| id | title |
+----+---------+
| 1 | lorem |
+----|---------|
And this one:
Fields
+----+------------+----------+-----------+
| id | id_content | name | value |
+----+------------+----------+-----------+
| 1 | 1 | subtitle | ipsum |
+----+------------+----------+-----------|
| 2 | 1 | tags | tag1 |
+----+------------+----------+-----------|
| 3 | 1 | tags | tag2 |
+----+------------+----------+-----------|
| 4 | 1 | tags | tag3 |
+----+------------+----------+-----------|
The thing is: i want to query the content, transforming all the rows from "Fields" into columns, having something like:
+----+-------+----------+---------------------+
| id | title | subtitle | tags |
+----+-------+----------+---------------------+
| 1 | lorem | ipsum | [tag1,tag2,tag3] |
+----+-------+----------+---------------------|
Also, subtitle and tags are just examples. I can have as many fields as I desired, them being array or not.
But I haven't found a way to convert the repeated "name" values into an array, even more without transforming "subtitle" into array as well. If that's not possible, "subtitle" could also turn into an array and I could change it later on the code, but I needed at least to group everything somehow. Any ideas?

You can use array_agg, e.g.
SELECT id_content, array_agg(value)
FROM fields
WHERE name = 'tags'
GROUP BY id_content
If you need the subtitle, too, use a self-join. I have a subselect to cope with subtitles that don't have any tags without returning arrays filled with NULLs, i.e. {NULL}.
SELECT f1.id_content, f1.value, f2.value
FROM fields f1
LEFT JOIN (
SELECT id_content, array_agg(value) AS value
FROM fields
WHERE name = 'tags'
GROUP BY id_content
) f2 ON (f1.id_content = f2.id_content)
WHERE f1.name = 'subtitle';
See http://www.postgresql.org/docs/9.3/static/functions-aggregate.html for details.
If you have access to the tablefunc module, another option is to use crosstab as pointed out by Houari. You can make it return arrays and non-arrays with something like this:
SELECT id_content, unnest(subtitle), tags
FROM crosstab('
SELECT id_content, name, array_agg(value)
FROM fields
GROUP BY id_content, name
ORDER BY 1, 2
') AS ct(id_content integer, subtitle text[], tags text[]);
However, crosstab requires that the values always appear in the same order. For instance, if the first group (with the same id_content) doesn't have a subtitle and only has tags, the tags will be unnested and will appear in the same column with the subtitles.
See also http://www.postgresql.org/docs/9.3/static/tablefunc.html

If the subtitle value is the only "constant" that you wan to separate, you can do:
SELECT * FROM crosstab
(
'SELECT content.id,name,array_to_string(array_agg(value),'','')::character varying FROM content inner join
(
select * from fields where fields.name = ''subtitle''
union all
select * from fields where fields.name <> ''subtitle''
) fields_ordered
on fields_ordered.id_content = content.id group by content.id,name'
)
AS
(
id integer,
content_name character varying,
tags character varying
);

Related

Postgres jsonb. Heterogenous json fields

If I have a table with a single jsonb column and the table has data like this:
[{"body": {"project-id": "111"}},
{"body": {"my-org.project-id": "222"}},
{"body": {"other-org.project-id": "333"}}]
Basically it stores project-id differently for different rows.
Now I need a query where the data->'body'->'etc'., from different rows would coalesce into a single field 'project-id', how can I do that?
e.g.: if I do something like this:
select data->'body'->'project-id' projectid from mytable
it will return something like:
| projectid |
| 111 |
But I also want project-id's in other rows too, but I don't want additional columns in the results. i.e, I want this:
| projectid |
| 111 |
| 222 |
| 333 |
I understand that each of your rows contains a json object, with a nested object whose key varies over rows, and whose value you want to acquire.
Assuming the 'body' always has a single key, you could do:
select jsonb_extract_path_text(t.js -> 'body', x.k) projectid
from t
cross join lateral jsonb_object_keys(t.js -> 'body') as x(k)
The lateral join on jsonb_object_keys() extracts all keys in the object as rows. Then we use jsonb_extract_path_text() to get the corresponding value.
Demo on DB Fiddle:
with t as (
select '{"body": {"project-id": "111"}}'::jsonb js
union all select '{"body": {"my-org.project-id": "222"}}'::jsonb
union all select '{"body": {"other-org.project-id": "333"}}'::jsonb
)
select jsonb_extract_path_text(t.js -> 'body', x.k) projectid
from t
cross join lateral jsonb_object_keys(t.js -> 'body') as x(k)
| projectid |
| :--------- |
| 111 |
| 222 |
| 333 |

Multiple STRING_AGG on multiple join columns causes bloated aggregation

I've got a table in my MSSQL server, lets call it blogPost. I've also got two tag tables, lets call them fooTag and barTag. The tag tables are used to tag the blogPost table which are identically structured.
blogPost
| postId | title | body |
+--------+---------------------+-------------+
| 1 | The life on a query | lorem ipsum |
+--------+---------------------+-------------+
fooTag and barTag
| postId | tagName |
+--------+--------------+
| 1 | sql |
| 1 | query |
| 1 | select-query |
+--------+--------------+
I want to get a single blogpost along with all it's tags in a single row so then STRING_AGG() feels suitable to make a query like this:
SELECT blogPost.*, STRING_AGG(fooTag.tagName, ';') as [fooTags], STRING_AGG(barTag.tagName, ';') as [barTags]
FROM blogPost
LEFT JOIN fooTag ON blogPost.postId = fooTag.postId
LEFT JOIN barTag ON blogPost.postId = barTag.postId
WHERE postId = 1
GROUP BY blogPost.postId, title, body
When making this query I'd expect to get the result
| postId | title | body | fooTags | barTags |
+--------+---------------------+-------------+-------------------------+-------------------------+
| 1 | The life on a query | lorem ipsum | sql;query;select-query | sql;query;select-query |
+--------+---------------------+-------------+-------------------------+-------------------------+
But I'm getting this result instead where bar tags (i.e. the last STRING_AGG selected) are duplicated.
| postId | title | body | fooTags | barTags |
+--------+---------------------+-------------+-------------------------+-----------------------------------------------+
| 1 | The life on a query | lorem ipsum | sql;query;select-query; | sql;sql;sql;query;query;query;select-query;select-query;select-query |
+--------+---------------------+-------------+-------------------------+-----------------------------------------------+
Putting barTags last in the SELECT statement makes it so that barTags gets the duplicates instead of fooTags. The amount of duplicates created seem to be bound to the amount of rows columns being aggregated together in the first STRING_AGG result column, so if fooTags has 5 rows to aggregate together there will be 5 duplicates of each barTag in the barTags column in the result.
How would I get the result I want without duplicates?
Your problem is caused by each row in fooTags creating that many rows of barTags in the JOIN, hence the duplication. You can work around this issue by performing the STRING_AGG in the footags and bartags tables before JOINing them:
SELECT blogPost.*, f.tags as [fooTags], b.tags as [barTags]
FROM blogPost
LEFT JOIN (SELECT postId, STRING_AGG(tagName, ';') AS tags
FROM fooTag
GROUP BY postId) f ON blogPost.postId = f.postId
LEFT JOIN (SELECT postId, STRING_AGG(tagName, ';') AS tags
FROM barTag
GROUP BY postId) b ON blogPost.postId = b.postId
WHERE postId = 1
You can simplify the query like so:
SELECT blogPost.*, ca1.*, ca2.*
FROM blogPost
OUTER APPLY (
SELECT STRING_AGG(tagName, ';')
FROM fooTag
WHERE blogPost.postId = fooTag.postId
) AS ca1(fooTags)
OUTER APPLY (
SELECT STRING_AGG(tagName, ';')
FROM barTag
WHERE blogPost.postId = barTag.postId
) AS ca2(barTags)
WHERE postId = 1
No GROUP BY required, in your case it'll be an expensive operation.

PostgreSQL update field for each element in array

I wish to do the following task in SQL:
I have a table with columns:
uuid (uuid), word (text), wordList (text[]), uuidList (uuid[])
I have the wordList array, uuid and word columns populated. I wish to update and populate the uuidList like this:
foreach element in wordList
var x = select uuid where word = element;
uuidList.append(x);
Example:
I have a table like this:
+---------+-------+--------------------+----------+
| uuid | word | wordList | uuidList |
+---------+-------+--------------------+----------+
| aaaa... | hello | NULL | NULL |
| bbbb... | world | NULL | NULL |
| cccc... | blah | {'hello', 'world'} | NULL |
+---------+-------+--------------------+----------+
I want it to become like this:
+---------+-------+--------------------+--------------------+
| uuid | word | wordList | uuidList |
+---------+-------+--------------------+--------------------+
| aaaa... | hello | NULL | NULL |
| bbbb... | world | NULL | NULL |
| cccc... | blah | {'hello', 'world'} | {aaaa..., bbbb...} |
+---------+-------+--------------------+--------------------+
I'm quite new to SQL and have gotten confused how to do it. I don't think I can join a table to itself. I don't know if I should be storing information in a temporary table to somehow achieve this (some related questions I read had this proposed)...
Thanks!
You can aggregate all the needed UUIDs in a single statement:
select w1.uid, array_agg(w2.uid order by wl.idx) as uuidlist
from words w1
cross join lateral unnest(w1.wordlist) with ordinality as wl(word,idx)
join words w2 on w2.word = wl.word
where w1.wordlist is not null
and w1.uuidlist is null -- optional
group by w1.uid;
The option with ordinality returns an additional column that indicates the position of the element in the original array. This is needed to aggregate the UUIDs in the correct order.
This returns the following result with your sample data:
uid | uuidlist
-----+------------
cccc | {aaaa,bbbb}
This can be used as the source of an update statement (assuming the column uid is unique):
update words
set uuidlist = t.uuidlist
from (
select w1.uid, array_agg(w2.uid order by wl.idx) as uuidlist
from words w1
cross join lateral unnest(w1.wordlist) with ordinality as wl(word,idx)
join words w2 on w2.word = wl.word
where w1.wordlist is not null
and w1.uuidlist is null -- optional
group by w1.uid
) t
where t.uid = words.uid;
Online example: https://rextester.com/LZUYC57184
(note that the display of arrays is a bit weird in that example)

Select all rows that have at least a list of features with wildcard support

given a table definition:
Objects:
obj_id | obj_name
-------|--------------
1 | object1
2 | object2
3 | object3
Tags:
tag_id | tag_name
-------|--------------
1 | code:python
2 | code:cpp
3 | color:green
4 | colorful
5 | image
objects_tags:
obj_id | tag_id
-------|---------
1 | 1
1 | 2
2 | 1
2 | 3
3 | 1
3 | 2
3 | 3
I'd like to select objects that contain all tags from given list with wildcards. Similar question has been asked several times and answer to simpler variant looks more or less like this:
SELECT obj_id,count(*) c FROM objects_tags
INNER JOIN objects USING(obj_id)
INNER JOIN tags USING(tag_id)
WHERE (name GLOB 'code*' OR name GLOB 'color*')
GROUP BY obj_id
HAVING (c==2)
However this solution doesn't work with wildcards. Is it possible to create similar query that would return objects that for each given wildcard query returned at least 1 tag? Checking if c>=2 doesn't work because one wildcard tag can return multiple results while another may return 0 still passing query even though it shouldn't.
I considered builting dynamic query built by client software that would consist of N INTERSECTs (one per tag) because there's probably not going to be many of them but it sounds like really dirty solution and if there's any more SQL way then I'd prefer to use it.
SQLite supports WITH clause so I would try to use it to determine all tags first, and then use these tags to find objects in the below way.
The example (demo) is made for PostGreSQL because I could not run SQLIte on any online tester, but I belive you will convert it easily to SQLite:
this query retrieves all tags:
WITH tagss AS (
SELECT * FROM Tags
WHERE tag_name LIKE 'code:%' OR tag_name LIKE 'color:%'
)
SELECT * FROM tagss;
| tag_id | tag_name |
|--------|-------------|
| 1 | code:python |
| 2 | code:cpp |
| 3 | color:green |
and the final query uses the above subquery in this way:
WITH tagss AS (
SELECT * FROM Tags
WHERE tag_name LIKE 'code:%' OR tag_name LIKE 'color:%'
)
SELECT obj_id,count(*) c
FROM objects_tags
INNER JOIN tagss USING(tag_id)
WHERE tag_name IN ( SELECT tag_name FROM tagss)
GROUP BY obj_id
HAVING count(*) >= (
SELECT count(*) FROM tagss
)
| obj_id | c |
|--------|---|
| 3 | 3 |

Counting SQLite rows that might match multiple times in a single query

I have a SQLite table which has a column containing categories that each row may fall into. Each row has a unique ID, but may fall into zero, one, or more categories, for example:
|-------+-------|
| name | cats |
|-------+-------|
| xyzzy | a b c |
| plugh | b |
| quux | |
| quuux | a c |
|-------+-------|
I'd like to obtain counts of how many items are in each category. In other words, output like this:
|------------+-------|
| categories | total |
|------------+-------|
| a | 2 |
| b | 2 |
| c | 2 |
| none | 1 |
|------------+-------|
I tried to use the case statement like this:
select case
when cats like "%a%" then 'a'
when cats like "%b%" then 'b'
when cats like "%c%" then 'c'
else 'none'
end as categories,
count(*)
from test
group by categories
But the problem is this only counts each row once, so it can't handle multiple categories. You then get this output instead:
|------------+-------|
| categories | total |
|------------+-------|
| a | 2 |
| b | 1 |
| none | 1 |
|------------+-------|
One possibility is to use as many union statements as you have categories:
select case
when cats like "%a%" then 'a'
end as categories, count(*)
from test
group by categories
union
select case
when cats like "%b%" then 'b'
end as categories, count(*)
from test
group by categories
union
...
but this seems really ugly and the opposite of DRY.
Is there a better way?
Fix your data structure! You should have a table with one row per name and per category:
create table nameCategories (
name varchar(255),
category varchar(255)
);
Then your query would be easy:
select category, count(*)
from namecategories
group by category;
Why is your data structure bad? Here are some reasons:
A column should contain a single value.
SQL has pretty lousy string functionality.
SQL queries to do what you want cannot be optimized.
SQL has a great data structure for storing lists. It is called a table, not a string.
With that in mind, here is one brute force method for doing what you want:
with categories as (
select 'a' as category union all
select 'b' union all
. . .
)
select c.category, count(t.category)
from categories c left join
test t
on ' ' || t.categories || ' ' like '% ' || c.category || ' %'
group by c.category;
If you already have a table of valid categories, then the CTE is not needed.