Duplicate JSON element in row_to_json() with self-join

Duplicate JSON element in row_to_json() with self-join - sql

This is a follow-up to this excellent Q&A: 13227142.
I almost have to do the same thing (with the constraint of PostgreSQL 9.2) but I'm using only one table. Therefore the query uses a self-join (in order to produce the correct JSON format) which results in a duplicate id field. How can I avoid this?
Example:
CREATE TABLE books
(
id serial primary key,
isbn text,
author text,
title text,
edition text,
teaser text
);
SELECT row_to_json(row)
FROM
(
SELECT id AS bookid,
author,
cover
FROM books
INNER JOIN
(
SELECT id, title, edition, teaser
FROM books
) cover(id, title, edition, teaser)
USING (id)
) row;
Result:
{
"bookid": 1,
"author": "Bjarne Stroustrup",
"cover": {
"id": 1,
"title": "Design and Evolution of C++",
"edition": "1st edition",
"teaser": "This book focuses on the principles, processes and decisions made during the development of the C++ programming language"
}
}
I want to get rid of "id" in "cover".

This turned out to be a tricky task. As far as I can see it's impossible to achieve with a simple query.
One solution is to use a predefined data type:
CREATE TYPE bookcovertype AS (title text, edition text, teaser text);
SELECT row_to_json(row)
FROM
(
SELECT books.id AS bookid, books.author,
row_to_json(row(books.title, books.edition, books.teaser)::bookcovertype) as cover
FROM books
) row;

you need id to join, so without id you can't make such short query. You need to struct it. Smth like:
select row_to_json(row,true)
FROM
(
with a as (select id,isbn,author,row_to_json((title,edition,teaser)) r from books
)
select a.id AS bookid,a.author, concat('{"title":',r->'f1',',"edition":',r->'f2',',"teaser":',r->'f3','}')::json as cover
from a
) row;
row_to_json
--------------------------------------------------------
{"bookid":1, +
"author":"\"b\"", +
"cover":{"title":"c","edition":"d","teaser":"\"b\""}}
(1 row)
Also without join you use twice as less resources

For the sake of completeness I've stumbled upon another answer myself: The additional fields can be eliminated by string functions. However, I prefer AlexM's anwer because it will be faster and is still compatible with PostgreSQL 9.2.
SELECT regexp_replace(
(
SELECT row_to_json(row)
FROM
(
SELECT id AS bookid,
author,
cover
FROM books
INNER JOIN
(
SELECT id, title, edition, teaser
FROM books
) cover(id, title, edition, teaser)
USING (id)
) row
)::text,
'"id":\d+,',
'')

Related

Using one SELECT instead of two to serve a side-loaded API request?

Let's say I create two tables using the following SQL,
such that post has many comment:
CREATE TABLE IF NOT EXISTS post (
id SERIAL PRIMARY KEY,
title VARCHAR NOT NULL,
text VARCHAR NOT NULL
)
CREATE TABLE IF NOT EXISTS comment (
id SERIAL PRIMARY KEY,
text VARCHAR NOT NULL,
post_id SERIAL REFERENCES post (id)
)
I would like to be able to query these tables so as to serve a response that
looks like this:
{
"post" : [
{ id: 100,
title: "foo",
text: "foo foo",
comment: [1000,1001,1002] },
{ id: 101,
title: "bar",
text: "bar bar",
comment: [1003] }
],
"comment": [
{ id: 1000,
text: "bla blah foo",
post: 100 },
{ id: 1001,
text: "bla foo foo",
post: 100 },
{ id: 1002,
text: "foo foo foo",
post: 100 },
{ id: 1003,
text: "bla blah bar",
post: 101 },
]
}
Doing this naively would involve to SELECT statements,
the first along the lines of
SELECT DISTINCT ON(post.id), post.title, post.text, comment.id
FROM post, comment
WHERE post.id = comment.post_id
... and the second something along the lines of
SELECT DISTINCT ON(comment.id), comment.text, post.id
FROM post, comment
WHERE post.id = comment.post_id
However, I cannot help but think that there is a way to do this involving
only one SELECT statement - is this possible?
Notes:
I am using Postgres, but I do not require a Postgres-specific solution. Any standard SQL solution should do.
The queries above are illustrative only, they do not give we exactly what is necessary at the moment.
It looks like what the naive solution here does is perform the same join on the same two tables, just doing a distinct on a different table each time. This definitely leaves room for improvement.
It appears that ActiveModel Serializers in Rails already do this - if someone familair with them would like to chime in how they work under the hood, that would be great.

You need two queries to get the form you laid out:
SELECT p.id, p.title, p.text, array_agg(c.id) AS comments
FROM post p
JOIN comment c ON c.post_id = p.id
WHERE p.id = ???
GROUP BY p.id;
Or faster, if you really want to retrieve all or most of your posts:
SELECT p.id, p.title, p.text, c.comments
FROM post p
JOIN (
SELECT post_id, array_agg(c.id) AS comments
FROM comment
GROUP BY 1
) c ON c.post_id = p.id
GROUP BY 1;
Plus:
SELECT id, text, post_id
FROM comment
WHERE post_id = ??;
Single query
SQL can only send one result type per query. For a single query, you would have to combine both tables, listing columns for post redundantly. That conflicts with the desired response in your question. You have to give up one of the two conflicting requirements.
SELECT p.id, p.title, p.text AS p_text, c.id, c.text AS c_text
FROM post p
JOIN comment c ON c.post_id = p.id
WHERE p.id = ???
Aside: The column comment.post_id should be integer, not serial! Also, column names are probably just for a quick show case. You wouldn't use the non-descriptive text as column name, which also conflicts with a basic data type.
Compare this related case:
Foreign key of serial type - ensure always populated manually

However, I cannot help but think that there is a way to do this involving only one SELECT statement - is this possible?
Technically: yes. If you really want your data in json anyway, you could use PostgreSQL (9.2+) to generate it with the json functions, like:
SELECT row_to_json(sq)
FROM (
SELECT array_to_json(ARRAY(
SELECT row_to_json(p)
FROM (
SELECT *, ARRAY(SELECT id FROM comment WHERE post_id = post.id) AS comment
FROM post
) AS p
)) AS post,
array_to_json(ARRAY(
SELECT row_to_json(comment)
FROM comment
)) AS comment
) sq;
But I'm not sure it's worth it -- usually not a good idea to dump all your data without limit / pagination.
SQLFiddle

What is the equivalent PostgreSQL syntax to Oracle's CONNECT BY ... START WITH?

In Oracle, if I have a table defined as …
CREATE TABLE taxonomy
(
key NUMBER(11) NOT NULL CONSTRAINT taxPkey PRIMARY KEY,
value VARCHAR2(255),
taxHier NUMBER(11)
);
ALTER TABLE
taxonomy
ADD CONSTRAINT
taxTaxFkey
FOREIGN KEY
(taxHier)
REFERENCES
tax(key);
With these values …
key value taxHier
0 zero null
1 one 0
2 two 0
3 three 0
4 four 1
5 five 2
6 six 2
This query syntax …
SELECT
value
FROM
taxonomy
CONNECT BY
PRIOR key = taxHier
START WITH
key = 0;
Will yield …
zero
one
four
two
five
six
three
How is this done in PostgreSQL?

Use a RECURSIVE CTE in Postgres:
WITH RECURSIVE cte AS (
SELECT key, value, 1 AS level
FROM taxonomy
WHERE key = 0
UNION ALL
SELECT t.key, t.value, c.level + 1
FROM cte c
JOIN taxonomy t ON t.taxHier = c.key
)
SELECT value
FROM cte
ORDER BY level;
Details and links to documentation in my previous answer:
Does PostgreSQL have a pseudo-column like "LEVEL" in Oracle?
Or you can install the additional module tablefunc which provides the function connectby() doing almost the same. See Stradas' answer for details.

Postgres does have an equivalent to the connect by. You will need to enable the module. Its turned off by default.
It is called tablefunc. It supports some cool crosstab functionality as well as the familiar "connect by" and "Start With". I have found it works much more eloquently and logically than the recursive CTE. If you can't get this turned on by your DBA, you should go for the way Erwin is doing it.
It is robust enough to do the "bill of materials" type query as well.
Tablefunc can be turned on by running this command:
CREATE EXTENSION tablefunc;
Here is the list of connection fields freshly lifted from the official documentation.
Parameter: Description
relname: Name of the source relation (table)
keyid_fld: Name of the key field
parent_keyid_fld: Name of the parent-key field
orderby_fld: Name of the field to order siblings by (optional)
start_with: Key value of the row to start at
max_depth: Maximum depth to descend to, or zero for unlimited depth
branch_delim: String to separate keys with in branch output (optional)
You really should take a look at the docs page. It is well written and it will give you the options you are used to. (On the doc page scroll down, its near the bottom.)
Postgreql "Connect by" extension
Below is the description of what putting that structure together should be like. There is a ton of potential so I won't do it justice, but here is a snip of it to give you an idea.
connectby(text relname, text keyid_fld, text parent_keyid_fld
[, text orderby_fld ], text start_with, int max_depth
[, text branch_delim ])
A real query will look like this. Connectby_tree is the name of the table. The line that starting with "AS" is how you name the columns. It does look a little upside down.
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
AS t(keyid text, parent_keyid text, level int, branch text, pos int);

As indicated by Stradas I report the query:
SELECT value
FROM connectby('taxonomy', 'key', 'taxHier', '0', 0, '~')
AS t(keyid numeric, parent_keyid numeric, level int, branch text)
inner join taxonomy t on t.key = keyid;

For example, we have a table in PostgreSQL, its name is product_types. Our table columns are (id, parent_id, name, sort_order).
Our first selection should give (parent) a root line.
id = 76 will be our sql's top 1 parent record.
with recursive product_types as (
select
pt0.id,
pt0.parant_id,
pt0.name,
pt0.sort_order,
0 AS level
from product_types pt0
where pt0.id = 76
UNION ALL
select
pt1.id,
pt1.parant_id,
pt1.name,
pt1.sort_order, (product_types.level + 1) as level
from product_types pt1
inner join product_types on (pt1.parant_id = product_types.id )
)
select
*
from product_types
order by level, sort_order

Populate Temp Table Postgres

I have the following three tables in the postgres db of my django app:
publication {
id
title
}
tag {
id
title
}
publication_tags{
id
publication_id
tag_id
}
Where tag and publication have a many to many relationship.
I'd like to make a temp table with three columns: 1)publication title, 2)publication id, and 3)tags, where tags is a list (in the form of a string if possible) of all the tags on a given publication.
Thus far I have made the temp table and populated it with the publication id and publication title, but I don't know how to get the tags into it. This is what I have so far:
CREATE TEMP TABLE pubtags (pub_id INTEGER, pub_title VARCHAR(50), pub_tags VARCHAR(50))
INSERT INTO pubtags(pub_id, pub_title) SELECT id, title FROM apricot_app_publication
Can anyone advise me on how I would go about the last step?

Sounds like a job for string_agg:
string_agg(expression, delimiter)
input values concatenated into a string, separated by delimiter
So something like this should do the trick:
insert into pubtags (pub_id, pub_title, pub_tags)
select p.id, p.title, string_agg(t.title, ' ,')
from publication p
join publication_tags pt on (p.id = pt.publication_id)
join tag on (pt.tag_id = t.id)
group by p.id, p.title
You may want to adjust the delimiter, I guessed that a comma would make sense.
I'd recommend using TEXT instead of VARCHAR for your pub_tags so that you don't have to worry about the string aggregation overflowing the pub_tags length. Actually, I'd recommend using TEXT instead of VARCHAR period: PostgreSQL will treat them both the same except for wasting time on length checks with VARCHAR so VARCHAR is pointless unless you have a specific need for a limited length.
Also, if you don't specifically need pub_tags to be a string, you could use an array instead:
CREATE TEMP TABLE pubtags (
pub_id INTEGER,
pub_title TEXT,
pub_tags TEXT[]
)
and array_agg instead of string_agg:
insert into pubtags (pub_id, pub_title, pub_tags)
select p.id, p.title, array_agg(t.title)
-- as above...
Using an array will make it a lot easier to unpack the tags if you need to.

PostgreSQL query involving integer[]

I have 2 tables:
CREATE TABLE article (
id serial NOT NULL,
title text,
tags integer[] -- array of tag id's from TAG table
)
CREATE TABLE tag (
id serial NOT NULL,
description character varying(250) NOT NULL
)
... and need to select tags from TAG table held in ARTICLE's 'tags integer[]' based on article's title.
So tried something like
SELECT *
FROM tag
WHERE tag.id IN ( (select article.tags::int4
from article
where article.title = 'some title' ) );
... which gives me
ERROR: cannot cast type integer[] to integer
LINE 1: ...FROM tag WHERE tag.id IN ( (select article.tags::int4 from ...
I am Stuck with PostgreSql 8.3 in both dev and production environment.

Use the array overlaps operator &&:
SELECT *
FROM tag
WHERE ARRAY[id] && ANY (SELECT tags FROM article WHERE title = '...');
Using contrib/intarray you can even index this sort of thing quite well.

Take a look at section "8.14.5. Searching in Arrays", but consider the tip at the end of that section:
Tip: Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.

You did not mention your Postgres version, so I assume you are using an up-to-date version (8.4, 9.0)
This should work then:
SELECT *
FROM tag
WHERE tag.id IN ( select unnest(tags)
from article
where title = 'some title' );
But you should really consider changing your table design.
Edit
For 8.3 the unnest() function can easily be added, see this wiki page:
http://wiki.postgresql.org/wiki/Array_Unnest

CTE vs. IN clause performance

Alright SQL Server Gurus, fire up your analyzers.
I have a list of titles in application memory (250 or so).
I have a database table "books" with greater than a million records, one of the columns of the table is "title" and is of type nvarchar.
the "books" table has another column called "ISBN"
books.title is not a primary key, is not unique, but is indexed.
So I'd like to know which is more efficient:
WITH titles AS (select 'Catcher and the Rye' as Title
union all 'Harry Potter ...'
...
union all 'The World Is Flat')
select ISBN from books, titles where books.title = titles.title;
OR:
select ISBN from books where title in ('Catcher and the Rye','Harry Potter',...,'The World Is Flat');
OR:
???

I hope you have ISBN includes on the title index to avoid key lookups
CREATE INDEX IX_Titles ON dbo.Books (Title) INCLUDE (ISBN)
Now, the IN vs JOIN vs EXISTs is a common question here. The CTE is irrelevant except for readability. Personally, I'd use exists because you'll get duplicates with JOIN for books with the same title, which folk often forget.
;WITH titles AS (select 'Catcher and the Rye' as Title
union all 'Harry Potter ...'
...
union all 'The World Is Flat')
SELECT
ISBN
FROM
books
WHERE
EXISTS (SELECT * --or null or = all the same
FROM
titles
WHERE
titles .Title = books.Title)
However, one construct I'd consider is this to force "intermediate materialisation" on my list of search titles. The also applies to an exists or CTE solution too. This is likely to help the optimiser considerably.
Edit: a temp table is a better option, really, as Steve mentioned in his comment
SELECT
ISBN
FROM
(
SELECT TOP 2000000000
Title
FROM
(select 'Catcher and the Rye' as Title
union all 'Harry Potter ...'
...
union all 'The World Is Flat'
) foo
ORDER BY
Title
) bar
JOIN
books On bar.Title = books.Title
SELECT
ISBN
FROM
books
WHERE
EXISTS (SELECT * --or null or = all the same
FROM
(
SELECT TOP 2000000000
Title
FROM
(select 'Catcher and the Rye' as Title
union all 'Harry Potter ...'
...
union all 'The World Is Flat'
) foo
ORDER BY
Title
) bar
WHERE
bar.Title = books.Title)

Given the choice of the two options, avoid IN clauses, as the number of items within the list goes up the query plan will alter and very quickly convert from a potential Seek to a Scan.
The normal tipping point (and I double checked on the adventure works) is that on the 65th item, it changes plan to a scan from a seek.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas