Postgresql: Assembling subsets of columns and subquery results into a JSON document - sql

(I'm playing around with using Postgresql 9.3 to do some of the lifting require to assemble a JSON data structure.)
Given the following schema:
person
id integer,
name text,
age integer
job
id references person,
title text
is it possible to use Postgresql's JSON functions to return something like
| id | personalia | jobs |
|----|----------------------------|----------------------------------------------|
| 1 | {"name": "kim", "age": 55} | [{"title": "Plumber"}, {"title": "manager"}] |
i.e. to select a subset of columns and even do a sub query/join to produce an array based on data from another table that matches some criteria (here: person.id = job.id).
Reading through the Postgresql JSON documentation, I see the building blocks are there, but I don't see how to do more advanced stuff like the above scenario – possibly because I lack the SQL know-how.

If using Postgres >= 9.4 this can be done using json_build_object and json_agg:
SELECT
p.id,
json_build_object(
'name', p.name,
'age', p.age
) AS personalia,
json_agg(
json_build_object(
'title', j.title
)
) AS jobs
FROM person p
LEFT JOIN job j USING (id)
GROUP BY p.id;

select
id,
format('{"name": %s, "age", %s}', to_json(name), to_json(age))::json as personalia,
array_to_json(array_agg(title)) as jobs
from
person p
left join
(
select id, format('{"title": %s}', to_json(title))::json as title
from job
)
job j using (id)
group by id, name, age

Related

Create a JSON object from parent -> child relationship without duplication

I want to query a database, get ALL of the user's data, and send it to my front end in a JSON object (with many layers of nesting).
e.g.
{
user_id: 1,
username: james,
messages: [
{
message_id: 'fewfef',
message: 'lorum ipsum'
... : {
...
}
}
]
}
Sample schema/data:
--user table (parent)
CREATE TABLE userdata (
user_id integer,
username text
);
INSERT INTO userdata VALUES (1, 'james');
-- messages table (child) connected to user table
CREATE TABLE messages(
message_id integer,
fk_messages_userdata integer,
message text
);
INSERT INTO messages VALUES (1, 1, 'hello');
INSERT INTO messages VALUES (2, 1, 'lorum ipsum');
INSERT INTO messages VALUES (3, 1, 'test123');
-- querying all data at once
SELECT u.username, m.message_id, m.message FROM userdata u
INNER JOIN messages m
ON u.user_id = m.fk_messages_userdata
WHERE u.user_id = '1';
This outputs the data as so:
username|message_id|message |
--------+----------+-----------+
james | 1|hello |
james | 2|lorum ipsum|
james | 3|test123 |
The issue is I have the username is repeated for each message. For larger databases and more layers of nesting this would cause a lot of useless data being queried/sent.
Is it better to do one query to get all of this data and send it to the backend, or make a seperate query for each table, and only get the data I want?
For example I could run these queries:
-- getting only user metadata
SELECT username from userdata WHERE user_id = '1';
-- output
username|
--------+
james |
-- getting only user's messages
SELECT m.message_id, m.message as message_id FROM userdata u
INNER JOIN messages m
ON u.user_id = m.fk_messages_userdata
WHERE u.user_id = '1';
--output
message_id|message_id |
----------+-----------+
1|hello |
2|lorum ipsum|
3|test123 |
This way I get only the data I need, and its a little easier to work with, as it comes to the backed more organized. But is there a disadvantage of running separate queries instead of one big one? Are there any other ways to do this?
Is it better to do one query to get all of this data and send it to the backend, or make a seperate query for each table, and only get the data I want?
It's best to run only one query and get only the data you want. As long as it doesn't get too complicated - which it doesn't IMO:
SELECT to_json(usr)
FROM (
SELECT u.user_id, u.username
, (SELECT json_agg(msg) -- aggregation in correlated subquery
FROM (
SELECT m.message_id, m.message
FROM messages m
WHERE m.fk_messages_userdata = u.user_id
) msg
) AS messages
FROM userdata u
WHERE u.user_id = 1 -- provide user_id here once!
) usr;
fiddle
There are many other ways.
A (LEFT) JOIN LATERAL instead of the correlated subquery. See:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
json_build_object() instead of converting whole rows from subselects. See:
Return multiple columns of the same row as JSON array of objects
LEFT JOIN query with JSON object array aggregate
But this version above should be shortest and fastest.
Related:
What are the pros and cons of performing calculations in sql vs. in your application

How can I use the LIKE operator on a map type in hiveql?

I want to select two columns out of students:
id_test int
number map<string,string>
I tried followed command with the LIKE Operator:
SELECT id_test ,number FROM students WHERE id_test = 123456 AND number LIKE %MOBILE%;
And get this error:
FAILED: SemanticException [Error 10014]: Line 1:82 Wrong arguments ''%MOBILE%'': No
matching method for class org.apache.hadoop.hive.ql.udf.UDFLike with
(map<string,string>, string). Possible choices: _FUNC_(string, string)
Code for reproduction:
CREATE TABLE students(id_test INT, number MAP<STRING, STRING>) ROW FORMAT DELIMITED FIELDS TERMINATED by
'|' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':';
INSERT INTO TABLE students SELECT 123434, map('MOBILE','918-555-1162') FROM existingtable LIMIT 1;
INSERT INTO TABLE students SELECT 245678, map('WORK','806-555-4722') FROM existingtable LIMIT 1;
Explode map, then you can filter keys using LIKE. If you want to get single row per id_test, number even if there are many keys satisfying LIKE condition, use GROUP BY or DISTINCT.
Demo:
with students as (--one record has many MOBILE* keys in the map
SELECT 123434 id_test , map('MOBILE','918-555-1162', 'OFFICE', '123456', 'MOBILE2', '5678') number union all
SELECT 245678, map('WORK','806-555-4722')
)
select s.id_test, s.number
from students s
lateral view explode(number) n as key,value
where n.key like '%MOBILE%'
group by s.id_test, s.number
Result:
123434 {"MOBILE":"918-555-1162","MOBILE2":"5678","OFFICE":"123456"}
If you know the key exactly 'MOBILE' then better to filter like this: where number['MOBILE'] is not null, without explode.
select *
from students
where concat_ws(' ',map_keys(number)) like '%MOBILE%'
The following is posted here just that you could see how the expression used with LIKE looks like:
select concat_ws(' ',map_keys(number))
from students
+-----------------------+
| _c0 |
+-----------------------+
| MOBILE MOBILE2 OFFICE |
| WORK |
+-----------------------+

How to use other table JSON data in select query

I am using postgres and have 2 tables, deviceTble has the following columns: deviceName, device_id, type, deviceOwnerPerson_id, deviceAccessPerson_id.
The other table is Person_kv and has 2 columns id,data (containing person info but in JSON format).
I want to a select query from deviceTble and want to use first_name and last_name of a person which are in Person_kv table by given of deviceOwnerPersonId and deviceAccessPersonId.
Here is what I have to get data from person_kv table to get data in tabular form:
select data :: json ->> 'id' as id
, data :: json ->> 'name' as first_name
, data :: json ->> 'surename' as last_name
from Person_kv
and expected deviceTble query:
select deviceName,device_id,type from deviceTble
I am confused either I use WITH clause on person_kv query and then join here or one by one on deviceOwnerPerson_id and deviceAccessPerson_id OR is there any other way as well by using inner query
Can someone tell me how I can get required result?
from you description you can just join em:
select deviceName,device_id,type, p.data:: json ->>'name' , p.data:: json ->>'surname'
from deviceTble d
join Person_kv p on p.data:: json ->>'id' = deviceOwnerPerson_id::text OR p.data:: json ->>'id' = deviceAccessPerson_id::text

Selecting distinct rows based on values from left table

Using Postgres. Here's my scenario:
I have three different tables. One is a title table. The second is a genre table. The third table is used to join the two. When I designed the database, I expected that each title would have one top level genre. After filling it with data, I discovered that there were titles that had two, sometimes, three top level genres.
I wrote a query that retrieves titles and their top level genres. This obviously requires that I join the two tables. For those that only have one top level genre, there is one record. For those that have more, there are multiple records.
I realize I'll probably have to write a custom function of some kind that will handle this for me, but I thought I'd ask if it's possible to do this without doing so just to make sure I'm not missing anything.
Is it possible to write a query that will allow me to select all of the distinct titles regardless of the number of genres that it has, but also include the genre? Or even better, a query that would give me a comma delimited string of genres when there are multiples?
Thanks in advance!
Sounds like a job for array_agg to me. With tables like this:
create table t (id int not null, title varchar not null);
create table g (id int not null, name varchar not null);
create table tg (t int not null, g int not null);
You could do something like this:
SELECT t.title, array_agg(g.name)
FROM t, tg, g
WHERE t.id = tg.t
AND tg.g = g.id
GROUP BY t.title, t.id
to get:
title | array_agg
-------+-----------------------
one | {g-one,g-two,g-three}
three | {g-three}
two | {g-two}
Then just unpack the arrays as needed. If for some reason you really want a comma delimited string instead of an array, then string_agg is your friend:
SELECT t.title, string_agg(g.name, ',')
FROM t, tg, g
WHERE t.id = tg.t
AND tg.g = g.id
GROUP BY t.title, t.id
and you'll get something like this:
title | string_agg
-------+---------------------
one | g-one,g-two,g-three
three | g-three
two | g-two
I'd go with the array approach so that you wouldn't have to worry about reserving a character for the delimiter or having to escape (and then unescape) the delimiter while aggregating.
Have a look at this thread which might answer your question.

What the simplest way to sub-query a variable number of rows into fields of the parent query?

What the simplest way to sub-query a variable number of rows into fields of the parent query?
PeopleTBL
NameID int - unique
Name varchar
Data: 1,joe
2,frank
3,sam
HobbyTBL
HobbyID int - unique
HobbyName varchar
Data: 1,skiing
2,swimming
HobbiesTBL
NameID int
HobbyID int
Data: 1,1
2,1
2,2
The app defines 0-2 Hobbies per NameID.
What the simplest way to query the Hobbies into fields retrieved with "Select * from PeopleTBL"
Result desired based on above data:
NameID Name Hobby1 Hobby2
1 joe skiing
2 frank skiing swimming
3 sam
I'm not sure if I understand correctly, but if you want to fetch all the hobbies for a person in one row, the following query might be useful (MySQL):
SELECT NameID, Name, GROUP_CONCAT(HobbyName) AS Hobbies
FROM PeopleTBL
JOIN HobbiesTBL USING NameID
JOIN HobbyTBL USING HobbyID
Hobbies column will contain all hobbies of a person separated by ,.
See documentation for GROUP_CONCAT for details.
I don't know what engine are you using, so I've provided an example with MySQL (I don't know what other sql engines support this).
Select P.NameId, P.Name
, Min( Case When H2.HobbyId = 1 Then H.HobbyName End ) As Hobby1
, Min( Case When H2.HobbyId = 2 Then H.HobbyName End ) As Hobby2
From HobbyTbl As H
Join HobbiesTbl As H2
On H2.HobbyId = H.HobbyId
Join PeopleTbl As P
On P.NameId = H2.NameId
Group By P.NameId, P.Name
What you are seeking is called a crosstab query. As long as the columns are static, you can use the above solution. However, if you want to dynamic build the columns, you need to build the SQL statement in middle-tier code or use a reporting tool.