Create a JSON object from parent -> child relationship without duplication - sql

I want to query a database, get ALL of the user's data, and send it to my front end in a JSON object (with many layers of nesting).
e.g.
{
user_id: 1,
username: james,
messages: [
{
message_id: 'fewfef',
message: 'lorum ipsum'
... : {
...
}
}
]
}
Sample schema/data:
--user table (parent)
CREATE TABLE userdata (
user_id integer,
username text
);
INSERT INTO userdata VALUES (1, 'james');
-- messages table (child) connected to user table
CREATE TABLE messages(
message_id integer,
fk_messages_userdata integer,
message text
);
INSERT INTO messages VALUES (1, 1, 'hello');
INSERT INTO messages VALUES (2, 1, 'lorum ipsum');
INSERT INTO messages VALUES (3, 1, 'test123');
-- querying all data at once
SELECT u.username, m.message_id, m.message FROM userdata u
INNER JOIN messages m
ON u.user_id = m.fk_messages_userdata
WHERE u.user_id = '1';
This outputs the data as so:
username|message_id|message |
--------+----------+-----------+
james | 1|hello |
james | 2|lorum ipsum|
james | 3|test123 |
The issue is I have the username is repeated for each message. For larger databases and more layers of nesting this would cause a lot of useless data being queried/sent.
Is it better to do one query to get all of this data and send it to the backend, or make a seperate query for each table, and only get the data I want?
For example I could run these queries:
-- getting only user metadata
SELECT username from userdata WHERE user_id = '1';
-- output
username|
--------+
james |
-- getting only user's messages
SELECT m.message_id, m.message as message_id FROM userdata u
INNER JOIN messages m
ON u.user_id = m.fk_messages_userdata
WHERE u.user_id = '1';
--output
message_id|message_id |
----------+-----------+
1|hello |
2|lorum ipsum|
3|test123 |
This way I get only the data I need, and its a little easier to work with, as it comes to the backed more organized. But is there a disadvantage of running separate queries instead of one big one? Are there any other ways to do this?

Is it better to do one query to get all of this data and send it to the backend, or make a seperate query for each table, and only get the data I want?
It's best to run only one query and get only the data you want. As long as it doesn't get too complicated - which it doesn't IMO:
SELECT to_json(usr)
FROM (
SELECT u.user_id, u.username
, (SELECT json_agg(msg) -- aggregation in correlated subquery
FROM (
SELECT m.message_id, m.message
FROM messages m
WHERE m.fk_messages_userdata = u.user_id
) msg
) AS messages
FROM userdata u
WHERE u.user_id = 1 -- provide user_id here once!
) usr;
fiddle
There are many other ways.
A (LEFT) JOIN LATERAL instead of the correlated subquery. See:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
json_build_object() instead of converting whole rows from subselects. See:
Return multiple columns of the same row as JSON array of objects
LEFT JOIN query with JSON object array aggregate
But this version above should be shortest and fastest.
Related:
What are the pros and cons of performing calculations in sql vs. in your application

Related

Query table using relid instead of the table name

I'm using node-postgres to return a joined table to the front end of my React app through an express server. Here is the query...
SELECT channels.name as channels, programmes.title as title, series.number as series, episodes.episode_number as episode
​FROM programmes
​INNER JOIN programme_channels ON programme_channels.programme_id = programmes.programme_id
​INNER JOIN channels ON programme_channels.channel_id = channels.channel_id
​INNER JOIN series ON programmes.programme_id = series.programme_id
​INNER JOIN episodes ON series.series_id = episodes.series_id
This works as needed, however I'd like for front-end users to be able to update or delete columns of the table. To do this, each cell of my table would need to know the origin of its data. Currently the query I have returns a table like this...
channel | title | series | episode
--------------+------------+---------------+---------
Some Channel | Some title | 1 | 1
Where channel is from the channels, title, series and episode are all from different tables. For a user to update or delete this data, they will need the origins of each column for the query.
The node-postgres query returns some more information which may be helpful for this in the form of a fields array...
fields: [
Field {
name: 'title',
tableID: 16554,
columnID: 2,
dataTypeID: 1043,
dataTypeSize: -1,
dataTypeModifier: 104,
format: 'text'
},
...]
and I can return a table with the original table name of a column using this query...
SELECT relname
FROM pg_catalog.pg_statio_user_tables
WHERE relid = '16554'
result...
relname
----------
programmes
however I'm not sure how to use the results of this to query the table 'programmes'. This is where I've hit a wall. My questions are...
Am I going about this the right way, or is there an easier way to update data returned from a joined table?
If so, is there any way I can SELECT a table by either the relid or the result of a query.

How To Populate A Gridview Dynamically Using Multiple Queries

I have two tables, UserInfo (PrimaryKey= UID) and Relationships (PrimaryKey= RID, ForeignKey= UID).
Relationships data example:
| UID | RID |
| 2 | 3 |
| 3 | 4 |
Now, after executing an sql query, that performs some sorting on Relationships table like:
"select RID from Relationships where UID= "value"
UNION select UID from Relationships where RID= "value "
if the value = 3, i get the integer values such 2 , 4 (which means 3 has relationships with 2 and 4)
if value = 4, result = 3 (4 only has relationship with 3)
etc these resulting values are PrimaryKey values of the UserInfo table.
What i want to do is to fetch the Information of Users having the UID = 23,25,34 (These values are the result of first SQL query, so the number of values and values changes every time according to the values passed to first SQL query)
and bind it with an asp Gridview.
for this, i think i'll have to execute multiple sql queries like
select * from UserInfo where UID= 2
select * from UserInfo where UID= 4
etc
using a loop.
What i think i should do is to loop through the results of first sql query, execute the second query according to it and save the resulting recordset in another table or any sort of datasource, which is finally bind to the gridview... but i don't know how to implement it..
this is my first question in StackOverflow.. i'll try my best to further clarrify the problem if necessory..
any help would be greatly appreciatable..! :)
You can use a SQL join
select a.RID from Friendships a inner join Relationships b on a.uid=b.uid
where a.UID= "value" & b.RID="value"
order by a.uid
This will give you all records which exist in both tables A & B.More on SQL Joins
There is no need to loop as you can execute the query in a single SQL statement like
select * from UserInfo where UID IN (23, 25, 34);
On a short note, select * from table is usually a bad practise & you might want to replace it with your column names

PostgreSQL where all in array

What is the easiest and fastest way to achieve a clause where all elements in an array must be matched - not only one when using IN? After all it should behave like mongodb's $all.
Thinking about group conversations where conversation_users is a join table between conversation_id and user_id I have something like this in mind:
WHERE (conversations_users.user_id ALL IN (1,2))
UPDATE 16.07.12
Adding more info about schema and case:
The join-table is rather simple:
Table "public.conversations_users"
Column | Type | Modifiers | Storage | Description
-----------------+---------+-----------+---------+-------------
conversation_id | integer | | plain |
user_id | integer | | plain |
A conversation has many users and a user belongs to many conversations. In order to find all users in a conversation I am using this join table.
In the end I am trying to figure out a ruby on rails scope that find's me a conversation depending on it's participants - e.g.:
scope :between, ->(*users) {
joins(:users).where('conversations_users.user_id all in (?)', users.map(&:id))
}
UPDATE 23.07.12
My question is about finding an exact match of people. Therefore:
Conversation between (1,2,3) won't match if querying for (1,2)
Assuming the join table follows good practice and has a unique compound key defined, i.e. a constraint to prevent duplicate rows, then something like the following simple query should do.
select conversation_id from conversations_users where user_id in (1, 2)
group by conversation_id having count(*) = 2
It's important to note that the number 2 at the end is the length of the list of user_ids. That obviously needs to change if the user_id list changes length. If you can't assume your join table doesn't contain duplicates, change "count(*)" to "count(distinct user_id)" at some possible cost in performance.
This query finds all conversations that include all the specified users even if the conversation also includes additional users.
If you want only conversations with exactly the specified set of users, one approach is to use a nested subquery in the where clause as below. Note, first and last lines are the same as the original query, only the middle two lines are new.
select conversation_id from conversations_users where user_id in (1, 2)
and conversation_id not in
(select conversation_id from conversations_users where user_id not in (1,2))
group by conversation_id having count(*) = 2
Equivalently, you can use a set difference operator if your database supports it. Here is an example in Oracle syntax. (For Postgres or DB2, change the keyword "minus" to "except.)
select conversation_id from conversations_users where user_id in (1, 2)
group by conversation_id having count(*) = 2
minus
select conversation_id from conversations_users where user_id not in (1,2)
A good query optimizer should treat the last two variations identically, but check with your particular database to be sure. For example, the Oracle 11GR2 query plan sorts the two sets of conversation ids before applying the minus operator, but skips the sort step for the last query. So either query plan could be faster depending on multiple factors such as the number of rows, cores, cache, indices etc.
I'm collapsing those users into an array. I'm also using a CTE (the thing in the WITH clause) to make this more readable.
=> select * from conversations_users ;
conversation_id | user_id
-----------------+---------
1 | 1
1 | 2
2 | 1
2 | 3
3 | 1
3 | 2
(6 rows)
=> WITH users_on_conversation AS (
SELECT conversation_id, array_agg(user_id) as users
FROM conversations_users
WHERE user_id in (1, 2) --filter here for performance
GROUP BY conversation_id
)
SELECT * FROM users_on_conversation
WHERE users #> array[1, 2];
conversation_id | users
-----------------+-------
1 | {1,2}
3 | {1,2}
(2 rows)
EDIT (Some resources)
array functions: http://www.postgresql.org/docs/9.1/static/functions-array.html
CTEs: http://www.postgresql.org/docs/9.1/static/queries-with.html
This preserves ActiveRecord objects.
In the below example, I want to know the time sheets which are associated with all codes in the array.
codes = [8,9]
Timesheet.joins(:codes).select('count(*) as count, timesheets.*').
where('codes.id': codes).
group('timesheets.id').
having('count(*) = ?', codes.length)
You should have the full ActiveRecord objects to work with. If you want it to be a true scope, you can just use your above example and pass in the results with .pluck(:id).
While #Alex' answer with IN and count() is probably the simplest solution, I expect this PL/pgSQL function to be the faster:
CREATE OR REPLACE FUNCTION f_conversations_among_users(_user_arr int[])
RETURNS SETOF conversations AS
$BODY$
DECLARE
_sql text := '
SELECT c.*
FROM conversations c';
i int;
BEGIN
FOREACH i IN ARRAY _user_arr LOOP
_sql := _sql || '
JOIN conversations_users x' || i || ' USING (conversation_id)';
END LOOP;
_sql := _sql || '
WHERE TRUE';
FOREACH i IN ARRAY _user_arr LOOP
_sql := _sql || '
AND x' || i || '.user_id = ' || i;
END LOOP;
/* uncomment for conversations with exact list of users and no more
_sql := _sql || '
AND NOT EXISTS (
SELECT 1
FROM conversations_users u
WHERE u.conversation_id = c.conversation_id
AND u.user_id <> ALL (_user_arr)
)
*/
-- RAISE NOTICE '%', _sql;
RETURN QUERY EXECUTE _sql;
END;
$BODY$ LANGUAGE plpgsql VOLATILE;
Call:
SELECT * FROM f_conversations_among_users('{1,2}')
The function dynamically builds executes a query of the form:
SELECT c.*
FROM conversations c
JOIN conversations_users x1 USING (conversation_id)
JOIN conversations_users x2 USING (conversation_id)
...
WHERE TRUE
AND x1.user_id = 1
AND x2.user_id = 2
...
This form performed best in an extensive test of queries for relational division.
You could also build the query in your app, but I went by the assumption that you want to use one array parameter. Also, this is probably fastest anyway.
Either query requires an index like the following to be fast:
CREATE INDEX conversations_users_user_id_idx ON conversations_users (user_id);
A multi-column primary (or unique) key on (user_id, conversation_id) is just as well, but one on (conversation_id, user_id) (like you may very well have!) would be inferior. You find a short rationale at the link above, or a comprehensive assessment under this related question on dba.SE
I also assume you have a primary key on conversations.conversation_id.
Can you run a performance test with EXPLAIN ANALYZE on #Alex' query and this function and report your findings?
Note that both solutions find conversations where at least the users in the array take part - including conversations with additional users.
If you want to exclude those, un-comment the additional clause in my function (or add it to any other query).
Tell me if you need more explanation on the features of the function.
create a mapping table with all possible values and use this
select
t1.col from conversations_users as t1
inner join mapping_table as map on t1.user_id=map.user_id
group by
t1.col
having
count(distinct conversations_users.user_id)=
(select count(distinct user_id) from mapping)
select id from conversations where not exists(
select * from conversations_users cu
where cu.conversation_id=conversations.id
and cu.user_id not in(1,2,3)
)
this can easily be made into a rails scope.
I am guessing that you don't really want to start messing with temporary tables.
Your question was unclear as to whether you want conversations with exactly the set of users, or conversations with a superset. The following is for the superset:
with users as (select user_id from users where user_id in (<list>)
),
conv as (select conversation_id, user_id
from conversations_users
where user_id in (<list>)
)
select distinct conversation_id
from users u left outer join
conv c
on u.user_id = c.user_id
where c.conversation_id is not null
For this query to work well, it assumes that you have indexes on user_id in both users and conversations_users.
For the exact set . . .
with users as (select user_id from users where user_id in (<list>)
),
conv as (select conversation_id, user_id
from conversations_users
where user_id in (<list>)
)
select distinct conversation_id
from users u full outer join
conv c
on u.user_id = c.user_id
where c.conversation_id is not null and u.user_id is not null
Based on #Alex Blakemore's answer, the equivalent Rails 4 scope on you Conversation class would be:
# Conversations exactly with users array
scope :by_users, -> (users) {
self.by_any_of_users(users)
.group("conversations.id")
.having("COUNT(*) = ?", users.length) -
joins(:conversations_users)
.where("conversations_users.user_id NOT IN (?)", users)
}
# generates an IN clause
scope :by_any_of_users, -> (users) { joins(:conversations_users).where(conversations_users: { user_id: users }).distinct }
Note you can optimize it instead of doing a Rails - (minus) you could do a .where("NOT IN") but that would be really complex to read.
Based on Alex Blakemore answer
select conversation_id
from conversations_users cu
where user_id in (1, 2)
group by conversation_id
having count(distinct user_id) = 2
I have found an alternative query with the same goal, finding the conversation_id of a conversation that contains user_1 and user_2 (ignoring aditional users)
select *
from conversations_users cu1
where 2 = (
select count(distinct user_id)
from conversations_users cu2
where user_id in (1, 2) and cu1.conversation_id = cu2.conversation_id
)
It is slower according the analysis that postgres perform via explain query statement, and i guess that is true because there is more conditions beign evaluated, at least, for each row of the conversations_users the subquery will get executed as it is correlated subquery. The possitive point with this query is that you aren't grouping, thus you can select aditional fields of the conversations_users table. In some situations (like mine) it could be handy.

Combine query results from one table with the defaults from another

This is a dumbed down version of the real table data, so may look bit silly.
Table 1 (users):
id INT
username TEXT
favourite_food TEXT
food_pref_id INT
Table 2 (food_preferences):
id INT
food_type TEXT
The logic is as follows:
Let's say I have this in my food preference table:
1, 'VEGETARIAN'
and this in the users table:
1, 'John', NULL, 1
2, 'Pete', 'Curry', 1
In which case John defaults to be a vegetarian, but Pete should show up as a person who enjoys curry.
Question, is there any way to combine the query into one select statement, so that it would get the default from the preferences table if the favourite_food column is NULL?
I can obviously do this in application logic, but would be nice just to offload this to SQL, if possible.
DB is SQLite3...
You could use COALESCE(X,Y,...) to select the first item that isn't NULL.
If you combine this with an inner join, you should be able to do what you want.
It should go something like this:
SELECT u.id AS id,
u.username AS username,
COALESCE(u.favorite_food, p.food_type) AS favorite_food,
u.food_pref_id AS food_pref_id
FROM users AS u INNER JOIN food_preferences AS p
ON u.food_pref_id = p.id
I don't have a SQLite database handy to test on, however, so the syntax might not be 100% correct, but it's the gist of it.

Select all items in a table that do not appear in a foreign key of another table

Take for example an application which has users, each of which can be in exactly one group. If we want to SELECT the list of groups which have no members, what would be the correct SQL? I keep feeling like I'm just about to grasp the query, and then it disappears again.
Bonus points - given the alternative senario, where it's a many to many pairing, what is the SQL to identify unused groups?
(if you want concrete field names:)
One-To-Many:
Table 'users': | user_id | group_id |
Table 'groups': | group_id |
Many-To-Many:
Table 'users': | user_id |
Table 'groups': | group_id |
Table 'user-group': | user_id | group_id |
Groups that have no members (for the many-many pairing):
SELECT *
FROM groups g
WHERE NOT EXISTS
(
SELECT 1
FROM users_groups ug
WHERE g.groupid = ug.groupid
);
This Sql will also work in your "first" example as you can substitute "users" for "users_groups" in the sub-query =)
As far as performance is concerned, I know that this query can be quite performant on Sql Server, but I'm not so sure how well MySql likes it..
For the first one, try this:
SELECT * FROM groups
LEFT JOIN users ON (groups.group_id=users.group_id)
WHERE users.user_id IS NULL;
For the second one, try this:
SELECT * FROM groups
LEFT JOIN user-group ON (groups.group_id=user-group.group_id)
WHERE user-group.user_id IS NULL;
SELECT *
FROM groups
WHERE groups.id NOT IN (
SELECT user.group_id
FROM user
)
It will return all group id which not present in user