SQL multiple join on many to many tables + comma separation - sql

I have these tables:
media
id (int primary key)
uri (varchar).
media_to_people
media_id (int primary key)
people_id (int primary key)
people
id (int primary key)
name (varchar)
role (int) -- role specifies whether the person is an artist, publisher, writer, actor,
etc relative to the media and has range(1-10)
This is a many to many relation
I want to fetch a media and all its associated people in a select. So if a media has 10 people associated with it, all 10 must come.
Further more, if multiple people with the same role exist for a given media, they must come as comma separated values under a column for that role.
Result headings must look like: media.id, media.uri, people.name(actor), people.name(artist), people.name(publisher) and so on.
I'm using sqlite.

SQLite doesn't have the "pivot" functionality you'd need for starters, and the "comma separated values" part is definitely a presentation issue that it would be absurd (and possibly unfeasible) to try to push into any database layer, whatever dialect of SQL may be involved -- it's definitely a part of the job you'd do in the client, e.g. a reporting facility or programming language.
Use SQL for data access, and leave presentation to other layers.
How you get your data is
SELECT media.id, media.uri, people.name, people.role
FROM media
JOIN media_to_people ON (media.id = media_to_people.media_id)
JOIN people ON (media_to_people.people_id = people.id)
WHERE media.id = ?
ORDER BY people.role, people.name
(the ? is one way to indicate a parameter in SQLite, to be bound to the specific media id you're looking for in ways that depend on your client); the data will come from the DB to your client code in several rows, and your client code can easily put them into the single column form that you want.
It's hard for us to say how to code the client-side part w/o knowing anything about the environment or language you're using as the client. But in Python for example:
def showit(dataset):
by_role = collections.defaultdict(list)
for mediaid, mediauri, name, role in dataset:
by_role[role].append(name)
headers = ['mediaid', 'mediauri']
result = [mediaid, mediauri]
for role in sorted(by_role):
headers.append('people(%s)' % role)
result.append(','.join(by_role[role]))
return ' '.join(headers) + '\n' + ' '.join(result)
even this won't quite match your spec -- you ask for headers such as 'people(artist)' while you also specify that the role's encoded as an int, and mention no way to go from the int to the string 'artist', so it's obviously impossible to match your spec exactly... but it's as close as my ingenuity can get;-).

I agree with Alex Martelli's answer, that you should get the data in multiple rows and do some processing in your application.
If you try to do this with just joins, you need to join to the people table for each role type, and if there are multiple people in each role, your query will have Cartesian products between these roles.
So you need to do this with GROUP_CONCAT() and produce a scalar subquery in your select-list for each role:
SELECT m.id, m.uri,
(SELECT GROUP_CONCAT(name)
FROM media_to_people JOIN people ON (people_id = id)
WHERE media_id = m.id AND role = 1) AS Actors,
(SELECT GROUP_CONCAT(name)
FROM media_to_people JOIN people ON (people_id = id)
WHERE media_id = m.id AND role = 2) AS Artists,
(SELECT GROUP_CONCAT(name)
FROM media_to_people JOIN people ON (people_id = id)
WHERE media_id = m.id AND role = 3) AS Publishers
FROM media m;
This is truly ugly! Don't try this at home!
Take our advice, and don't try to format the pivot table using only SQL.

Related

How can I remove the repetitive joins from this SELECT query?

I have a postgres database with two tables: services and meta.
The first table stores the "core" information the application needs, and the app also has a "custom field" feature implemented similar to how Wordpress's wp_post_meta table works.
Users can add on meta rows with arbitrary keys and values, in a one-to-many relationship with the service.
The schema of the meta table is:
id
key (string)
value (string)
service_id (foreign key)
That works great for the app, so I'm not interested in changing the schema, but for some infrequently used admin dashboards I need to get back a list of services with several of the meta rows joined on as columns.
Here's what I have so far:
SELECT
services.*,
meta1.value AS funding,
meta2.value AS ownership
FROM services
JOIN meta meta1
ON services.id = meta.service_id
AND meta.key = 'Funding'
JOIN meta meta2
ON services.id = meta2.service_id
AND meta2.key = 'Ownership'
Now, this works great, but I have to do another join every time I want to add another meta value.
That seems like it will slow down the query and make it less readable.
Is there a good way to refactor this to keep it easy to read and fast to run?
here's an attempted refactor using OR, which doesn't work:
SELECT
*,
meta.value AS funding,
meta.value AS ownership
FROM services
JOIN meta
ON services.id = meta.service_id
AND meta.key = 'Funding' OR meta.key = 'Ownership'
One way would be to aggregate the key/value pairs into a JSON value with a derived table:
select srv.*,
mv.vals ->> 'Funding' as funding,
mv.vals ->> 'Ownership' as ownership
from services srv
cross join lateral (
select jsonb_object_agg(m.key, m.value) as vals
from meta m
where m.key in ('Funding', 'Ownership')
and m.service_id = srv.id
) as mv
If your application can handle the JSON, then maybe the conversion into two separate columns isn't actually necessary which would avoid the repetition of the keys.
You can use conditional aggregation:
SELECT s.*, m.funding, m.ownership
FROM services s JOIN
(SELECT m.service_id,
MAX(value) FILTER (WHERE key = 'Funding') as Funding,
MAX(value) FILTER (WHERE key = 'Ownership') as Ownership
FROM meta m
GROUP BY m.service_id
) m
ON m.service_id = s.id

select items where id is contained in another table field

I have the following database schema in sqlite3:
Basically, a member has multiple characters. A character plays in an activity (with a mode type) and has results for that activity (character_activity_stats)
I select all of the stats (activity / character_activity_stats) for a specific character and mode like so:
SELECT
*,
activity.mode as activity_mode,
character_activity_stats.id as character_activity_stats_index
FROM
character_activity_stats
INNER JOIN
activity ON character_activity_stats.activity = activity.id,
modes ON modes.activity = activity.id
WHERE
modes.mode = 5 AND
character_activity_stats.character = 1
This works great.
However, now I want to select the same set of data, but by member (basically combine results for all characters for a member).
However, I am not really sure how to even approach this.
Basically, I need to retrieve all character_activity_stats where character_activity_stats.character is a character of the specified member (by id). Any suggestions or pointers? (I am very new to sql).
Join those 3 tables on the right keys:
select *
from character_activity_stats
join character on character_activity_stats.character = character.id
join member on member.id = character.member
where member.id = ?
If you don't need any data from member other than limit by id, then you leave that join off and just do character.member = ? instead.
It's much easier if you use the same name for the primary and foreign keys (i.e. don't use id for the primary key). It also allows you use natural joins so you don't even need to give the join conditions. For the primary key to convention is usually _id. You id and _in in most of the tables, so I don't what is that is about.

SQL query join different tables based on value in a given column?

Well I am designing a domain model and data mapper system for my site. The system has different kind of user domain objects, with the base class for users with child classes for admins, mods and banned users. Every domain object uses data from a table called 'users', while the child classes have an additional table to store admin/mod/banned information. The user type is determined by a column in the table 'users' called 'userlevel', its value is 3 for admins, 2 for mods, 1 for registered users and -1 for banned users.
Now it comes a problem when I work on a members list feature for my site, since the members list is supposed to load all users from the database(with pagination, but lets not worry about this now). The issue is that I want to load the data from both the base user table and additional admin/mod/banned table. As you see, the registered users do not have additional table to store extra data, while for admin/mod/banned users the table is different. Moreover, the columns in these tables are also different.
So How am I supposed to handle this situation using SQL queries? I know I can simply just select from the base user table and then use multiple queries to load additional data if the user level is found to be a given value, but this is a bad idea since it will results in n+1 queries for n admins/mods/banned users, a very expensive trip to database. What else am I supposed to do? Please help.
If you want to query all usertypes with one query you will have to have the columns from all tables in your result-table, several of them filled with null-values.
To get them filled with data use a left-join like this:
SELECT *
FROM userdata u
LEFT OUTER JOIN admindata a
ON ( u.userid = a.userid
AND u.usertype = 3 )
LEFT OUTER JOIN moddata m
ON ( u.userid = m.userid
AND u.usertype = 2 )
LEFT OUTER JOIN banneddata b
ON ( u.userid = b.userid
AND u.usertype = -1 )
WHERE...
You could probably drop the usertype-condition, since there should only be data in one of the joined tables, but you never know...
Then your program-code will have the job to pick the correct columns based on the usertype.
P.S.: Not that select * is only for sake of simplicity, in real code better list all of the column-names...
While is totally fine having this hierarchy in your domain classes, I would suggest changing the approach in your database. Otherwise your queries are going to be very complex and slow.
You can have just another table called e.g. users_additional_info with the mix of the columns that you need for all your user types. Then you can do
SELECT * FROM users
LEFT JOIN users_additional_info ON users.id = users_additional_info.user_id
to get all the information in a single simple query. Believe me or not, this approach will save you a lots of headaches in the future when your tables start to grow or you decide to add another type of user.

Best Practice to querying a Lookup table

I am trying to figure out a way to query a property feature lookup table.
I have a property table that contains rental property information (address, rent, deposit, # of bedrooms, etc.) along with another table (Property_Feature) that represents the features of this property (pool, air conditioning, laundry on-site, etc.). The features themselves are defined in yet another table labeled Feature.
Property
pid - primary key
other property details
Feature
fid - primary key
name
value
Property_Feature
id - primary key
pid - foreign key (Property)
fid - foreign key (Feature)
Let say someone wants to search for property that has air conditioning, and a pool and laundry on-site. How do you query the Property_Feature table for multiple features for the same property if each row only represents one feature? What would the SQL query look like? Is this possible? Is there a better solution?
Thanks for the help and insight.
In terms of database design, yours is the right way to do it. It's correctly normalized.
For the query, I would simply use exists, like this:
select * from Property
where
exists (select * from Property_Feature where pid = property.pid and fid = 'key_air_conditioning')
and
exists (select * from Property_Feature where pid = property.pid and fid = 'key_pool')
Where key_air_conditioning and key_pool are obviously the keys for those features.
The performance will be OK even for large databases.
Here's the query that will find all the properties with a pool:
select
p.*
from
property p
inner join property_feature pf on
p.pid = pf.pid
inner join feature f on
pf.fid = f.fid
where
f.name = 'Pool'
I use inner joins instead of EXISTS since it tends to be a bit faster.
You can also do something like this:
SELECT *
FROM Property p
WHERE 3 =
( SELECT COUNT(*)
FROM Property_Feature pf
, Feature f
WHERE pf.pid = p.pid
AND pf.fid = f.fid
AND f.name in ('air conditioning', 'pool', 'laundry on-site')
);
Obviously, if your front end is capturing the fids of the feature items when the user is selecting them, you can dispense with the join to Feature and constrain directly on fid. Your front end would know what the count of features selected was, so determining the value for "3" above is trivial.
Compare it, performance wise, to the tekBlues construction above; depending on your data distribution, either one of these might be the faster query.

complex sql query help needed

I'm not sure how to write this query in SQL. there are two tables
**GroupRecords**
Id (int, primary key)
Name (nvarchar)
SchoolYear (datetime)
RecordDate (datetime)
IsUpdate (bit)
**People**
Id (int, primary key)
GroupRecordsId (int, foreign key to GroupRecords.Id)
Name (nvarchar)
Bio (nvarchar)
Location (nvarchar)
return a distinct list of people who belong to GroupRecords that have a SchoolYear of '2000'. In the returned list, people.name should be unique (no duplicate People.Name), in case of a duplication only the person who belong to the GroupRecords with the later RecordDate should be returned.
It would probably be better to write a stored procedure for this right?
This is untested, but it should do what is required in the question.
It selects all details about the person.
The subquery will make it match only the latest RecordDate for a single name. It will also look only in the right GroupRecord because of the Match between the ids.
SELECT
People.Id,
People.GroupRecordsId,
People.Name,
People.Group,
People.Bio,
People.Location
FROM
People
INNER JOIN GroupRecords ON GroupRecords.Id = People.GroupRecordsId
WHERE
GroupRecords.SchoolYear = '2000/1/1' AND
GroupRecords.RecordDate = (
SELECT
MAX(GR2.RecordDate)
FROM
People AS P2
INNER JOIN GroupRecords AS GR2 ON P2.GroupRecordsId = GR2.Id
WHERE
P2.Name = People.Name AND
GR2.Id = GroupRecords.Id
)
Select Distinct ID
From People
Where GroupRecordsID In
(Select Id From GroupRecords
Where SchoolYear = '2000/1/1')
This will produce a distinct list of those individuals in the 2000 class...
but I don't understand what you're getting at with the cpmment about duplicates... please elaborate...
It reads as though you're talking about when two different people happen to have the same name you don't want them both listed... Is that really what you want?
MySQL specific:
SELECT *
FROM `People`
LEFT JOIN `GroupRecords` ON `GroupRecordsId` = `GroupRecords`.`Id`
GROUP BY `People`.`Name`
ORDER BY `GroupRecords`.`RecordDate` DESC
WHERE `GroupRecords`.`SchoolYear` = '2000/1/1'
people.name should be unique (no duplicate People.Name)
? Surely you mean no duplicate People.ID?
in case of a duplication only the person who belong to the GroupRecords with the later RecordDate should be returned.
There's the rub — that's the bit that it's not obvious how to do in plain SQL. There are a number of approaches to the “For each X, select the row Y with maximum/minimum Z” question; which work and which perform better depend on which database software you're using.
http://kristiannielsen.livejournal.com/6745.html has some good discussion of some of the usual techniques for attacking this (in the context of MySQL, but widely applicable).