How can I remove the repetitive joins from this SELECT query? - sql

I have a postgres database with two tables: services and meta.
The first table stores the "core" information the application needs, and the app also has a "custom field" feature implemented similar to how Wordpress's wp_post_meta table works.
Users can add on meta rows with arbitrary keys and values, in a one-to-many relationship with the service.
The schema of the meta table is:
id
key (string)
value (string)
service_id (foreign key)
That works great for the app, so I'm not interested in changing the schema, but for some infrequently used admin dashboards I need to get back a list of services with several of the meta rows joined on as columns.
Here's what I have so far:
SELECT
services.*,
meta1.value AS funding,
meta2.value AS ownership
FROM services
JOIN meta meta1
ON services.id = meta.service_id
AND meta.key = 'Funding'
JOIN meta meta2
ON services.id = meta2.service_id
AND meta2.key = 'Ownership'
Now, this works great, but I have to do another join every time I want to add another meta value.
That seems like it will slow down the query and make it less readable.
Is there a good way to refactor this to keep it easy to read and fast to run?
here's an attempted refactor using OR, which doesn't work:
SELECT
*,
meta.value AS funding,
meta.value AS ownership
FROM services
JOIN meta
ON services.id = meta.service_id
AND meta.key = 'Funding' OR meta.key = 'Ownership'

One way would be to aggregate the key/value pairs into a JSON value with a derived table:
select srv.*,
mv.vals ->> 'Funding' as funding,
mv.vals ->> 'Ownership' as ownership
from services srv
cross join lateral (
select jsonb_object_agg(m.key, m.value) as vals
from meta m
where m.key in ('Funding', 'Ownership')
and m.service_id = srv.id
) as mv
If your application can handle the JSON, then maybe the conversion into two separate columns isn't actually necessary which would avoid the repetition of the keys.

You can use conditional aggregation:
SELECT s.*, m.funding, m.ownership
FROM services s JOIN
(SELECT m.service_id,
MAX(value) FILTER (WHERE key = 'Funding') as Funding,
MAX(value) FILTER (WHERE key = 'Ownership') as Ownership
FROM meta m
GROUP BY m.service_id
) m
ON m.service_id = s.id

Related

select items where id is contained in another table field

I have the following database schema in sqlite3:
Basically, a member has multiple characters. A character plays in an activity (with a mode type) and has results for that activity (character_activity_stats)
I select all of the stats (activity / character_activity_stats) for a specific character and mode like so:
SELECT
*,
activity.mode as activity_mode,
character_activity_stats.id as character_activity_stats_index
FROM
character_activity_stats
INNER JOIN
activity ON character_activity_stats.activity = activity.id,
modes ON modes.activity = activity.id
WHERE
modes.mode = 5 AND
character_activity_stats.character = 1
This works great.
However, now I want to select the same set of data, but by member (basically combine results for all characters for a member).
However, I am not really sure how to even approach this.
Basically, I need to retrieve all character_activity_stats where character_activity_stats.character is a character of the specified member (by id). Any suggestions or pointers? (I am very new to sql).
Join those 3 tables on the right keys:
select *
from character_activity_stats
join character on character_activity_stats.character = character.id
join member on member.id = character.member
where member.id = ?
If you don't need any data from member other than limit by id, then you leave that join off and just do character.member = ? instead.
It's much easier if you use the same name for the primary and foreign keys (i.e. don't use id for the primary key). It also allows you use natural joins so you don't even need to give the join conditions. For the primary key to convention is usually _id. You id and _in in most of the tables, so I don't what is that is about.

SQL query join different tables based on value in a given column?

Well I am designing a domain model and data mapper system for my site. The system has different kind of user domain objects, with the base class for users with child classes for admins, mods and banned users. Every domain object uses data from a table called 'users', while the child classes have an additional table to store admin/mod/banned information. The user type is determined by a column in the table 'users' called 'userlevel', its value is 3 for admins, 2 for mods, 1 for registered users and -1 for banned users.
Now it comes a problem when I work on a members list feature for my site, since the members list is supposed to load all users from the database(with pagination, but lets not worry about this now). The issue is that I want to load the data from both the base user table and additional admin/mod/banned table. As you see, the registered users do not have additional table to store extra data, while for admin/mod/banned users the table is different. Moreover, the columns in these tables are also different.
So How am I supposed to handle this situation using SQL queries? I know I can simply just select from the base user table and then use multiple queries to load additional data if the user level is found to be a given value, but this is a bad idea since it will results in n+1 queries for n admins/mods/banned users, a very expensive trip to database. What else am I supposed to do? Please help.
If you want to query all usertypes with one query you will have to have the columns from all tables in your result-table, several of them filled with null-values.
To get them filled with data use a left-join like this:
SELECT *
FROM userdata u
LEFT OUTER JOIN admindata a
ON ( u.userid = a.userid
AND u.usertype = 3 )
LEFT OUTER JOIN moddata m
ON ( u.userid = m.userid
AND u.usertype = 2 )
LEFT OUTER JOIN banneddata b
ON ( u.userid = b.userid
AND u.usertype = -1 )
WHERE...
You could probably drop the usertype-condition, since there should only be data in one of the joined tables, but you never know...
Then your program-code will have the job to pick the correct columns based on the usertype.
P.S.: Not that select * is only for sake of simplicity, in real code better list all of the column-names...
While is totally fine having this hierarchy in your domain classes, I would suggest changing the approach in your database. Otherwise your queries are going to be very complex and slow.
You can have just another table called e.g. users_additional_info with the mix of the columns that you need for all your user types. Then you can do
SELECT * FROM users
LEFT JOIN users_additional_info ON users.id = users_additional_info.user_id
to get all the information in a single simple query. Believe me or not, this approach will save you a lots of headaches in the future when your tables start to grow or you decide to add another type of user.

Storing Multiple data type

I want to store user metadata setting for user. but the metadata value is multiple datatype, can be integer, string, date, or boolean.
So I came with my own solution
user(user_id(PK), ...)
meta(
meta_id (PK)
, user_id (FK)
, data_type
, meta_name
, ...)
meta_user(
user_id(FK)
, meta_id(FK)
, number_value
, decimal_value
, string_value
, date_value
, time_value
, boolean_value)
But I'm not sure that's the right way to store multiple data type. I hope some one here can help me share their solution.
UPDATE:
User may have many metadata, and user must first register their metadata.
My contribution to this is:
As you should implement names for the metadata, add a metadefinition table:
meta_definition (metadefinition_ID(PK), name (varchar), datatype)
Then modify your meta_user table
meta_user (meta_ID (PK), user_ID(FK), metadefinition_ID(FK))
You have 3 choices (A,B,C) or even more...
Variant A: Keep your design of storing the values for all possible datatypes in one single row resulting in a sparse table.
This is the easiest to implement but the ugliest in terms of 'clean design' (my opinion).
Variant B: Use distinct value tables per data type: Instead of having one sparse table meta_user you could use 6 tables meta_number, meta_decimal, meta_string. Each table has the form:
meta_XXXX (metadata_ID(PK), meta_ID(FK), value)
Imho, this is the cleanest design (but a bit complicated to work with).
Variant C: reduce meta_user to hold three columns (I renamed it to meta_values, as it hold values and not users).
meta_values (metavalue_ID(PK), meta_ID(FK), value (varchar))
Format all values as string/varchar and stuff them into the value column. This is not well designed and a bad idea if you are going to use the values within SQL as you would have to do expensive and complicated casting in order to use the 'real' values.
This is imho the most compact design.
To list all metadata of a specific user, you can use
select u.name,
md.name as 'AttributeName',
md.DataType
from user u
join meta_user mu on u.user_ID = mu.userID
join meta_definition md on md.metadefinition_ID = mu. metadefinition_ID
selecting the values for a given user would be
Variant A:
select u.name,
md.name as 'AttributeName',
mv.* -- show all different data types
from user u
join meta_user mu on u.user_ID = mu.userID
join meta_definition md on md.metadefinition_ID = mu. metadefinition_ID
join meta_value mv on mv.meta_ID = mu.metaID
Disadvantage: When new datatypes are available, you would have to add a column, recompile the query and change your software as well.
select u.name,
md.name as 'AttributeName',
mnum.value as NumericValue,
mdec.value as DecimalValue
...
from user u
join meta_user mu on u.user_ID = mu.userID
join meta_definition md on md.metadefinition_ID = mu. metadefinition_ID
left join meta_numeric mnum on mnum.meta_ID = mu.metaID
left join meta_decimal mdec on mdec.meta_ID = mu.metaID
...
Disadvantage: Slow if many users and attributes are being stored. Needs a new table when a new datatype is being introduced.
Variant C:
select u.name,
md.name as 'AttributeName',
md.DataType -- client needs this to convert to original datatype
mv.value -- appears formatted as string
from user u
join meta_user mu on u.user_ID = mu.userID
join meta_definition md on md.metadefinition_ID = mu. metadefinition_ID
join meta_value mv on mv.meta_ID = mu.metaID
Advantage: Don't have to change the query in case new datatypes are being introduced.
Every data type in PostgreSQL can be cast to text, which is therefore the natural common ground for data or variable type.
I suggest you have a look at the additional module hstore. I quote the manual:
This module implements the hstore data type for storing sets of
key/value pairs within a single PostgreSQL value. This can be useful
in various scenarios, such as rows with many attributes that are
rarely examined, or semi-structured data. Keys and values are simply
text strings.
That's a fast, proven, versatile solution and easy to extend for more attributes.
In addition you could have a table where you register meta-data (like data type and more) for each attribute. You could use this meta-data to check whether the attribute can be cast to its attached type (and passes additional tests stored in the meta-table) in a trigger on INSERT OR UPDATE to maintain integrity.
Be aware that many standard-features of the RDBMS are not easily available for such a storage regime. Basically you run into most of the problems you have with an EAV model ("entity-attribute-value"). You can find very good advice for that under this related question on dba.SE

MS Access Distinct Records in Recordset

So, I once again seem to have an issue with MS Access being finicky, although it seems to also be an issue when trying similar queries in SSMS (SQL Server Management Studio).
I have a collection of tables, loosely defined as follows:
table widget_mfg { id (int), name (nvarchar) }
table widget { id (int), name (nvarchar), mfg_id (int) }
table widget_component { id (int), name (nvarchar), widget_id (int), component_id }
table component { id (int), name (nvarchar), ... } -- There are ~25 columns in this table
What I'd like to do is query the database and get a list of all components that a specific manufacturer uses. I've tried some of these queries:
SELECT c.*, wc.widget_id, w.mfg_id
FROM ((widget_component wc INNER JOIN widget w ON wc.widget_id = w.id)
INNER JOIN widget_manufacturer wm on w.mfg_id = wm.id)
INNER JOIN component c on c.id = wc.component_id
WHERE wm.id = 1
The previous example displays duplicates of any part that is contained in multiple widget_component lists for different widgets.
I've also tried doing:
SELECT DISTINCT c.id, c.name, wc.widget_id, w.mfg_id
FROM component c, widget_component wc, widget w, widget_manufacturer wm
WHERE wm.id=w.mfg_id AND wm.id = 1
This doesn't display anything at all. I was reading about sub-queries, but I do not understand how they work or how they would apply to my current application.
Any assistance in this would be beneficial.
As an aside, I am not very good with either MS Access or SQL in general. I know the basics, but not a lot beyond that.
Edit:
I just tried this code, and it works to get all the component.id's while limiting them to a single entry each. How do I go about using the results of this to get a list of all the rest of the component data (component.*) where the id's from the first part are used to select this data?
SELECT DISTINCT c.part_no
FROM component c, widget w, widget_component wc, widget_manufacturer wm
WHERE(((c.id=wc.component_id AND wc.widget_id=w.id AND w.mfg_id=wm.id AND wm.id=1)))
(P.S. this is probably not the best way to do this, but I am still learning SQL.)
What I'd like to do is query the database and get a list of all
components that a specific manufacturer uses
There are several ways to do this. IN is probably the easiest to write
SELECT c.*
FROM component c
WHERE c.id IN (SELECT c.component_id
FROM widget w
INNER JOIN widget_component c
ON w.id = c.widget_id
WHERE w.mfg_id = 123)
The IN sub query finds all the component ids that a specific manufacturer uses. The outer query then selects any component.id that is that result. It doesn't matter if its in there once or 1000 times it will only get the component record once.
The other ways of doing this are using an EXISTS sub query or using a join to the query (but then you do need to de-dup it)
It sounds like your component -to- widget relationship is one-to-many. Hence the duplicates. (i.e., the same component is used by more than one widget).
Your Select is almost OK --
SELECT c.*, wc.widget_id, w.mfg_id
but the wc.widget_id is causing the duplicates (per the assumption above).
So remove wc.widget_id from the SELECT, or else aggregate it (min, max, count, etc.). Removing is easier. If you agregate, remember to add a group by clause.
Try this:
SELECT DISTINCT c.*, w.mfg_id
Also -- FWIW, it's generally a better practice to use field names, instead of the *

SQL multiple join on many to many tables + comma separation

I have these tables:
media
id (int primary key)
uri (varchar).
media_to_people
media_id (int primary key)
people_id (int primary key)
people
id (int primary key)
name (varchar)
role (int) -- role specifies whether the person is an artist, publisher, writer, actor,
etc relative to the media and has range(1-10)
This is a many to many relation
I want to fetch a media and all its associated people in a select. So if a media has 10 people associated with it, all 10 must come.
Further more, if multiple people with the same role exist for a given media, they must come as comma separated values under a column for that role.
Result headings must look like: media.id, media.uri, people.name(actor), people.name(artist), people.name(publisher) and so on.
I'm using sqlite.
SQLite doesn't have the "pivot" functionality you'd need for starters, and the "comma separated values" part is definitely a presentation issue that it would be absurd (and possibly unfeasible) to try to push into any database layer, whatever dialect of SQL may be involved -- it's definitely a part of the job you'd do in the client, e.g. a reporting facility or programming language.
Use SQL for data access, and leave presentation to other layers.
How you get your data is
SELECT media.id, media.uri, people.name, people.role
FROM media
JOIN media_to_people ON (media.id = media_to_people.media_id)
JOIN people ON (media_to_people.people_id = people.id)
WHERE media.id = ?
ORDER BY people.role, people.name
(the ? is one way to indicate a parameter in SQLite, to be bound to the specific media id you're looking for in ways that depend on your client); the data will come from the DB to your client code in several rows, and your client code can easily put them into the single column form that you want.
It's hard for us to say how to code the client-side part w/o knowing anything about the environment or language you're using as the client. But in Python for example:
def showit(dataset):
by_role = collections.defaultdict(list)
for mediaid, mediauri, name, role in dataset:
by_role[role].append(name)
headers = ['mediaid', 'mediauri']
result = [mediaid, mediauri]
for role in sorted(by_role):
headers.append('people(%s)' % role)
result.append(','.join(by_role[role]))
return ' '.join(headers) + '\n' + ' '.join(result)
even this won't quite match your spec -- you ask for headers such as 'people(artist)' while you also specify that the role's encoded as an int, and mention no way to go from the int to the string 'artist', so it's obviously impossible to match your spec exactly... but it's as close as my ingenuity can get;-).
I agree with Alex Martelli's answer, that you should get the data in multiple rows and do some processing in your application.
If you try to do this with just joins, you need to join to the people table for each role type, and if there are multiple people in each role, your query will have Cartesian products between these roles.
So you need to do this with GROUP_CONCAT() and produce a scalar subquery in your select-list for each role:
SELECT m.id, m.uri,
(SELECT GROUP_CONCAT(name)
FROM media_to_people JOIN people ON (people_id = id)
WHERE media_id = m.id AND role = 1) AS Actors,
(SELECT GROUP_CONCAT(name)
FROM media_to_people JOIN people ON (people_id = id)
WHERE media_id = m.id AND role = 2) AS Artists,
(SELECT GROUP_CONCAT(name)
FROM media_to_people JOIN people ON (people_id = id)
WHERE media_id = m.id AND role = 3) AS Publishers
FROM media m;
This is truly ugly! Don't try this at home!
Take our advice, and don't try to format the pivot table using only SQL.