MySQL Join from multiple options to select one value - sql

I am putting together a nice little database for adding values to options, all these are setup through a map (Has and Belongs to Many) table, because many options are pointing to a single value.
So I am trying to specify 3 option.ids and a single id in a value table - four integers to point to a single value. Three tables. And I am running into a problem with the WHERE part of the statement, because if multiple values share an option there are many results. And I need just a single result.
SELECT value.id, value.name FROM value
LEFT JOIN (option_map_value, option_table)
ON (value.id = option_map_value.value_id AND option_map_value.option_table_id = option_table.id)
WHERE option_table.id IN (5, 2, 3) AND value.y_axis_id = 16;
The problem with the statement seems to be the IN on the WHERE clause. If one of the numbers are different in the IN() part, then there are multiple results - which is not good.
I have tried DISTINCT, which again works if there is one result, but returns many if there is many. The closest we have gotten to is adding a count - to return to value with the most options at the top.
So is there a way to do the WHERE to be more specific. I cannot break it out into option_table.id = 5 AND option_table.id = 2 - because that one fails. But can the WHERE clause be more specifc?
Maybe it is me being pedantic, but I would like to be able to return just the single result, instead of a count of results... Any ideas?

The problem with the statement seems to be the IN on the WHERE clause. If one of the numbers are different in the IN() part, then there are multiple results - which is not good. I have tried DISTINCT, which again works if there is one result, but returns many if there is many. The closest we have gotten to is adding a count - to return to value with the most options at the top.
You were very close, considering the DISTINCT:
SELECT v.id,
v.name
FROM VALUE v
LEFT JOIN OPTION_MAP_VALUE omv ON omv.value_id = v.id
LEFT JOIN OPTION_TABLE ot ON ot.id = omv.option_table_id
WHERE ot.id IN (5, 2, 3)
AND v.y_axis_id = 16
GROUP BY v.id, v.name
HAVING COUNT(*) = 3
You were on the right track, but needed to use GROUP BY instead in order to be able to use the HAVING clause to count the DISTINCT list of values.
Caveat emptor:
The GROUP BY/HAVING COUNT version of the query is dependent on your data model having a composite key, unique or primary, defined for the two columns involved (value_id and option_table_id). If this is not in place, the database will not stop duplicates being added. If duplicate rows are possible in the data, this version can return false positives because a value_id could have 3 associations to the option_table_id 5 - which would satisfy the HAVING COUNT(*) = 3.
Using JOINs:
A safer, though more involved, approach is to join onto the table that can have multiple options, as often as you have criteria:
SELECT v.id,
v.name
FROM VALUE v
JOIN OPTION_MAP_VALUE omv ON omv.value_id = v.id
JOIN OPTION_TABLE ot5 ON ot5.id = omv.option_table_id
AND ot5.id = 5
JOIN OPTION_TABLE ot2 ON ot2.id = omv.option_table_id
AND ot2.id = 2
JOIN OPTION_TABLE ot3 ON ot3.id = omv.option_table_id
AND ot3.id = 3
WHERE v.y_axis_id = 16
GROUP BY v.id, v.name

Related

Filter by disjunction and conjunction in n:m relationships

I have 4 tables like this (simplified to relevant columns):
thing:
id::numeric, some_info::text
mapping:
thing_id::numeric, common_property_id::numeric
common_properties:
id::numeric, some_info::text
children_of_thing
id::numeric, thing_id::numeric, other_stuff::text
common_properties are properties that can be attached to things, any number of properties can be attached to any number of things (n:m).
children_of_thing are children of a thing, any thing can have any number of children (1:n).
Now I want to count all children of things that have specific properties applied. My query looks like this:
SELECT COUNT(DISTINCT children_of_thing.id) as count
, thing.id as thing_id
FROM children_of_thing
INNER JOIN thing ON children_of_thing.thing_id = thing.id
LEFT JOIN (mapping
INNER JOIN common_properties ON mapping.common_property_id = common_properties.id
) ON thing.id = mapping.thing_id
GROUP BY thing.id
WHERE ...
Now my question is: how can I select only things that have (pseudocode ahead as it does obviously NOT work like that)
common_property.some_info = 'foo' AND (
common_property.some_info = 'bar' OR
common_property.some_info = 'baz'
)
I know I can query for just the inner part ('bar' OR 'baz') just fine as-is - the DISTINCT on my select takes care that I only count everything once even when two entries for the same children are returned when both conditions are true.
I can also query if ALL properties (foo ANDbar AND baz) are true by querying for (foo OR bar OR baz) and adding HAVING COUNT(DISTINCT common_property.id) = 3 (this would only get results where all three conditions yield one result each).
But how can I do both, ANDand OR?
Additional notes: The database layout is fixed, I cannot change it with reasonalbe work. It's an already runnign system. Plus, the layout makes total sense for our application.
On top of this, speed is critical, so I'm looking for a query as optimized as possible - I would want to avoid having a separate subquery for each Conjunction (even though that is a solution, of course).
--
Below is my actual Query, that incorporates even another intermediate table (thing_variants - every thing is of one of the variants in thing_variants, 1:n) and filters on the children. I have originally left all that out of the questoin as it complicates everything, but if it helps understand the entire situation, I have no need to hide it.
SELECT COUNT (DISTINCT children.id) as children_count
, things.id as things_id,
, things.meta_info as thing_info
, thing_variants.id as thing_variant_id
, thing_variants.meta_info as thing_variant_info
FROM children
INNER JOIN things ON children.thing_id = things.id
INNER JOIN thing_variants ON things.thing_variant_id = thing_variant.id
LEFT JOIN (mapping
INNER JOIN properties ON mapping.property_id = properties.id
) ON thing_variant.id = mapping.thing_variant_id
WHERE (children.something IS FALSE)
AND (children.another_thing IS TRUE)
AND (children.foobarbaz IS NULL)
GROUP BY things.id

COUNT is outputting more than one row

I am having a problem with my SQL query using the count function.
When I don't have an inner join, it counts 55 rows. When I add the inner join into my query, it adds a lot to it. It suddenly became 102 rows.
Here is my SQL Query:
SELECT COUNT([fmsStage].[dbo].[File].[FILENUMBER])
FROM [fmsStage].[dbo].[File]
INNER JOIN [fmsStage].[dbo].[Container]
ON [fmsStage].[dbo].[File].[FILENUMBER] = [fmsStage].[dbo].[Container].[FILENUMBER]
WHERE [fmsStage].[dbo].[File].[RELATIONCODE] = 'SHIP02'
AND [fmsStage].[dbo].[Container].DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08'
GROUP BY [fmsStage].[dbo].[File].[FILENUMBER]
Also, I have to do TOP 1 at the SELECT statement because it returns 51 rows with random numbers inside of them. (They are probably not random, but I can't figure out what they are.)
What do I have to do to make it just count the rows from [fmsStage].[dbo].[file].[FILENUMBER]?
First, your query would be much clearer like this:
SELECT COUNT(f.[FILENUMBER])
FROM [fmsStage].[dbo].[File] f INNER JOIN
[fmsStage].[dbo].[Container] c
ON v.[FILENUMBER] = c.[FILENUMBER]
WHERE f.[RELATIONCODE] = 'SHIP02' AND
c.DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08';
No GROUP BY is necessary. Otherwise you'll just one row per file number, which doesn't seem as useful as the overall count.
Note: You might want COUNT(DISTINCT f.[FILENUMBER]). Your question doesn't provide enough information to make a judgement.
Just remove GROUP BY Clause
SELECT COUNT([fmsStage].[dbo].[File].[FILENUMBER])
FROM [fmsStage].[dbo].[File]
INNER JOIN [fmsStage].[dbo].[Container]
ON [fmsStage].[dbo].[File].[FILENUMBER] = [fmsStage].[dbo].[Container].[FILENUMBER]
WHERE [fmsStage].[dbo].[File].[RELATIONCODE] = 'SHIP02'
AND [fmsStage].[dbo].[Container].DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08'

How can I do a SQL join to get a value 4 tables farther from the value provided?

My title is probably not very clear, so I made a little schema to explain what I'm trying to achieve. The xxxx_uid labels are foreign keys linking two tables.
Goal: Retrieve a column from the grids table by giving a proj_uid value.
I'm not very good with SQL joins and I don't know how to build a single query that will achieve that.
Actually, I'm doing 3 queries to perform the operation:
1) This gives me a res_uid to work with:
select res_uid from results where results.proj_uid = VALUE order by res_uid asc limit 1"
2) This gives me a rec_uid to work with:
select rec_uid from receptor_results
inner join results on results.res_uid = receptor_results.res_uid
where receptor_results.res_uid = res_uid_VALUE order by rec_uid asc limit 1
3) Get the grid column I want from the grids table:
select grid_name from grids
inner join receptors on receptors.grid_uid = grids.grid_uid
where receptors.rec_uid = rec_uid_VALUE;
Is it possible to perform a single SQL that will give me the same results the 3 I'm actually doing ?
You're not limited to one JOIN in a query:
select grids.grid_name
from grids
inner join receptors
on receptors.grid_uid = grids.grid_uid
inner join receptor_results
on receptor_results.rec_uid = receptors.rec_uid
inner join results
on results.res_uid = receptor_results.res_uid
where results.proj_uid = VALUE;
select g.grid_name
from results r
join resceptor_results rr on r.res_uid = rr.res_uid
join receptors rec on rec.rec_uid = rr.rec_uid
join grids g on g.grid_uid = rec.grid_uid
where r.proj_uid = VALUE
a small note about names, typically in sql the table is named for a single item not the group. thus "result" not "results" and "receptor" not "receptors" etc. As you work with sql this will make sense and names like you have will seem strange. Also, one less character to type!

The "where" condition worked not as expected ("or" issue)

I have a problem to join thoses 4 tables
Model of my database
I want to count the number of reservations with different sorts (user [mrbs_users.id], room [mrbs_room.room_id], area [mrbs_area.area_id]).
Howewer when I execute this query (for the user (id=1) )
SELECT count(*)
FROM mrbs_users JOIN mrbs_entry ON mrbs_users.name=mrbs_entry.create_by
JOIN mrbs_room ON mrbs_entry.room_id = mrbs_room.id
JOIN mrbs_area ON mrbs_room.area_id = mrbs_area.id
WHERE mrbs_entry.start_time BETWEEN "145811700" and "1463985000"
or
mrbs_entry.end_time BETWEEN "1458120600" and "1463992200" and mrbs_users.id = 1
The result is the total number of reservations of every user, not just the user who has the id = 1.
So if anyone could help me.. Thanks in advance.
Use parentheses in the where clause whenever you have more than one condition. Your where is parsed as:
WHERE (mrbs_entry.start_time BETWEEN "145811700" and "1463985000" ) or
(mrbs_entry.end_time BETWEEN "1458120600" and "1463992200" and
mrbs_users.id = 1
)
Presumably, you intend:
WHERE (mrbs_entry.start_time BETWEEN 145811700 and 1463985000 or
mrbs_entry.end_time BETWEEN 1458120600 and 1463992200
) and
mrbs_users.id = 1
Also, I removed the quotes around the string constants. It is bad practice to mix data types, and in some databases, the conversion between types can make the query less efficient.
The problem you've faced caused by the incorrect condition WHERE.
So, should be:
WHERE (mrbs_entry.start_time BETWEEN 145811700 AND 1463985000 )
OR
(mrbs_entry.end_time BETWEEN 1458120600 AND 1463992200 AND mrbs_users.id = 1)
Moreover, when you use only INNER JOIN (JOIN) then it be better to avoid WHERE clause, because the ON clause is executed before the WHERE clause, so criteria there would perform faster.
Your query in this case should be like this:
SELECT COUNT(*)
FROM mrbs_users
JOIN mrbs_entry ON mrbs_users.name=mrbs_entry.create_by
JOIN mrbs_room ON mrbs_entry.room_id = mrbs_room.id
AND
(mrbs_entry.start_time BETWEEN 145811700 AND 1463985000
OR ( mrbs_entry.end_time BETWEEN 1458120600 AND 1463992200 AND mrbs_users.id = 1)
)
JOIN mrbs_area ON mrbs_room.area_id = mrbs_area.id

Select first or random row in group by

I have this query using PostgreSQL 9.1 (9.2 as soon as our hosting platform upgrades):
SELECT
media_files.album,
media_files.artist,
ARRAY_AGG (media_files. ID) AS media_file_ids
FROM
media_files
INNER JOIN playlist_media_files ON media_files.id = playlist_media_files.media_file_id
WHERE
playlist_media_files.playlist_id = 1
GROUP BY
media_files.album,
media_files.artist
ORDER BY
media_files.album ASC
and it's working fine, the goal was to extract album/artist combinations and in the result set have an array of media files ids for that particular combo.
The problem is that I have another column in media files, which is artwork.
artwork is unique for each media file (even in the same album) but in the result set I need to return just the first of the set.
So, for an album that has 10 media files, I also have 10 corresponding artworks, but I would like just to return the first (or a random picked one for that collection).
Is that possible to do with only SQL/Window Functions (first_value over..)?
Yes, it's possible. First, let's tweak your query by adding alias and explicit column qualifiers so it's clear what comes from where - assuming I've guessed correctly, since I can't be sure without table definitions:
SELECT
mf.album,
mf.artist,
ARRAY_AGG (mf.id) AS media_file_ids
FROM
"media_files" mf
INNER JOIN "playlist_media_files" pmf ON mf.id = pmf.media_file_id
WHERE
pmf.playlist_id = 1
GROUP BY
mf.album,
mf.artist
ORDER BY
mf.album ASC
Now you can either use a subquery in the SELECT list or maybe use DISTINCT ON, though it looks like any solution based on DISTINCT ON will be so convoluted as not to be worth it.
What you really want is something like an pick_arbitrary_value_agg aggregate that just picks the first value it sees and throws the rest away. There is no such aggregate and it isn't really worth implementing it for the job. You could use min(artwork) or max(artwork) and you may find that this actually performs better than the later solutions.
To use a subquery, leave the ORDER BY as it is and add the following as an extra column in your SELECT list:
(SELECT mf2.artwork
FROM media_files mf2
WHERE mf2.artist = mf.artist
AND mf2.album = mf.album
LIMIT 1) AS picked_artwork
You can at a performance cost randomize the selected artwork by adding ORDER BY random() before the LIMIT 1 above.
Alternately, here's a quick and dirty way to implement selection of a random row in-line:
(array_agg(artwork))[width_bucket(random(),0,1,count(artwork)::integer)]
Since there's no sample data I can't test these modifications. Let me know if there's an issue.
"First" pick
Wouldn't it be simpler / cheaper to just use min():
SELECT m.album
,m.artist
,array_agg(m.id) AS media_file_ids
,min(m.artwork) AS artwork
FROM playlist_media_files p
JOIN media_files m ON m.id = p.media_file_id
WHERE p.playlist_id = 1
GROUP BY m.album, m.artist
ORDER BY m.album, m.artist;
Abitrary / random pick
If you are looking for a random selection, #Craig already provided a solution with truly random picks.
You could also use a CTE to avoid additional scans on the (possibly big) base table and then run two separate (cheap) subqueries on the small result set.
For arbitrary selection - not truly random, the result will depend on the physical order of rows in the table and implementation-specifics:
WITH x AS (
SELECT m.album, m.artist, m.id, m.artwork
FROM playlist_media_files p
JOIN media_files m ON m.id = p.media_file_id
)
SELECT a.album, a.artist, a.media_file_ids, b.artwork
FROM (
SELECT album, artist, array_agg(id) AS media_file_ids
FROM x
) a
JOIN (
SELECT DISTINCT ON (1,2) album, artist, artwork
FROM x
) b USING (album, artist);
For truly random results, you can add an ORDER BY .. random() like this to subquery b:
JOIN (
SELECT DISTINCT ON (1, 2) album, artist, artwork
FROM x
ORDER BY 1, 2, random()
) b USING (album, artist);