Mysql many to many query - sql

Having a mental block with going around this query.
I have the following tables:
review_list: has most of the data, but in this case the only important thing is review_id, the id of the record that I am currently interested in (int)
variant_list: model (varchar), enabled (bool)
variant_review: model (varchar), id (int)
variant_review is a many to many table linking the review_id in review_list to the model(s) in variant_list review and contains (eg):
..
test1,22
test2,22
test4,22
test1,23
test2,23... etc
variant_list is a list of all possible models and whether they are enabled and contains (eg):
test1,TRUE
test2,TRUE
test3,TRUE
test4,TRUE
what I am after in mysql is a query that when given a review_id (ie, 22) will return a resultset that will list each value in variant_review.model, and whether it is present for the given review_id such as:
test1,1
test2,1
test3,0
test4,1
or similar, which I can farm off to some webpage with a list of checkboxes for the types. This would show all the models available and whether each one was present in the table

Given a bit more information about the column names:
Select variant_list.model
, Case When variant_review.model Is Not Null Then 1 Else 0 End As HasReview
From variant_list
Left join variant_review
On variant_review.model = variant_list.model
And variant_review.review_id = 22
Just for completeness, if it is the case that you can have multiple rows in the variant_review table with the same model and review_id, then you need to do it differently:
Select variant_list.model
, Case
When Exists (
Select 1
From variant_review As VR
Where VR.model = variant_list.model
And VR.review_id = 22
) Then 1
Else 0
End
From variant_list

Related

SELECT DISTINCT to return at most one row

Given the following db structure:
Regions
id
name
1
EU
2
US
3
SEA
Customers:
id
name
region
1
peter
1
2
henry
1
3
john
2
There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.
There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.
So maybe something like this:
SELECT DISTINCT region FROM customers WHERE id IN (?, ?)
The problem with this is that the result will be either an array (if the customers are not within the same region) or a single value.
Is there are more elegant way of solving this constraint? I was thinking of SELECT INTO and use a temporary table, or I could SELECT COUNT(DISTINCT region) and then do another SELECT for the actual value if the count is less than 2, but I'd like to avoid the performance hit if possible.
There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.
There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.
This query should work:
WITH q AS (
SELECT
COUNT( * ) AS CountCustomers,
COUNT( DISTINCT c.Region ) AS CountDistinctRegions,
-- MIN( c.Region ) AS MinRegion
FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion
FROM
Customers AS c
WHERE
c.CustomerId = $senderCustomerId
OR
c.CustomerId = $receiverCustomerId
)
SELECT
CASE WHEN q.CountCustomers = 2 AND q.CountDistinctRegions = 2 THEN 'OK' ELSE 'BAD' END AS "Status",
CASE WHEN q.CountDistinctRegions = 2 THEN q.MinRegion END AS SingleRegion
FROM
q
The above query will always return a single row with 2 columns: Status and SingleRegion.
SQL doesn't have a "SINGLE( col )" aggregate function (i.e. a function that is NULL unless the aggregation group has a single row), but we can abuse MIN (or MAX) with a CASE WHEN COUNT() in a CTE or derived-table as an equivalent operation.
Alternatively, windowing-functions could be used, but annoyingly they don't work in GROUP BY queries despite being so similar, argh.
Once again, this is the ISO SQL committee's fault, not PostgreSQL's.
As your Region column is UUID you cannot use it with MIN, but I understand it should work with FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion.
As for the columns:
The Status column is either 'OK' or 'BAD' based on those business-constraints you mentioned. You might want to change it to a bit column instead of a textual one, though.
The SingleRegion column will be NOT NULL (with a valid region) if CountDistinctRegions = 2 regardless of CountCustomers, but feel free to change that, just-in-case you still want that info.
For anybody else who's interested in a simple solution, I finally came up with the (kind of obvious) way to do it:
SELECT
r.region
FROM
customers s
INNER JOIN customers r ON
s.region = r.region
WHERE s.id = 'sender_id' and r.id = 'receiver_id';
Huge credit to SELECT DISTINCT to return at most one row who helped me out a lot on this and also posted a viable solution.

BigQuery: Count consecutive string matches between two fields

I have two tables:
Master_Equipment_Index (alias mei) containing the columns serial_num & model_num
Customer Equipment Index (alias cei) containing the columns account_num, serial_num, & model_num
Originally, guard rails were not implemented to require model attribute input in the mei data whenever new serial_num records were inserted. Whenever that serial_num is later associated with a customer account in the cei data, the model data carries over as null.
What I want to do is backfill the missing model attributes in the cei data from the mei data based on the strongest sequential character match from other similar serial_nums in the mei data.
To further clarify, I don't have access to mass update the mei or cei datasets. I can formalize change requests, but I need to build the function out to prove its worth. So this has to be done outside of any mass action query updates.
cei.account_num
cei.serial_num
cei.model
mei.serial_num
mei.model
serial_num_str_match
row_number
123123123
B4I4SXT1708
null
B4I4SXT178A
Model_Series1
8
1
123123123
B4I4SXT1708
null
B4I4SXTAS34
Model_Series2
7
2
In the table example above row_number 1 has a higher consecutive string match count than row_number 2. I want to only return row_number 1 and populate cei.model with mei.model's value.
cei.account_num
cei.serial_num
cei.model
mei.serial_num
mei.model
serial_num_str_match
row_number
123123123
B4I4SXT1708
Model_Series1
B4I4SXT178A
Model_Series1
8
1
To give an idea as to scale:
The mei data contains 1 million records and the cei data contains 50,000 records. I would have to take and perform this string match for every single cei.account_num, cei.serial_num where the cei.model data is null.
With mac addresses, the first 6 characters identify the vendor and I could look at things similarly in the sample SQL below to help reduce the volume of transactional 1:Many lookups taking place:
/* need to define function */
create temp function string_match_function(x any type, y any type) as (
syntax to generate consecutive string count matches between x and y
);
select * from (
select
c.account_num,
c.serial_num,
m.model,
row_number() over(partition by c.account_num, c.serial_num order by serial_num_str_match desc) seq
from (
select
c.account_num,
c.serial_num,
m.model,
needed: string_match_function(c.serial_num, m.serial_num) as serial_num_str_match
from (
select * from cei where model is null
) c
join (
select * from mei where model is not null
) m on substr(c.serial_num,1,6) = substr(m.serial_num,1,6)
) as a
) as b
where seq = 1
I've looked at different options, some coming from https://hoffa.medium.com/new-in-bigquery-persistent-udfs-c9ea4100fd83, but I'm not finding what I need.
Any insight or direction would be greatly appreciated.
This UDF function counts the equal charachters in each string from the begin:
CREATE TEMP FUNCTION string_match_function(x string, y string)
RETURNS int64
LANGUAGE js
AS r"""
var i=0;
var max_len= Math.min(x.length,y.length);
for(i=0;i<max_len;i++){
if(x[i]!=y[i]) {return i;}
}
return i;
""";
select string_match_function("12a345","1234")
gives 2, because both start with 12

Comparing SQL Queries

I'm considering two SQL queries (Oracle) and I shall state the difference between them by showing examples. The queries are as follows:
/* Query 1 */
SELECT DISTINCT countryCode
FROM Member M
WHERE NOT EXISTS(
(SELECT organisation FROM Member
WHERE countryCode = 'US')
MINUS
(SELECT organisation FROM Member
WHERE countryCode = M.countryCode ) )
/* Query 2 */
SELECT DISTINCT M1.countryCode
FROM Member M1, Member M2
WHERE M2.countryCode = 'US' AND M1.organisation = M2.organisation
GROUP BY M1.countryCode
HAVING COUNT(M1.organisation) = (
SELECT COUNT(M3.organisation)
FROM Member M3 WHERE M3.countryCode = 'US' )
As far as I get it, these queries give back the countries which are members of the same organisations as the United States. The scheme of Member is (countryCode, organisation, type) with bold ones as primary key. Example: ('US', 'UN', 'member'). The member table contains only a few tuples and is not complete, so when executing (1) and (2) both yield the same result (e.g. Germany, since here only 'UN' and 'G7' are in the table).
So how can I show that these queries can actually return different results?
That means how can I create an example table instance of Member such that the queries yield different results based on that table instance?
Thanks for your time and effort.
The queries will result all the country codes which are members at least with all the organization the US is member with (it could be member with other organizations as well).
I've finally found an example to show that they can actually output different values based on the same Member instance. This is actually the case when Member contains duplicates. For query 1 this is not a problem, but for query 2 it actually affects the result, since here the number of memberships is crucial. So, if you have e.g. ('FR', 'UN', member) twice in Member the HAVING COUNT(M1.organisation) will return a different value as SELECT COUNT(M3.organisation) and 'FR' would not be part of the output.
Thanks to all for your constructive suggestions, that helped me a lot.
The first query would return countries whose list of memberships is longer than that of the US. It does require they include the same organizations as US but it could be more.
The second one requires the two membership lists to be identical.
As for creating an example with real data, start with an empty table and add this row:
insert into Member (countryCode, organisation)
values ('Elbonia', 'League of Fictitious Nations')
By the way a full outer join would let you characterize the difference symmetrically:
select
mo.countryCode || ' ' ||
case
when count(case when mu.organisation is null then 1 else null end) > 0
and count(case when mo.organisation is null then 1 else null end) > 0
then 'and US both have individual memberships they that do not have in common.'
when count(case when mo.organisation is null then 1 else null end) > 0
then 'is a member of some organisations that US is not a member of.'
when count(case when mo.organisation is null then 1 else null end) > 0
then 'is not a member of some organisations that US is a member of.'
else 'has identical membership as US.'
end
from
(select * from Member where countryCode = 'US') mu
full outer join
(select * from Member where countryCode = '??') mo
on mo.organisation = mu.organisation
Please forgive the dangling prepositions.
And a disk note, though duplicate rows are not allowed in normalized data, this query has no problem with those.

Alternative to using GROUP BY without aggregates to retrieve distinct "best" result

I'm trying to retrieve the "Best" possible entry from an SQL table.
Consider a table containing tv shows:
id, title, episode, is_hidef, is_verified
eg:
id title ep hidef verified
1 The Simpsons 1 True False
2 The Simpsons 1 True True
3 The Simpsons 1 True True
4 The Simpsons 2 False False
5 The Simpsons 2 True False
There may be duplicate rows for a single title and episode which may or may not have different values for the boolean fields. There may be more columns containing additional info, but thats unimportant.
I want a result set that gives me the best row (so is_hidef and is_verified are both "true" where possible) for each episode. For rows considered "equal" I want the most recent row (natural ordering, or order by an abitrary datetime column).
3 The Simpsons 1 True True
5 The Simpsons 2 True False
In the past I would have used the following query:
SELECT * FROM shows WHERE title='The Simpsons' GROUP BY episode ORDER BY is_hidef, is_verified
This works under MySQL and SQLite, but goes against the SQL spec (GROUP BY requiring aggragates etc etc). I'm not really interested in hearing again why MySQL is so bad for allowing this; but I'm very interested in finding an alternative solution that will work on other engines too (bonus points if you can give me the django ORM code for it).
Thanks =)
In some way similar to Andomar's but this one really works.
select C.*
FROM
(
select min(ID) minid
from (
select distinct title, ep, max(hidef*1 + verified*1) ord
from tbl
group by title, ep) a
inner join tbl b on b.title=a.title and b.ep=a.ep and b.hidef*1 + b.verified*1 = a.ord
group by a.title, a.ep, a.ord
) D inner join tbl C on D.minid = C.id
The first level tiebreak converts bits (SQL Server) or MySQL boolean to an integer value using *1, and the columns are added to produce the "best" value. You can give them weights, e.g. if hidef > verified, then use hidef*2 + verified*1 which can produce 3,2,1 or 0.
The 2nd level looks among those of the "best" scenario and extracts the minimum ID (or some other tie-break column). This is essential to reduce a multi-match result set to just one record.
In this particular case (table schema), the outer select uses the direct key to retrieve the matched records.
This is basically a form of the groupwise-maximum-with-ties problem. I don't think there is a SQL standard compliant solution. A solution like this would perform nicely:
SELECT s2.id
, s2.title
, s2.episode
, s2.is_hidef
, s2.is_verified
FROM (
select distinct title
, episode
from shows
where title = 'The Simpsons'
) s1
JOIN shows s2
ON s2.id =
(
select id
from shows s3
where s3.title = s1.title
and s3.episode = s1.episode
order by
s3.is_hidef DESC
, s3.is_verified DESC
limit 1
)
But given the cost of readability, I would stick with your original query.

How to write a query returning non-chosen records

I have written a psychological testing application, in which the user is presented with a list of words, and s/he has to choose ten words which very much describe himself, then choose words which partially describe himself, and words which do not describe himself. The application itself works fine, but I was interested in exploring the meta-data possibilities: which words have been most frequently chosen in the first category, and which words have never been chosen in the first category. The first query was not a problem, but the second (which words have never been chosen) leaves me stumped.
The table structure is as follows:
table words: id, name
table choices: pid (person id), wid (word id), class (value between 1-6)
Presumably the answer involves a left join between words and choices, but there has to be a modifying statement - where choices.class = 1 - and this is causing me problems. Writing something like
select words.name
from words left join choices
on words.id = choices.wid
where choices.class = 1
and choices.pid = null
causes the database manager to go on a long trip to nowhere. I am using Delphi 7 and Firebird 1.5.
TIA,
No'am
Maybe this is a bit faster:
SELECT w.name
FROM words w
WHERE NOT EXISTS
(SELECT 1
FROM choices c
WHERE c.class = 1 and c.wid = w.id)
Something like that should do the trick:
SELECT name
FROM words
WHERE id NOT IN
(SELECT DISTINCT wid -- DISTINCT is actually redundant
FROM choices
WHERE class == 1)
SELECT words.name
FROM
words
LEFT JOIN choices ON words.id = choices.wid AND choices.class = 1
WHERE choices.pid IS NULL
Make sure you have an index on choices (class, wid).