SQL - Returning combinations of ID from tables if combination not found in another - sql

I have four tables of interest: users, models, questions and labels.
"Labels" contains rows that describes a user's answer to a question on a model. e.g username TEXT, mid INT, qid INT, answer TEXT
I am interested in finding out what model-question pairs the user is still required to provide. A user is asked to provide an answer for every combination of model and questions that appear in their respective tables. So for a given username, I can have rows of model ids and question ids.
Users:
Bacon, XXXX, XXXX, XXXX, ...
Mark, XXXX, XXXX, XXXX, ...
Models:
1, climateOne, XXXX, ....
2, climateTwo, XXXX, ....
3, climateTwo, XXXX, ....
Questions:
1, "Is this a question?"
2, "And another?"
labels:
Bacon, 1, 2, "Yes is was..."
Bacon, 3, 2, "Another what?!"
So the result of asking "what model question pairs has 'Bacon' not completed" would be the return of:
(1,1)(2,1)(2,2)(3,1)

To get all possible combinations of models and questions, use a cross join:
SELECT models.id,
questions.id
FROM models
CROSS JOIN questions
Then filter out those that have been completed:
...
WHERE (models.id, questions.id) NOT IN (SELECT mid, qid
FROM labels);

Thanks go to CL. for the point in the right direction. This code worked for me as I was using python's sqlite3 package. I have included the username specification too. However, I have no clue whether it is efficient or not :)
SELECT m.mid, q.qid
FROM models m CROSS JOIN questions q
WHERE NOT EXISTS(
SELECT 1
FROM labels l
WHERE l.username = ? AND l.mid = m.mid AND l.qid = q.qid
)
Thanks all!

Related

Rails 4 ActiveRecord: Order recrods by attribute and association if it exists

I have three models that I am having trouble ordering:
User(:id, :name, :email)
Capsule(:id, :name)
Outfit(:id, :name, :capsule_id, :likes_count)
Like(:id, :outfit_id, :user_id)
I want to get all the Outfits that belong to a Capsule and order them by the likes_count.
This is fairly trivial and I can get them like this:
Outfit.where(capsule_id: capsule.id).includes(:likes).order(likes_count: :desc)
However, I then want to also order the outfits so that if a given user has liked it, it appears higher in the list.
Example if I have the following outfit records:
Outfit(id: 1, capsule_id: 2, likes_count: 1)
Outfit(id: 2, capsule_id: 2, likes_count: 2)
Outfit(id: 3, capsule_id: 2, likes_count: 2)
And the given user has only liked outfit with id 3, the returned order should be IDs: 3, 2, 1
I'm sure this is fairly easy, but I can't seem to get it. Any help would be greatly appreciated :)
Postgres SQL with a subquery
SELECT outfits.*
FROM outfits
LEFT OUTER JOIN (SELECT likes.outfit_id, 1 AS weight
FROM likes
WHERE likes.user_id = #user_id) AS user_likes
ON user_likes.outfit_id = outfits.id
WHERE outfits.capsule_id = #capsule_id
ORDER BY user_likes.weight ASC, outfits.likes_count DESC;
Postgres gives NULL values bigger weight when ordering. I am not sure how this would look in Arel query. You can try converting it using this cheatsheets.

SQL join both ways to one result

I have two tables "TestItem" and "Connector" where Connector is used for relating two items in "TestItem".
I have two questions in prioritized order. But first, feel free to suggest alternative approaches. I'm open for suggestion to completely rethink my approach to what I want to achieve here.
Question 1) How to get relations both ways returned in the same result
Question 2) How to filter the most efficient way for specific items
Q1)
Two tables
Table: "TestItem"
ID, ITEM
1, "John Doe"
2, "Peggy Sue"
3, "Papa Sue"
Table: "Connector"
MOTHER, CHILD
1,2
The connector table will be used for several purposes (see below), but this is a destilled scenario for the equal type connection, like for instance marriage. If "John Doe" is married to "Peggy Sue" that information should also be sufficient to return "Peggy Sue" as married to "John Doe".
I can do this in two queries, but for efficiency (especially regarding my question 2) I'd appreciate this done in one query, so an implementation is not dependent on which way the connection is defined.
What is the most efficient way to do this?
Two queries approach to illustrate how the data can be fetched, but how one connection is missed one way or the other.
//Connector through "mother"-part SELECT ITEM, SUBITEM FROM TestItem
INNER JOIN (
SELECT MOTHER, ITEM AS SUBITEM
FROM Connector
INNER JOIN TestItem ON Connector.CHILD = TestItem.ID
) AS SUB ON TestItem.ID = SUB.MOTHER
/* WHERE ITEM = "John Doe" return "Peggy Sue" => Correct
WHERE ITEM = "Peggy Sue" return nothing => Wrong
*/
//Connector through "child"-part SELECT ITEM, SUBITEM FROM TestItem
INNER JOIN (
SELECT CHILD, ITEM AS SUBITEM
FROM Connector
INNER JOIN TestItem ON Connector.MOTHER= TestItem.ID
) AS SUB ON TestItem.ID = SUB.CHILD
/* WHERE ITEM = "John Doe" return nothing => Wrong
WHERE ITEM = "Peggy Sue" return "John Doe" => Correct
*/
Q2) Having the two approaches returned in one result may increase the amount of data involved, and hence bring down performance. If my focus is Peggy Sue, I assume sorting out only the relevant data as early as possible will improve performance. Is there a neat way of doing this from top level, or will every sub-query require an added WHERE?
PS: Some more information of the bigger perspective.
I'm planning to use the connector table for several purposes, both of the mentioned equal type, like colleagues, family, friends, etc, but also for hierarchical connection types like mother/child, leader/employee, country/city.
Thus solutions eliminating the mother/child-type connection may not suit my bigger purpose.
Basically I'm requesting how to handle the equal type of connections without losing the opportunity to use the same architecture and data for hierarchical connections.
Peggy Sue may through the same dataset be defined as daughter of Papa Sue through the relation
Mother, Child, Mother_type, Child_type
3, 2, Father, Daughter
1, 2, Married to, Married to
(But this is as mentioned on the side of what I'm requesting here. )
UNION ALL might be what you are looking for:
select mother.id as connectedToId,
mother.item as connectedToItem,
'Mother' as role
from TestItem ti
join Connector c on c.child = ti.id
join TestItem mother on c.mother = mother.id
where ti.item = 'John Doe'
union all
select child.id as connectedToId,
child.item as connectedToItem,
'Child' as role
from TestItem ti
join Connector c on c.mother = ti.id
join TestItem child on c.child = child.id
where ti.item = 'John Doe'

How to order by largest amount of identical entries with Rails?

I have a survey where users can post answers and since the answers are being saved in the db as a foreign key for each question, I'd like to know which answer got the highest rating.
So if the DB looks somewhat like this:
answer_id
1
1
2
how can I find that the answer with an id of 1 was selected more times than the one with an id of 2 ?
EDIT
So far I've done this:
#question = AnswerContainer.where(user_id: params[:user_id]) which lists the things a given user has voted for, but, obviously, that's not what I need.
you could try:
YourModel.group(:answer_id).count
for your example return something like: {1 => 2, 2 => 1}
You can do group by and then sort
Select answer_id, count(*) as maxsel
From poll
Group by answer_id
Order by maxsel desc
As stated in rails documentation (http://api.rubyonrails.org/classes/ActiveRecord/Calculations.html) when you use group with count, active record "returns a Hash whose keys represent the aggregated column, and the values are the respective amounts"
Person.group(:city).count
# => { 'Rome' => 5, 'Paris' => 3 }

Query that finds objects with ALL association ids (Rails 4)

BACKGROUND: Posts have many Communities through CommunityPosts. I understand the following query returns posts associated with ANY ONE of these community_ids.
Post.joins(:communities).where(communities: { id: [1,2,3] })
OBJECTIVE: I'd like to query for posts associated with ALL THREE community_ids in the array. Posts having communities 1, 2, and 3 as associations
EDIT: Please assume that length of the array is unknown. Used this array for explanation purposes.
Try this,
ids=[...]
Post.joins(:communities).select(“count(communities.id) AS cnt”).where(id: ids).group(‘post.id’).having(cnt: ids.size)
ids = [1, 2, 3] # and etc
Post.joins(:communities).where("communities.id IN ?", ids)
Wish it helps .

Filtering model with HABTM relationship

I have 2 models - Restaurant and Feature. They are connected via has_and_belongs_to_many relationship. The gist of it is that you have restaurants with many features like delivery, pizza, sandwiches, salad bar, vegetarian option,… So now when the user wants to filter the restaurants and lets say he checks pizza and delivery, I want to display all the restaurants that have both features; pizza, delivery and maybe some more, but it HAS TO HAVE pizza AND delivery.
If I do a simple .where('features IN (?)', params[:features]) I (of course) get the restaurants that have either - so or pizza or delivery or both - which is not at all what I want.
My SQL/Rails knowledge is kinda limited since I'm new to this but I asked a friend and now I have this huuuge SQL that gets the job done:
Restaurant.find_by_sql(['SELECT restaurant_id FROM (
SELECT features_restaurants.*, ROW_NUMBER() OVER(PARTITION BY restaurants.id ORDER BY features.id) AS rn FROM restaurants
JOIN features_restaurants ON restaurants.id = features_restaurants.restaurant_id
JOIN features ON features_restaurants.feature_id = features.id
WHERE features.id in (?)
) t
WHERE rn = ?', params[:features], params[:features].count])
So my question is: is there a better - more Rails even - way of doing this? How would you do it?
Oh BTW I'm using Rails 4 on Heroku so it's a Postgres DB.
This is an example of a set-iwthin-sets query. I advocate solving these with group by and having, because this provides a general framework.
Here is how this works in your case:
select fr.restaurant_id
from features_restaurants fr join
features f
on fr.feature_id = f.feature_id
group by fr.restaurant_id
having sum(case when f.feature_name = 'pizza' then 1 else 0 end) > 0 and
sum(case when f.feature_name = 'delivery' then 1 else 0 end) > 0
Each condition in the having clause is counting for the presence of one of the features -- "pizza" and "delivery". If both features are present, then you get the restaurant_id.
How much data is in your features table? Is it just a table of ids and names?
If so, and you're willing to do a little denormalization, you can do this much more easily by encoding the features as a text array on restaurant.
With this scheme your queries boil down to
select * from restaurants where restaurants.features #> ARRAY['pizza', 'delivery']
If you want to maintain your features table because it contains useful data, you can store the array of feature ids on the restaurant and do a query like this:
select * from restaurants where restaurants.feature_ids #> ARRAY[5, 17]
If you don't know the ids up front, and want it all in one query, you should be able to do something along these lines:
select * from restaurants where restaurants.feature_ids #> (
select id from features where name in ('pizza', 'delivery')
) as matched_features
That last query might need some more consideration...
Anyways, I've actually got a pretty detailed article written up about Tagging in Postgres and ActiveRecord if you want some more details.
This is not "copy and paste" solution but if you consider following steps you will have fast working query.
index feature_name column (I'm assuming that column feature_id is indexed on both tables)
place each feature_name param in exists():
select fr.restaurant_id
from
features_restaurants fr
where
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'pizza')
and
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'delivery')
group by
fr.restaurant_id
Maybe you're looking at it backwards?
Maybe try merging the restaurants returned by each feature.
Simplified:
pizza_restaurants = Feature.find_by_name('pizza').restaurants
delivery_restaurants = Feature.find_by_name('delivery').restaurants
pizza_delivery_restaurants = pizza_restaurants & delivery_restaurants
Obviously, this is a single instance solution. But it illustrates the idea.
UPDATE
Here's a dynamic method to pull in all filters without writing SQL (i.e. the "Railsy" way)
def get_restaurants_by_feature_names(features)
# accepts an array of feature names
restaurants = Restaurant.all
features.each do |f|
feature_restaurants = Feature.find_by_name(f).restaurants
restaurants = feature_restaurants & restaurants
end
return restaurants
end
Since its an AND condition (the OR conditions get dicey with AREL). I reread your stated problem and ignoring the SQL. I think this is what you want.
# in Restaurant
has_many :features
# in Feature
has_many :restaurants
# this is a contrived example. you may be doing something like
# where(name: 'pizza'). I'm just making this condition up. You
# could also make this more DRY by just passing in the name if
# that's what you're doing.
def self.pizza
where(pizza: true)
end
def self.delivery
where(delivery: true)
end
# query
Restaurant.features.pizza.delivery
Basically you call the association with ".features" and then you use the self methods defined on features. Hopefully I didn't misunderstand the original problem.
Cheers!
Restaurant
.joins(:features)
.where(features: {name: ['pizza','delivery']})
.group(:id)
.having('count(features.name) = ?', 2)
This seems to work for me. I tried it with SQLite though.