How to retrieve samples from the database? - sql

Suppose I have many tagged entities (e.g. blog posts with tags) to store in a SQL database. For example:
post1: work
post2: work, programming, java, work
post3: work, programming, sql
post4: vacation, photo
post5: vacation
post6: photo
Suppose also I have a list of tags
work, vacation
Now I'd like to get a posts sample of size 2, i.e. two posts with tags from the list. For example
sample1: post1 and post2
sample2: post1 and post4
sample3: post2 and post5
In addition I'd like the sample to contain all tags in the list. Note that sample1 does not meet this requirement since the set of tags of the sample entities does not contain tag vacation from the list.
I would like also all tags occurrences to be equal. Let's consider 2 samples of size 4.
sample1: post1, post2, post3, post6
sample2: post1, post3, post4, post5
Note that sample1 does not meet this requirement since tag work occurs 3 times in it and vacation occurs only once.
My question is: how to design a relational database and SQL query to retrieve samples of given size?

If you want to get all posts that have tags in a comma delimited list:
select postid
from post_tags
where find_in_set(tagid, #LIST) > 0
group by postid
having count(distinct tagid) = 1+length(#LIST) - length(replace(',', #LIST, ''));
If you want just a "sample" of them:
select postid
from (select postid
from post_tags
where find_in_set(tagid, #LIST) > 0
group by postid
having count(distinct tagid) = 1+length(#LIST) - length(replace(',', #LIST, ''))
) t
order by rand()
limit 5

Related

Optimization of SQL query based on attribute (not having specific value) in joined table

My models structure: Movie has_many :captions. Language of the Caption may be “en”, “de”, “fr”...
Problem:
An effective query to select Movies that don’t have Captions with an “en” language.
App that needs above runs on Rails, and for this I’m currently using something like this in Caption model:
def self.ids_of_movies_without_caption_in_en
a = (1..(Movie.last.lp.to_i)).to_a
b = Caption.in_lang("en").collect {|h| h.movie_id }
(a - b)
end
As you can see, I collect id’s (lp) of all movies and then I remove from that array id’s of those movies where Captions have “en” as a language. The outcome is an array of id’s of Movies I need.
Above works, but as you can imagine it’s quite “heavy”. I believe that there is a better (and maybe trivial) approach to it. However, being “fresh” with SQL, I ask for some guidance in writing an efficient query. This runs on PostgreSQL
Implementation in Rails (5.2) would be an additional bonus!
This is the situation: let's say in the database there are 1000 movies, and 4000 captions for those movies. There are of course movies that don't have any captions. Out of those 4000 captions 400 are in "en" language. The query I'm looking for would return 600 movies, where caption in "en" does not exist (including movies with 0 captions).
This is quite easy in SQL. I'm not quite sure what the tables look like, but something like this:
select movie_id
from captions
group by movie_id
having not bool_or(language = 'en');
If you want movies with no captions, then use not exists:
select m.movie_id
from movies m
where not exists (select 1
from captions c
where c.movie_id = m.movie_id and
m.language = 'en'
);

How to order by largest amount of identical entries with Rails?

I have a survey where users can post answers and since the answers are being saved in the db as a foreign key for each question, I'd like to know which answer got the highest rating.
So if the DB looks somewhat like this:
answer_id
1
1
2
how can I find that the answer with an id of 1 was selected more times than the one with an id of 2 ?
EDIT
So far I've done this:
#question = AnswerContainer.where(user_id: params[:user_id]) which lists the things a given user has voted for, but, obviously, that's not what I need.
you could try:
YourModel.group(:answer_id).count
for your example return something like: {1 => 2, 2 => 1}
You can do group by and then sort
Select answer_id, count(*) as maxsel
From poll
Group by answer_id
Order by maxsel desc
As stated in rails documentation (http://api.rubyonrails.org/classes/ActiveRecord/Calculations.html) when you use group with count, active record "returns a Hash whose keys represent the aggregated column, and the values are the respective amounts"
Person.group(:city).count
# => { 'Rome' => 5, 'Paris' => 3 }

Rails, order by number of matched tags and then by name

Here's what I want to do: Listing has a many-to-many relationship with Tag through Taggings. I want to allow a user to search for listings by title (of the listing) and name (of zero or more tags). I want to order the number of results first by the listings with the greatest number of tags matched, and then by title.
It seems like this question has been done before -- it might be as simple as matching this question (Ordering items with matching tags by number of tags that match) from MySQL. However, I'm not SQL-literate at all, which is why I'm asking for help.
Update:
Here is an example of what I want.
Say I have 3 listings.
listing1 has tags "humor," "funny," and "hilarious."
listing2 = 2 has tags "funny," "silly," and "goofy."
listing3 = 3 has tags "funny," "silly," and "goofy."
listing4 = 4 has the tag "completely serious."
If I make a search with the tags "funny" and "silly", what I should get back is listing2, listing3, listing1, and listing4 (ignoring titles for now).
Interesting problem. I think you might have to use some SQL sugar to do this scope.
Something like this:
Listing
.joins("LEFT JOIN taggings ON taggings.listing_id = listings.id")
.joins('LEFT JOIN tags ON tags.id = taggings.tag_id AND tags.name IN ("funny","silly")')
.group(:id)
.order("count(tags.id), name DESC")
Does that help?
Assuming you want a solution in pure ActiveRecord so as not to touch any SQL...
Listing.order("tags.count DESC, title")
In this case you'd probably be better off using a counter cache for tags to optimize your queries.

SQL Latest photos from contacts (grouped by contact)

To short version of this question is that I want to accomplish something along the lines of what's visible on Flickr's homepage once you're logged in. It shows the three latest photos of each of your friends sorted by date but grouped by friend.
Here's a longer explanation: For example I have 3 friends: John, George and Andrea. The list I want to extract should look like this:
George
Photo - 2010-05-18
Photo - 2010-05-18
Photo - 2010-05-12
John
Photo - 2010-05-17
Photo - 2010-05-14
Photo - 2010-05-12
Andrea
Photo - 2010-05-15
Photo - 2010-05-15
Photo - 2010-05-15
Friend with most recent photo uploaded is on top but his or her 2 next files follow.
I'd like to do this from MySQL, and for the time being I got here:
SELECT photos.user_id, photos.id, photos.date_uploaded
FROM photos
WHERE photos.user_id IN (SELECT user2_id
FROM user_relations
WHERE user1_id = 8)
ORDER BY date_uploaded DESC
Where user1_id = 8 is the currently logged in user and user2_id are the ids of friends. This query indeed returns the latest files uploaded by the contacts of the user with id = 8 sorted by date. However I'd like to accomplish the grouping and limiting mentioned above.
Hopefully this makes sense. Thank you in advance.
Sometimes, the only way to acheive an end is to create a piece of SQL so ugly and heinous, that the alternative of doing multiple queries becomes attractive :-)
I would just do one query to get a list of your friends then, for each friend, get the three most recent photos. Something like:
friend_list = sqlexec "select user2_id from relations where user1_id = "
+ current_user_id
photolist = []
for friend in friend_list:
photolist += sqlexec "select user_id, id, date_uploaded from photos"
+ " where user_id = "
+ friend.get("user2_id")
+ " order by date_uploaded desc fetch first 3 rows only"
# Now do something with photolist
You don't have to do it as one query any more than you're limited to one regular expression for matching a heinous pattern. Sure it'd be nice to be "clever" but it's rarely necessary. I prefer a pragmatic approach.

Nhibernate Criteria Query with Join

I am looking to do the following using an NHibernate Criteria Query
I have "Product"s which has 0 to Many "Media"s
A product can be associated with 1 to Many ProductCategories
These use a table in the middled to create the join
ProductCategories
Id
Title
ProductsProductCategories
ProductCategoryId
ProductId
Products
Id
Title
ProductMedias
ProductId
MediaId
Medias
Id
MediaType
I need to implement a criteria query to return All Products in a ProductCategory and the top 1 associated Media or no media if none exists.
So although for example a "T Shirt" may have 10 Medias associated, my result should be something similar to this
Product.Id Product.Title MediaId
1 T Shirt 21
2 Shoes Null
3 Hat 43
I have tried the following solutions using JoinType.LeftOuterJoin
1) productCriteria.SetResultTransformer(Transformers.DistinctRootEntity);
This hasnt worked as the transform is done code side and as I have .SetFirstResult() and .SetMaxResults() for paging purposes it wont work.
2) .SetProjection(
Projections.Distinct(
Projections.ProjectionList()
.Add(Projections.Alias(Projections.Property("Id"), "Id"))
...
.SetResultTransformer(Transformers.AliasToBean());
This hasn't worked as I cannot seem to populate a value for Medias.Id in the projections. (Similar to nHibernate Criteria API Projections)
Any help would be greatly appreciated