SQL: Grabbing two when theres already one match? - sql

For the recipient field in my message system, if you enter Megan Fo, it will ask you "Did you mean Megan Fox?". Then I also have a "Who did you mean? Megan Fox, Megan Foxxy, Megan Foxxie" if there's more than one found.
And then ofcourse there is if you made it correct. (if this SQL statement is returning 1 only out and the full is 1).
SELECT users.id, users.firstname, users.lastname,
(users.firstname = 'Meg' AND users.lastname = 'Meg') AS full FROM users
INNER JOIN users_friends ON users.id=users_friends.uID
WHERE users_friends.bID='1' AND users_friends.type = 'friend' AND users_friends.accepted = '1' AND (
(users.firstname = 'Meg' AND users.lastname='Meg') OR
(users.firstname LIKE 'Meg%' AND users.lastname LIKE 'Meg%') OR
users.firstname LIKE 'Meg%' OR
users.lastname LIKE 'Meg%')
This is working fine, although now I am having an issue when Im trying to send to a user
"Meg Meg"
Then it returns 2, "Meg Meg" and "Megan Fox". I want it to return one only if it matchs the full. So now I am keep getting "Who did you mean?" as i build it up so if thers more than 1 rowcount, then "who did you mean.."
if($sql_findUser->rowCount() == 1){
$get = $sql_findUser->fetch();
if($get["full"] == 1){
echo "SUCCESS, ONE FOUND FULL MATCH"
}else{
echo "Did you mean ...?"
}
}elseif($sql_findUser->rowCount()>1){
Who did you mean?
}
How can i fix this?

At the end of your SQL statement, it looks like you're including everyone with a first name starting with "Meg", which is why Meg Meg and Megan Fox both match. Try shortening your WHERE clause to:
WHERE users_friends.bID='1' AND users_friends.type = 'friend' AND users_friends.accepted = '1' AND (
(users.firstname = 'Meg' AND users.lastname='Meg') OR
(users.firstname LIKE 'Meg%' AND users.lastname LIKE 'Meg%'))

It will be a tad slower, but you could split the logic in two queries, one that uses exact matching (= instead of like), and if that one does not yield anything, then run the current query.
However, that will make problems, when the user enters Megan Fox, and really meant Megan Foxxy.

As I understand it, this is your problem when using "Meg Meg"...
- Your code identifies an exact match with "Meg Meg"
- Your code also identifies a partial match with "Megan Fox"
- You don't want the partial matches to be included if you get an exact match
The issue here is that no single record in the return set 'knows' about the other records.
To know that you do Not want to include the partial match, you must have already completed a full check for the exact match.
There appear to be two ways of doing that to me...
1) Sequential but seperate queries...
Have your client run a query for exact matches.
If no records are returned, run a query for partial matches on first AND last name.
If no records are returned, run a query for partial matches on first OR last name.
etc, etc.
2. Run one query where you specify the type of match
Add a field to your query that is something like this...
CASE WHEN (users.firstname = 'Meg' AND users.lastname='Meg') THEN 1
WHEN (users.firstname LIKE 'Meg%' AND users.lastname LIKE 'Meg%') THEN 2
WHEN (users.firstname LIKE 'Meg%' OR users.lastname LIKE 'Meg%') THEN 3
END AS 'match_type'
Then also add it to the ORDER BY clause, to make full matches at the top, etc, etc.
Your client can then see how many of each match type were generated and choose to discard/ignore the matches that are not relevant.

Related

Selecting related model: Left join, prefetch_related or select_related?

Considering I have the following relationships:
class House(Model):
name = ...
class User(Model):
"""The standard auth model"""
pass
class Alert(Model):
user = ForeignKey(User)
house = ForeignKey(House)
somevalue = IntegerField()
Meta:
unique_together = (('user', 'property'),)
In one query, I would like to get the list of houses, and whether the current user has any alert for any of them.
In SQL I would do it like this:
SELECT *
FROM house h
LEFT JOIN alert a
ON h.id = a.house_id
WHERE a.user_id = ?
OR a.user_id IS NULL
And I've found that I could use prefetch_related to achieve something like this:
p = Prefetch('alert_set', queryset=Alert.objects.filter(user=self.request.user), to_attr='user_alert')
houses = House.objects.order_by('name').prefetch_related(p)
The above example works, but houses.user_alert is a list, not an Alert object. I only have one alert per user per house, so what is the best way for me to get this information?
select_related didn't seem to work. Oh, and surely I know I can manage this in multiple queries, but I'd really want to have it done in one, and the 'Django way'.
Thanks in advance!
The solution is clearer if you start with the multiple query approach, and then try to optimise it. To get the user_alerts for every house, you could do the following:
houses = House.objects.order_by('name')
for house in houses:
user_alerts = house.alert_set.filter(user=self.request.user)
The user_alerts queryset will cause an extra query for every house in the queryset. You can avoid this with prefetch_related.
alerts_queryset = Alert.objects.filter(user=self.request.user)
houses = House.objects.order_by('name').prefetch_related(
Prefetch('alert_set', queryset=alerts_queryset, to_attrs='user_alerts'),
)
for house in houses:
user_alerts = house.user_alerts
This will take two queries, one for houses and one for the alerts. I don't think you require select related here to fetch the user, since you already have access to the user with self.request.user. If you want you could add select_related to the alerts_queryset:
alerts_queryset = Alert.objects.filter(user=self.request.user).select_related('user')
In your case, user_alerts will be an empty list or a list with one item, because of your unique_together constraint. If you can't handle the list, you could loop through the queryset once, and set house.user_alert:
for house in houses:
house.user_alert = house.user_alerts[0] if house.user_alerts else None

Searching on pubmed using biopython

I am trying to input over 200 entries into pubmed in order to record the number of articles published by an author and to refine the search by including his/her mentor and institution. I have tried to do this using biopython and xlrd (the code is below), but I am consistently getting 0 results for all three formats of inquiries (1. by name, 2. by name and institution name, and 3. by name and mentor's name). Are there steps of troubleshooting that I can do, or should I use a different format when using the keywords indicated below to search on pubmed?
Example output of the input queries;search_term is a linked list with lists of the input queries.
print(*search_term[8:15], sep='\n')
[text:'Andrew Bland', 'Weill Cornell Medical College', text:'David Cutler MD']
[text:'Andy Price', 'University of Alabama at Birmingham School of Medicine', text:'Jason Warem, PhD']
[text:'Bah Chamin', 'University of Texas Southwestern Medical School', text:'Dr. Timothy Hillar']
[text:'Eduo Cera', 'University of Colorado School of Medicine', text:'Dr. Tim']
Code used to generate the input queries above and to search on Pubmed:
Entrez.email = "mollyzhaoe#college.harvard.edu"
for search_term in search_terms[8:55]:
handle = Entrez.egquery(term="{0} AND ((2010[Date - Publication] : 2017[Date - Publication])) ".format(search_term[0]))
handle_1 = Entrez.egquery(term = "{0} AND ((2010[Date - Publication] : 2017[Date - Publication])) AND {1}".format(search_term[0], search_term[2]))
handle_2 = Entrez.egquery(term = "{0} AND ((2010[Date - Publication] : 2017[Date - Publication])) AND {1}".format(search_term[0], search_term[1]))
record = Entrez.read(handle)
record_1 = Entrez.read(handle_1)
record_2 = Entrez.read(handle_2)
pubmed_count = ['','','']
for row in record["eGQueryResult"]:
if row["DbName"] == "pubmed":
pubmed_count[0] = row["Count"]
for row in record_1["eGQueryResult"]:
if row["DbName"] == "pubmed":
pubmed_count[1] = row["Count"]
for row in record_2["eGQueryResult"]:
if row["DbName"] == "pubmed":
pubmed_count[2] = row["Count"]
Check your indentation, it is difficult to know which part belongs to which loop.
If you want to troubleshoot, try printing your egquery, e.g.
print("{0} AND ((2010[Date - Publication] : 2017[Date - Publication])) ".format(search_term[0]))
and paste the output to pubmed and see what you get. Perhaps modify it a bit and see which search term causes the problems.
Your input format is a little bit hard to guess. Print the query and make sure you are getting the right search values.
For the author names, try to get rid of the academic titles, PubMed might confused them with the initials, e.g. House MD, might be Mark David House.

Filtering model with HABTM relationship

I have 2 models - Restaurant and Feature. They are connected via has_and_belongs_to_many relationship. The gist of it is that you have restaurants with many features like delivery, pizza, sandwiches, salad bar, vegetarian option,… So now when the user wants to filter the restaurants and lets say he checks pizza and delivery, I want to display all the restaurants that have both features; pizza, delivery and maybe some more, but it HAS TO HAVE pizza AND delivery.
If I do a simple .where('features IN (?)', params[:features]) I (of course) get the restaurants that have either - so or pizza or delivery or both - which is not at all what I want.
My SQL/Rails knowledge is kinda limited since I'm new to this but I asked a friend and now I have this huuuge SQL that gets the job done:
Restaurant.find_by_sql(['SELECT restaurant_id FROM (
SELECT features_restaurants.*, ROW_NUMBER() OVER(PARTITION BY restaurants.id ORDER BY features.id) AS rn FROM restaurants
JOIN features_restaurants ON restaurants.id = features_restaurants.restaurant_id
JOIN features ON features_restaurants.feature_id = features.id
WHERE features.id in (?)
) t
WHERE rn = ?', params[:features], params[:features].count])
So my question is: is there a better - more Rails even - way of doing this? How would you do it?
Oh BTW I'm using Rails 4 on Heroku so it's a Postgres DB.
This is an example of a set-iwthin-sets query. I advocate solving these with group by and having, because this provides a general framework.
Here is how this works in your case:
select fr.restaurant_id
from features_restaurants fr join
features f
on fr.feature_id = f.feature_id
group by fr.restaurant_id
having sum(case when f.feature_name = 'pizza' then 1 else 0 end) > 0 and
sum(case when f.feature_name = 'delivery' then 1 else 0 end) > 0
Each condition in the having clause is counting for the presence of one of the features -- "pizza" and "delivery". If both features are present, then you get the restaurant_id.
How much data is in your features table? Is it just a table of ids and names?
If so, and you're willing to do a little denormalization, you can do this much more easily by encoding the features as a text array on restaurant.
With this scheme your queries boil down to
select * from restaurants where restaurants.features #> ARRAY['pizza', 'delivery']
If you want to maintain your features table because it contains useful data, you can store the array of feature ids on the restaurant and do a query like this:
select * from restaurants where restaurants.feature_ids #> ARRAY[5, 17]
If you don't know the ids up front, and want it all in one query, you should be able to do something along these lines:
select * from restaurants where restaurants.feature_ids #> (
select id from features where name in ('pizza', 'delivery')
) as matched_features
That last query might need some more consideration...
Anyways, I've actually got a pretty detailed article written up about Tagging in Postgres and ActiveRecord if you want some more details.
This is not "copy and paste" solution but if you consider following steps you will have fast working query.
index feature_name column (I'm assuming that column feature_id is indexed on both tables)
place each feature_name param in exists():
select fr.restaurant_id
from
features_restaurants fr
where
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'pizza')
and
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'delivery')
group by
fr.restaurant_id
Maybe you're looking at it backwards?
Maybe try merging the restaurants returned by each feature.
Simplified:
pizza_restaurants = Feature.find_by_name('pizza').restaurants
delivery_restaurants = Feature.find_by_name('delivery').restaurants
pizza_delivery_restaurants = pizza_restaurants & delivery_restaurants
Obviously, this is a single instance solution. But it illustrates the idea.
UPDATE
Here's a dynamic method to pull in all filters without writing SQL (i.e. the "Railsy" way)
def get_restaurants_by_feature_names(features)
# accepts an array of feature names
restaurants = Restaurant.all
features.each do |f|
feature_restaurants = Feature.find_by_name(f).restaurants
restaurants = feature_restaurants & restaurants
end
return restaurants
end
Since its an AND condition (the OR conditions get dicey with AREL). I reread your stated problem and ignoring the SQL. I think this is what you want.
# in Restaurant
has_many :features
# in Feature
has_many :restaurants
# this is a contrived example. you may be doing something like
# where(name: 'pizza'). I'm just making this condition up. You
# could also make this more DRY by just passing in the name if
# that's what you're doing.
def self.pizza
where(pizza: true)
end
def self.delivery
where(delivery: true)
end
# query
Restaurant.features.pizza.delivery
Basically you call the association with ".features" and then you use the self methods defined on features. Hopefully I didn't misunderstand the original problem.
Cheers!
Restaurant
.joins(:features)
.where(features: {name: ['pizza','delivery']})
.group(:id)
.having('count(features.name) = ?', 2)
This seems to work for me. I tried it with SQLite though.

Rails, order by number of matched tags and then by name

Here's what I want to do: Listing has a many-to-many relationship with Tag through Taggings. I want to allow a user to search for listings by title (of the listing) and name (of zero or more tags). I want to order the number of results first by the listings with the greatest number of tags matched, and then by title.
It seems like this question has been done before -- it might be as simple as matching this question (Ordering items with matching tags by number of tags that match) from MySQL. However, I'm not SQL-literate at all, which is why I'm asking for help.
Update:
Here is an example of what I want.
Say I have 3 listings.
listing1 has tags "humor," "funny," and "hilarious."
listing2 = 2 has tags "funny," "silly," and "goofy."
listing3 = 3 has tags "funny," "silly," and "goofy."
listing4 = 4 has the tag "completely serious."
If I make a search with the tags "funny" and "silly", what I should get back is listing2, listing3, listing1, and listing4 (ignoring titles for now).
Interesting problem. I think you might have to use some SQL sugar to do this scope.
Something like this:
Listing
.joins("LEFT JOIN taggings ON taggings.listing_id = listings.id")
.joins('LEFT JOIN tags ON tags.id = taggings.tag_id AND tags.name IN ("funny","silly")')
.group(:id)
.order("count(tags.id), name DESC")
Does that help?
Assuming you want a solution in pure ActiveRecord so as not to touch any SQL...
Listing.order("tags.count DESC, title")
In this case you'd probably be better off using a counter cache for tags to optimize your queries.

Complex subqueries in activerecord

I'm doing a rails app. I have to do a comparison engine a bit complex. I'm currently trying to do a prototype. My query can vary widely so i have to work with a lot of scopes, but that's not my problem.
My query have to compare candidates. These candidates have answered some tests. These tests belongs to category. Theses tests have different max value, and i have to be able to compare candidates by categories.
So i have to calculate a % of good answers. I have to be able to compare candidates in all possible use cases in one category. So, i have to be able to compare the average good answer rate for all this category.
In a nutshell : I have to be able to use subqueries in order to compare some candidates. I have to be able to compare them for a test or a category. My problem is using a subquery able to return a good answer rate for all tests a candidats may have passed in a category.
And I have to be able to use this subquery in an order_by or having clause.
How can I construct this subquery ? I have no problem to handle complex conditional queries with some scopes. This has to be a real subquery, because I am working with 6 or 7 models here.
I ask for an active record way, cause this must work with whatever database supported by rails.
Excuse my poor English.
Edit :
An example is worth 1000 words so how could do something like this :
Sessiontest.find(Candidat.where(:firstname => 'toto'))
This example is stupid, ok. So, is it possible to do something like this ?
Edit2 :
I saw some posts about AREL. I wish to know if it is possible to do this without a third party plugin.
Is it possible to do some sub queries in subqueries with arel? Because for example, my number of points per test, is the sum of the points of all his questions. (Sad, but I have to keep it). And I need this, so my subquery can calculate my good answers %.
So you got the idea. That's something, which has to be really powerful, so I need something powerful, and not too much error prone.
Edit3 : I made some progress, but I can't for a while post an answer.
It seem possible to get this work without any plugin. I have some success in buildings some subqueries like this :
toto = Candidat.where(:lastname => Candidat.select(:lastname).where(:lastname => "ulysse").limit(1))
The request :
Candidat Load (1.0ms)[0m SELECT "candidats".* FROM "candidats" WHERE "candidats"."lastname" IN (SELECT "candidats"."id" FROM "candidats" WHERE "candidats"."lastname" = 'ulysse' LIMIT 1
This works and create a real subquery. I will try some more advanced experiences, in order to get the level I actually need.
Just tried sub-subquery works wonder too.
Edit 5 :
I am trying some more advanced things, and there is a lot of things, i still don't understand.
- toto = Candidat.where("id = ? / ? ", Sessiontest.select(:id).where(:id => 6), Sessiontest.select(:id).where(:id => 2))
This is just a stupid example in order to get an object with an id of 3. This code works, but not as i expected.
See, the sql :
1m[35m (1.0ms)[0m SELECT COUNT("sessiontests"."id") FROM "sessiontests" WHERE "sessiontests"."id" = 6
[1m[36mSessiontest Load (0.0ms)[0m [1mSELECT id FROM "sessiontests" WHERE "sessiontests"."id" = 6[0m
[1m[35m (1.0ms)[0m SELECT COUNT("sessiontests"."id") FROM "sessiontests" WHERE "sessiontests"."id" = 2
[1m[36mSessiontest Load (1.0ms)[0m [1mSELECT id FROM "sessiontests" WHERE "sessiontests"."id" = 2[0m
[1m[35mCandidat Load (1.0ms)[0m SELECT "candidats".* FROM "candidats" WHERE (id = 6 / 2)
So, it does not use a subqueries. I tried with .to_sql. But it introduce my sql this way :
1m[36mCandidat Load (0.0ms)[0m [1mSELECT "candidats".* FROM "candidats" WHERE (id = 'SELECT id FROM "sessiontests" WHERE "sessiontests"."id" = 6' / 2 )[0m
So active record quoted the subreust for security purpose. this is closer to my wish, but not really what i want.
This does not work
Candidat.where("id = (?) / ? ", Sessiontest.select(:id).where(:id => 6).to_sql, Sessiontest.select(:id).where(:id => 2))
Quotes prevents the subquery to work.
But this work :
Candidat.where("id = (" + Sessiontest.select(:id).where(:id => 6).to_sql + ") / (" + Sessiontest.select(:id).where(:id => 2).to_sql + ") ")
[1m[36mCandidat Load (1.0ms)[0m [1mSELECT "candidats".* FROM "candidats" WHERE (id = (SELECT id FROM "sessiontests" WHERE "sessiontests"."id" = 6) / (SELECT id FROM "sessiontests" WHERE "sessiontests"."id" = 2) )[0m
But I find this ugly. I will try to get these subqueries working in a more dynamic way. I mean replace the integer values by columns name.
I don't have anymore the exact answer to this question, because i do not work in the same enterprise anymore. But the solution to this problem, was to use a group_by clause. So the request became really easy.
With a group_by, i was able to manipulate, category or a technology with ease.