mysql where IN on large dataset or Looping? - sql

I have the following scenario:
Table 1:
articles
id article_text category author_id
1 "hello world" 4 1
2 "hi" 5 2
3 "wasup" 4 3
Table 2
authors
id name friends_with
1 "Joe" "Bob"
2 "Sue" "Joe"
3 "Fred" "Bob"
I want to know the total number of authors that are friends with "Bob" for a given category.
So for example, for category 4 how many authors are there that are friends with "Bob".
The authors table is quite large, in some cases I have a million authors that are friends with "Bob"
So I have tried:
Get list of authors that are friends with bob, and then loop through them and get the count for each of them of that given category and sum all those together in my code.
The issue with this approach is it can generate a million queries, even though they are very fast, it seems there should be a better way.
I was thinking of trying to get a list of authors that are friends with bob and then building an IN clause with that list, but I fear that would blow out the amt of memory allowed in the query set.
Seems like this is a common problem. Any ideas?
thanks

SELECT COUNT(DISTINCT auth.id)
FROM authors auth
INNER JOIN articles art ON auth.id = art.author_id
WHERE friends_with = 'bob' AND art.category = 4
Count(Distinct a.id) is required as articles might hit multiple rows for each author.
But if you have any control over the database I would use a link table for friends_with as your cussrent solution either have to use a comma seperated list of names which will be disastrous for performance and require a completly different query or each author can only have one friend.
Friends
id friend_id
then the query would look like this
SELECT COUNT(DISTINCT auth.id)
FROM authors auth
INNER JOIN articles art ON auth.id = art.author_id
INNER JOIN friends f ON auth.id = f.id
INNER JOIN authors fauth ON fauth.id = f.friend_id
WHERE fauth.name = 'bob' AND art.category = 4
Its more complex but will allow for many friends, just remeber, this construct calls for 2 rows in friends for each pair, one from joe to bob and one from bob to joe.
You could build it differently but that would make the query even more complex.

Maybe something like
select fr.name,
fr.id,
au.name,
ar.article_text,
ar.category,
ar.author_id
from authors fr, authors au, articles ar
where fr.id = ar.author_id
and au.friends_with = fr.name
and ar.category = 4 ;
Just the count...
select count(distinct fr.name)
from authors fr, authors au, articles ar
where fr.id = ar.author_id
and au.friends_with = fr.name
and ar.category = 4 ;

A version without using joins (hopefully will work!)
SELECT count(distinct id) from authors where friends_with = 'Bob' and id in(select author_id from articles where category = 4)
I found it is easier to understand statements with 'IN' in when I started out with SQL.

Related

How to select people that don't know anyone who takes workshops

I have a few tables I want to iterate over. First table is Persons:
id
name
address
1
Laura Jansen
New York
2
Sana Vendi
Miami
3
Adam Smith
Boston
4
Mo Zora
Los Angeles
Second one is TakingWorkshop. This is the workshop the people are taking, so person_id is the id of the one in Persons.
id
person_id
workshop_id
20
4
26
19
2
27
18
3
28
Last table is Knows. The person id's are the same as the id's in Persons. So, PersonX knows PersonY.
PersonX
PersonY
1
2
1
3
2
1
4
1
So 1 is Laura, 2 is Sana and 3 is Adam. We can see that Adam doesn't know anyone. That means that Adam automatically also doesn't know anyone who takes workshops, because he doesn't even know anyone. However, in the table we see that Laura, 1, doesn't take workshops. So 4 and 2, Mo and Sana, know Laura, but she doesn't take any workshops so Mo and Sana don't know anyone who takes workshops.
I wrote some code for the people who don't know anyone taking workshops (in this database, it's Adam)
First I do a left join on the Person table and Knows table, on the id of persons and the id of personA of Knows. PersonA knows person B. This join gives me a table of people who know people, including the people who don't know anyone (those are null).
SELECT distinct P.name, K.personA_id
FROM Persons P LEFT JOIN Knows K
ON P.id = K.personA_id
Now I want to see if personB_id is in the person_id of TakingWorkshop. This way you can see whether the known people are taking workshops or not. PersonB_id should NOT be in TakeingWorkshop, because that's how you filter out Laura. I did this like this:
WHERE K.personB_id NOT IN (SELECT person_id
FROM TakingWorkshop)
So my whole code looks like this
SELECT distinct P.name, K.personA_id
FROM Persons P LEFT JOIN Knows K
ON P.id = K.personA_id
WHERE K.personB_id NOT IN (SELECT person_id
FROM TakingWorkshop)
But I get no results when I do this and want to know what's going wrong
Hmmm . . . Your description of the problem suggests not exists. But not exists what?
This query gets everyone who is known and taking a workshop:
select . . .
from knows k join
TakingWorkshop tw
on k.personY = tw.person_id;
So, we can slip that into the query:
select p.*.
from persons p
where not exists (select 1
from knows k join
TakingWorkshop tw
on k.personY = tw.person_id
where k.personX = p.id
);

sql finding multiple characters within multiple books

I can't seem to find all comics with certain characters in them. The comics and characters tables have a many to many relationship as follows:
My database schema:
**comics table**
comic_id
comic_name
comic_date
**character table**
character_id
character_name
**comics_character table**
comic_id
character_id
This works fine for one character:
sqlite3 comics.db
select comic_name
from comics as c, comics_characters as cc, characters as h
on c.comic_id = cc.comic_id and h.character_id = cc.character_id
where h.character_name = 'Superman';
But if I want all comics with say Superman and Batman in them, I tried using this:
sqlite3 comics.db
select comic_name
from comics as c, comics_characters as cc, characters as h
on c.comic_id = cc.comic_id and h.character_id = cc.character_id
where h.character_name in ('Batman', 'Superman');
but this only gets me a list of comics featuring either Batman OR Superman rather than comics with both Batman AND Superman in
I've also tried this which doesn't return anything:
sqlite3 comics.db
select comic_name
from comics as c, comics_characters as cc, characters as h
on (c.comic_id = cc.comic_id and h.character_id = cc.character_id)
where (h.character_name = 'Batman' and h.character_name = 'Superman');
I've tried other variations but can't get the desired result
The OR route doesn't work, for reasons you've worked out - you get rows for superman or batman, and comics that have both characters have two rows with the same comic is and different character ids. The AND route doesn't work because a row cannot be simultaneously both characters.
So, you need to use the OR route to get comics with one or both characters and then also use a count to show only comics with both characters. Essentially, "filter to superman or batman, and then filter again to only comic ids that appear on two rows" or "filter to only batman or superman then group them up based on the comic is and only take Groups that have two entities in them". Ultimately, the lesson here is that database rows are thought of as different entities and when you want to treat them as one you have to group them, so we are identifying comics based on some attribute of the group after (deliberately) losing the detail of exactly which entities the group contains:
SELECT comic_id
FROM comic_characters
WHERE character_id IN (1,2) --Batman or Superman
GROUP BY conic_id
HAVING COUNT(*) = 2
The number on the right hand side of the = must be the same as the number of character IDs in the IN() clause. If you IN for 4 character IDs, then use COUNT(*) = 4
You can join in other tables so you can use names etc; I simplified this to make the point without extraneous detail
Footnote; this technique wil find comics that feature at least batman and superman- the comic could very well contain other characters too but we lost those other guys at the WHERE stage before we did the GROUP BY. If you wanted comics that ONLY featured batman and superman it's a different thing. For that we could do something like grouping first, and counting conditionally - give batman or superman a score of 1 and everyone else a score of 10, comics that had only B and S would score 2, comics that featured only one of them would score 1 and anyone else's presence would cause a score of 10 or more, so we could filter on the 2

How to mix sql consults to make conditions to another one

I've the following tables
series_trailers:
ID EPISODEID CONTENT AUTHOR
-----------------------------
1 122383 url1 Peter
2 9999 url2 Ana
3 923822 stuff Jhon
4 122384 url3 Drake
series_episodes:
ID TITLE SERIESID
--------------------------------
122383 Episode 1 23
9999 Somethingweird 87
923822 Randomtitle 52
122384 Episode 2 23
series:
ID TITLE
-------------------
23 Stranger Things
87 Seriesname
512 Sometrashseries
As you can see there are three tables: one with the series info, one with the series' episodes and another one which contains urls that redirect to the episode's trailers. I'd like to get the lastest rows from series_trailers but without repeating the series where they're from.
I've tried with SELECT DISTINCT EPISODEID FROM series_trailers ORDER BY id DESCbut there are two rows with the same episodes' series so I'll get the seriies Stranger things twice. Summing up I'd like to display the lastest series with new urls but I don't want to get duplicated series (that's what i'd get with the sql above)
EDIT: What I'm supposed to get:
Last updated series:
Stranger Things
Seriesname
Sometrashseries
What I'd get with my sql code:
Stranger Things
Seriesname
Sometrashseries
Stranger Things (again)
If I understood correctly, here is the latest trailer for the latest episodes (latest as in the highest series ID / series_trailer ID, so most likely added lastest).
WITH MostRecentTrailers
AS (
SELECT MAX(st.ID) "TRAILERID"
,s.ID "SERIESID"
,s.TITLE "SERIESTITLE"
FROM series_trailers st
JOIN series_episodes se ON se.ID = st.EPISODEID
JOIN series s ON s.ID = se.SERIESID
GROUP BY s.ID
,s.TITLE
ORDER BY s.ID DESC
)
SELECT *
FROM MostRecentTrailers mrt
JOIN series_trailers st ON st.ID = mrt.TRAILERID
Let me know if that does it for ya.
Edit: Fixed some typo mistakes.
This gives you the trailer with the highest ID for each episode. This answer is based on the assumption that the episode with the highest ID is the latest one.
select id, content from series_trailer where episode_id in
(select max(id)
from series_episodes
group by seriesid)

Triple joins with SQL?

My database represents a library. Each book is tagged with multiple things, so that one title might be tagged 'science fiction', 'short stories', and 'Russian'.
There are three tables: books, tags, and books_tag_link. They look like this:
Books
ID | TITLE
-----------------------------
1 | Rendezvous With Rama
2 | Howl and Other Poems
3 | A Short History of Nearly Everything
Tags
ID | TAGNAME
-----------------------------
1 | science fiction
2 | fiction
3 | poetry
Books_Tag_Link
BOOK | TAG
-----------------------------------
1 | 1
1 | 2
2 | 3
Hopefully you can see how that would work. The books_tag_link table has two foreign keys, and links books to tags; each book has many tags, each tag is associated with many books. I don't know if this is the best way to do it but it's what the OSS library program Calibre does, and that's what I'm kind of using as a reference as I study.
Now what I want to do is say "select all fiction books". But I can't quite work out the proper way to express that thought in SQL. Select books.title where books.id = tags.id = books_tag_link.tag... or something. I'm not sure.
Can someone help me out with a tip or explanation of what I should be doing?
I'm using SQLite at the moment but MySQL-specific advice would be fine too.
SELECT b.title
FROM Books AS b
JOIN Books_Tag_Link AS bt ON b.id = bt.book
JOIN Tags AS t ON t.id = bt.tag
WHERE t.tagname = 'fiction'
Something like that?
select b.title
from Books b join Books_Tag_Link btl on btl.BOOK=b.ID
join Tags t on t.ID=btl.TAG
where t.TAGNAME='fiction';
Caveat: if all tables are large, you have to make sure that the fields mentioned in JOIN are keys (indexes).
You need to use two joins:
select
books.title
from
books
inner join
books_tag_link on
books_tag_link.book = book.id
inner join
tags on
tags.id = books_tag_link.tag
where
tag.tagname = 'fiction'

SQL Joins issue

I have 3 database tables.
First one containing Ingredients, second one containing Dishes and the third one which is conecting both Ingredients and Dishes.
Adding data to those tables was easy but I faced a problem while trying to select specific content.
Reurning all ingredients for specific dish.
SELECT *
FROM Ingredient As I
JOIN DishIngredients as DI
ON I.ID = DI.IngredientID
WHERE DI.DishID = 1;
But If i try to query for dish Name and Description no matter what kind o join I use i always get number of results equal to number of used Ingredients. If i have 4 ingredients in my dish then select returns Name and Description 4 times, how can I modify my slect to select those values just once?
Here is result of my query (same as hawk's) if i try to select Name and Description. I am using MS SQL.
ID Name Description DishID IngredientID
-- -------------------- -------------------------------------------------------------------- ------ ---------
1 Spaghetti Carbonara This delcitious pasta is made with fresh Panceta and Single Cream 1 1
1 Spaghetti Carbonara This delcitious pasta is made with fresh Panceta and Single Cream 1 2
Kuzgun's query worked fine for me. However from your sugestions I see that I dont really need join between DishIngredient and Dish.
When I need Name and Descritpion I can simply go for
SELECT * FROM Dish WHERE ID=1;
Wehn I need list of Ingredient I can use my above query.
If you need to display both dish details and ingredient details, you need to join all 3 tables:
SELECT *
FROM Ingredient As I
JOIN DishIngredients as DI
ON I.ID = DI.IngredientID
JOIN Dish AS D
ON D.ID=DI.DishID
WHERE DI.DishID = 1;
If you don't care about ingredient,you don't have to use the table DishIngredient.Just use tale Dish.select * from dish d where d.id=1.
If you want to know what the ingredient is ,the sql that you use just query the id of table ingredient.It's useless.Because of the design of your database ,a little redundancy is a must .
select * from dish d join dishingredient di on d.id=di.dishid join ingredient i on
i.id=di.ingredientid where d.id=1
Of course,you will get number of results that contain dish's name and description.
If you want to get the full information but the least redundancy,you can do it in two step:
select * from dish d where d.id=1;
select * from ingredient i join DishIngredient di on i.id=di.ingredientid where di.dishid=1
In java ,you can write a class to represent a dish and a list to represent the ingredients it use.
public class Dish {
BigDecimal id;
String name;
String description;
List<Ingredient> ingredient;
}
class Ingredient{
BigDecimal id;
String name;
.....
}