I am new to Neo4j and wondering if Cypher has functions like GROUP BY in SQL.
Here is my code:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
RETURN m.title AS movie, p.name AS actor
Here is my result from above query:
movie actor
"The Matrix" "Emil Eifrem"
"The Matrix" "Carrie-Anne Moss"
"The Matrix" "Keanu Reeves"
"The Matrix Reloaded" "Hugo Weaving"
"The Matrix Reloaded" "Laurence Fishburne"
"The Matrix Revolutions" "Hugo Weaving"
"The Matrix Revolutions" "Laurence Fishburne"
Here is the result I want to have:
movie actor num_of_actors
"The Matrix" "Emil Eifrem" 3
"The Matrix" "Carrie-Anne Moss" 3
"The Matrix" "Keanu Reeves" 3
"The Matrix Reloaded" "Hugo Weaving" 2
"The Matrix Reloaded" "Laurence Fishburne" 2
"The Matrix Revolutions" "Hugo Weaving" 2
"The Matrix Revolutions" "Laurence Fishburne" 2
Basically I would like to have the number of actors played in each movie together with the original results.
Thanks in advance
You'll want to review the aggregation functions, which you can use within a WITH clause to do grouping.
For example, if you wanted to group the actor names with each movie, you could do this:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH m, collect(p.name) as actors
RETURN m.title AS movie, actors
That said, there are some shortcuts we can do here since you're asking about the total number of actors per movie (see our knowledge base article on using degree counts from a node instead of doing expansions).
If you wanted to keep a separate row per actor, but also have the number of actors, since we know :ACTED_IN relationships will never go to the same actor more than once, we can get the degree of :ACTED_IN relationships incoming to each :Movie node to get our count. For best performance, get the degree before you expand out to actors:
MATCH (m:Movie)
WITH m, m.title as title, size((m)<-[:ACTED_IN]-()) as num_of_actors
MATCH (p:Person)-[:ACTED_IN]->(m)
RETURN title, p.name as actor, num_of_actors
Related
I have 2 tables I am working with, one is movie_score which contains an id, name, and score. I have another table that is movie_cast which contains mid, cid, and name. Mid is movie id and cid is cast id. the problem I must do is as follows:
Find top 10 (distinct) cast members who have the highest average movie scores. The output list must be sorted by score (from high to low), and then, by cast name in alphabetical order, if/when they have the same average score. The search must NOT include: (a) movies with scores lower than 50 AND (b) cast members who have appeared in less than 3 movies (again, only counting the number of appearances in movies with scores of at least 50). (Expected Output: cid, cname, average score)
I have tried to put the command together but so far this is all I was able to get:
SELECT DISTINCT movie_cast.cid, movie_cast.cname, FROM movie_score INNER JOIN movie_cast ON movie_score.id=movie_cast.mid ORDER BY cname LIMIT 10;
movie-name-score.txt goes with movie_score:
Example of .txt file
9,"Star Wars: Episode III - Revenge of the Sith 3D",80
24214,"The Chronicles of Narnia: The Lion, The Witch and The Wardrobe",76
1789,"War of the Worlds",74
10009,"Star Wars: Episode II - Attack of the Clones 3D",67
771238285,"Warm Bodies",-1
770785616,"World War Z",-1
771303871,"War Witch",89
771323601,"War of the Worlds the True Story",-1
movie-cast.txt goes with movie_cast:
Example:
9,162652153,"Hayden Christensen"
9,162652152,"Ewan McGregor"
9,418638213,"Kenny Baker"
9,548155708,"Graeme Blundell"
9,358317901,"Jeremy Bulloch"
9,178810494,"Anthony Daniels"
9,770726713,"Oliver Ford Davies"
9,162652156,"Samuel L. Jackson"
9,162655731,"James Earl Jones"
I expect to have an output something like:
162655731,"James Earl Jones",average score of the movies they have been in
Does anyone know the best way to create this command?
For an exam, I am asked to get the list of clients having more than one rent, both as an SQL query and as an algebraic expression.
For some reasons, the correction doesn't provide the algebraic version.
So now I am left with:
SELECT IdClient, Name, ...
FROM Client
WHERE IdClient IN (
SELECT IdClient
FROM Rental
GROUP BY IdClient
HAVING COUNT(*) > 1
)
I don't know if there is a standard for algebraic notations, therefore:
Π projection
× Cartesian product
⋈ natural join
σ selection
Then I translate as:
Π IdClient, Name, ... (
σ (count(IdClient)>1) (Π Rental) ⋈ (Client ⋈ Rental)
)
But I find no source to prove me right or wrong, especially for:
the logic behind the math
Π Rental seems like a shady business
I saw the use of count() at https://cs.stackexchange.com/questions/29897/use-count-in-relational-algebra and while it isn't used the same way, I couldn't figure out a way to use it without the projection (which I wanted to avoid.)
There are many variants of "relational algebra", differing even on what a relation is. You need to tell us which one you are supposed to use.
Also you don't explain what it means for a pair of RA & SQL queries to "have the form of" or "be the same as" each other. (Earlier versions.) Same result? Or also some kind of parallel structure?
Also you don't explain what "get the list of clients" means. What attributes does the result have?
If you try to write a definition of the count you are trying to use in σ count(IdClient)>1 (...)--what it inputs & what it outputs based on that--you will see that you can't. That kind of count that takes just an attribute does not correspond to a relational operator. It is used in a grouping expression--which you are missing. Such count & group are not actually relational operators, they are non-terminals in so-called relational algebras that are really query languages, designed by SQL apologists, suggesting it is easy to map SQL to relational algebra, but begging the question of how we aggregate in an algebra. Still, maybe that's the sort of "relational algebra" you were told to use.
I saw the use of count() there https://cs.stackexchange.com/questions/29897/use-count-in-relational-algebra
The nature of algebras is that the only sense in which we "use" operators "with" other operators is to pass outputs of operator calls as inputs to other operator calls. (Hence some so-called algebras are not.) In your linked answer, grouping operator G inputs aggregate name count and attribute name name and that affects the output. The answer quotes Database System Concepts, 5th Ed:
G1, G2, ..., Gn G F1(A1), F2(A2), ..., Fm(Am) (E)
where E is any relational-algebra expression; G1, G2, ..., Gn constitute a list of attributes on which to group; each Fi is an aggregate function; and each Ai is an attribute name.
G returns rows with attributes G1, ..., A1, ... where one or more rows with the same G1, ... subrows are in E and each Ai holds the output from aggregating Fi on Ai over those rows.
But the answer when you read & linked it used that definition improperly. (I got it fixed since.) Correct is:
π name (σ phone>1 (name G count(phone) (Person)))
This is clear if you carefully read the definition.
G has misleading syntax. count(phone) is not a call of an operator; it's just a pair of arguments, an aggregate name count & an attribute name phone. Less misleading syntax would be
π name (σ phone>1 (name G count phone (Person)))
One does not need a grouping operator to write your query. That makes it all the more important to know what "relational algebra" means in the exam. It is harder if you can't use a grouping operator.
"Π Rental seems like a shady business" is unclear. You do use projection incorrectly; proper use is π attributes (relation). I guess you are using π in an attempt to involve a grouping operator like G. Re "the logic behind the math" see this.
I'm looking at movie ratings across a friend group and I'm trying to find the best way to measure how people rate movies compared to IMDb ratings.
Here you can see my table titled fRating, which is a fact table that contains the MovieID, the RaterID, and their UserRating:
This joins with dMovies, which is the dimension table for all of the movies. That table is shown below:
Now, how would I go about finding how much each rater varies from the IMDb ratings? Here is my attempt, using STDDEV:
SELECT a.[RaterID]
,Rater
,STDEV(([UserRating] - ROUND(b.IMDb_Rating,0))) as StdDev
FROM [IMDbRatings].[dbo].[fRating] a
JOIN [IMDbRatings].dbo.dMovies b ON a.MovieID = b.MovieID
JOIN [IMDbRatings].dbo.dRaterID c ON a.RaterID = c.RaterID
GROUP BY a.RaterID, Rater
Here are the results that I get from that:
Am I on the right track? Is this doing what I want it to do?
Thank you for your wisdom!
Suppose we have this relational schema
homebuilder(hID, hName, hStreet, hCity, hZip, hPhone)
model(hID, mID, mName, sqft, story) subdivision(sName,
sCity, sZip) offered(sName, hID, mID, price) lot(sName,
lotNum, lStAddr, lSize, lPremium) sold(sName, lotNum, hID, mID,
status)
I have problem by doing relational algebra for each subdivision , find the number of models offered and the average, minimum and maximum price of the models offered at that subdivision. Also display the result in descending order on the average price of a home.
I am done with SQL formula, but it hard for me to translate this SQL to relational algebra. Can someone help me?
Here is what I got so far:
SQL:=
SELECT S, avg (O.price), min (O.price), max (O.price), count(*)
FROM offered O, subdivision S
WHERE O.sName = S.sName
GROUP BY S.sName
ORDER BY 4 desc;
+1 to DPenner's comment: quite true that you can't do ordering in RA. (Although those q's and a's referenced seem to have some 'difficulties'.)
Another thing you can't do in RA (contra the SQL that JaveLeave shows) is to have anonymous columns referenced by position. If SQL were a sensible language (or indeed any sort of language at all), you could name the column in the SELECT clause ..., max (O.price) AS maxPrice, ... then ORDER BY maxPrice desc. But no, you can't do that. In SQL you have to repeat ORDER BY max (O.price) desc. (By the way, the question asked for ordering by average price, not max(?) That's column 2.)
Contrast that the RA Group operation returns a relation. And being a relation it must have attributes only addressable by name.
Back to the question as asked. The nearest you can get to an ordering is to put a column on each row with the ordinal position of this row relative to the overall table. Since the question asks for descending sequence, the first step is to find the subdivision with minimum average price and tag it with ordinal 1. Then select all but that one, get the minmium of those, tag it with 2. And in general: take all not tagged so far; get the minmum; tag it with highest tag so far +1; recurse. So you need the transitive closure operation (which is another 'missing' feature of standard RA). You can find some SQL code to achieve this sort of thing in comp.database.theory -- from memory Joe Celko gives examples.
Off-topic: I'm puzzled why courses/professors/textbooks in SQL also ask you to do impossible things in RA. Certainly it's good to have a grounding in RA. It's a powerful mental model to understand data structures that SQL only obscures. RA (as an algebra) underpins most SQL engines. But then why leave the impression that RA is some sort of 'poor cousin' to SQL? There are no commercial implementations of RA; there are no job advertisements for RA programmers. Why try to make it what it isn't?
I'm having trouble solving the following SQL requests:
Give the names of the actors that have acted in more films than 'sara allgood' and who have acted in films that won the 'cannes film festival'. Also, give the filmname.
Get the percentage of movies who won awards out of all movies produced between the years 1970 and 1990.
There are several tables but I'm assuming that only 4 are needed:
'films','remakes','casts', 'awtypes'
'films' attributes: filmid, filmname, year, director, studio, award
'remakes' attributes: filmid, title, year, priorfilm, prioryear
'casts' attributes: filmid, filmname, actor, award(10)
'awtypes' attributes: award(10), org(100), country, colloquial(50), year
It's a bit unclear to me how to match the award to the 'Cannes film festival' in the first query since the award field is only 10 characters meaning it is a reference to the awtypes table but I don't know which field in the awtypes table contains the name of the award and I don't have access to the database at the moment so it's either org or colloquial.
As for the second I don't know how I could compute the percentage but it seems that it should be solved using a union operator for the movies produced between 1970 and 1990 and the films that have won an award (I don't know how to place a condition for having at least one award).
A few hints, I hope they help you!
Give the names of the actors that have acted in more films than 'sara
allgood' and who have acted in films that won the 'cannes film
festival'. Also, give the filmname.
Based on the attributes you're stating, I would say that you can get to the right awtypes attribute via the casts table. They both contain the award(10) column. Given your data, I would expect the org(100) column to contain something on the organization that provides the prizes, so that would be my guess in this case for the cannes film festival content. But you would have to try it out and see what results you get. Unfortunately, as in this case, it is often quite hard to guess the contents of a column based only on column names.
Get the percentage of movies who won awards out of all movies produced
between the years 1970 and 1990.
Based on the info stated in your question, I would go with a guess that the award column in the films table contains a boolean or something that states if the movie won an award or not. You'd have to try this out. If that's the case, you can use a COUNT(*) on all movies between 1970 and 1990 and a COUNT(*) on all movies WHERE award = 1 (or something) to get the total numbers.
You could indeed combine these in a computation query with a UNION. Example that might help you:
SELECT SUM(cnt1) / SUM(cnt2) ... do the right computation here ...
FROM ( SELECT COUNT(*) AS cnt1
,0 AS cnt2
FROM table1
UNION ALL
SELECT 0 AS cnt1
,COUNT(*) AS cnt2
FROM table2) AS sub