Ordering a has_many :through by a condition / subquery in rails - sql

I have something like the following (simplified and kittenified):
Kitten has a name
Kitten has many-to-many relationship with VisitedCountries (through KittensVisitedCountry)
VisitedCountry has a country_code such as 'us', 'gb', 'fr' etc.
I want to try and order my kittens by whether they've been to the UK (country code 'gb'), then by age (desc). For example:
bob has been to 'fr' and 'us', and is 3yo
alice has been to 'us' and 'gb' and is 4yo
frances has been nowhere, but is 6yo
colin has been to 'gb' and is 2yo
So in this case, I would want them ordered as:
alice (4) - because she is the oldest that has been to the UK
colin (2) - because he is the youngest that has been to the UK
frances (6) - because she is the oldest that hasn't been to the UK
bob (3)
The closest I've been able to get - and it might work if it wasn't a many-to-many, is the following:
Kitten.joins(:visited_countries)
.select('kittens.*', "(visited_countries.country_code = 'gb') as has_visited_gb")
.order('has_visited_gb DESC', 'age DESC')
That gave me:
alice (4, gb us}
Colin (2, gb}
alice (4, gb us}
bob (3, us fr}
bob (3, us fr}
Obviously this isn't quite what we want - repetition, and poor frances doesn't even appear.
I'm assuming some kind of subquery is needed, but my sql-fu is rusty. I'm using postgres underneath, if that's any help.
Edit: I should mention, I've tried using .includes instead of .joins - I believe it attempts multiple SQL statements when you try that, so the select can't be applied - you get this error:
Kitten Load (0.7ms) SELECT kittens.*, (visited_countries.country_code = 'gb') as has_visited_gb FROM "kittens" ORDER BY has_visited_gb DESC, age DESC LIMIT $1 [["LIMIT", 11]]
ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR: missing FROM-clause entry for table "visited_countries"
LINE 1: SELECT kittens.*, (visited_countries.country_code = 'gb') a...
^
: SELECT kittens.*, (visited_countries.country_code = 'gb') as has_visited_gb FROM "kittens" ORDER BY has_visited_gb DESC, age DESC LIMIT $1
Edit 2:
I've feel like it might simplify it further by ignoring the VisitedCountry table and hard-code the UK's id in the query:
Kitten.joins(:kittens_visited_countries).
select('kittens.*', "(kittens_visited_countries.visited_country_id = 1) as has_visited_gb").
order('has_visited_gb DESC', 'age DESC')
this gives me:
alice (4, gb us}
Colin (2, gb}
alice (4, gb us}
bob (3, us fr}
bob (3, us fr}
But still duplicated, still missing Frances. If I could get .includes() to do a join instead of multiple queries, this would work - but I don't know how to encourage that.

Related

Filter condition using "parent" CurrentMember

Here is the data-set:
CREATE TABLE Movies(id INT, name VARCHAR(50), genre VARCHAR(50), budget DECIMAL(10));
INSERT INTO Movies VALUES
(1, 'Pirates of the Caribbean', 'Fantasy', 379000000),
(2, 'Avengers', 'Superhero', 365000000),
(3, 'Star Wars', 'Science fiction', 275000000),
(4, 'John Carter', 'Science fiction', 264000000),
(5, 'Spider-Man', 'Superhero', 258000000),
(6, 'Harry Potter', 'Fantasy', 250000000),
(7, 'Avatar', 'Science fiction', 237000000);
To filter relatively to a constant value no problem, e.g. to get all the movies with a budget higher than 300M$:
WITH
MEMBER X AS SetToStr(Filter(Movie.[Name].[Name].Members - Movie.[Name].CurrentMember, Measures.Budget > 300000000))
SELECT
Movie.[Name].[Name].Members ON ROWS,
X ON COLUMNS
FROM
Cinema
Which gives:
Avatar {[Movie].[Name].&[Avengers],[Movie].[Name].&[Pirates of the Caribbean]}
Avengers {[Movie].[Name].&[Pirates of the Caribbean]}
Harry Potter {[Movie].[Name].&[Avengers],[Movie].[Name].&[Pirates of the Caribbean]}
John Carter {[Movie].[Name].&[Avengers],[Movie].[Name].&[Pirates of the Caribbean]}
Pirates of the Caribbean {[Movie].[Name].&[Avengers]}
Spider-Man {[Movie].[Name].&[Avengers],[Movie].[Name].&[Pirates of the Caribbean]}
Star Wars {[Movie].[Name].&[Avengers],[Movie].[Name].&[Pirates of the Caribbean]}
But how to compare to the budget of the current movie instead of the hard-coded 300M$ to get the movies more expensive than the current one?
It would give {} for "Pirates of the Caribbean" as it is the most expensive movie.
For "Avengers" it would be { 'Pirates of the Caribbean' } as this is the second most expensive and only "Pirates of the Caribbean" is more expensive.
For "Avatar" it would give all the other movies as it is the less expensive.
The issue is that inside the Filter function's condition CurrentMember refers to the currently tested tuple and not the one currently selected on the ROWS axis.
Instead of using Filter() for each movie, I would first compute an ordered set of movies based on budget values. Then X could be defined using the SubSet and Rank function.
Here is an example using a different schema but I guess you'll get the point easily:
with
set ordered_continents as order( [Geography].[Geography].[Continent], -[Measures].[#Sales] )
member xx as SetToStr( SubSet( ordered_continents, 0, Rank( [Geography].[Geography].currentMember, ordered_continents) - 1))
select {[#Sales], [xx] } on 0, [Geography].[Geography].[Continent] on 1 from [Sales]
I'm not familiar with SSAS so I'm using icCube but I guess the MDX should be very much similar.

How to get the differences between two rows **and** the name of the field where the difference is, in BigQuery?

I have a table in BigQuery like this:
Name
Phone Number
Address
John
123456778564
1 Penny Lane
John
873452987424
1 Penny Lane
Mary
845704562848
87 5th Avenue
Mary
845704562848
54 Lincoln Rd.
Amy
342847327234
4 Ocean Drive Avenue
Amy
347907387469
98 Truman Rd.
I want to get a table with the differences between two consecutive rows and the name of the field where occurs the difference:
I mean this:
Name
Field
Before
After
John
Phone Number
123456778564
873452987424
Mary
Address
87 5th Avenue
54 Lincoln Rd.
Amy
Phone Number
342847327234
347907387469
Amy
Address
4 Ocean Drive Avenue
98 Truman Rd.
How can I do this ? I've looked on other posts but couldn't find something that corresponds to my need.
Thank you
Consider below BigQuery'ish solution
select Name, ['Phone Number', 'Address'][offset(offset)] Field,
prev_field as Before, field as After
from (
select timestamp, Name, offset, field,
lag(field) over (partition by Name, offset order by timestamp) as prev_field
from yourtable,
unnest([`Phone Number`, Address]) field with offset
)
where prev_field != field
if applied to sample data in your question - output is
As you can see here - no matter how many columns in your table that you need to compare - it is still just one query - no unions and such.
You just need to enumerate your columns in two places
['Phone Number', 'Address'][offset(offset)] Field
and
unnest([`Phone Number`, Address]) field with offset
Note: you can further refactor above using scripting's execute immediate to compose such lists within the query on the fly (check my other answers - I frequently use such technique in them)
One method is just use to use lag() and union all
select name, 'phone', prev_phone as before, phone as after
from (select name, phone,
lag(phone) over (partition by name order by timestamp) as prev_phone
from t
) t
where prev_phone <> phone
union all
select name, 'address', prev_address as before, address as afte4r
from (select name, address,
lag(address) over (partition by name order by timestamp) as prev_address
from t
) t
where prev_address <> address

How to filter query based on table where they dont share a field?

Ok so i've been working on this SQL statement for while and just cant figure it out. I need to be able to see the top 5 authors that clients have borrowed from in 2017. My tables look like this
1. Client - fields (clientId,clientFirstName,clientlastName,clidentDoB)
2.author - fields (AuthorID, AuthorFirstname, AuthorLastname,AuthorNation )
3. book - fields (BookId, BookAuthor,BookTitle,BookGenre)
4. borrower - fields (borrowID,BorrowDate, ClientID,BookId)
so I understand that I need to pull the names from author table, based on the number of books borrowed, I also understand that borrower.bookId is equal to book.BookId and Author.AuthorID is equal to Book.BookAuthor.
I should be able to set it so that it sees books borrowed in 2017, then filters by the most popular by taking the borrowBookId and adding each instance of the same Id together and seeing what bookID matches BookAuthor in book table and then use that to compare ID to get the first and last name printed.
I have tried
SELECT author.authorfirstname,author.authorlastname
FROM author Join ON author.authorid = book.bookauthor
WHERE (borrower.borrowdate <='31/12/2017' AND borrower.borrowdate >= '01/01/2017');
I know this won't work but Im not sure how to get that bridge from author to borrower.
Sample data and expected output from it.
ok sure. Lets say I have 4 authors and we want the top 3. We also have only borrows in 2017 counting. The client table isn't really needed for this so lets fill in the others with some data. table field names same as in original question sample for this would be
author table
(1,bob,ross, USA)
(2, fred, martin, USA)
(3, alex,joe,CAN)
(4, dan, reed, can)
Book table
(1,1,bobsbook,fantasy)
(2,1,bobagain,fantasy)
(3,1,returnofbob,fantasy)
(4,2,fredsadventure,fantasy)
(5, 2, fedagain, fantasy)
(6, 2, fedstrikes, fantasy)
(7,3,alexjoes, fantasy)
(8, 3, alexjoeagain,fantasy)
(9,4, dansbook, fantasy)
borrow table
(1, 20/01/2017,,1, 1)
(2, 20/01/2017,,3, 2)
(3, 20/01/2017,,2, 1)
(4, 20/01/2017,,1, 3)
(5, 20/01/2017,,6, 2)
(6, 20/01/2017,,8, 4)
(7, 20/01/2017,,4, 4)
(8, 20/01/2017,,9, 6)
(9, 20/01/2017,,2, 7)
(10, 20/01/2017,,3, 9)
(11, 20/01/2017,,4, 9)
the end result would be
AuthorFirstName AuthorLastname
bob ross
Fred Martin
Dan Reed
This is because they had the most borrows in 2017 date range, they are in order with bob at 5, fred at 3 and dan at 2. It also only prints the top 3 people so alex joe is left off the list.
code given to be by #fahmi
SELECT author.authorfirstname,author.authorlastname
FROM author
Join book ON author.authorid = book.bookauthor
join borrower on book.bookid=borrower.bookid
WHERE borrower.borrowdate <='31/12/2017' AND borrower.borrowdate >= '01/01/2017'
;
this has given me a list of the authors and it lists each instance of the burrow but I need the list merged so I only see the name one and so that it is in order by most borrows and limits it self.
Use TOP-N query feature from Oracle,
select au.authorfirstname
,au.authorlastname
from author au
join book bk
on au.authorid = bk.bookauthor
join borrower bw
on bk.bookid=bw.bookid
where extract(year from bw.borrowdate)
= 2017
group by au.authorfirstname
,au.authorlastname
order by count(bk.bookid) desc
fetch first 3 rows with ties;
You need another join to borrower with book.bookid=borrower.bookid relationship
SELECT top 3 author.authorfirstname,author.authorlastname,count(borrower.bookid) as cnt
FROM author
Join book ON author.authorid = book.bookauthor
join borrower on book.bookid=borrower.bookid
WHERE borrower.borrowdate <='31/12/2017' AND borrower.borrowdate >= '01/01/2017'
group by author.authorfirstname,author.authorlastname
order by cnt desc

Pset7 - Movies Stuck on 12 and 13 SQL?

I am currently working of CS50 PSET7 (https://cs50.harvard.edu/x/2020/psets/7/movies/) and I CAN NOT figure out how to do 12.sql and 13.sql (explained in link). Can someone PLEASE help me?
For 12.sql: Find movie titles where 'id' in "id's of Johnny Depp movies" and 'id' in "id's of Helena Bonham Carter movies", such as:
SELECT "title" FROM "movies"
WHERE "id" IN (-- code to select movie id's in which "Johnny Depp" starred)
AND "id" IN (-- code to select movie id's in which "Helena Bonham Carter" starred);
For 13.sql: Find names of people where "person_id's" in "stars" correspond to the "movie_id" in which "Kevin Bacon (born: 1958)" starred, and names != "Kevin Bacon", such as:
SELECT "name" FROM "people"
WHERE "id" IN
(-- select "person id's" from "stars" where "movie id" in
(-- select "movie id's" in which "Kevin Bacon (born: 1958)" starred))
AND "name" != "Kevin Bacon";
Inside the second brackets of 13.sql, to query "Kevin Bacon born in 1958", you can write some code like this:
... WHERE "people"."name" = "Kevin Bacon" AND "people"."birth" = 1958))...
Think simple, no need to do anything fancy.
12.sql
Consider using HAVING COUNT()
https://www.w3resource.com/sql/aggregate-functions/count-having.php
13.sql
As I've also answered in another thread, I found these steps helpful:
Get the ID of Kevin Bacon, with the criteria that it's the Kevin Bacon who was born in 1958
Get the movie IDs of Kevin Bacon using his ID (hint: linking his ID in table1 with table2)
Get other stars' IDs with the same movie IDs
Get the name of these stars, and exclude Kevin Bacon (because the spec says he shouldn't be included in the resulting list)
For both of these Psets you need to use nested SELECT statements e.g.:
SELECT table.column FROM table WHERE table.column IN (SELECT table.column2 FROM table WHERE ...)
Based on my experience for 12 you will need to use 2 separate nested queries (each of which should have multiple values) and then use an AND operator to find movies that appear in both of these.
For 13 I found using several nested queries helped, starting with finding the id for Kevin Bacon and working up to selecting people. name values from a query that contained multiple possible people.id values.

Fuzzy grouping in Postgres

I have a table with contents that look similar to this:
id | title
------------
1 | 5. foo
2 | 5.foo
3 | 5. foo*
4 | bar
5 | bar*
6 | baz
6 | BAZ
…and so on. I would like to group by the titles and ignore the extra bits. I know Postgres can do this:
SELECT * FROM (
SELECT regexp_replace(title, '[*.]+$', '') AS title
FROM table
) AS a
GROUP BY title
However, that's quite simple and would get very unwieldy if I tried to anticipate all the possible variations. So, the question is, is there a more general way to do fuzzy grouping than using regexp? Is it even possible, at least without breaking one's back doing it?
Edit: To clarify, there is no preference for any of the variations, and this is what the table should look like after grouping:
title
------
5. foo
bar
baz
I.e., the variations would be items that are different just by a few characters or capitalization, and it doesn't matter which ones are left as long as they're grouped.
For any grouping you should have transitive equality, that is a ~= b, b ~= c => a ~= c.
Formulate it strictly using words and we'll try to formulate it using SQL.
For instance, which group should foo*bar go to?
Update:
This query replaces all non-alphanumerical characters with spaces and returns first title from each group:
SELECT DISTINCT ON (REGEXP_REPLACE(UPPER(title), '[^[:alnum:]]', '', 'g')) title
FROM (
VALUES
(1, '5. foo'),
(2, '5.foo'),
(3, '5. foo*'),
(4, 'bar'),
(5, 'bar*'),
(6, 'baz'),
(7, 'BAZ')
) rows (id, title)
At some time, you are going to have to define what makes a set of values belong together in a group. If that's too hard, maybe you should prohibit and inhibit the entry of fuzzy data, or if you must permit it, add a column that contains a sanitized version of the title for use by the grouping operations.