SQL - HAVING with recursive statement - sql

I am not sure the title corresponds to my problem but this is the best way I could describe it.
I was wondering if it was possible theoretically to have a query with a recursive statement in the having clause. For instance:
SELECT a_name
FROM (
SELECT Author.id as a_id, Author.name as a_name, COUNT ( * ) as science_fiction_per_author
FROM Title, Author, Title_tags, Tags, Publication_authors, Publication, Publication_content
WHERE ...
GROUP BY a_id
HAVING science_fiction_per_author = MAX(science_fiction_per_author)
);

It's not really recursion but you can use the same phrase twice to find the individual with the highest count. This is based on the movie database at http://sqlzoo.net/wiki/More_JOIN_operations
The phrase
SELECT actor.name,count(movieid) moviesPerActor
FROM actor join casting ON actor.id=actorid
GROUP BY actor.name
Lists each actor name and the number of movies made.
The same phrase is used twice. The A version is used to find the highest number of movies made by a single actor, the B version is used to identify that actor.
SELECT B.name from
(SELECT MAX(moviesPerActor) as highestNumberOfMovies
FROM(
SELECT actor.name,count(movieid) moviesPerActor
FROM actor join casting ON actor.id=actorid
GROUP BY actor.name
) as A) as A1
JOIN
(
SELECT actor.name,count(movieid) moviesPerActor
FROM actor join casting ON actor.id=actorid
GROUP BY actor.name
) as B
ON B.moviesPerActor = highestNumberOfMovies
In some versions of SQL (including MSSQL and Oracle) you can use windowing functions to identify the element with the highest value more efficiently.

Related

SQL: Find all rows in a table when the rows are a foreign key in another table

The caveat here is I must complete this with only the following tools:
The basic SQL construct: SELECT FROM .. AS WHERE... Distinct is ok.
Set operators: UNION, INTERSECT, EXCEPT
Create temporary relations: CREATE VIEW... AS ...
Arithmetic operators like <, >, <=, == etc.
Subquery can be used only in the context of NOT IN or a subtraction operation. I.e. (select ... from... where not in (select...)
I can NOT use any join, limit, max, min, count, sum, having, group by, not exists, any exists, count, aggregate functions or anything else not listed in 1-5 above.
Schema:
People (id, name, age, address)
Courses (cid, name, department)
Grades (pid, cid, grade)
I satisfied the query but I used not exists (which I can't use). The sql below shows only people who took every class in the Courses table:
select People.name from People
where not exists
(select Courses.cid from Courses
where not exists
(select grades.cid from grades
where grades.cid = courses.cid and grades.pid = people.id))
Is there way to solve this by using not in or some other method that I am allowed to use? I've struggled with this for hours. If anyone can help with this goofy obstacle, I'll gladly upvote your answer and select your answer.
As Nick.McDermaid said you can use except to identify students that are missing classes and not in to exclude them.
1 Get the complete list with a cartesian product of people x courses. This is what grades would look like if every student has taken every course.
create view complete_view as
select people.id as pid, courses.id as cid
from people, courses
2 Use except to identify students that are missing at least one class
create view missing_view as select distinct pid from (
select pid, cid from complete_view
except
select pid, cid from grades
) t
3 Use not in to select students that aren't missing any classes
select * from people where id not in (select pid from missing_view)
As Nick suggests, you can use EXCEPT in this case. Here is the sample:
select People.name from People
EXCEPT
select People.name from People AS p
join Grades AS g on g.pid = p.id
join Courses as c on c.cid = g.cid
you can turn the first not exists into not in using a constant value.
select *
from People a
where 1 not in (
select 1
from courses b
...

What am I missing this? Do I need to use JOIN or UNION or a subquery?

I'm trying to get some practice making queries with SQL.
I'm working with a playground that uses SQLite.
There are two tables:books_north and books_south
Both have columns for: id, title, author, genre and first_published
The query I'm trying is to generate a report that lists the book titles from both locations and count the total number of books with the same title.
I can't work out how to even get started with the count.
So far I have
SELECT title
FROM books_north
INNER JOIN books_south
ON books_north.title = books_south.title;
But it just says that title is an ambiguous column.
How do I do this? Thank you
You need UNION ALL to get the count of each title
Select Title,Count(1) as [Count]
From
(
SELECT title FROM books_north bn
union all
select title from books_south bs
) A
Group by Title
Another approach using FULL OUTER JOIN (If your RDBMS supports)
SELECT COALESCE(bn.Title, bs.title) as title,
( bn.[count] + bs.[count] ) AS [Count]
FROM (SELECT title,
Count(1) AS [count]
FROM books_north
GROUP BY title) bn
FULL OUTER JOIN (SELECT title,
Count(1) AS [count]
FROM books_south
GROUP BY title) bs
ON bn.Title = bs.Title
Regarding your error message, Title column is present in the both the table so when you select the Title column you need to tell the compiler from which table you want to select Title column. It can be done by giving a alias name to the tables in Join
SELECT COUNT( DISTINCT a.title) AS TITLECount
FROM books_north a
INNER JOIN books_south b
ON a.title = b.title;
A simple inner join would be sufficient to get your count. Use a table alias in SELECT to remove the ambiguity of column title as it is present in both the tables.
In regard to the error that you mentioned:
The problem is SELECT title - it asks you to be specific about where title shall be read from, books_north or books_south.
So you need to tell either SELECT book_north.title or SELECT book_south.title, that's all there is regarding the ambiguity error.
The count is explained in the other answers. You need to learn group by if you want to display title and count (it is basically a group by title in your case.
SELECT books_north.title, count(books_north.title)
FROM books_north
INNER JOIN books_south
ON books_north.title = books_south.title
GROUP BY books_north.title;
This works:
SELECT title, COUNT(title) count FROM
(
SELECT title FROM books_north
UNION ALL
SELECT title FROM books_south
)
GROUP BY title;

SQL inner join with count() condition, and relationnal algebra

I have these tables:
Movies (id, name)
Cast (idmovie, actor)
And I would like to count the number of actors for each movie and then only get movies with more than 10 actors. I have a query to count the number of actors for each movie, which goes like this:
SELECT idmovie, count(actor) FROM Cast GROUP BY idmovie HAVING count(actor) > 10;
Now, I wonder how to get that result and join it to the Movies table.
I tried:
SELECT name FROM Movies INNER JOIN (SELECT idmovie FROM Cast GROUP BY idmovie HAVING count(actor) >2) Cast ON Cast.idmovie = Movies.id;
But it doesn't work.
I also have to translate it to relational algebra.
π name (σ (count(σ id = idmovie))) Movies⨝Cast
Which is obviously wrong...
Any help?
Try this...
SELECT m.name, COUNT(c.actor) AS 'ActorsCount'
FROM Movies m INNER JOIN [Cast] c ON m.id = c.idmovie
GROUP BY m.name HAVING COUNT(c.actor) > 10;
The query looks correct to me except perhaps that you aliased the nested query with Cast which is also the name of a table. I'm not sure what effect that'd have but I'd expect it to confuse MySQL. Try the following:
SELECT name FROM Movies INNER JOIN
(SELECT idmovie FROM Cast GROUP BY idmovie HAVING count(actor) >2) CastCount
ON CastCount.idmovie = Movies.id;
I didn't try it, but I think that'll work

PostgreSQL - Query with aggregate functions

I need some help for a PostgreSQL query.
I have 4 tables involved on it: customer, organization_complete, entity and address. I retrieve some data from everyone and with this query:
SELECT distinct ON (c.customer_number, trim(lower(o.name)), a.street, a.zipcode, a.area, a.country)
c.xid AS customer_xid, o.xid AS entity_xid, c.customer_number, c.deleted, o.name, o.vat, 'organisation' AS customer_type, a.street, a.zipcode, a.city, a.country
FROM customer c
INNER JOIN organisation_complete o ON (c.xid = o.customer_xid AND c.deleted = 'FALSE')
INNER JOIN entity e ON e.customer_xid = o.customer_xid
INNER JOIN address a ON (a.contact_info_xid = e.contact_info_xid and a.address_type = 'delivery')
WHERE c.account_xid = "<value>"
I get a distinct of all the customers splitted by customer_number, name, street, zipcode, area and country (what's specified after the DISTINCT ON statement).
What I need to retrieve now is a distinct of all customers having a doubled row on DB but I also need to retrieve the customer_xid and the entity_xid, that are primary keys of the respective tables and so are unique. For this reason they can't be included into an aggregate function. All I need is to count how many rows with the same customer_number, name, street, zipcode, area and country I have for each distinct tuple and to select only tuples with a count bigger than 1.
For each selected tuple I need also to take a customer_xid and an entity_xid, at random, like MySQL would do with a_key in a query like this:
SELECT COUNT(*), tab.a_key, tab.b, tab.c from tab
WHERE 1
GROUP BY tab.b
I know MySQL is quite an exception regarding this, I just want to know if may be possible to obtain the same result on PostgreSQL.
Thanks,
L.
This query in MySql is using a nonstandard (see note below) "MySql group by extension": http://dev.mysql.com/doc/refman/5.0/en/group-by-extensions.html
SELECT COUNT(*), tab.a_key, tab.b, tab.c
from tab
WHERE 1
GROUP BY tab.b
Note: This is a feature definied in SQL:2003 Standard as T301 Functional dependencies, it is not required by the standard, and many RDBMS don't support it, including PostgreSql (see this link for version 9.3 - unsupported features: http://www.postgresql.org/docs/9.3/static/unsupported-features-sql-standard.html ).
The above query could be expressed in PostgreSQL in this way:
SELECT tab.a_key, tab.b, tab.c,
q.cnt
FROM (
SELECT tab.b,
COUNT(*) As cnt,
MIN(tab.unique_id) As unique_id /* could be also MAX */
from tab
WHERE 1
GROUP BY tab.b
) q
JOIN tab ON tab.unique_id = q.unique_id
where unique_id is a column that uniquely identifies each row in tab (usually a primary key).
Min or Max functions choose one row from the table in a pseudo-random manner.

How can I get ordered distinct IDs with pagination?

Let's say I have two tables: Person and Address. Both have a numeric 'id' column, and a person record can have multiple addresses (foreign key 'Address.person_id' which references 'Person.id').
I now want to
search persons with criteria on both the person and it's addresses
sort the result by person/address attributes, and
return the distinct person ids
using pagination (additional restriction on row range, calculated by page number and page size)
Getting the non-distinct person ids is quite simple:
select p.id from person p
left join address a on a.person_id = p.id
where p.firstname is not null
order by a.city, p.lastname, p.firstname
But now I can't just select the distinct(p.id), as I have an order, which cannot be applied unless I select the order criteria as well.
If I wrap the SQL-snippet above with select distinct(id) from (...), I get the distinct ids, but lose the order (ids come in arbitrary order, probably due to hashing)
I came up with a generic but rather impractical solution which works correctly doesn't satisfy me yet (3 outer selects):
select id from (
select id, rownum as r from (
select distinct(ID), min(rownum) from (
select p.id from person p
left join address a on a.person_id = p.id
where p.firstname is not null
order by a.city, p.lastname, p.firstname
)
group by (id)
order by min(rownum)
)
) where r>${firstrow} and r<=${lastrow}
(Placeholders ${firstrow} and ${lastrow} will be replaced by values calculated from page number and page size)
Is there a better way to just get the ordered distinct IDs with
pagination?
I'm implementing these searches using the Hibernate Criteria API, can I somehow realize the outer selects as a Projection in Hibernate, or create my own projection implementation which does this?
you basically want to sort the persons by their min address (not sure this makes any sense to me, but it should only make sense to you). in this case you can try
select person_id
from (
select a.person_id , min(a.city || p.lastname || p.firstname)
from person p left join address a
on (a.person_id = p.id)
where p.firstname is not null
group by a.person_id
order by 2 )
where rownum < x
couple of technical notes -
if every person has an adress lose the left join.
if you'r using group by you dont need to specify distinct.