how to perform these queries? - sql

I have these three tables:
create table albums(sernum number primary key,
Albname varchar2(30) not null,
Artist varchar2(20) not null,
Pdate number(4),
Recompany varchar2(10),
Media char(2) not null);
create table tracks(sernum number not null,
song varchar2(50) not null,
primary key(sernum, song),
foreign key(sernum) references albums(sernum));
create table performers(sernum number not null,
Artist varchar2(30) not null,
Instrument varchar2(50) not null,
primary key(sernum, Artist, Instrument),
foreign key (sernum) references albums(sernum));
I want to perform two queries in sql oracle:
list the names of the artists that used all instruments.
list the names of the albums containing the maximum number or songs.
here is my tries:
select distinct(a.Artist) from albums a where a.Artist like (select p.Artist, distinct(p.Instrument) from performers p) group by a.Artist, p.Instrument;
select a.Albname from albums a, inner join tracks t on where a.sernum in(select max(t.sernum) group by t.sernum);

Query 1 - get artists who have played all instruments:
SELECT
p.Artist
FROM
(
SELECT Artist, count(distinct Instrument) as InstrumentCount
FROM performers
GROUP BY artist
) p
JOIN
(
SELECT COUNT(DISTINCT Instrument) as InstrumentCount
FROM performers
) i
ON p.InstrumentCount = i.InstrumentCount
Explanation: 1st subquery gets the count of instruments played by each artist. 2nd subquery gets the count of unique instruments. The two are joined together based on this instrument count to give us only those artists whose instrument counts match the maximum.
--
Query 2 - Get albums containing the maximum number of songs:
WITH
AlbumTrackCount
(
SELECT
sernum,
COUNT(1) as TrackCount
FROM tracks
GROUP BY sernum
)
SELECT
a.Albname
FROM albums a
JOIN AlbumTrackCount atc
ON a.sernum = atc.sernum
AND atc.TrackCount =
(
SELECT MAX(TrackCount)
FROM AlbumTrackCount
)
Explanation: the WITH up top establishes a subquery we'll reuse; it gets us the track count within each album. Down below, we join the albums with this album track count, and add a filter that only those albums with a track count equal to the maximum track count of any of the albums. Note that this is different from the top query, which just got every instrument ever; here, it is important to first count up the tracks within each album, and then get the maximum of those counts.

Below are some of the issues with your queries:
SELECT DISTINCT (a.artist)
FROM albums a
WHERE a.artist LIKE (SELECT p.artist,
distinct(p.Instrument)
from performers p)
group by a.Artist, p.Instrument;
LIKE indicates that you're going to use a wildcard. When comparing against a sub-query in the where clause, you typically use in as the operator.
DISTINCT is not a function. It always applies to all of the columns in a SELECT statement.
DISTINCT and GROUP BY serve very similar purposes. You would rarely use both in the same statement.
You can't reference a column from a correlated sub-query (i.e. a query in the where clause), in the outer query.
SELECT a.albname
FROM albums a,
inner join tracks t
on
where a.sernum in(select max(t.sernum) group by t.sernum);
Your using both a comma and inner join to connect two tables. The comma indicates pre-SQL:1999 syntax, whereas INNER JOIN is SQL:1999. While, technically you can use both in a single FROM clause, you can't use both between a single pair of tables. Also, you shouldn't use both. Sticj to SQL:1999.
Your ON clause is empty. You should probably be joining your two tables here. If you really want to not have a join condition, change the join to CROSS JOIN (to re-iterate: you almost certainly don't actually want this).
You have a SELECT statement without a FROM clause. That is not allowed.

Related

Return all data when grouping on a field

I have the following 2 tables (there are more fields in the real tables):
create table publisher(id serial not null primary key,
name text not null);
create table product(id serial not null primary key,
name text not null,
publisherRef int not null references publisher(id));
Sample data:
insert into publisher (id,name) values (1,'pub1'),(2,'pub2'),(3,'pub3');
insert into product (name,publisherRef) values('p1',1),('p2',2),('p3',2),('p4',2),('p5',3),('p6',3);
And I would like the query to return:
name, numProducts
pub2, 3
pub3, 2
pub1, 1
A product is published by a publisher. Now I need a list of name, id of all publishers which have at least one product, ordered by the total number of products each publisher has.
I can get the id of the publishers ordered by number of products with:
select publisherRef AS id, count(*)
from product
order by count(*) desc;
But I also need the name of each publisher in the result. I thought I could use a subquery like:
select *
from publisher
where id in (
select publisherRef
from product
order by count(*) desc)
But the order of rows in the subquery is lost in the outer SELECT.
Is there any way to do this with a single sql query?
SELECT pub.name, pro.num_products
FROM (
SELECT publisherref AS id, count(*) AS num_products
FROM product
GROUP BY 1
) pro
JOIN publisher pub USING (id)
ORDER BY 2 DESC;
db<>fiddle here
Or (since the title mentions "all data") return all columns of the publisher with pub.*. After products have been aggregated in the subquery, you are free to list anything in the outer SELECT.
This only lists publisher which
have at least one product
And the result is ordered by
the total number of products each publisher has
It's typically faster to aggregate the "n"-table before joining to the "1"-table. Then use an [INNER] JOIN (not a LEFT JOIN) to exclude publishers without products.
Note that the order of rows in an IN expression (or items in the given list - there are two syntax variants) is insignificant.
The column alias in publisherref AS id is totally optional to use the simpler USING clause for identical column names in the following join condition.
Aside: avoid CaMeL-case names in Postgres. Use unquoted, legal, lowercase names exclusively to make your life easier.
Are PostgreSQL column names case-sensitive?

Easier way to limit rows in SELECT subquery?

I perform queries on an Oracle database. Let's say I have a table, PEOPLE. Each person can have multiple reference numbers. The reference numbers are stored in a different table, REFERENCENUMBERS.
REFERENCENUMBERS contains a column, PERSON_ID, which is identical to the ID column of the PEOPLE table. It is through this ID that the tables are joined.
Let's say I want to perform a query on the PEOPLE table. However I only want a single reference number returned per person record: i.e if a person has multiple reference numbers, I don't want multiple rows returned per person per reference number.
I choose a criterion for how to select only one reference number: the one which was created earliest. The date of reference number creation is stored in the REFERENCENUMBERS table as DATECREATED.
The following code does this job:
SELECT
PEOPLE.ID,
PEOPLE.NAME,
PEOPLE.AGE,
PEOPLE.ADDRESS,
-- Subquery to return the earliest-created reference number for this person
(
SELECT
REFERENCENUMBERS.NUMBER
FROM
REFERENCENUMBERS
WHERE
REFERENCENUMBERS.PERSON_ID = PEOPLE.ID -- Link back to the main people ID
AND REFERENCENUMBERS.DATECREATED =
-- Sub-sub query simply to match the earliest date
(
SELECT
MIN(R.DATECREATED) -- To ensure that only the earliest-created reference number is returned.
FROM
REFERENCENUMBERS R -- Give this sub-sub query an alias for the table
WHERE
R.PERSON_ID = PEOPLE.ID -- Link back to the main people ID
)
)
FROM
PEOPLE
WHERE
PEOPLE.AGE > 18 -- Or whatever
However, my question to you knowledgeable SQL people, is.. is there an easier way of doing this? It just appears cumbersome to have to include a sub-sub-query solely for the purpose of finding the earliest date, and limiting the WHERE clause of the sub-query.
There must be an easier, or cleaner way of doing this. Any suggestions?
(By the way - the sample code is greatly simplified from what I'm actually working on. Please don't provide answers which substantively modify my primary query with different-style JOINs etc - thanks).
The simplest would be a top-n filter:
select people.id
, people.name
, people.age
, people.address
, ( select referencenumbers.number
from referencenumbers
where referencenumbers.person_id = people.id
order by referencenumbers.datecreated
fetch first row only )
from people
where people.age > 18;
More details here (requires Oracle 12.1 or later.)
Or this (works in earlier versions):
select people.id
, people.name
, people.age
, people.address
, ( select min(rn.person_id) keep (dense_rank first order by rn.datecreated)
from referencenumbers rn
where rn.person_id = people.id )
from people
where people.age > 18;
(I gave referencenumbers a shorter alias for readability.)
Try this
SELECT
PEOPLE.ID,
PEOPLE.NAME,
PEOPLE.AGE,
PEOPLE.ADDRESS,
REFERENCENUMBERS.NUMBER
FROM PEOPLE
JOIN REFERENCENUMBERS ON REFERENCENUMBERS.PERSON_ID = PEOPLE.ID -- Link back to the main people ID
JOIN
(
SELECT
R.PERSON_ID,
MIN(R.DATECREATED) minc -- To ensure that only the earliest-created reference number is returned.
FROM
REFERENCENUMBERS R -- Give this sub-sub query an alias for the table
GROUP BY R.PERSON_ID
) t ON t.minc = REFERENCENUMBERS.DATECREATED and
t.PERSON_ID = REFERENCENUMBERS.PERSON_ID
WHERE
PEOPLE.AGE > 18 -- Or whatever

SQL*Plus Query help to sum and count from different table and get the correct answer

This is my query and is giving me the wrong amount. How can I fix it? It should be giving a total of 167700 but it is giving me 2515500 for the total loan. The count is working fine = 15. I can't change anything on the tables.
create table loan
(loan_number varchar(15) not null,
branch_name varchar(15) not null,
amount number not null,
primary key(loan_number));
create table customer
(customer_name varchar(15) not null,
customer_street varchar(12) not null,
customer_city varchar(15) not null,
primary key(customer_name));
select SUM(amount),
COUNT( distinct customer_name)
from loan,customer;
Simple rule: Never use commas in the FROM clause. Always use explicit, proper JOIN syntax with the conditions in the ON clause. Then you won't forget them!
So:
select SUM(amount),
COUNT( distinct customer_name)
from loan l join
customer c
on l.customerid = c.customerid;
Of course, I made up the names of the columns used for the join, because your question has no information describing the tables.
Duh! No common key in the two tables. How is that? How do you keep track of which customer took which loan?
Why are you running ONE query for data from two UNRELATED tables? The CROSS JOIN you created (by having no condition whatsoever on the enumeration of the two tables) simply joins every row from the first table to every row from the second table.
It appears the customer table has 15 rows, and all 15 names are distinct. When you COUNT DISTINCT, you get the correct number, even though in the cross join each customer_name appears many times.
On the other hand, each loan amount is repeated 15 times. 167,700 x 15 = 2,515,500.
If you need to show both the total loan amount and the number of (distinct) customers in a single row, you want something like
select (select sum(amount) from loan) as total_amount,
(select count (distinct customer_name) from customer) as distinct_customers
from dual
;

Doing a FULL OUTER JOIN in Sqlite3 to get the combination of two columns?

I'm currently working on a database project and one of the problems calls for the following:
The Genre table contains twenty-five entries. The MediaType table contains 5
entries. Write a single SQL query to generate a table with three columns and 125
rows. One column should contain the list of MediaType names; one column
should contain the list of Genre names; the third column should contain a count of
the number of tracks that have each combination of media type and genre. For
example, one row will be: “Rock MPEG Audio File xxx” where xxx is the
number of MPEG Rock tracks, even if the value is 0.
Recognizing this, I believe I'll need to use a FULL OUTER JOIN, which Sqlite3 doesn't support. The part that is confusing me is generating the column with the combination. Below, I've attached the two methods I've tried.
create view T as
select MediaTypeId, M.Name as MName, GenreId, G.Name as GName
from MediaType M, Genre G
SELECT DISTINCT GName, MName, COUNT(*) FROM (
SELECT *
FROM T
OUTER LEFT JOIN MediaType
ON MName = GName
UNION ALL
SELECT *
FROM Genre
OUTER LEFT JOIN T
) GROUP BY GName, MName;
However, that returned nearly 250 rows and the GROUP BY or JOIN(s) is totally wrong.
I've also tried:
SELECT Genre.Name as GenreName, MediaTypeName, COUNT(*)
FROM Genre LEFT OUTER JOIN (
SELECT MediaType.Name as MediaTypeName, Track.Name as TrackName
FROM MediaType LEFT OUTER JOIN Track) GROUP BY GenreName, MediaTypeName;
Which returned 125 rows but they all had the same count of 3503 which leads me to believe the GROUP BY is wrong.
Also, here is a schema of the database:
https://www.dropbox.com/s/onnbwqfrfc82r1t/IMG_2429.png?dl=0
You don't use full outer join to solve this problem.
Because it looks like a homework problem, I'll describe the solution.
First, you want to generate all combinations of genres and media types. Hint: This uses a cross join.
Second, you want to count all the combinations that you have. Hint: this uses an aggregation.
Third, you want to combine these together. Hint: left join.

Can’t figure out Query and Sub-Queries

I’m having trouble figuring this problem out.
I’m doing some revision exercises for university and would like to understand this BEFORE my exam in 2 days.
I’ve attempted some things (which I’ll post at the end). Please be kind, this is my first Database subject so my attempts may seem very stupid to you.
The question is as follows:
Which artist/s has/have the largest number of shows on at the moment?
Show the First & Last Name of the artist/s and their Address.
ORDER BY clause cannot be used.
Write a single SQL Statement.
Use Sub-Queries.
Relevant tables in the database:
Shows (ShowName, ArtistId, ShowStartDate, ShowEndDate)
Artists (ArtistId, FirstName, FamilyName, Address, PhoneNum)
We assume ArtistId, ShowStartDate, FirstName, FamilyName and Address cannot be null.
Now, I think that I have to count the number of shows each artist has on at the moment. Then, get the ArtistId for the artist/s that has/have the most. Use the ArtistId to retrieve the artist details (names and address).
I got as far as this (which is very wrong):
SELECT FirstName, FamilyName, Address
FROM Artists
WHERE ArtistId = (SELECT ArtistId
FROM Shows
WHERE ArtistId = (SELECT MAX(Counted)
FROM (SELECT ArtistId, COUNT(ArtistId) AS Counted
FROM Shows
WHERE ShowEndDate IS null
GROUP BY ArtistId)
GROUP BY ArtistId));
Well, I know
SELECT ArtistId, COUNT(ArtistId)
FROM Shows
WHERE ShowEndDate IS null
GROUP BY ArtistId
gives me a table with the count of how many times each ArtistId is listed.
Which is good.
But from this results table, I need to get the ArtistId/’s of the ones that have the highest count.
And this is where I’m lost.
Anyone can shed some light?
(As for which DBMS I am using: We have to use one created and supplied by the university. It’s very basic SQL. Simpler than Access 2010).
Thank you
(If you provide an answer [thank you thank you] could you also briefly explain the reasoning behind it?)
You need to find maximum of the count of shows by artist, then find out which artists have that count by re-running the count query but applying a having clause matching the maximum just found.
select FirstName, FamilyName, Address
from Artists
where ArtistId in -- use an in() to select the artists
(select ArtistId from -- just select the artist id from the results
(select ArtistId, count(*) c -- re-run the count query, but see having clause
from Shows
where current_date between ShowStartDate and ShowEndDate
group by ArtistId
having count(*) = -- use a having clause to only select those with the max count
(select max(c) from -- this is simply the maximum count
(select ArtistId, count(*) c -- find all counts by artist
from Shows
where current_date between ShowStartDate and ShowEndDate
group by ArtistId
) counts
)
)
)
Some syntax notes:
count(*) c means the column (with value count(*)) is given the alias c, so it can be referred to by an outer query. You can't refer to it as count(*), because that would be interpreted as an attempt at aggregation.
max(c) gets the maximum of the column named (or aliased) c (AFAIK you can't code max(count(*)) - maybe you could try it - I just typed this in without a console to test it)
counts is a table alias, which is a syntactic requirement when selecting from a result set
You haven't specified which database you're using, so you may have to replace current_date with your database's equivalent.
Some dbs allow you to reuse a query in a query (using a with clause), which would avoid rerunning the count subquery.
This query uses only subselects, but you can do it with a join too.
Try this:
SELECT FirstName, FamilyName, Address
FROM Artists
WHERE ArtistId IN (
SELECT ArtistId
FROM (
SELECT ArtistId, COUNT(ArtistId) AS Counted
FROM Shows
WHERE ShowEndDate IS null
GROUP BY ArtistId) S1
WHERE Counted = (
SELECT MAX(Counted)
FROM (
SELECT ArtistId, COUNT(ArtistId) AS Counted
FROM Shows
WHERE ShowEndDate IS null
GROUP BY ArtistId) S2
GROUP BY ArtistId)
);
It is simple and should work in your case.