So the code I have is trying to count the number of ratings given to a movie per state. That's all easy done. I also need to count the number of ratings given to award winning movies, per state.
SELECT DISTINCT ad.state "State",
COUNT(r.ratingid) OVER (PARTITION BY ad.state) "Number of Ratings",
COUNT(
SELECT DISTINCT r.ratingid
FROM netflix.ratings100 r JOIN netflix.movies_awards a
ON r.movieid = a.movieid
JOIN netflix.addresses ad
ON ad.custid = r.custid
WHERE a.awardid IS NOT NULL
) OVER (PARTITION BY ad.state) "Number of Award Winning Movies Rated"
FROM netflix.addresses ad JOIN netflix.ratings100 r
ON ad.custid = r.custid
JOIN netflix.movies_awards a
ON r.movieid = a.movieid
GROUP BY "State"
The second count statement should be counting the number of ratings made where the awardID is not null. That subquery alone works, and returns distinct ratingIDs, but the thing as a whole does not work. I get ORA-00936: missing expression. Solutions?
You haven't got brackets around the subquery - you have the brackets to indicate the count, but you need an extra set to indicate that it's a subquery.
E.g;
count( (select ....) ) over ...
Moreover, you're reusing the aliases from your outer query in your inner query, plus there's nothing to correlate the subquery to your outer query, so I don't think you're going to get the results you're after.
Additionally, you've labelled a column with an identifier that's over 30 characters, so unless you're on 12.2 with the extended identifiers set, you're going to get ORA-00972: identifier is too long.
Finally, I don't think you need that subquery at all; I think you can just use a conditional count, e.g.:
SELECT DISTINCT ad.state "State",
COUNT(r.ratingid) over(PARTITION BY ad.state) "Number of Ratings",
COUNT(DISTINCT CASE WHEN a.awardid IS NOT NULL THEN r.ratingid END) over(PARTITION BY ad.state) "Num Award Winning Movies Rated"
FROM netflix.addresses ad
JOIN netflix.ratings100 r
ON ad.custid = r.custid
JOIN netflix.movies_awards a
ON r.movieid = a.movieid
GROUP BY "State";
You may not even need that distinct; it depends on your data. Hopefully you can play around with that and get it to work for your requirements.
That seems like a complicated query. This should be an aggregation query . . . with a correlated subquery:
SELECT ad.state, COUNT(DISTINCT r.ratingId) as num_rated,
COUNT(DISTINCT CASE WHEN a.awardId IS NOT NULL THEN r.ratingid END) as num_rated_with_award
FROM netflix.addresses ad JOIN
netflix.ratings100 r
ON ad.custid = r.custid LEFT JOIN
netflix.movies_awards a
ON r.movieid = a.movieid
GROUP BY ad.state;
Notes:
There is no reason to give a column an alias equivalent to its original name. So, as "State" is unnecessary, unless you really care about capitalization.
A movie could have more than one award, so to get the number of ratings, use count(distinct).
SELECT DISTINCT is almost never appropriate with GROUP BY.
The query has no need of window functions.
Related
I am using the basic chinook database and I am trying to get a query that will display the worst selling genres. I am mostly getting the answer, however there is one genre 'Opera' that has 0 sales, but the query result is ignoring that and moving on to the next lowest non-zero value.
I tried using left join instead of inner join but that returns different values.
This is my query currently:
create view max
as
select distinct
t1.name as genre,
count(*) as Sales
from
tracks t2
inner join
invoice_items t3 on t2.trackid == t3.trackid
left join
genres as t1 on t1.genreid == t2.genreid
group by
t1.genreid
order by
2
limit 10;
The result however skips past the opera value which is 0 sales. How can I include that? I tried using left join but it yields different results.
Any help is appreciated.
If you want to include genres with no sales then you should start the joins from genres and then do LEFT joins to the other tables.
Also, you should not use count(*) which counts any row in the resultset.
SELECT g.name Genre,
COUNT(i.trackid) Sales
FROM genres g
LEFT JOIN tracks t ON t.genreid = g.genreid
LEFT JOIN invoice_items i ON i.trackid = t.trackid
GROUP BY g.genreid
ORDER BY Sales LIMIT 10;
There is no need for the keyword DISTINCT, since the query returns 1 row for each genre.
When asking for the top n one must always state how to deal with ties. If I am looking for the top 1, but there are three rows in the table, all with the same value, shall I select 3 rows? Zero rows? One row arbitrarily chosen? Most often we don't want arbitrary results, which excludes the last option. This excludes LIMIT, too, because LIMIT has no clause for ties in SQLite.
Here is an example with DENSE_RANK instead. You are looking for the worst selling genres, so we must probably look at the revenue per genre, which is the sum of price x quantity sold. In order to include genres without invoices (and maybe even without tracks?) we outer join this data to the genre table.
select total, genre_name
from
(
select
g.name as genre_name,
coalesce(sum(ii.unit_price * ii.quantity), 0) as total
dense_rank() over (order by coalesce(sum(ii.unit_price * ii.quantity), 0)) as rnk
from genres g
left join tracks t on t.genreid = g.genreid
left join invoice_items ii on ii.trackid = t.trackid
group by g.name
) aggregated
where rnk <= 10
order by total, genre_name;
SELECT TOP(100) M.title, count(WH.movie_id)
FROM Movie AS M
inner join WatchHistory AS WH ON M.movie_id = WH.movie_id
GROUP BY WH.movie_id, M.title, count(WH.movie_id)
ORDER BY count(WH.movie_id) ASC;
BlockquoteColumn 'Movie.title' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
My assignment is to do the following query:
Show 100 films that have gone so far
were looked at. This also means 0 times [film title, number of times viewed].
Make a View for this information requirement.
It gives the error above
There are plenty of issues:
count(WH.movie_id) removed from a GROUP BY
Added alias [CountViews]
Alias used in ORDER BY instead of aggregate
Fixed SQL:
SELECT TOP(100) M.title, count(WH.movie_id) as [CountViews]
FROM Movie AS M
inner join WatchHistory AS WH ON M.movie_id = WH.movie_id
GROUP BY M.title
ORDER BY [CountViews] ASC;
You should only group by M.title. If you intend to group rows, decide which columns will be grouped. Remember that only columns in the GROUP BY clause, in addition to aggregate functions such as COUNT, may ultimately be included in the SELECT clause. Grouped aggregate functions operate on sets of rows defined in a GROUP BY clause and return a summarized result. Examples include SUM, MIN, MAX COUNT, and AVG. In the absence of a GROUP BY clause, all rows are considered one set; aggregation is performed on all of them.
I have a set of data that contains multiple groups of data(Vehicle_Code), each item(PK: Cusip_Sedol) in the group has a certain code(GIC_Code) that is not unique. I am trying to find the percentage of each code(GIC_Code) within each group(Vehicle_Name) of data.
Here is my SQL Select statement thus far:
SELECT H.vehicle_code,
G.group_name,
Count(D.cusip_sedol) AS Total
FROM tbltrading_holdings AS H
INNER JOIN tbltrading_stocks_data_stocks AS D
ON H.cusip_sedol = D.cusip_sedol
LEFT JOIN tbltrading_gic AS G
ON D.gic_code = G.gic_code
WHERE vehicle_code IN (SELECT vehicle_code
FROM tbltrading_vehicles
WHERE vehicle_name LIKE 'J%')
AND D.gic_code IS NOT NULL
GROUP BY H.vehicle_code,
G.group_name
ORDER BY vehicle_code
SELECT
H.vehicle_code,
G.group_name,
VehicleTotal = Count(D.cusip_sedol) OVER (PARTITION BY H.vehicle_code, G.group_name),
d.gic_code,
gic_codePercentPerVehicleName =
Count(d.gic_code) OVER () * 1.0 / Count(*) OVER (PARTITION BY V.vehicle_name),
gic_codePercentPerVehicleName2 =
Count(d.gic_code) * 1.0 / Count(*) OVER (PARTITION BY V.vehicle_name)
FROM
dbo.tbltrading_holdings H
INNER JOIN tbltrading_stocks_data_stocks D
ON H.cusip_sedol = D.cusip_sedol
LEFT JOIN dbo.tbltrading_gic G
ON D.gic_code = G.gic_code
INNER JOIN dbo.tbltrading_vehicles V
ON H.vehicle_code = V.vehicle_code
AND v.vehicle_name LIKE 'J%'
WHERE
D.gic_code IS NOT NULL
GROUP BY
H.vehicle_code,
D.gic_code,
G.group_name,
V.vehicle_name
ORDER BY
H.vehicle_code
;
There are some unknowns here that have forced me to make certain assumptions. You can see that I've come up with two different interpretations about what "gic code per vehicle name" could mean.
For starters, to provide the vehicle_name each gic_code is associated with, we have to do a real join, not an IN (which is effectively an EXISTS). However, is it possible for the same gic_code to join up to different vehicle_name values? (Since there is an intermediate vehicle_code that joins them?) I'm assuming that it's not possible for this to happen, and if it actually is, the query will give unuseful results, and you'll have to formulate better what exactly you're looking for before we can help you more.
Next, the results are all muddied by the fact that you're selecting so many columns, which forces them to be part of the GROUP BY. But once you do that, then all the windowing functions have to include partitions to "break" them out of the grouping. This query may run slowly, as it's being made to do a lot at once, which could result in many scans of the table. The way things are now, for each particular gic_code, you'll get many rows with the same value, because the query has to expose the (multiple) vehicle_code and group_name combinations for each one. Is that really what you want?
You might get better results if you removed some of the displayed columns, as this would let you remove at least some of the PARTITION BY expressions.
Last, I'm not sure I even got the partitions correct. Only you know the cardinality of each column in relation to the joins to other tables.
What you need is the total over all the rows . . . and you can get this using window functions. So, change the select to:
SELECT H.vehicle_code,
G.group_name,
Count(D.cusip_sedol) AS Total,
Count(D.Cusip_sedol)*1.0 / Sum(Count(D.Cusip_sedol)) Over () as p_total
. . .
Note that the *1.0 is there just to prevent integer division.
I think you are pretty close. Is counting the Sedol working for you? If so then just divide that by the count of the group name for your percentage:
SELECT H.vehicle_code,
G.group_name,
cast(Count(DISTINCT D.cusip_sedol) as DECIMAL)/cast(count(DISTINCT G.group_name) as DECIMAL) AS Total --add this second part
FROM tbltrading_holdings AS H
INNER JOIN tbltrading_stocks_data_stocks AS D
ON H.cusip_sedol = D.cusip_sedol
LEFT JOIN tbltrading_gic AS G
ON D.gic_code = G.gic_code
WHERE vehicle_code IN (SELECT vehicle_code
FROM tbltrading_vehicles
WHERE vehicle_name LIKE 'J%')
AND D.gic_code IS NOT NULL
GROUP BY H.vehicle_code,
G.group_name
ORDER BY vehicle_code
I have the following SQL:
select
inv.salesman_id,
(select salesman_goals.goal from salesman_goals
where salesman_goals.salesman_id = inv.salesman_id
and salesman_goals.group_id = g.group_id
and salesman_goals.subgroup_id = sg.subgroup_id
and salesman_goals.variation_id = v.variation_id)
as goal,
sum(i.quantity) as qnt
from invoiceitem i
inner join invoice inv on inv.invoice_id = i.invoice_id
inner join product p on p.product_id = i.product_id
left join groups g on g.group_id = p.group_id
left join subgroup sg on sg.group_id = g.group_id and sg.subgroup_id = p.subgroup_id
left join variation v on v.group_id = sg.group_id and v.subgroup_id = sg.subgroup_id and v.variation_id = p.variation_id
group by
1,2
which returns three columns, the first one is the salesman id, the second is a sub select to get the sales quantity goal, and the third is the actual sales quantity.
Even grouping by the first and second columns, firebird throws an error when executing the query:
Invalid expression in the select list (not contained in either an aggregate function or the GROUP BY clause).
What's the reason for this?
There is a column "in the select list (not contained in either an aggregate function or the GROUP BY clause)". Namely each column you mention in your subselect other than inv.salesman_id. Such a column has many values per group. When there is a GROUP BY (or just a HAVING, implicitly grouping by all columns) a SELECT clause returns one row per group. There is no single value to return. So you want (as you put in an answer yourself):
group by
inv.salesman_id,
g.group_id,
sg.subgroup_id,
v.variation_id
OK guys i found the solution for this problem.
The thing is, if you have a sub query in a column which will be in the group by clause, the parameters inside this sub query must also appear in the group by. So in this case, all i had to do was:
group by
inv.salesman_id,
g.group_id,
sg.subgroup_id,
v.variation_id
And that's it. Hope it helps if someone has the same issue in the future.
I m trying to use group by clause in left join sql query and it is not working.
Please help me out, thanks in advance.
SELECT Cust_Mst_Det.Cust_Hd_Code,
Cust_Mst_Det.First_Name,
SL_HEAD20152016.vouch_date AS invoice_2,
SL_HEAD20142015.vouch_date AS invoice_1,
Cust_Mst_Hd.EMail
FROM Cust_Mst_Det
LEFT JOIN SL_HEAD20142015 ON Cust_Mst_Det.Cust_Hd_Code=SL_HEAD20142015.Member_Code
LEFT JOIN SL_HEAD20152016 ON Cust_Mst_Det.Cust_Hd_Code=SL_HEAD20152016.Member_Code
LEFT JOIN Cust_Mst_Hd ON Cust_Mst_Det.Cust_Hd_Code=Cust_Mst_Hd.Cust_Hd_Code
WHERE cust_mst_det.first_name!='NIL'
GROUP BY Cust_Mst_Det.Cust_Hd_Code
ORDER BY SL_HEAD20152016.vouch_date DESC,
SL_HEAD20142015.vouch_date
I'm not sure which DBMS you are using, but on an Oracle your query will not work at all.
First issue: The GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns. You do not have any aggregating function in your SELECT statement (count, max, etc.)
Second issue: you must specify all columns from SELECT statement in your GROUP BY statement (excluding columns that represents results of aggregation).
As I said I don't know which DB is used by you, but those two points should be applicable for the most of SQL standards.
It appears that it is impossible to use an ORDER BY on a GROUP BY summarisation. My fundamental logic is flawed. I will need to run the following subquery.
ex :
SELECT p.*, pp.price
FROM products p
LEFT JOIN ( SELECT price FROM product_price ORDER BY date_updated DESC ) pp
ON p.product_id = pp.product_id GROUP BY p.product_id;
This will take a performance hit but as it is the same subquery for each row it shouldn't be too bad.