Hive Script, DISTINCT with SUM - sql

I am trying to distinct and then find the count of the teams a player played for in any single season and number of teams he played for. This is tripping me up and ofcourse i have a sample down below(2nd) one. The first ones is my failed attempt
SELECT o.id,o.year,COUNT(DISTINCT(o.team)) b JOIN
(SELECT id, year, team FROM batting
GROUP BY id,year,team
ORDER BY id DESC
LIMIT 25) o
0.id =b.id;
SELECT id, year, team FROM batting
GROUP BY id,year,team
ORDER BY id DESC
LIMIT 25;
produces
IGNORE the ^A, i think they represent either space or comma, just column seperatpr

Get the count of teams for each player for each year and order by the count desc,get the 1 row
SELECT id, year, COUNT(DISTINCT(team)) FROM batting
GROUP BY id,year
ORDER BY COUNT(DISTINCT(team)) DESC
LIMIT 1;

Related

QL: Find Top 2 and reverse order

I am having the IMDB database; I am looking for the top two years in which most movies were produced, and I have to sort them chronologically after the years and print only the years.
I am trying this to compute the list and sort it 'the other way around' afterwards but I cannot order by anthing in the last 'order by' statement because in the FROM-statement I dont refer to any tables and instead open the next statement. It says "unknown column topTwo" as well so that I cannot order my results accordingly.
What am I doing wrong?
SELECT *
FROM
(SELECT m.year, COUNT(*)
FROM movies as m
GROUP BY m.year
ORDER BY m.year DESC) AS topTwo
ORDER BY **topTwo** ASC
LIMIT 2;
I think you are looking for this:
SELECT topTwo.year
FROM (SELECT m.year, COUNT(*) as cnt
FROM movies m
GROUP BY m.year
ORDER BY COUNT(*) DESC
LIMIT 2
) topTwo
ORDER BY year ASC;
Notes:
The LIMIT goes in the subquery.
The COUNT(*) is given an alias.
The ORDER BY in the subquery is based on the count.
The ORDER BY in the outer query is based on the year.
You only seem to want the year, so the outer query only select that column.

How to do the max count part in SQL?

I was told to Find out which occupation has the greatest number of patients with conditionID=MC8
I dk how to do the greatest part.....
Here my code right now
SELECT occupation
FROM Patient
WHERE EXISTS
(SELECT PatientID FROM PatientMedcon
Where conditionID=’MC8’)
GROUP BY occupation
HAVNG count(occupation) = (Select max(occupation)
From Patient
You should approach these types of queries using regular joins and then add additional factors. The following gets the count of patients for each occupation with that condition:
SELECT occupation, COUNT(*)
FROM Patient p JOIN
PatentMedcon pm
ON p.PatientId = pm.PatientId and
pm.conditionId = 'MC8'
GROUP BY occupation
ORDER BY COUNT(*) DESC;
If you want the top row, that depends on the database. It might be select top 1, limit 1 at the end, fetch first 1 rows only at the end, or even something else.

SQL Finding maximum value without top command

Let's say I have a bases with a table:
-courses (key: name [ofthecourse], other attributes: year in which the course takes place)
I want to complete a query looking for an answer to the question:
On which year of study there is a maximum number of courses?
Normally, the query would be:
SELECT TOP 1 STUDYEAR
FROM COURSES
GROUP BY STUDYEAR
ORDER BY COUNT(CNO) DESC;
But my question is, which query could complete this without using the TOP 1 phrase?
You can use an inner query to get the maximum count. The only difference is though that it can return more than one record if they have the same count.
SELECT STUDYEAR
FROM COURSES
GROUP BY STUDYEAR
HAVING COUNT(CNO) = (SELECT MAX(CNOCount) FROM
(SELECT COUNT(CNO) CNOCount
FROM COURSES
GROUP BY STUDYEAR) X)
Another version with only one inner query:
SELECT STUDYEAR
FROM
(SELECT STUDYEAR, ROW_NUMBER() OVER (ORDER BY COUNT(CNO) DESC) RowNumber
FROM COURSES
GROUP BY STUDYEAR) X
WHERE RowNumber = 1

Find MAX of grouped SUM

I have this code and it gives me sold seats summarized for each movie:
SELECT mName, SUM(soldSeats)
FROM movie, show, prog
WHERE movie.movieID = prog.movieID
AND prog.showID = show.showID
GROUP BY mName
The problem is to find movie that have the max of seats sold, I have tried to add this, but only get no rows:
HAVING SUM(soldseats) = (SELECT MAX(SUM(soldseats)) FROM show
GROUP BY solgteplasser)
Do anyone have a suggestion? Here is how the tables looks: Sum of a activity
Select test.*
from
(Select movie.mName, SUM(show.soldSeats) as soldseatsum
FROM movie, show, prog
WHere movie.movieId = prog.movieID
AND prog.showID = show.showID
Group by movie.mName
Order by soldseatsum DESC) test
Where rownum<=1
This way you order your selection descending by your SUM
and you select the first row only using Rownum
which is the row with the highest SUm.
EDIT:
Also in case you got Nulls in your SUM column make sure you add a NULLS LAST after your ORder By like this:
.....
.....
Group by movie.mName
Order by soldseatsum DESC NULLS LAST) test
Where rownum<=1

Fetch one row per account id from list

I have a table with game scores, allowing multiple rows per account id: scores (id, score, accountid). I want a list of the top 10 scorer ids and their scores.
Can you provide an sql statement to select the top 10 scores, but only one score per account id?
Thanks!
select username, max(score) from usertable group by username order by max(score) desc limit 10;
First limit the selection to the highest score for each account id.
Then take the top ten scores.
SELECT TOP 10 AccountId, Score
FROM Scores s1
WHERE AccountId NOT IN
(SELECT AccountId s2 FROM Scores
WHERE s1.AccountId = s2.AccountId and s1.Score > s2.Score)
ORDER BY Score DESC
Try this:
select top 10 username,
max(score)
from usertable
group by username
order by max(score) desc
PostgreSQL has the DISTINCT ON clause, that works this way:
SELECT DISTINCT ON (accountid) id, score, accountid
FROM scoretable
ORDER BY score DESC
LIMIT 10;
I don't think it's standard SQL though, so expect other databases to do it differently.
SELECT accountid, MAX(score) as top_score
FROM Scores
GROUP BY accountid,
ORDER BY top_score DESC
LIMIT 0, 10
That should work fine in mysql. It's possible you may need to use 'ORDER BY MAX(score) DESC' instead of that order by - I don't have my SQL reference on hand.
I believe that PostgreSQL (at least 8.3) will require that the DISTINCT ON expressions must match initial ORDER BY expressions. I.E. you can't use DISTINCT ON (accountid) when you have ORDER BY score DESC. To fix this, add it into the ORDER BY:
SELECT DISTINCT ON (accountid) *
FROM scoretable
ORDER BY accountid, score DESC
LIMIT 10;
Using this method allows you to select all the columns in a table. It will only return 1 row per accountid even if there are duplicate 'max' values for score.
This was useful for me, as I was not finding the maximum score (which is easy to do with the max() function) but for the most recent time a score was entered for an accountid.