How to get top 10 from one column and sort by another column in hive? - sql

I want to find top 10 title with high number of user ids. So I used query like
select title,count(userid) as users from combined_moviedata group by title order by users desc limit 10
But i need to sort them based on title, I tried this query
select title,count(userid) as users from combined_moviedata group by title order by users desc,title asc limit 10
But it doesnot sort them. Merely returned same results. How to do this

The answer from #KaushikNayak is very close to what I'd consider the "right" answer.
At one level, work out what your top 10 records are
At a different level, sort them by a different field
The only thing I'd say is that if the 10th and 11th most common titles are tied for the same count, they should generally also be included in the results. This is a RANK().
WITH
ranked_titles AS
(
SELECT
RANK() OVER (ORDER BY COUNT(*) DESC) frequency_rank,
title
FROM
combined_moviedata
GROUP BY
title
)
SELECT
*
FROM
ranked_titles
WHERE
frequency_rank <= 10
ORDER BY
title
;
http://sqlfiddle.com/#!6/7283c/1
Note that in the example linked, 12 rows are returned. That is because 4 titles are all tied for the 9th most frequent, and it is actually impossible to determine which two should be selected in preference over the others. In this case selecting 10 rows would normally be statistically incorrect.
title frequency frequency_rank
title06 2 9
title07 2 9
title08 2 9
title09 2 9
title10 3 6
title11 3 6
title12 3 6
title13 4 4
title14 4 4
title15 5 2
title16 5 2
title17 6 1

You could make use of a WITH clause
with t AS
(
select title,count(userid) as users from combined_moviedata
group by title
order by users desc limit 10
)
select * FROM t ORDER BY title ;

Related

SQLite: How to fetch the ORDER index of COUNT

I have an SQLite database that contains a list of user messages in a group.
And I want to get a user's "rank" by counting the number of messages they had sent.
Currently I'm doing this
SELECT user_id, COUNT(*) as count
FROM message
group by user_id
ORDER BY count DESC
It'd return something like this:
-
user_id
count
1
2072040132
61877
2
1609505732
40514
3
1543287045
34735
4
203349203
30570
5
842634673
29651
6
1702633101
29185
7
1978947042
27728
8
1929648593
27025
9
1069841429
17944
10
1437208364
17344
11
...
...
Like user 1609505732 is top 2, and 1702633101 is top 6.
But my database has more than 2 million rows, and this is too slow having to fetch all of the list.
I was wondering if there are any way that I can fetch only the order of it.
Like this:
-
user_id
order
count
1
1702633101
6
61877
And the user with id=1702633101 is top 6. That'd be a lot faster.
Thanks for spending time on my question, I can't seem to find the answer anywhere on the internet.
To improve query speed, I'd consider physicalising the aggregate view, example below:
CREATE Table as tbl_aggregate()
Id INTEGER PRIMARY KEY AUTOINCREMENT
, user_id NVARCHAR
, count INT;
INSERT INTO tbl_aggregate
SELECT user_id, COUNT(*) as count
FROM message
group by user_id
ORDER BY count DESC;
Select * from tbl_aggregate
Where Id = 6
Select top 10 * from tbl_aggregate

Count Instances Of Occuring String With Unique IDs

I need to count the number of times that a specific string occurs but they when one ID has the same string more than once, only count it once. Basically, I need to count the number of occurrences of a string that occur uniquely to an ID. I believe this should be a simple thing to do but I don't know what I'm doing. Here is my current code:
SELECT
RXNAME as Name,
DUPERSID as ID,
COUNT(RXNAME) as Number
FROM
`OmniHealth.PrescriptionsMEPS`
GROUP BY
ID,
Name
ORDER BY
Number
When run, it says everything was counted as 1. Thanks for the help!
UPDATE:
Dataset: https://storage.googleapis.com/omnihealth/MepsPrescriptionData.csv
OUTPUT when run with code above:
Row Name ID Number
1 SUMATRIPTAN 68896102 1
2 IBUPROFEN 65063102 1
3 PENICILLN VK 66179101 1
4 FUROSEMIDE 63217102 1
5 HYSINGLA ER 70373101 1
6 FUROSEMIDE 76090101 1
7 SKELETAL MUSCLE RELAXANTS 78414101 1
8 AMOXICILLIN 69467103 1
9 TRAMADOL HCL 67667101 1
10 PANTOPRAZOLE 60737102 1
11 CARBAMIDE PEROXIDE 6.5% OTIC SOLN 63990104 1
12 PROMETH/COD 68433101 1
13 AZITHROMYCIN 79045102 1
14 METRONIDAZOL 75414101 1
15 DEXILANT 69625101 1
16 TRAMADOL HCL 66890203 1
17 AZITHROMYCIN 73838101 1
18 COLCRYS 63856102 1
19 PERMETHRIN 62103107 1
20 ACETAMINOPHEN TAB 500 MG 62456102 1
not sure if it is what you asked - but if you are looking for DISTINCT COUNT - go with below:
#standardSQL
SELECT
RXNAME AS Name,
COUNT(DISTINCT DUPERSID) AS Number
FROM `OmniHealth.PrescriptionsMEPS`
GROUP BY 1
ORDER BY Number DESC
Try this...You are grouping on a different field than you are counting. I think you are meaning to group by RXNAME.
SELECT
RXNAME as Name,
DUPERSID as ID,
COUNT(RXNAME) as Number
FROM
`OmniHealth.PrescriptionsMEPS`
GROUP BY
ID,
RXNAME
ORDER BY
Number
I think you want:
SELECT DUPERSID as ID, COUNT(DISTINCT RXNAME) as Number
FROM `OmniHealth.PrescriptionsMEPS`
GROUP BY ID
ORDER BY Number;
This assumes that "same string" means "same value for RXNAME".

Getting specific number of rows from the same table using different criteria

I have news table from where I need to select specific number or rows - for example 3 news from 4 categories. There should always be 3 news from each category (table is full of contents - it's not the matter). I want to select only the newest ones.
Suppose my table contains 5 fields: id_news, news_title, news_desc, news_date, id_news_category.
Edit: Looks like those previous questions can't help me because what I really need to do is to select top N from range of categories id, for eg. 1-6, 7-13, 14-20 etc.
I'm looking for the best (most efficient) way to accomplish this. Is it UNION operator:
SELECT id_news, news_title
FROM news
WHERE id_news_category BETWEEN 1 AND 6
ORDER BY news_date DESC
LIMIT 3
UNION
SELECT id_news, news_title
FROM news
WHERE id_news_category BETWEEN 7 AND 13
ORDER BY news_date DESC
LIMIT 3
UNION
SELECT id_news, news_title
FROM news
WHERE id_news_category BETWEEN 14 AND 20
ORDER BY news_date DESC
LIMIT 3
UNION
SELECT id_news, news_title
FROM news
WHERE id_news_category BETWEEN 21 AND 27
ORDER BY news_date DESC
LIMIT 3
I know above UNION may have quite other form - with single order. Not really sure if it has performance impact. I'm using PSql 8.4 and Symfony so I will need to use QueryBuilder or maybe just native query.

Get MAX() on repeating IDs

This is how my query results look like currently. How can I get the MAX() value for each unique id ?
IE,
for 5267139 is 8.
for 5267145 is 4
5267136 5
5267137 8
5267137 2
5267139 8
5267139 5
5267139 3
5267141 4
5267141 3
5267145 4
5267145 3
5267146 1
5267147 2
5267152 3
5267153 3
5267155 8
SELECT DISTINCT st.ScoreID, st.ScoreTrackingTypeID
FROM ScoreTrackingType stt
LEFT JOIN ScoreTracking st
ON stt.ScoreTrackingTypeID = st.ScoreTrackingTypeID
ORDER BY st.ScoreID, st.ScoreTrackingTypeID DESC
GROUP BY will partition your table into separate blocks based on the column(s) you specify. You can then apply an aggregate function (MAX in this case) against each of the blocks -- this behavior applies by default with the below syntax:
SELECT First_column, MAX(Second_column) AS Max_second_column
FROM Table
GROUP BY First_column
EDIT: Based on the query above, it looks like you don't really need the ScoreTrackingType table at all, but leaving it in place, you could use:
SELECT st.ScoreID, MAX(st.ScoreTrackingTypeID) AS ScoreTrackingTypeID
FROM ScoreTrackingType stt
LEFT JOIN ScoreTracking st ON stt.ScoreTrackingTypeID = st.ScoreTrackingTypeID
GROUP BY st.ScoreID
ORDER BY st.ScoreID
The GROUP BY will obviate the need for DISTINCT, MAX will give you the value you are looking for, and the ORDER BY will still apply, but since there will only be a single ScoreTrackingTypeID value for each ScoreID you can pull it out of the ordering.

How do I average numbers according to each distinct ID in SQL?

I have CommentRating table that holds a foreign key to the DrinkId. I'm trying to get the average ratings for each DrinkId, and along with that I want to display the top three drinkIds that have the highest ratings.
commentRating drinkID
9 7
9 4
8 11
8 7
7 4
6 4
6 11
Here's the SQL I have so far, but I don't know how to change it.
Select TOP(3)(AVG(commentRating)),DISTINCT(drinkID)
FROM Comment order by commentRating desc
How do I average the ratings, select the drinks with the top three ratings, and return them in SQL?
You need to GROUP BY the result by the drinkID:
SELECT TOP 3 AVG(commentRating), drinkID
FROM Comment
GROUP BY drinkID
ORDER BY AVG(commentRating) DESC
I recommend you to read your favorite SQL documentation about details on GROUP BY. For T-SQL it is GROUP BY (Transact-SQL) on MSDN.
AVG is an aggregate function. Again, I'd recommend you to read some documentation on aggregate functions, in T-SQL it's in the MSDN library too.
For T-SQL:
select TOP 3 * from
(
select drinkID,avg(commentRating) avgCom
from Comment group by drinkID
) t order by avgCom DESC
For MySQL use LIMIT key word.