How to query a column created by aggregate function in hive? - sql

In hive, I want to select the records with users>=40. My table column consist of field userid. So i used
select title,sum(rating),count(userid) from table_name where count(userid)>=40
group by title order by rating desc
But it showed error like you can't use count in where clause. Also i have tried using alias like
select title,sum(rating) as ratings,count(userid) as users where users>=40 group by title order by ratings desc
Here also i struck up with error showing users is not a column name in table.
I need to get title with maximum ratings having minimum 40 users

You want the having clause:
select title, sum(rating), count(userid)
rom table_name
group by title
having count(userid) >= 40
order by sum(rating) desc;
In Hive, you may need to use a column alias, though:
select title, sum(rating) as rating, count(userid) as cnt
rom table_name
group by title
having cnt >= 40
order by rating desc;

Related

How to find AVG of Count in SQL

This is what I have
select avg(visit_count) from ( SELECT count(user_id) as visit_count from table )group by user_id;
But I get the below error
ERROR 1248 (42000): Every derived table must have its own alias
if I add alias
then I get avg for only one user_id
What I want is the avg of visit_count for all user ids
SEE the picture for reference
Example 3,2.5,1.5
It means that your subquery needs to have an alias.
Like this:
select avg(visit_count) from (
select count(user_id) as visit_count from table
group by user_id) a
Your subquery is missing an alias. I think this is the version you want:
SELECT AVG(visit_count)
FROM
(
SELECT COUNT(user_id) AS visit_count
FROM yourTable
GROUP BY user_id
) t;
Note that GROUP BY belongs inside the subquery, as you want to find counts for all users.

Filter by number of occurrences in a SQL Table

Given the following table where the Name value might be repeated in multiple rows:
How can we determine how many times a Name value exists in the table and can we filter on names that have a specific number of occurrances.
For instance, how can I filter this table to show only names that appear twice?
You can use group by and having to exhibit names that appear twice in the table:
select name, count(*) cnt
from mytable
group by name
having count(*) = 2
Then if you want the overall count of names that appear twice, you can add another level of aggregation:
select count(*) cnt
from (
select name
from mytable
group by name
having count(*) = 2
) t
It sounds like you're looking for a histogram of the frequency of name counts. Something like this
with counts_cte(name, cnt) as (
select name, count(*)
from mytable
group by name)
select cnt, count(*) num_names
from counts_cte
group by cnt
order by 2 desc;
You need to use a GROUP BY clause to find counts of name repeated as
select name, count(*) AS Repeated
from Your_Table_Name
group by name;
If You want to show only those Which are repeated more than one times. Then use the below query which will show those occurrences which are there more than one times.
select name, count(*) AS Repeated
from Your_Table_Name
group by name having count(*) > 1;

How to do this query to select N rows with highest numbers ordered by col

Lets say I have a table that has three columns: ID, Name and Users.
I want to select the 3 rows with the highest number of users and I wanted the rows to be ordered by the Name Ascending. How can I Achieve that?
I used
select Name from TABLE where ID IN (select ID from Tablesorder by Users desc limit 3)
But IN/ANY are not supported. Any other ways?
Thanks
When subqueries are allowed, you could use this.
It fetches the 3 records with highest value for column users. These 3 results will be ordered in the outer query.
select Name from
(
select Name
from Tables
order by Users desc
limit 3
) as temp
ORDER BY Name ASC
In Mysql :
SELECT id, name, users
FROM (SELECT id,name,users FROM tablename ORDER BY users DESC LIMIT 3) as a
ORDER BY name;
In Sql server
SELECT id, name, users
FROM (SELECT TOP 3 id,name,users FROM tablename ORDER BY users DESC ) as a
ORDER BY name;
In Oracle
SELECT id, name, users
FROM (SELECT id,name,users FROM tablename ORDER BY users DESC ) as a
WHERE ROWNUM<=3
ORDER BY name;

Referring to dynamic columns in a postgres query?

Let's say I have something like this:
select sum(points) as total_points
from sometable
where total_points > 25
group by username
I am unable to refer to total_points in the where clause because I get the following error: ERROR: column "total_points" does not exist. In this case I'd have no problem rewriting sum(points) in the where clause, but I'd like some way of doing what I have above.
Is there any way to store the result in a variable without using a stored procedure?
If I do rewrite sum(points) in the where clause, is postgres smart enough to not recalculate it?
SELECT SUM(points) AS total_points
FROM sometable
GROUP BY
username
HAVING SUM(points) > 25
PostgreSQL won't calculate the sum twice.
I believe PostgreSQL is like other brands of sql, where you need to do:
SELECT t.*
FROM (
SELECT SUM(points) AS total_points
FROM sometable
GROUP BY username
) t
WHERE total_points > 25
EDIT: Forgot to alias subquery.
You have error in statement:
select sum(points) as total_points
from sometable
where total_points > 25 -- <- error here
group by username
You can't limit rows by total_points, because sometable don't have that column. What you want is limit gouped resulting rows by total_points, computed for each group, so:
select sum(points) as total_points
from sometable
group by username
having sum(points) > 25
If you replace total_point in your example, then you simply chech if sum computed from all rows is bigger than 25 and then return all rows, grouped by username.
Edit:
Always remember order:
is FROM with JOIN's to get tables
is WHERE for limit rows from tables
is SELECT for limit columns
is GROUP BY for group rows into related groups
is HAVING for limit resulting groups
is ORDER BY for order results

MySQL query to return only duplicate entries with counts

I have a legacy MySQL table called lnk_lists_addresses with columns list_id and address_id. I'd like to write a query that reports all the cases where the same list_id-address_id combination appears more than once in the table with a count.
I tried this...
SELECT count(*), list_id, address_id
FROM lnk_lists_addresses
GROUP BY list_id, address_id
ORDER BY count(*) DESC
LIMIT 20
It works, sort of, because there are fewer than 20 duplicates. But how would I return only the counts greater than 1?
I tried adding "WHERE count(*) > 1" before and after GROUP BY but got errors saying the statement was invalid.
SELECT count(*), list_id, address_id
FROM lnk_lists_addresses
GROUP BY list_id, address_id
HAVING count(*)>1
ORDER BY count(*) DESC
To combine mine and Todd.Run's answers for a more "complete" answer. You want to use the HAVING clause:
http://dev.mysql.com/doc/refman/5.1/en/select.html
You want to use a "HAVING" clause. Its use is explained in the MySQL manual.
http://dev.mysql.com/doc/refman/5.1/en/select.html
SELECT count(*) AS total, list_id, address_id
FROM lnk_lists_addresses
WHERE total > 1
GROUP BY list_id, address_id
ORDER BY total DESC
LIMIT 20
If you name the COUNT() field, you can use it later in the statement.
EDIT: forgot about HAVING (>_<)