Rank in hive from count(column) from table - sql

If I want to select count(user_name), country from the table in hive. What command should I use to get the result as top 2 country for the most user_name?
How can I use rank function?
id | user_name | country
1 | a | UK
2 | b | US
3 | c | AUS
4 | d | ITA
5 | e | UK
6 | f | US
the result should be:
rank| num_user_name | country
1 | 2 | US
1 | 2 | UK
2 | 1 | ITA
2 | 1 | AUS

A subquery is not necessary:
select dense_rank() over (order by count(*)) as rank,
country,
count(*) as num_user_name
from t
group by country
order by count(*) desc, country;

You could use the dense_rank analytic function:
with cte as (
select country,
count(user_name) as num_user_name
from tbl
group by country
), cte2 as (
select dense_rank() over (order by num_user_name desc) as ranked,
num_user_name,
country
from cte
)
select ranked,
num_user_name,
country
from cte2
where ranked <= 2
order by 1

Related

Greatest count for each customer in PostgreSQL

customer | category | count
------------+---------------+-------
4846 | Vegetables | 1
1687 | Fast-Food | 7
2654 | Drink | 2
2654 | Vegetables | 3
1597 | Vegetables | 1
4846 | Drink | 2
2654 | Fast-Food | 1
1597 | Drink | 6
1597 | Snack | 3
how can i select the category which has greatest count for each customer for this table?
This is called the mode. You can use distinct on:
select distinct on (customer) t.*
from t
order by customer, count desc;
You can use window function row_number().
select
customer,
category,
count
from
(
select
*,
row_number() over (partition by customer order by count desc) as rnk
from yourTable
) val
where rnk = 1
Simple code for you try:
SELECT c.*
FROM (SELECT customer, max(count) as max_count
FROM customers
GROUP BY customer) as max_count_table
JOIN customers as c on max_count_table.customer = c.customer and max_count_table.max_count = c.count
Result:

How to get Top 10 for a grouped column?

My data is a list of customers and products, and the cost for each product
Member Product Cost
Bob A123 $25
Bob A123 $25
Bob A123 $75
Joe A789 $50
Joe A789 $50
Bob C321 $50
Joe A123 $50
etc, etc, etc
My current query grabs each customer, product and cost, and also the total cost for that customer. It gives results like this:
Member Product Cost Total Cost
Bob A123 $125 $275
Bob A1433 $100 $275
Bob C321 $50 $275
Joe A123 $150 $250
Joe A789 $100 $250
How can I get the top 10 by Total Cost, not just the top 10 records overall? My query is:
SELECT a.Member
,a.Product
,SUM(a.Cost)
,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member) as 'Total Cost'
FROM MyTable a
GROUP BY a.Member
,a.Product
ORDER BY [Total Cost] DESC
If I do a SELECT TOP 10 it only gives me the first 10 rows. The actual Top 10 would end up being more like 40 or 50 rows.
Thanks!
Try this one.
SELECT tbl.member,
tbl.product,
Sum(tbl.cost) AS cost,
Max(stbl.totalcost) AS totalcost
FROM mytable tbl
INNER JOIN (SELECT member,
Sum(cost) AS totalcost,
Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
FROM mytable
GROUP BY member) stbl
ON stbl.member = tbl.member
WHERE stbl.rn <= 10
GROUP BY tbl.member, tbl.product
ORDER BY Max(stbl.rn)
Online Demo: http://sqlfiddle.com/#!18/87857/1/0
Table structure & Sample Data
CREATE TABLE mytable
(
member NVARCHAR(50),
product NVARCHAR(10),
cost INT
)
INSERT INTO mytable
VALUES ('Bob','A123','25'),
('Bob','A123','25'),
('Bob','A123','75'),
('Joe','A789','50'),
('Joe','A789','50'),
('Bob','C321','50'),
('Joe','A123','50'),
('Rock','A123','50'),
('Anord','A100','50'),
('Jack','A123','50'),
('Anord','A123','50'),
('Joe','A123','50'),
('Karma','A123','50'),
('Seetha','A123','50'),
('Aruna','A123','50'),
('Jake','A123','50'),
('Paul','A123','50'),
('Logan','A123','50'),
('Joe','A123','50');
Subquery - Total cost per customer
SELECT member,
Sum(cost) AS totalcost,
Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
FROM mytable
GROUP BY member
Subquery: Output
+---------+------------+----+
| member | totalcost | rn |
+---------+------------+----+
| Joe | 250 | 1 |
| Bob | 175 | 2 |
| Anord | 100 | 3 |
| Aruna | 50 | 4 |
| Jack | 50 | 5 |
| Jake | 50 | 6 |
| Karma | 50 | 7 |
| Logan | 50 | 8 |
| Paul | 50 | 9 |
| Rock | 50 | 10 |
| Seetha | 50 | 11 |
+---------+------------+----+
Record Count: 11
Main Query
SELECT tbl.member,
tbl.product,
Sum(tbl.cost) AS cost,
Max(stbl.totalcost) AS totalcost,
Max(stbl.rn) AS rn
FROM mytable tbl
INNER JOIN (SELECT member,
Sum(cost) AS totalcost,
Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
FROM mytable
GROUP BY member) stbl
ON stbl.member = tbl.member
GROUP BY tbl.member, tbl.product
ORDER BY Max(stbl.rn)
Main Query: Output
+---------+----------+-------+------------+----+
| member | product | cost | totalcost | rn |
+---------+----------+-------+------------+----+
| Joe | A123 | 150 | 250 | 1 |
| Joe | A789 | 100 | 250 | 1 |
| Bob | C321 | 50 | 175 | 2 |
| Bob | A123 | 125 | 175 | 2 |
| Anord | A100 | 50 | 100 | 3 |
| Anord | A123 | 50 | 100 | 3 |
| Aruna | A123 | 50 | 50 | 4 |
| Jack | A123 | 50 | 50 | 5 |
| Jake | A123 | 50 | 50 | 6 |
| Karma | A123 | 50 | 50 | 7 |
| Logan | A123 | 50 | 50 | 8 |
| Paul | A123 | 50 | 50 | 9 |
| Rock | A123 | 50 | 50 | 10 |
| Seetha | A123 | 50 | 50 | 11 |
+---------+----------+-------+------------+----+
Record Count: 14
You can use rank() and partition by but you may also need to use a window function:
with temp as (
SELECT a.Member
,a.Product
,SUM(a.Cost)
,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member)
as 'Total Cost'
FROM MyTable a
GROUP BY a.Member,a.Product
)
select a.*, rank() over (partition by member order by [Total Cost]
desc) as rank
from temp a
order by rank desc limit 10
You can use dense_rank() with apply :
select mt.*
from (select mt.*, sum(mt.Cost) over (partition by Product, Member) as Cost,
dense_rank() over (order by TotalCost desc) as seq
from MyTable mt cross apply
(select sum(mt1.Cost) as TotalCost
from MyTable mt1
whete mt1.member = mt.member
) mt1
) mt
where mt.seq <= 10;
Use a subquery to get the TOP 10 total costs and join to your query:
SELECT
t.Member, t.Product, t.Cost, g.[Total Cost]
FROM (
SELECT Member, Product, SUM(Cost) as Cost
FROM MyTable
GROUP BY Member, Product
) t INNER JOIN (
SELECT TOP (10) Member, SUM(Cost) as [Total Cost]
FROM MyTable
GROUP BY Member
ORDER BY [Total Cost] DESC
) g on g.Member = t.Member
ORDER BY g.[Total Cost] DESC, t.Member, t.Cost DESC
Depending on your requirement you may use:
SELECT TOP (10) WITH TIES...
You don't have to select from the same table twice. Use SUM OVER to get the total per member.
Use DENSE_RANK to get the totals ranked (highest total = 1, second highest total = 2, ...).
Use TOP(10) WITH TIES to get all rows having the top ten totals.
The query:
select top(10) with ties *
from
(
select
member,
product,
sum(cost),
sum(sum(cost)) over (partition by member) as total_cost
from mytable
group by member, product
) results
order by dense_rank() over (order by total_cost) desc;
If you want exactly 10 customers even when there are ties, then a slight variation on Thorsten's method will work:
select top(10) with ties t.*
from (select member, product, sum(cost) as cost,
sum(sum(cost)) over (partition by member) as total_cost
from t
group by member, product
) t
order by dense_rank() over (order by total_cost) desc, member;
The addition of member as a second key may seem like a minor addition. However, it ensures that the dense_rank() is unique for each member (of course ordered by total_cost). This, in turn, guarantees that you get exactly 10 customers.
You can use dense_rank() like below. Worked in SQL Server 2016. Change the value of limit variable to filter number of rows returned.
declare #limit int = 10;
SELECT *
FROM
(
select x.*,rn = dense_rank() over (order by x.TotalCost desc)
from (
SELECT a.Member
,a.Product
,SUM(a.Cost)
,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member) as 'TotalCost'
FROM MyTable a
GROUP BY a.Member
,a.Product
ORDER BY [Total Cost] DESC
) x
) y
where rn <= #limit
order by rn

Postgresql - Return (N) rows for each ID

I have a table like this
contact_id | phone_number
1 | 55551002
1 | 55551003
1 | 55551000
2 | 55552001
2 | 55552008
2 | 55552003
2 | 55552007
3 | 55553001
3 | 55553002
3 | 55553009
3 | 55553004
4 | 55554000
I want to return only 3 numbers of each contact_id, order by phone_number, like this:
contact_id | phone_number
1 | 55551000
1 | 55551002
1 | 55551003
2 | 55552001
2 | 55552003
2 | 55552007
3 | 55553001
3 | 55553002
3 | 55553004
4 | 55554000
please need be an optimized query.
My Query
SELECT a.cod_cliente, count(a.telefone) as qtd
FROM crm.contatos a
LEFT JOIN (
SELECT *
FROM crm.contatos b
LIMIT 3
) AS sub_contatos ON sub_contatos.cod_contato = a.cod_cliente
group by a.cod_cliente;
This type of query can easily be solved using window functions:
select contact_id, phone_number
from (
select contact_id, phone_number,
row_Number() over (partition by contact_id order by phone_number) as rn
from crm.contatos
) t
where rn <= 3
order by contact_id, phone_number;

SQL subquery to return rank 2

I have a question about writing a sub-query in Microsoft T-SQL. From the original table I need to return the name of the person with the second most pets. I am able to write a query that returns the number of perts per person, but I'm not sure how to write a subquery to return rank #2.
Original table:
+—————————-——+———-————-+
| Name | Pet |
+————————————+————-————+
| Kathy | dog |
| Kathy | cat |
| Nick | gerbil |
| Bob | turtle |
| Bob | cat |
| Bob | snake |
+—————————-——+—————-———+
I have the following query:
SELECT Name, COUNT(Pet) AS NumPets
FROM PetTable
GROUP BY Name
ORDER BY NumPets DESC
Which returns:
+—————————-——+———-————-+
| Name | NumPets |
+————————————+————-————+
| Bob | 3 |
| Kathy | 2 |
| Nick | 1 |
+—————————-——+—————-———+
You are using TSQL So:
WITH C AS (
SELECT COUNT(Pet) OVER (PARTITION BY Name) cnt
,Name
FROM PetTable
)
SELECT TOP 1 Name, cnt AS NumPets
FROM C
WHERE cnt = 2
The ANSI standard method is:
OFFSET 1 FETCH FIRST 1 ROW ONLY
However, most databases have their own syntax for this, using limit, top or rownum. You don't specify the database, so I'm sticking with the standard.
This is how you could use ROW_NUMBER to get the result.
SELECT *
FROM(
SELECT ROW_NUMBER() OVER (ORDER BY COUNT(name) DESC) as RN, Name, COUNT(NAME) AS COUNT
FROM PetTable
GROUP BY Name
) T
WHERE T.RN = 2
In MSSQL you can do this:
SELECT PetCounts.Name, PetCounts.NumPets FROM (
SELECT
RANK() OVER (ORDER BY COUNT(Pet) DESC) AS rank,
Name, COUNT(Pet)as NumPets
FROM PetTable
GROUP BY Name
) AS PetCounts
WHERE rank = 2
This will return multiple rows if they have the same rank. If you want to return just one row you can replace RANK() with ROW_NUMBER()

Sql two table query most duplicated foreign key

I got those two tables sport and student:
First table sport:
|idsport | name |
_______________________
| 1 | bobsled |
| 2 | skating |
| 3 | boarding |
| 4 | iceskating |
| 5 | skiing |
Second table student:
foreign key
|idstudent | name | sport_idsport
__________________________________________
| 1 | john | 3 |
| 2 | pauly | 2 |
| 3 | max | 1 |
| 4 | jane | 2 |
| 5 | nico | 5 |
so far i did this it output which number is mostly inserted, but cant get it to work
with two tables
SELECT sport_idsport
FROM (SELECT sport_idsport FROM student GROUP BY sport_idsport ORDER BY COUNT(*) desc)
WHERE ROWNUM<=1;
I need to output name of most popular sport, in that case it would be skating.
I use oracle sql.
with counter as (
Select sport_idsport,
count(*) as cnt,
dense_rank() over (order by count(*) desc) as rn
from student
group by sport_idsport
)
select s.*, c.cnt
from sport s
join counter c on c.sport_idsport = s.idsport and c.rn = 1;
SQLFiddle example: http://sqlfiddle.com/#!4/b76e21/1
select cnt, sport_idsport from (
select count(*) cnt, sport_idsport
from student
group by sport_idsport
order by count(*) desc
)
where rownum = 1