Greatest count for each customer in PostgreSQL - sql

customer | category | count
------------+---------------+-------
4846 | Vegetables | 1
1687 | Fast-Food | 7
2654 | Drink | 2
2654 | Vegetables | 3
1597 | Vegetables | 1
4846 | Drink | 2
2654 | Fast-Food | 1
1597 | Drink | 6
1597 | Snack | 3
how can i select the category which has greatest count for each customer for this table?

This is called the mode. You can use distinct on:
select distinct on (customer) t.*
from t
order by customer, count desc;

You can use window function row_number().
select
customer,
category,
count
from
(
select
*,
row_number() over (partition by customer order by count desc) as rnk
from yourTable
) val
where rnk = 1

Simple code for you try:
SELECT c.*
FROM (SELECT customer, max(count) as max_count
FROM customers
GROUP BY customer) as max_count_table
JOIN customers as c on max_count_table.customer = c.customer and max_count_table.max_count = c.count
Result:

Related

How to get Top 10 for a grouped column?

My data is a list of customers and products, and the cost for each product
Member Product Cost
Bob A123 $25
Bob A123 $25
Bob A123 $75
Joe A789 $50
Joe A789 $50
Bob C321 $50
Joe A123 $50
etc, etc, etc
My current query grabs each customer, product and cost, and also the total cost for that customer. It gives results like this:
Member Product Cost Total Cost
Bob A123 $125 $275
Bob A1433 $100 $275
Bob C321 $50 $275
Joe A123 $150 $250
Joe A789 $100 $250
How can I get the top 10 by Total Cost, not just the top 10 records overall? My query is:
SELECT a.Member
,a.Product
,SUM(a.Cost)
,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member) as 'Total Cost'
FROM MyTable a
GROUP BY a.Member
,a.Product
ORDER BY [Total Cost] DESC
If I do a SELECT TOP 10 it only gives me the first 10 rows. The actual Top 10 would end up being more like 40 or 50 rows.
Thanks!
Try this one.
SELECT tbl.member,
tbl.product,
Sum(tbl.cost) AS cost,
Max(stbl.totalcost) AS totalcost
FROM mytable tbl
INNER JOIN (SELECT member,
Sum(cost) AS totalcost,
Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
FROM mytable
GROUP BY member) stbl
ON stbl.member = tbl.member
WHERE stbl.rn <= 10
GROUP BY tbl.member, tbl.product
ORDER BY Max(stbl.rn)
Online Demo: http://sqlfiddle.com/#!18/87857/1/0
Table structure & Sample Data
CREATE TABLE mytable
(
member NVARCHAR(50),
product NVARCHAR(10),
cost INT
)
INSERT INTO mytable
VALUES ('Bob','A123','25'),
('Bob','A123','25'),
('Bob','A123','75'),
('Joe','A789','50'),
('Joe','A789','50'),
('Bob','C321','50'),
('Joe','A123','50'),
('Rock','A123','50'),
('Anord','A100','50'),
('Jack','A123','50'),
('Anord','A123','50'),
('Joe','A123','50'),
('Karma','A123','50'),
('Seetha','A123','50'),
('Aruna','A123','50'),
('Jake','A123','50'),
('Paul','A123','50'),
('Logan','A123','50'),
('Joe','A123','50');
Subquery - Total cost per customer
SELECT member,
Sum(cost) AS totalcost,
Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
FROM mytable
GROUP BY member
Subquery: Output
+---------+------------+----+
| member | totalcost | rn |
+---------+------------+----+
| Joe | 250 | 1 |
| Bob | 175 | 2 |
| Anord | 100 | 3 |
| Aruna | 50 | 4 |
| Jack | 50 | 5 |
| Jake | 50 | 6 |
| Karma | 50 | 7 |
| Logan | 50 | 8 |
| Paul | 50 | 9 |
| Rock | 50 | 10 |
| Seetha | 50 | 11 |
+---------+------------+----+
Record Count: 11
Main Query
SELECT tbl.member,
tbl.product,
Sum(tbl.cost) AS cost,
Max(stbl.totalcost) AS totalcost,
Max(stbl.rn) AS rn
FROM mytable tbl
INNER JOIN (SELECT member,
Sum(cost) AS totalcost,
Row_number() OVER (ORDER BY Sum(cost) DESC) AS rn
FROM mytable
GROUP BY member) stbl
ON stbl.member = tbl.member
GROUP BY tbl.member, tbl.product
ORDER BY Max(stbl.rn)
Main Query: Output
+---------+----------+-------+------------+----+
| member | product | cost | totalcost | rn |
+---------+----------+-------+------------+----+
| Joe | A123 | 150 | 250 | 1 |
| Joe | A789 | 100 | 250 | 1 |
| Bob | C321 | 50 | 175 | 2 |
| Bob | A123 | 125 | 175 | 2 |
| Anord | A100 | 50 | 100 | 3 |
| Anord | A123 | 50 | 100 | 3 |
| Aruna | A123 | 50 | 50 | 4 |
| Jack | A123 | 50 | 50 | 5 |
| Jake | A123 | 50 | 50 | 6 |
| Karma | A123 | 50 | 50 | 7 |
| Logan | A123 | 50 | 50 | 8 |
| Paul | A123 | 50 | 50 | 9 |
| Rock | A123 | 50 | 50 | 10 |
| Seetha | A123 | 50 | 50 | 11 |
+---------+----------+-------+------------+----+
Record Count: 14
You can use rank() and partition by but you may also need to use a window function:
with temp as (
SELECT a.Member
,a.Product
,SUM(a.Cost)
,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member)
as 'Total Cost'
FROM MyTable a
GROUP BY a.Member,a.Product
)
select a.*, rank() over (partition by member order by [Total Cost]
desc) as rank
from temp a
order by rank desc limit 10
You can use dense_rank() with apply :
select mt.*
from (select mt.*, sum(mt.Cost) over (partition by Product, Member) as Cost,
dense_rank() over (order by TotalCost desc) as seq
from MyTable mt cross apply
(select sum(mt1.Cost) as TotalCost
from MyTable mt1
whete mt1.member = mt.member
) mt1
) mt
where mt.seq <= 10;
Use a subquery to get the TOP 10 total costs and join to your query:
SELECT
t.Member, t.Product, t.Cost, g.[Total Cost]
FROM (
SELECT Member, Product, SUM(Cost) as Cost
FROM MyTable
GROUP BY Member, Product
) t INNER JOIN (
SELECT TOP (10) Member, SUM(Cost) as [Total Cost]
FROM MyTable
GROUP BY Member
ORDER BY [Total Cost] DESC
) g on g.Member = t.Member
ORDER BY g.[Total Cost] DESC, t.Member, t.Cost DESC
Depending on your requirement you may use:
SELECT TOP (10) WITH TIES...
You don't have to select from the same table twice. Use SUM OVER to get the total per member.
Use DENSE_RANK to get the totals ranked (highest total = 1, second highest total = 2, ...).
Use TOP(10) WITH TIES to get all rows having the top ten totals.
The query:
select top(10) with ties *
from
(
select
member,
product,
sum(cost),
sum(sum(cost)) over (partition by member) as total_cost
from mytable
group by member, product
) results
order by dense_rank() over (order by total_cost) desc;
If you want exactly 10 customers even when there are ties, then a slight variation on Thorsten's method will work:
select top(10) with ties t.*
from (select member, product, sum(cost) as cost,
sum(sum(cost)) over (partition by member) as total_cost
from t
group by member, product
) t
order by dense_rank() over (order by total_cost) desc, member;
The addition of member as a second key may seem like a minor addition. However, it ensures that the dense_rank() is unique for each member (of course ordered by total_cost). This, in turn, guarantees that you get exactly 10 customers.
You can use dense_rank() like below. Worked in SQL Server 2016. Change the value of limit variable to filter number of rows returned.
declare #limit int = 10;
SELECT *
FROM
(
select x.*,rn = dense_rank() over (order by x.TotalCost desc)
from (
SELECT a.Member
,a.Product
,SUM(a.Cost)
,(SELECT SUM(b.Cost) from MyTable b WHERE b.Member = a.Member) as 'TotalCost'
FROM MyTable a
GROUP BY a.Member
,a.Product
ORDER BY [Total Cost] DESC
) x
) y
where rn <= #limit
order by rn

Getting max and latest rows in SQL

I have a table containing Orders, where in the same day multiple orders can be created for a given Name. I need to return the latest Order for a given date and name, and if there are multiple orders on that day for a name, return the one with the largest order value.
Sample data:
ID | NAME | OrderDate | OrderValue
----+------+--------------+--------------
1 | A | 2019-01-15 | 100
2 | B | 2019-01-15 | 200
3 | A | 2019-01-15 | 150
4 | C | 2019-01-17 | 450
5 | D | 2019-01-18 | 300
6 | C | 2019-01-17 | 500
Result returned should be:
ID | NAME | OrderDate | OrderValue
----+------+--------------+--------------
2 | B | 2019-01-15 | 200
3 | A | 2019-01-15 | 150
5 | D | 2019-01-18 | 300
6 | C | 2019-01-17 | 500
I can do this in multiple SQL queries, but is there a simplistic query to achieve the above result?
Starting SQL Server 2005, just use ROW_NUMBER():
SELECT ID, Name, OrderDate, OrderValue
FROM (
SELECT
o.*,
ROW_NUMBER() OVER(PARTITION BY Name, OrderDate ORDER BY OrderValue DESC) rn
FROM orders o
) x WHERE rn = 1
ROW_NUMBER() assigns a rank to each record within groups of records having the same Name and OrderDate, sorted by OrderValue. The record with the highest order value gets row number 1.
With older versions, a solution to filter the table is to use a correlated subquery with a NOT EXITS condition :
SELECT ID, Name, OrderDate, OrderValue
FROM orders o
WHERE NOT EXISTS (
SELECT 1
FROM orders o1
WHERE
o1.Name = o.Name
AND o1.OrderDate = o.OrderDate
AND o1.OrderValue > o.OrderValue
)
The NOT EXISTS condition ensures that there is no other record with a highest OrderValue for the same Name and OrderDate.
Use cross apply:
select o.id, name, orderdate, o.ordervalue
from orders o
cross apply (select top 1 id, ordervalue from orders where name=o.name and orderdate=o.OrderDate order by ordervalue desc) oo
where o.id=oo.id
order by o.id

Rank in hive from count(column) from table

If I want to select count(user_name), country from the table in hive. What command should I use to get the result as top 2 country for the most user_name?
How can I use rank function?
id | user_name | country
1 | a | UK
2 | b | US
3 | c | AUS
4 | d | ITA
5 | e | UK
6 | f | US
the result should be:
rank| num_user_name | country
1 | 2 | US
1 | 2 | UK
2 | 1 | ITA
2 | 1 | AUS
A subquery is not necessary:
select dense_rank() over (order by count(*)) as rank,
country,
count(*) as num_user_name
from t
group by country
order by count(*) desc, country;
You could use the dense_rank analytic function:
with cte as (
select country,
count(user_name) as num_user_name
from tbl
group by country
), cte2 as (
select dense_rank() over (order by num_user_name desc) as ranked,
num_user_name,
country
from cte
)
select ranked,
num_user_name,
country
from cte2
where ranked <= 2
order by 1

Order By 2 columns

I have this table and .
id | item_id | created_at
-----+-----------+------------
1 | Apple | 2017-03-21
2 | Grape | 2017-03-23
3 | Grape | 2017-03-24
4 | Apple | 2017-03-25
I want to order by created_at and also at the same time order by item_id like this:
id | item_id | created_at
-----+-----------+------------
4 | Apple | 2017-03-25
1 | Apple | 2017-03-21
3 | Grape | 2017-03-24
2 | Grape | 2017-03-23
So if i add a new row for item_id: Grape my new results should be like this:
id | item_id | created_at
-----+-----------+------------
5 | Grape | 2017-03-28 (NEW)
3 | Grape | 2017-03-24
2 | Grape | 2017-03-23
4 | Apple | 2017-03-25
1 | Apple | 2017-03-21
and then if i add new row for item_id: Apple it should be like this:
id | item_id | created_at
-----+-----------+------------
6 | Apple | 2017-03-28 (NEW)
4 | Apple | 2017-03-25
1 | Apple | 2017-03-21
5 | Grape | 2017-03-27
3 | Grape | 2017-03-24
2 | Grape | 2017-03-23
...So it orders by the latest created_at and show the other rows with the same item_id below it
I have tried ORDER BY created_at, item_id DESC but it does not works and give me this instead:
id | item_id | created_at
-----+-----------+------------
6 | Apple | 2017-03-28
5 | Grape | 2017-03-27
4 | Apple | 2017-03-25
3 | Grape | 2017-03-24
2 | Grape | 2017-03-23
1 | Apple | 2017-03-21
SQLFiddle: PostgreSQL
WITH grouped AS (
select item_id, max(created_at) as max_dt
from tbl
group by item_id
)
SELECT tbl.*
FROM grouped
LEFT JOIN tbl USING (item_id)
ORDER BY grouped.max_dt desc,
grouped.item_id, -- important if apple and grape both have same max dt
tbl.created_at desc;
subquery to get the most recent dates by item (the item grouping)
left join those results on the actual table to get desired records
You need to first identify your first rule which is get the max created_at per item_id, therefore the subquery with the item_id grouped and a row_number() added so it can know which fruit cames first. Then just join with your table.
SELECT tt.id, tt.item_id, tt.created_at
FROM test_table tt
INNER JOIN
(SELECT item_id,
MAX(created_at),
ROW_NUMBER() OVER (ORDER BY MAX(created_at) DESC) rn
FROM test_table
GROUP BY item_id
ORDER BY 3) ord
ON tt.item_id = ord.item_id
ORDER BY ord.rn, tt.created_at DESC;
For mysql, try this:
select item.*
from item
left join (
select item_id, max(created_at) maxdate
from item
group by item_id
) t on item.item_id = t.item_id
order by t.maxdate desc, item.created_at desc
Demo1 in SQLFiddle
Demo2 in SQLFiddle
Here is a mysql version
set #ordr = 0;
select t.*
from tbl t
join (
select o.item_id, o.created_at, #ordr := #ordr + 1 as `order`
from (
select tbl.item_id, max(tbl.created_at) as created_at
from tbl
group by tbl.item_id
order by created_at desc
) as o
) sub
on t.item_id = sub.item_id
order by sub.`order`, sub.created_at desc
Please try -
SELECT id,
item_ID,
created_at
FROM tblOrders
ORDER BY item_ID,
item_ID = ( SELECT item_ID
FROM tblOrders
WHERE id = ( SELECT MAX( id ) FROM tblOrders ) ),
created_at;

Inconsistent Transpose

Given a table A has the following data:
+----------+-------+
| Supplier | buyer |
+----------+-------+
| A | 1 |
| A | 2 |
| B | 3 |
| B | 4 |
| B | 5 |
+----------+-------+
My question is, can I transpose the second column so the resultant table will be like:
+----------+--------+--------+--------+
| Supplier | buyer1 | buyer2 | buyer3 |
+----------+--------+--------+--------+
| A | 1 | 2 | |
| B | 3 | 4 | 5 |
+----------+--------+--------+--------+
Assuming the maximum number of buyers is known as three.
You could use a common table expression to give each buyer an order within the supplier, and then just do a regular case to put them in columns;
WITH cte AS (
SELECT supplier, buyer,
ROW_NUMBER() OVER (PARTITION BY supplier ORDER BY buyer) rn
FROM Table1
)
SELECT supplier,
MAX(CASE WHEN rn=1 THEN buyer END) buyer1,
MAX(CASE WHEN rn=2 THEN buyer END) buyer2,
MAX(CASE WHEN rn=3 THEN buyer END) buyer3
FROM cte
GROUP BY supplier;
An SQLfiddle to test with.
You may consider using PIVOT clause:
select *
from (
select supplier, buyer, row_number() over (partition by supplier order by buyer) as seq
from a
)
pivot (max(buyer) for seq in (1 as buyer1, 2 as buyer2, 3 as buyer3));
SQLFiddle here.