Order by Counts in a Group

Order by Counts in a Group - sql

I have the data which looks like below:
group resource count
A X 5
A Y 8
A Z 2
B E 8
B F 10
B G 2
I want to order the data in a way that the group comes on the top having highest sum of count using SQL statement. Such as:
group resource count
B F 10
B E 8
B G 2
A Y 8
A X 5
A Z 2
I am avoiding using multiple select statements too. Any help for this. Thanks

Try using a window function to sum your count and order by that. Your DBMS was not listed, so might need some minor tweaks to syntax, but I think this should work for most DBMS
SELECT [Group]
,[Resource]
,[Count]
,TotalCountOfGroup = SUM([Count]) OVER (PARTITION BY [Group])
FROM YourTable
ORDER BY TotalCountOfGroup DESC,[Count] DESC
Or if you need to exclude the TotalCount column, can wrap it in a CTE
WITH cte_YourData AS (
SELECT [Group],[Resource],[Count]
,TotalCountOfGroup = SUM([Count]) OVER (PARTITION BY [Group])
FROM YourTable
)
SELECT [Group],[Resource],[Count]
FROM cte_YourData
ORDER BY TotalCountOfGroup DESC,[Count] DESC

Related

SQL compare one column, then another, by using max over partition by

DB: SAP HANA
I have asked this question before, but now I'm facing more complicated question. When qty is the same, I want to return biggest no.
A
user
no
qty
A
10
20
A
11
20
B
12
40
B
13
10
B
id
user
1
A
2
B
Expected result
id
user
no
1
A
11
2
B
12
I try
SELECT
B.id,
B.user,
C.max_qty_no
FROM
B
LEFT JOIN (
SELECT
A.user,
CASE
WHEN A.qty = (
MAX(A.qty) OVER (PARTITION BY A.user)
) THEN A.no
END as max_qty_no
FROM
A
) C ON C.user = B.user AND
C.max_qty_no IS NOT NULL;
return
id
user
no
1
A
10
1
A
11
2
B
12

You want to rank the A rows per user and only select the best-ranked row. So far this ranking was on one column only, so you could simply compare the value with the maximum value. Now, however, the ranking must be done considering two columns instead of just one. You can use ROW_NUMBER for this ranking:
select id, user, no
from
(
select
b.id, b.user, a.no,
row_number() over (partition by b.user order by a.qty desc, a.no desc) as rn
from a
join b on b.user = a.user
) ranked
where rn = 1;

Since you want the MAX(no) per user having the largest quantity you need to apply additional selection criteria. The partitioning takes care of selecting the rows with MAX(qty) per user but you still need to select the rows with MAX(no) for each distinct user - you can do this by using the MAX aggregate function combined with a GROUP BY. With this small change you can return the expected results:
SELECT
B.id,
B.user,
MAX(C.max_qty_no)
FROM
B
LEFT JOIN (
SELECT
A.user,
CASE
WHEN A.qty = (
MAX(A.qty) OVER (PARTITION BY A.user)
) THEN A.no
END as max_qty_no
FROM
A
) C ON C.user = B.user AND
C.max_qty_no IS NOT NULL
GROUP BY B.id, B.user;

Get the mostly occured value in multiple columns of a table

I have table which contains three columns Work, Cost, Duration. I need to get the maximum
occurred values in all three columns. If two values occurred same times, then return the
maximum value from that two. Please see the sample data & result below.
Work Cost Duration
5 2 6
5 8 7
6 8 7
2 2 2
6 2 6
I need to get the result as
Work Cost Duration
6 2 7
I tried with the following query, But it is returning the value for one column, that too it is returning the count for all the values
select Duration, count(*) as "DurationCount" from SimulationResult
group by Duration
order by count(*) desc,Duration desc

You can do something like
select * from
(select top 1 Work from SimulationResult
group by Work
order by count(*) desc, Work desc),
(select top 1 Cost from SimulationResult
group by Cost
order by count(*) desc, Cost desc),
(select top 1 Duration from SimulationResult
group by Duration
order by count(*) desc, Duration desc)

Try the following:
select max(t1.a), max(t2.b), max(t3.c)
from
(select a from (
select a, count(a) counta
from #tab
group by a) tempa
having counta = max(counta)) t1,
(select b from (
select b, count(b) countb
from #tab
group by b) tempb
having countb = max(countb)) t2,
(select c from (
select c, count(c) countc
from #tab
group by c) tempc
having countc = max(countc)) t3

SELECT records that have top n counts for one column

I am using postgresql 9.2.
I have a dataset like this:
ID A B
1 x x
2 x x
2 x x
2 x x
3 x x
4 x x
4 x x
I want to display records with ID that has the top n count. Say, top 2 counts of ID--in this case, ID=2 and 4.
So the dataset should be:
ID A B
2 x x
2 x x
2 x x
4 x x
4 x x
My first thought was to create a new view by calculating the top n count, and then match the ID of the new view with the ID of the original table, thanks for this
However, the query runs forever, since EXISTS takes enormous time.
I wonder if there's a better way to do this?

You can do this with nested window functions:
select t.id, t.a, t.b
from (select t.*, dense_rank() over (order by idcnt desc, id) as seqnum
from (select t.*, count(*) over (partition by id) as idcnt
from t
) t
) t
where seqnum <= 2;
You can check out the SQLFiddle.

This should be considerably simpler and faster than two subquery levels with window functions.
SELECT *
FROM t
JOIN (
SELECT id
FROM t
GROUP BY 1
ORDER BY count(*) DESC
LIMIT 2
) top2 USING (id)
As mentioned before, you need an index for this to be really fast. If id is your primary key you are all set.

running sum on group by

I have this query
SELECT NAME, OTHER_NAME, COUNT(NAME)
FROM ETHNICITY
GROUP BY NAME,OTHER_NAME
and I would like to add a running sum on other_name or name in that column.
For instance, if there is 3x african american and 2x name= "other" and other_name = "jewish"
I want to give it 3 and 2 as the counts and sum them up as it traverses
Any ideas how I can augment this to add that? Thanks.

In Oracle, a running sum is easily done with the sum() ... over() window function:
select name
, other_name
, name_count
, sum(name_count) over(
order by name, other_name) as running
from (
select name
, other_name
, count(name) as name_count
from ethnicity
group by
name
, other_name
order by
name
, other_name
) subqueryalias
Example at SQL Fiddle

I prefer to do this using a subquery:
select t.name, t.other_name, t.cnt,
sum(cnt) over (order by name) as cumecnt
from (SELECT NAME, OTHER_NAME, COUNT(NAME) as cnt
FROM ETHNICITY
GROUP BY NAME,OTHER_NAME
) t
This assumes that you want a cumulative sum of count in the order of name.
The order by in the analytic functions do cumulative sums. This is standard syntax, and also supported by Postgres and SQL Server 2012.
The following might also work
select name, other_name, count(name) as cnt,
sum(count(name)) over (order by name)
from ethnicity
group by name, other_name
I find this harder to read (the sum(count()) is a bit jarring) and perhaps more prone to error. I haven't tried this syntax on Oracle; it does work in SQL Server 2012.

Look at Grouping sets, lets you aggregate totals.
Not sure this is what you're after though...
SELECT NAME, OTHER_NAME, COUNT(NAME)
FROM ETHNICITY
GROUP BY GROUPING SETS ((NAME,OTHER_NAME), (Name), ())
Sorry ID10T error... the grouping sets didn't require a 2nd aggregate, the count will do it on it's own:
So this data:
Name Other_Name
A B
A C
A D
B E
B F
B G
C H
C I
C J
Results in
Name Other_Name CNT(Name)
A B 1
A C 1
A D 1
A 3
B E 1
B F 1
B G 1
B 3
C H 1
C I 1
C J 1
C 3
9

Get all rows with one of the top 2 values in a column

I have a table with multiple entries and I have ordered it according to a sales criterion. So, if the entries are like:
Item Sales
a 10
b 10
c 9
d 8
e 8
f 7
I want to extract the items with the highest and second highest number of sales. As such,
I would want to extract a, b and c.
Is there any function in PostgreSQL that can help with this?

To include all rows with one of the top two sales values, you could use the dense_rank() window function:
WITH x AS (
SELECT *
,dense_rank() OVER (ORDER BY sales DESC) AS rnk
FROM tbl
)
SELECT item, sales
FROM x
WHERE rnk < 3;
You need PostgreSQL 8.4 or later for that.
For older versions, you could:
SELECT *
FROM tbl
JOIN (
SELECT sales
FROM tbl
GROUP BY 1
ORDER BY 1 DESC
LIMIT 2
) t1 USING (sales)

Use ORDER BY and LIMIT:
SELECT Item, Sales
FROM mytable
ORDER BY Sales DESC
LIMIT 2;
Results in:
item sales
---- -----
a 10
b 9
SQL Fiddle

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Order by Counts in a Group - sql

Related

SQL compare one column, then another, by using max over partition by

Get the mostly occured value in multiple columns of a table

SELECT records that have top n counts for one column

running sum on group by

Get all rows with one of the top 2 values in a column

Categories

Resources