Create array in SELECT - sql

I'm using PostgreSQL 9.1 and I have this data structure:
A B
-------
1 a
1 a
1 b
1 c
1 c
1 c
1 d
2 e
2 e
I need a query that produces this result:
1 4 {{c,3},{a,2},{b,1},{d,1}}
2 1 {{e,2}}
A=1, 4 rows total with A=1, the partial counts (3 rows with c value, 2 rows with a value, .....)
The distinct values of column "A"
The count of all rows related to the "A" value
An array contains all the elements related to the "A" value and the relative count of itself
The sort needed for the array is based of the count of each group (like the example 3,2,1,1).

This should do the trick:
SELECT a
, sum(ab_ct)::int AS ct_total
, count(*)::int AS ct_distinct_b
, array_agg(b || ', ' || ab_ct::text) AS b_arr
FROM (
SELECT a, b, count(*) AS ab_ct
FROM tbl
GROUP BY a, b
ORDER BY a, ab_ct DESC, b -- append "b" to break ties in the count
) t
GROUP BY a
ORDER BY ct_total DESC;
Returns:
ct_total: total count of b per a.
ct_distinct_b: count of distinct b per a.
b_arr: array of b plus frequency of b, sorted by frequency of b.
Ordered by total count of b per a.
Alternatively, you can use an ORDER BY clause within the aggregate call in PostgreSQL 9.0 or later. Like:
SELECT a
, sum(ab_ct)::int AS ct_total
, count(*)::int AS ct_distinct_b
, array_agg(b || ', ' || ab_ct::text ORDER BY a, ab_ct DESC, b) AS b_arr
FROM (
SELECT a, b, count(*) AS ab_ct
FROM tbl
GROUP BY a, b
) t
GROUP BY a
ORDER BY ct_total DESC;
May be clearer. But it's typically slower. And sorting rows in a subquery works for simple queries like this one. More explanation:
How to apply ORDER BY and LIMIT in combination with an aggregate function?

Maybe I'm missing something, but this should do it:
SELECT a,
count(*) as cnt,
array_agg(b) as all_values
FROM your_table
GROUP BY a

This is what you need:
SELECT A, COUNT(*), array_agg(b)
FROM YourTable
GROUP BY A

Related

Order by Counts in a Group

I have the data which looks like below:
group resource count
A X 5
A Y 8
A Z 2
B E 8
B F 10
B G 2
I want to order the data in a way that the group comes on the top having highest sum of count using SQL statement. Such as:
group resource count
B F 10
B E 8
B G 2
A Y 8
A X 5
A Z 2
I am avoiding using multiple select statements too. Any help for this. Thanks
Try using a window function to sum your count and order by that. Your DBMS was not listed, so might need some minor tweaks to syntax, but I think this should work for most DBMS
SELECT [Group]
,[Resource]
,[Count]
,TotalCountOfGroup = SUM([Count]) OVER (PARTITION BY [Group])
FROM YourTable
ORDER BY TotalCountOfGroup DESC,[Count] DESC
Or if you need to exclude the TotalCount column, can wrap it in a CTE
WITH cte_YourData AS (
SELECT [Group],[Resource],[Count]
,TotalCountOfGroup = SUM([Count]) OVER (PARTITION BY [Group])
FROM YourTable
)
SELECT [Group],[Resource],[Count]
FROM cte_YourData
ORDER BY TotalCountOfGroup DESC,[Count] DESC

SQL compare one column, then another, by using max over partition by

DB: SAP HANA
I have asked this question before, but now I'm facing more complicated question. When qty is the same, I want to return biggest no.
A
user
no
qty
A
10
20
A
11
20
B
12
40
B
13
10
B
id
user
1
A
2
B
Expected result
id
user
no
1
A
11
2
B
12
I try
SELECT
B.id,
B.user,
C.max_qty_no
FROM
B
LEFT JOIN (
SELECT
A.user,
CASE
WHEN A.qty = (
MAX(A.qty) OVER (PARTITION BY A.user)
) THEN A.no
END as max_qty_no
FROM
A
) C ON C.user = B.user AND
C.max_qty_no IS NOT NULL;
return
id
user
no
1
A
10
1
A
11
2
B
12
You want to rank the A rows per user and only select the best-ranked row. So far this ranking was on one column only, so you could simply compare the value with the maximum value. Now, however, the ranking must be done considering two columns instead of just one. You can use ROW_NUMBER for this ranking:
select id, user, no
from
(
select
b.id, b.user, a.no,
row_number() over (partition by b.user order by a.qty desc, a.no desc) as rn
from a
join b on b.user = a.user
) ranked
where rn = 1;
Since you want the MAX(no) per user having the largest quantity you need to apply additional selection criteria. The partitioning takes care of selecting the rows with MAX(qty) per user but you still need to select the rows with MAX(no) for each distinct user - you can do this by using the MAX aggregate function combined with a GROUP BY. With this small change you can return the expected results:
SELECT
B.id,
B.user,
MAX(C.max_qty_no)
FROM
B
LEFT JOIN (
SELECT
A.user,
CASE
WHEN A.qty = (
MAX(A.qty) OVER (PARTITION BY A.user)
) THEN A.no
END as max_qty_no
FROM
A
) C ON C.user = B.user AND
C.max_qty_no IS NOT NULL
GROUP BY B.id, B.user;

Get the mostly occured value in multiple columns of a table

I have table which contains three columns Work, Cost, Duration. I need to get the maximum
occurred values in all three columns. If two values occurred same times, then return the
maximum value from that two. Please see the sample data & result below.
Work Cost Duration
5 2 6
5 8 7
6 8 7
2 2 2
6 2 6
I need to get the result as
Work Cost Duration
6 2 7
I tried with the following query, But it is returning the value for one column, that too it is returning the count for all the values
select Duration, count(*) as "DurationCount" from SimulationResult
group by Duration
order by count(*) desc,Duration desc
You can do something like
select * from
(select top 1 Work from SimulationResult
group by Work
order by count(*) desc, Work desc),
(select top 1 Cost from SimulationResult
group by Cost
order by count(*) desc, Cost desc),
(select top 1 Duration from SimulationResult
group by Duration
order by count(*) desc, Duration desc)
Try the following:
select max(t1.a), max(t2.b), max(t3.c)
from
(select a from (
select a, count(a) counta
from #tab
group by a) tempa
having counta = max(counta)) t1,
(select b from (
select b, count(b) countb
from #tab
group by b) tempb
having countb = max(countb)) t2,
(select c from (
select c, count(c) countc
from #tab
group by c) tempc
having countc = max(countc)) t3

running sum on group by

I have this query
SELECT NAME, OTHER_NAME, COUNT(NAME)
FROM ETHNICITY
GROUP BY NAME,OTHER_NAME
and I would like to add a running sum on other_name or name in that column.
For instance, if there is 3x african american and 2x name= "other" and other_name = "jewish"
I want to give it 3 and 2 as the counts and sum them up as it traverses
Any ideas how I can augment this to add that? Thanks.
In Oracle, a running sum is easily done with the sum() ... over() window function:
select name
, other_name
, name_count
, sum(name_count) over(
order by name, other_name) as running
from (
select name
, other_name
, count(name) as name_count
from ethnicity
group by
name
, other_name
order by
name
, other_name
) subqueryalias
Example at SQL Fiddle
I prefer to do this using a subquery:
select t.name, t.other_name, t.cnt,
sum(cnt) over (order by name) as cumecnt
from (SELECT NAME, OTHER_NAME, COUNT(NAME) as cnt
FROM ETHNICITY
GROUP BY NAME,OTHER_NAME
) t
This assumes that you want a cumulative sum of count in the order of name.
The order by in the analytic functions do cumulative sums. This is standard syntax, and also supported by Postgres and SQL Server 2012.
The following might also work
select name, other_name, count(name) as cnt,
sum(count(name)) over (order by name)
from ethnicity
group by name, other_name
I find this harder to read (the sum(count()) is a bit jarring) and perhaps more prone to error. I haven't tried this syntax on Oracle; it does work in SQL Server 2012.
Look at Grouping sets, lets you aggregate totals.
Not sure this is what you're after though...
SELECT NAME, OTHER_NAME, COUNT(NAME)
FROM ETHNICITY
GROUP BY GROUPING SETS ((NAME,OTHER_NAME), (Name), ())
Sorry ID10T error... the grouping sets didn't require a 2nd aggregate, the count will do it on it's own:
So this data:
Name Other_Name
A B
A C
A D
B E
B F
B G
C H
C I
C J
Results in
Name Other_Name CNT(Name)
A B 1
A C 1
A D 1
A 3
B E 1
B F 1
B G 1
B 3
C H 1
C I 1
C J 1
C 3
9

Get all rows with one of the top 2 values in a column

I have a table with multiple entries and I have ordered it according to a sales criterion. So, if the entries are like:
Item Sales
a 10
b 10
c 9
d 8
e 8
f 7
I want to extract the items with the highest and second highest number of sales. As such,
I would want to extract a, b and c.
Is there any function in PostgreSQL that can help with this?
To include all rows with one of the top two sales values, you could use the dense_rank() window function:
WITH x AS (
SELECT *
,dense_rank() OVER (ORDER BY sales DESC) AS rnk
FROM tbl
)
SELECT item, sales
FROM x
WHERE rnk < 3;
You need PostgreSQL 8.4 or later for that.
For older versions, you could:
SELECT *
FROM tbl
JOIN (
SELECT sales
FROM tbl
GROUP BY 1
ORDER BY 1 DESC
LIMIT 2
) t1 USING (sales)
Use ORDER BY and LIMIT:
SELECT Item, Sales
FROM mytable
ORDER BY Sales DESC
LIMIT 2;
Results in:
item sales
---- -----
a 10
b 9
SQL Fiddle