how to query the percentage of aggregate in vertica - sql

Table product
productId type
1 A
2 A
3 A
4 B
5 B
6 C
What I want:
type perc
A 0.5
B 0.33
C 0.17
We can write a simple query like this:
Select type, cnt/(select count(*) from product) AS perc
FROM (
select type, count(*) as cnt
from product
group by type
) nested
But vertica doesn't support the subselect which is not correlated
Need someone's help!

Vertica does support both correlated and non-correlated subquery even if you might have restrictions on the joining predicate.
So, your query here above just works. And - guess what - it continues to work even if you use indentation:
SQL> SELECT
type
, cnt/( select count (*) FROM product ) AS perc
FROM
( SELECT type, count (*) as cnt
FROM product
GROUP BY type
) nested ;
type | perc
------+----------------------
C | 0.166666666666666667
A | 0.500000000000000000
B | 0.333333333333333333
(3 rows)
Of course you can re-write it in a different way. For example:
SQL> SELECT
a.type
, a.cnt/b.tot as perc
FROM
( SELECT type , count (*) as cnt
FROM product
GROUP BY type ) a
CROSS JOIN
( SELECT count (*) AS tot
FROM product ) b
ORDER BY 1
;
type | perc
------+----------------------
A | 0.500000000000000000
B | 0.333333333333333333
C | 0.166666666666666667
(3 rows)

You could also use analytic functions, which are messy in this application, but work:
WITH product AS (
select 1 as productId, 'A' as type
union all select 2, 'A'
union all select 3, 'A'
union all select 4, 'B'
union all select 5, 'B'
union all select 6, 'C'
)
SELECT distinct /* distinct because analytic functions don't reduce row count like aggregate functions */
type, count(*) over (partition by type) / count(*) over ()
FROM product;
type | perc
------+----------------------
A | 0.500000000000000000
B | 0.333333333333333333
C | 0.166666666666666667
count(*) over (partition by type) counts each type;
count(*) over () counts over everything, so gets the total count

Related

How do i select all columns, plus the result of the sum

I have this select:
"Select * from table" that return:
Id
Value
1
1
1
1
2
10
2
10
My goal is create a sum from each Value group by id like this:
Id
Value
Sum
1
1
2
1
1
2
2
10
20
2
10
20
I Have tried ways like:
SELECT Id,Value, (SELECT SUM(Value) FROM Table V2 WHERE V2.Id= V.Id GROUP BY IDRNC ) FROM Table v;
But the is not grouping by id.
Id
Value
Sum
1
1
1
1
1
1
2
10
10
2
10
10
Aggregation aggregates rows, reducing the number of records in the output. In this case you want to apply the result of a computation to each of your records, task carried out by the corresponding window function.
SELECT table.*, SUM(Value) OVER(PARTITION BY Id) AS sum_
FROM table
Check the demo here.
Your attempt looks correct.
Can you try the below query :
It works for me :
SELECT Id, Value,
(SELECT SUM(Value) FROM Table V2 WHERE V2.Id= V.Id GROUP BY ID) as sum
FROM Table v;
You can do it using inner join to join with selection grouped by id :
select t.*, sum
from _table t
inner join (
select id, sum(Value) as sum
from _table
group by id
) as s on s.id = t.id
You can check it here
Your select is ok if you adjust it just a little:
SELECT Id,Value, (SELECT SUM(Value) FROM Table V2 WHERE V2.Id= V.Id GROUP BY IDRNC ) FROM Table v;
GROUP BY IDRNC is a mistake and should be GROUP BY ID
you should give an alias to a sum column ...
subquery selecting the sum does not have to have self table alias to be compared with outer query that has one (this is not a mistake - works either way)
Test:
WITH
a_table (ID, VALUE) AS
(
Select 1, 1 From Dual Union All
Select 1, 1 From Dual Union All
Select 2, 10 From Dual Union All
Select 2, 10 From Dual
)
SELECT ID, VALUE, (SELECT SUM(VALUE) FROM a_table WHERE ID = v.ID GROUP BY ID) "ID_SUM" FROM a_table v;
ID VALUE ID_SUM
---------- ---------- ----------
1 1 2
1 1 2
2 10 20
2 10 20

multiple top n aggregates query defined as a view (or function)?

I couldn't find a past question exactly like this problem. I have an orders table, containing a customer id, order date, and several numeric columns (how many of a particular item were ordered on that date). Removing some of the numberics, it looks like this:
customer_id date a b c d
0001 07/01/22 0 3 3 5
0001 07/12/22 12 0 50 0
0002 06/30/22 5 65 0 30
0002 07/20/22 1 0 19 2
0003 08/01/22 0 0 99 0
I need to sum each numeric column by customer_id, then return the top n customers for each column. Obviously that means a single customer may appear multiple times, once for each column. Assuming top 2, the desired output would look something like this:
column_ranked customer_id sum rank
'a' 001 12 1
'a' 002 6 2
'b' 002 65 1
'b 001 3 2
'c' 003 99 1
'c' 001 53 2
'd' 002 30 1
'd' 001 5 2
(this assumes no date range filter)
My first thought was a CTE to collapse the table into its per-customer sums, then a union from the CTE, with a limit n clause, once for each summed column. That works if the date range is hard-coded into the CTE .... but I want to define this as a view, so it can be called by users something like this:
SELECT * from top_customers_view WHERE date_range BETWEEN ( date1 and date2 )
How can I pass the date restriction down to the CTE? Or am I taking the wrong approach entirely? If a view isn't possible, can it be done as a function? (without using a costly cursor, that is.)
Since the date ranges clearly produce a massive number of combinations you cannot generate a view with them. You can write a query, however, as shown below:
with
p as (select cast ('2022-01-01' as date) as ds, cast ('2022-12-31' as date) as de),
a as (
select top 10 customer_id, 'a' as col, sum(a) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
b as (
select top 10 customer_id, 'b' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
c as (
select top 10 customer_id, 'c' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
d as (
select top 10 customer_id, 'd' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
)
select * from a
union all select * from b
union all select * from c
union all select * from d
order by customer_id, col, s desc
The date range is in the second line.
See db<>fiddle.
Alternatively, you could create a data warehousing solution, but it would require much more effort to make it work.

MULTIPLE COUNTS IN THE SAME QUERY

I have this table, i want to count the number of orders which are of the same type , and the count of all orders, as follow
ord_id type
1 A
2 B
3 A
4 C
Here is the result :
TYPE COUNT TOTAL
A 2 4
B 1 4
C 1 4
where count column is the count of orders based on their type, and total is the total orders.
Here is my code:
SELECT type, COUNT(*)
FROM
table
where type = 'A'
Union
SELECT type, COUNT(*)
FROM
table
where type = 'b';
Use aggregation and window functions:
select
type,
count(*) cnt,
sum(count(*)) over() total
from mytable
group by type

How to select max of count in PostgreSQL

I have table in PostgreSQL with the following schema:
Category | Type
------------+---------
A | 0
C | 11
B | 5
D | 1
D | 0
F | 2
E | 11
E | 9
. | .
. | .
How can I select category wise maximum occurrence of type? The following give me all:
SELECT
category,
type,
COUNT(*)
FROM
table
GROUP BY
category,
type
ORDER BY
category,
count
DESC
My expected result is something like this:
Cat |Type |Count
--------+-------+------
A |0 |5
B |5 |30
C |2 |20
D |3 |10
That is the type with max occurrence in each category with count of that type.
You can use the following query:
SELECT category, type, cnt
FROM (
SELECT category, type, cnt,
RANK() OVER (PARTITION BY category
ORDER BY cnt DESC) AS rn
FROM (
SELECT category, type, COUNT(type) AS cnt
FROM mytable
GROUP BY category, type ) t
) s
WHERE s.rn = 1
The above query uses your own query as posted in the OP and applies RANK() windowed function to it. Using RANK() we can specify all records coming from the initial query having the greatest COUNT(type) value.
Note: If there are more than one types having the maximum number of occurrences for a specific category, then all of them will be returned by the above query, as a consequence of using RANK.
Demo here
If I understand correctly, you can use window functions:
SELECT category, type, cnt
FROM (SELECT category, type, COUNT(*) as cnt,
ROW_NUMBER() OVER (PARTITION BY type ORDER BY COUNT(*) DESC) as seqnum
FROM table
GROUP BY category, type
) ct
WHERE seqnum = 1;
SELECT
category,
type,
COUNT(*)
FROM
table
GROUP BY
category,
type
HAVING
COUNT(*) = (SELECT MAX(C) FROM (SELECT COUNT(*) AS C FROM A GROUP BY A) AS Q)
EDITED:
I apologize to readers,
COUNT(*) = (SELECT MAX(COUNT(*)) FROM table GROUP BY category,type)
is the ORACLE version, postgresql version is:
COUNT(*) = (SELECT MAX(C) FROM (SELECT COUNT(*) AS C FROM A GROUP BY A) AS Q)
SELECT category , MAX (Occurence)
FROM (SELECT t.category as category , Count(*) AS Occurence FROM table t);
SELECT
category,
type,
COUNT(*) AS count
FROM
table
GROUP BY
category,
type
ORDER BY
category ASC

Transact SQL - How to perform additional operation on a result set

I have a simple query:
select id, count(*) n
from mytable
group by id
Is it possible to include also the sum(n) in the same query? So the result would look something like this:
id n
---- -----------
1 12
2 1
3 14
4 1
5 2
6 6
Sum=36
You can use a common table expression to do this:
--
; WITH cte as (SELECT id
,count(*) n
FROM mytable
GROUP BY id)
SELECT id, n FROM cte
UNION ALL
SELECT 'Sum', SUM(n) from cte
You can also use ROLLUP: (this may not be exactly correct syntax)
SELECT id
,count(*) n
FROM mytable
GROUP BY id
WITH ROLLUP