I am querying a Presto table where I want to calculate what percentage of the total a certain subset of the rows account for.
Consider a table like this:
id
m
1
5
1
7
2
9
3
8
I want to query to report how much of the total measure (m) is contributed by each id. In this example, the total of the measure column is 29 can I find it with a query like...
SELECT SUM("m") FROM t;
output:
sqlite> SELECT SUM("m") FROM t;
29
Then I want to subtotal by id for some of the ids like
SELECT "id", SUM("m") AS "sub_total" FROM t WHERE "id" IN ('1','3') GROUP BY id;
output:
sqlite> SELECT "id", SUM("m") AS "sub_total" FROM t WHERE "id" IN ('1','3') GROUP BY id;
1|12
3|8
Now I want to add a third column where the subtotals are divided by the grand total (29) to get the percentage for each selected id.
I tried:
sqlite>
WITH a AS (
SELECT SUM("m") AS g FROM t )
SELECT "id", SUM("m") AS "sub_total", SUM(m)*100/"a"."g"
FROM a, t
WHERE "t"."id" IN ('1','3') GROUP BY "t"."id";
output:
1|12|41
3|8|27
Which is all good in SQLLite3! But when I translate this to my actual Presto DB (and the right tables and columns), I get this error:
presto error: line 10:5: 'a.g' must be an aggregate expression or appear in GROUP BY clause
I can't understand what I'm missing here or why this would be different in Presto.
When you have a GROUP BY in your query, all expressions that the query is returning must be either:
the expression you are grouping by
or aggregate function
For example if you do GROUP BY id, the resulting query will return one row per id - you cannot just use m, because with id = 1 there are two values: 5 and 7 - so what should be returned? First value, last, sum, average? You need to tell it using aggregate function like sum(m).
Same with a.g - you need to add it to GROUP BY.
WITH a AS (
SELECT SUM("m") AS g FROM t )
SELECT "id", SUM("m") AS "sub_total", SUM(m)*100/"a"."g"
FROM a, t
WHERE "t"."id" IN ('1','3') GROUP BY "t"."id", "a"."g";
There's nothing special about PrestoDB here, it's more SQLite that's less strict, actually most other database engines would complain about your case.
Related
I have this query:
select id, convert(nvarchar(10), pubdate, 102) as pubdate,
channel_title, title, description, link, vertinimas
from table1
where statusid > 0
and channel_title = 'channel1'
group by title
order by pubdate desc
to exclude duplicate entries in the field "title" i added group by title in the end, but an error occurs:
"is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."
GROUP BY clause can only be used with aggregate functions like count(), min(), max(), sum() etc. The select query can only select the columns which are part of GROUP BY clause or on which you are applying an aggregate function.
For example you have a STUDENT table like below:
ID
NAME
SUBJECT
MARKS
1
FOO
ENGLISH
80
2
FOO
MATH
70
3
BAR
ENGLISH
100
4
BAR
MATH
50
5
ZIL
ENGLISH
90
6
ZIL
MATH
75
you can write a query like:
SELECT NAME, SUM(MARKS) AS TOTAL FROM STUDENT GROUP BY NAME;
Hear in the above query NAME is part of your GROUP BY clause and we are applying sum() aggregate function on column on MARKS. This will give us a result like below:
NAME
MARKS
FOO
150
BAR
150
ZIL
165
In your query above in the post, only title is part of GROUP BY column. Rest all the column like id, pubdate, channel_title, title, description, link, vertinimas, they are neither part of GROUP BY clause nor passed as a parameter in any aggregate function.
If you want to find / exclude / delete duplicate rows, you can checkout this blog post. This guy has explained it pretty well. Here is the like to find and delete duplicate records!
Please take this as an example where my primary table is
ID Name
-- -------
1 Alpha
2 Beta
3 Beta
4 Beta
5 Charlie
6 Charlie
as there is duplication in Name column. Resultant Table after grouping them by name, with count column is -
Name Count
------- -----
Alpha 1
Beta 3
Charlie 2
SUM 6
Here SUM is taken out as separate row of all the resultant COUNT column, I am trying to get SUM of all the rows from the resultant Count function from primary table but as separate query for SUM unlike separate row
My table has 2 fields Roles and User_Id.
I have already tried Below query
select orl.role ,
SUM (orl.role) as "No of Users"
from org_user_roles orl
group by orl.role
i think this is a string column with numerical values. So need to cast it to int before performing the sum() operation
select orl.role,
sum(orl.user_id::int) as "No of Users"
from org_user_roles orl
group by orl.role
If you want to count rows (users) for each role, use the COUNT aggregate function - not SUM:
select "role", count(*) as "No of Users" from org_user_roles group by "role";
To get the sum of these grouped counts - which is just the overall row count - use:
select count(*) as "Sum" from org_user_roles;
I would like to query my database to know which fraction/percentage of the elements of a table are larger/smaller than a given value.
For instance, let's say I have a table shopping_list with the following schema:
id integer
name text
price double precision
with contents:
id name price
1 banana 1
2 book 20
3 chicken 5
4 chocolate 3
I am now going to buy a new item with price 4, and I would like to know where this new item will be ranked in the shopping list. In this case the element will be greater than 50% of the elements.
I know I can run two queries and count the number of elements, e.g.:
-- returns = 4
SELECT COUNT(*)
FROM shopping_list;
-- returns = 2
SELECT COUNT(*)
FROM shopping_list
WHERE price > 4;
But I would like to do it with a single query to avoid post-processing the results.
if you just want them in single query use UNION
SELECT COUNT(*), 'total'
FROM shopping_list
UNION
SELECT COUNT(*),'greater'
FROM shopping_list
WHERE price > 4;
The simplest way is to use avg():
SELECT AVG( (price > 4)::float)
FROM shopping_list;
One way to get both results is as follows:
select count(*) as total,
(select count(*) from shopping_list where price > 4) as greater
from shopping_list
It will get both results in a single row, with the names you specified. It does, however, involve a query within a query.
I found the aggregate function PERCENT_RANK which does exactly what I wanted:
SELECT PERCENT_RANK(4) WITHIN GROUP (ORDER BY price)
FROM shopping_list;
-- returns 0.5
I'm trying to add a column which calculates percentages of different products in MS Access Query. Basically, this is the structure of the query that I'm trying to reach:
Product |
Total |
Percentage
Prod1 |
15 |
21.13%
Prod2 |
23 |
32.39%
Prod3 |
33 |
46.48%
Product |
71 |
100%
The formula for finding the percent I use is: ([Total Q of a Product]/[Totals of all Products])*100, but when I try to use the expression builder (since my SQL skills are basic) in MS Access to calculate it..
= [CountOfProcuts] / Sum([CountOfProducts])
..I receive an error message "Cannot have aggregate function in GROUP BY clause.. (and the expression goes here)". I also tried the option with two queries: one that calculates only the totals and another that use the first one to calculate the percentages, but the result was the same.
I'll be grateful if someone can help me with this.
You can get all but the last row of your desired output with this query.
SELECT
y.Product,
y.Total,
Format((y.Total/sub.SumOfTotal),'#.##%') AS Percentage
FROM
YourTable AS y,
(
SELECT Sum(Total) AS SumOfTotal
FROM YourTable
) AS sub;
Since that query does not include a JOIN or WHERE condition, it returns a cross join between the table and the single row of the subquery.
If you need the last row from your question example, you can UNION the query with another which returns the fabricated row you want. In this example, I used a custom Dual table which is designed to always contain one and only one row. But you could substitute another table or query which returns a single row.
SELECT
y.Product,
y.Total,
Format((y.Total/sub.SumOfTotal),'#.##%') AS Percentage
FROM
YourTable AS y,
(
SELECT Sum(Total) AS SumOfTotal
FROM YourTable
) AS sub
UNION ALL
SELECT
'Product',
DSum('Total', 'YourTable'),
'100%'
FROM Dual;
I have table with columns as id,title,relation_key. I wanted to get count(*) as well as title for correspondingrelation_key column.
My table contains the following data:
id title relation_key
55 title1111 10
56 title2222 10
57 MytitleVVV 20
58 MytitlleXXX 20
I tried:
select title,count(*) from table where relation_key=10 group by title
But its returning 1 row only. I want both records of title for relation_key=10
You probably want something along these lines:
select title, count(*) over (partition by relation_key)
from table
where relation_key = 10
The result of this would yield:
title | count
----------+------
title1111 | 2
title2222 | 2
Note that you cannot select fields that are not part of the GROUP BY clause in Oracle (as in most other databases).
As a general rule of thumb, you should avoid grouping if you don't really want to group data, but just use aggregate functions such as count(*). Most of Oracle's aggregate functions can be transformed into window functions by adding an over() clause, removing the need for a GROUP BY clause.
If you are getting an Error then Please try with following.
select title,count(*) from table where relation_key=10 group by title,relation_key