Basic SQL question for Group By (in Netezza) - sql

This may sound stupid, but I'm having trouble with my SQL code for Group By in Netezza that I cannot seem to figure out. Basically, I'm doing simple sum and am trying to group the results, which is where I'm coming across issues. Current table looks like:
id date daily_count
---------------------------
1 4/1/20 2
1 4/2/20 1
2 4/1/20 3
2 4/1/20 2
2 4/3/20 1
I want to make it to looks like:
id date daily_count
---------------------------
1 4/1/20 2
1 4/2/20 1
2 4/1/20 5
2 4/3/20 1
my select statement is:
select id, date, sum(count) over (partition by date, id) as daily_count
If I do group by clause including the sum field (group by id, date, daily_count), I get warning saying:
Windowed aggregates not allowed in a GROUP BY clause
But if I exclude sum field in group by clause (group by id, date), then I get warning saying:
Attribute count must be GROUPed or used in an aggregate function
count is the variable that I'm summing, so if I group that, it won't produce the right sum amount.
Does this mean that grouping has to happen outside of this query, meaning cte or subquery? I'm hoping to get some advice to know what exactly is happening and what is the best course of action.

You want simple aggregation rather than window functions:
select id, date, sum(daily_count) daily_count
from mytable
group by id, date

You seem to just want aggregation:
select id, date, sum(count) as daily_count
from t
group by id, date;
I'm not sure why you are trying to use a window function here.

Related

Select all columns but also by distinct column and oldest date [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 6 months ago.
I need to select rows that are distinct by number and are the oldest date.
I saw this post: SQL Select Distinct column and latest date
But it doesn't work in my case, because I need to select all the columns, not just the 2 that are used in the clause.
So I get this error:
it is not contained in either an aggregate function or the GROUP BY clause` error.
Is there a way to choose all the columns, and in case there are rows with same date, consider the older one as the one with the lower id?
sample data:
id name number date
1 foo 1111 06-11-2022
2 bar 2222 01-12-2022
3 baz 3333 12-30-2022
4 foobar 1111 02-01-2022
this is the query I tried:
SELECT id, name, number, MIN(date) as date
FROM my_table
GROUP BY number
using Microsoft SQL
When you are using GROUP BY aggregate function, you can only use the aggregated columns in your SELECT list. You can achieve your expected output in several ways, Here is one using windowed aggregate function :
select T.*
from (select *,
row_number() over (partition by number order by [date], id) as sn
from my_table
) T
where sn = 1;
SQL here

sql: query to find max count with extra columns as well

Input table:events
month user
2020-11 user_1
2020-11 user_5
2020-11 user_3
2020-12 user_2
2020-10 user_4
2020-09 user_6
GOAL
I want max(distinct user) grouped by month column.
my final result need two columns one is month and another one is max_count
I need output similar to this
month max_count
2020-11 3
I followed some approach
Approach1:
select max(cnt) max_count
from
(
select month,
count(distinct user) as cnt
from events
group by 1
)
if i follow this approach, it is just giving me only max_count but i need month column as well
I know we can use something like order by and limit to get the result. But i dont want that hacky way.
Can anyone suggest a solution for this?
Use a window function:
select month, cnt
from
(
select month,
count(distinct "user") as cnt,
dense_rank() over (order by count(distinct "user") desc) as rnk
from events
group by month
) t
where rnk = 1;
user is a reserved keyword in SQL and should be quoted (or better: find a different name)
If I understand correctly, you can use order by and some clause to limit the results:
select month, count(distinct user)
from events
group by month
order by count(distinct user) desc
fetch first 1 row only;
Note that not all databases support the standard fetch clause. You might want limit or select top (1) or something similar.

How to get grouping of rows in SQL

I have a table like this:
id name
1 washing
1 cooking
1 cleaning
2 washing
2 cooking
3 cleaning
and I would like to have a following grouping
id name count
1 washing,cooking,cleaning 3
2 washing,cooking 2
3 cleaning 1
I have tried to group by ID but can only show count after grouping by
SELECT id,
COUNT(name)
FROM WORK
GROUP BY id
But this will only give the count and not the actual combination of names.
I am new to SQL. I know it has to be relational but there must be some way.
Thanks in advance!
in postgresql you can use array_agg
SELECT id, array_agg(name), COUNT(*)
FROM WORK
GROUP BY id
in mysql you can use group_concat
SELECT id, group_concate(name), COUNT(*)
FROM WORK
GROUP BY id
or for redshift
SELECT id, listagg(name), COUNT(*)
FROM WORK
GROUP BY id

PostgreSQL using sum in where clause

I have a table which has a numeric column named 'capacity'. I want to select first rows which the total sum of their capacity is no greater than X, Sth like this query
select * from table where sum(capacity )<X
But I know I can not use aggregation functions in where part.So what other ways exists for this problem?
Here is some sample data
id| capacity
1 | 12
2 | 13.5
3 | 15
I want to list rows which their sum is less than 26 with the order of id, so a query like this
select * from table where sum(capacity )<26 order by id
and it must give me
id| capacity
1 | 12
2 | 13.5
because 12+13.5<26
A bit late to the party, but for future reference, the following should work for a similar problem as the OP's:
SELECT id, sum(capacity)
FROM table
GROUP BY id
HAVING sum(capacity) < 26
ORDER by id ASC;
Use the PostgreSQL docs for reference to aggregate functions: https://www.postgresql.org/docs/9.1/tutorial-agg.html
Use Having clause
select * from table order by id having sum(capacity)<X
You can use the window variant of sum to produce a cumulative sum, and then use it in the where clause. Note that window functions can't be placed directly in the where clause, so you'd need a subquery:
SELECT id, capacity
FROM (SELECT id, capacity, SUM(capacity) OVER (ORDER BY id ASC) AS cum_sum
FROM mytable) t
WHERE cum_sum < 26
ORDER BY id ASC;

How to sum two columns in sql without group by

I have columns such as pagecount, convertedpages and changedpages in a table along with many other columns.
pagecount is the sum of convertedpages and changedpages.
I need to select all rows along with pagecount and i cant group them. I am wondering if there is any way to do it?
This select is part of view. so can i use another sql statement to bring just the sum and then somehow make it part of the main sql query?
Thank you.
SELECT
*,
(ConvertedPages + ChangedPages) as PageCount
FROM Table
If I'm understanding your question correctly, while I'm not sure why you can't use group by, another option would be to use a correlated subquery:
select distinct id,
(select sum(field) from yourtable y2 where y.id = y2.id) summedresult
from yourtable y
This assumes you have data such as:
id | field
1 | 10
1 | 15
2 | 10
And would be equivalent to:
select id, sum(field)
from yourtable
group by id
Not 100% on what you're after here, but if you want a total across rows without grouping, you can use OVER() with an aggregate in SQL Server:
SELECT *, SUM(convertedpages) OVER() AS convertedpages
, SUM(changedpages) OVER() AS changedpages
, SUM(changedpages + convertedpages) OVER() as PageCount
FROM Table
This repeats the total for every row, you can use PARTITION BY inside OVER() if you'd like to have the aggregate to be grouped by some fields while still displaying the full detail of all rows.