How to use condition and aggregation in postgresql

How to use condition and aggregation in postgresql - sql

I tried to use a select query in which contains both where and sum aggreation query.But its shows error while executing the query .
Below is sample table
|sensorid | timestamp | reading |
====================================
|1 | 1604192522 | 10 |
|1 | 1604192525 | 15 |
|2 | 1605783723 | 8.1 |
My query is
select date_trunc('day', v.timestamp) as day,sum(reading) from sensor v(timestamp,sensorid) group by (DAY) having sensorid=1;
while executing below error occured
Cannot use column sensorid outside of an Aggregation in HAVING clause. Only GROUP BY keys allowed here.]

If you apply group by, you cease particular values of all other columns.
Probably you want to either filter by values -> use where
select date_trunc('day', v.timestamp) as day,sum(reading) from sensor v(timestamp,sensorid) where sensorid=1 group by (DAY) ;
or filter by aggregation -> keep having but use aggregation function
select date_trunc('day', v.timestamp) as day,sum(reading) from sensor v(timestamp,sensorid) group by (DAY) having min(sensorid)=1;
Not clear what's your intention, post expected output if I didn't guess any variant.

Related

Running sum of unique users in redshift

I have a table with as follows with user visits by day -
| date | user_id |
|:-------- |:-------- |
| 01/31/23 | a |
| 01/31/23 | a |
| 01/31/23 | b |
| 01/30/23 | c |
| 01/30/23 | a |
| 01/29/23 | c |
| 01/28/23 | d |
| 01/28/23 | e |
| 01/01/23 | a |
| 12/31/22 | c |
I am looking to get a running total of unique user_id for the last 30 days . Here is the expected output -
| date | distinct_users|
|:-------- |:-------- |
| 01/31/23 | 5 |
| 01/30/23 | 4 |
.
.
.
Here is the query I tried -
SELECT date
, SUM(COUNT(DISTINCT user_id)) over (order by date rows between 30 preceding and current row) AS unique_users
FROM mytable
GROUP BY date
ORDER BY date DESC
The problem I am running into is that this query not counting the unique user_id - for instance the result I am getting for 01/31/23 is 9 instead of 5 as it is counting user_id 'a' every time it occurs.
Thank you, appreciate your help!

Not the most performant approach, but you could use a correlated subquery to find the distinct count of users over a window of the past 30 days:
SELECT
date,
(SELECT COUNT(DISTINCT t2.user_id)
FROM mytable t2
WHERE t2.date BETWEEN t1.date - INTERVAL '30 day' AND t1.date) AS distinct_users
FROM mytable t1
ORDER BY date;

There are a few things going on here. First window functions run after group by and aggregation. So COUNT(DISTINCT user_id) gives the count of user_ids for each date then the window function runs. Also, window function set up like this work over the past 30 rows, not 30 days so you will need to fill in missing dates to use them.
As to how to do this - I can only think of the "expand to the data so each date and id has a row" method. This will require a CTE to generate the last 2 years of dates plus 30 days so that the look-back window works for the first dates. Then window over the past 30 days for each user_id and date to see which rows have an example of this user_id within the past 30 days, setting the value to NULL if no uses of the user_id are present within the window. Then Count the user_ids counts (non NULL) grouping by just date to get the number of unique user_ids for that date.
This means expanding the data significantly but I see no other way to get truly unique user_ids over the past 30 days. I can help code this up if you need but will look something like:
WITH RECURSIVE CTE to generate the needed dates,
CTE to cross join these dates with a distinct set of all the user_ids in user for the past 2 years,
CTE to join the date/user_id data set with the table of real data for past 2 years and 30 days and window back counting non-NULL user_ids, partition by date and user_id, order by date, and setting any zero counts to NULL with a DECODE() or CASE statement,
SELECT, grouping by just date count the user_ids by date;

How can I summarize data by year in SQL?

I'm sure the request is rather straight-forward, but I'm stuck. I'd like to take the first table below and turn it into the second table by summing up Incremental_Inventory by Year.
+-------------+-----------+----------------------+-----+
|Warehouse_ID |Date |Incremental_Inventory |Year |
+-------------+-----------+----------------------+-----+
| 1|03/01/2010 |125 |2010 |
| 1|08/01/2010 |025 |2010 |
| 1|02/01/2011 |150 |2011 |
| 1|03/01/2011 |200 |2011 |
| 2|03/01/2012 |125 |2012 |
| 2|03/01/2012 |025 |2012 |
+-------------+-----------+----------------------+-----+
to
+-------------+-----------+---------------------------+
|Warehouse_ID |Date |Cumulative_Yearly_Inventory|
+-------------+-----------+---------------------------+
| 1|03/01/2010 |125 |
| 1|08/01/2010 |150 |
| 1|02/01/2011 |150 |
| 1|03/01/2011 |350 |
| 2|03/01/2012 |125 |
| 2|03/01/2012 |150 |
+-------------+-----------+---------------------------+

If your DBMS, which you haven't told us, supports window functions you could simply do something like:
SELECT warehouse_id,
date,
sum(incremental_inventory) OVER (PARTITION BY warehouse_id,
year(date)
ORDER BY date) cumulative_yearly_inventory
FROM elbat
ORDER BY date;
year() maybe needs to replaced by the means your DBMS provides to extract the year from a date.
If it doesn't support window functions you had to use a subquery and aggregation.
SELECT t1.warehouse_id,
t1.date,
(SELECT sum(t2.incremental_inventory)
FROM elbat t2
WHERE t2.warehouse_id = t1.warehouse_id
AND year(t2.date) = year(t1.date)
AND t2.date <= t1.date) cumulative_yearly_inventory
FROM elbat t1
ORDER BY t1.date;
However, if there are two equal dates, this will print the same sum for both of them. One would need another, distinct column to sort that out and as far as I can see you don't have such a column in the table.
I'm not sure if you want the sum over all warehouses or only per warehouse. If you don't want the sums split by warehouses but one sum for all warehouses together, remove the respective expressions from the PARTITION BY or inner WHERE clause.

If you have SAS/ETS then the time series tasks will do this for you. Assuming not, here's a data step solution.
Use RETAIN to hold value across rows
Use BY to identify the first record for each year
data want;
set have;
by year;
retain cum_total;
if first.year then cum_total=incremental_inventory;
else cum_total+incremental_inventory;
run;

Window function filter not ignoring rollup null

I've got a simple table that contains films:
CREATE TABLE public.films
(
id integer NOT NULL,
title character varying(255),
release_year integer
)
I also have a query that calculates the number of films by year, the total count of all films (using ROLLUP) and a window function that add the total number of films to each row:
SELECT
release_year,
COUNT(*),
SUM(COUNT(*) FILTER (WHERE release_year IS NOT NULL)) OVER ()
FROM films
GROUP BY ROLLUP(release_year)
I added the FILTER (WHERE release_year IS NOT NULL) part because I wanted to ignore the aggregated row produced by ROLLUP. Surprisingly this filtering doesn't work:
| release_year | count | sum |
----------------------------
| [null] | 225 | 450 |
| 2014 | 57 | 450 | <--- sum should be 225 everywhere
| 2015 | 53 | 450 |
| 2016 | 57 | 450 |
| 2017 | 58 | 450 |
I know some other possible ways to solve this, like moving the window function to an outside query or partitioning by release_year IS NOT NULL, but I'm very curious why this particular case doesn't work as I expected. What do I miss?
I use Postgres 10.

The FILTER doesn't work OVER the ROLLUP since it 'renders' while ROLLUP is being made.
If you remove the SUM you'll see:
SELECT
-- Adding window function row_number for test:
row_number() OVER() as rn,
release_year,
COUNT(*),
-- Removing SUM and OVER for test:
COUNT(*) FILTER (WHERE release_year IS NOT NULL)
FROM films
GROUP BY ROLLUP(release_year)
This outputs something like:
rn | release_year | count | sum |
----+--------------+-------+------
1 | [null] | 225 | 225 |
2 | 2014 | 57 | 57 |
3 | 2015 | 53 | 53 |
4 | 2016 | 57 | 57 |
5 | 2017 | 58 | 58 |
Since FILTER works on the ROLLUP and not on the 'final' table, it does not find any NULL release_year, and because of the ROLLUP, it sums all release_year's.
Since the Window Functions 'renders' over the 'final' table, when you add the SUM() OVER() it sums all in the column; see rn column, it counts all becouse it 'renders' after the ROLLUP.
EDIT (Furthermore):
There is an order in which postgresql computes (I say renders) a query. Generally it starts with the WHERE clause, then group by, then aggregation, and always at the end it does window functions.
If you run EXPLAIN you can see it more clearly:
QUERY PLAN
------------------------------------------------------------------------------
WindowAgg (cost=XXX.XX..XXX.XX rows=XXX width=XX)
-> GroupAggregate (cost=XXX.XX..XXX.XX rows=XXX width=XX)
Group Key: release_year
Group Key: ()
-> Sort (cost=XXX.XX..XXX.XX rows=XXXX width=X)
Sort Key: release_year
-> Seq Scan on films (cost=X.X..XX.XX rows=XXXX width=X)
First, it does the WHERE clause, but since there is no WHERE on the query, it gets all the rows from table films (Scan on films).
Second, it does the group and orders by release_year since it is in the GROUP BY ROLLUP clause (Sort).
Third, it runs the aggregation function COUNT over the grouped data (GroupAggregate). Since FILTER is just a substitution of CASE WHEN it runs at this point, over the grouped data. More info about FILTER here
Lastly, it runs the window functions over the aggregation (WindowAgg).
Some Notes:
When I say 'over', I mean that internally postgresql has kind of a temporal table -this 'final' table I refer above, actually a set of data- over which it runs all the functions or clauses you use in a query.
After running the window functions it returns a RESULT table, that is what you get. But in the middle, postgresql uses sets of data to get what you asked.
Conclusion
In that sense, your query doesn't returns what you would like because the FILTER works over the grouped data, not the set of data where there is a NULL on release_year. Only window functions have access to this NULL since they work on the last set of data before it is returned to you, and since window functions have limited uses, you would have to query over this RESULT table to get what you want.
Nevertheless, if you want to read more:
Reading a PostgreSQL EXPLAIN
Understanding Window Functions
Filter Clause in PostgreSQL
And more in the postgresql manual:
PostgreSQL Table Expressions (Group By)
PostgreSQL Aggregate Expressions

Create VIEW (count duplicate values in column)

I have little project with SQL database which has table with some column.
Question: How create View in SQL Server, which count how many duplicate values I have in column and show that number in next column.
Here below you can see result which I want to take.
|id|name|count|
|1 |tom | |
|2 |tom | |
|3 |tom | |
| | | 3 |
|4 |leo | |
| | | 1 |

A view is simply a select statement with the words CREATE VIEW AS before the SELECT. This allows for example, 1 person (DBA) to maintain (create/alter) complex views, while another person (developer) only has the rights to select from them.
So to use #Stidgeon's answer (below):
CREATE VIEW MyCounts
AS
SELECT name, COUNT(id) AS counts
FROM table
GROUP BY name
and later you can query
Select * from MyCounts where counts > 1 order by name
or whatever you need to do. Note that order by is not allowed in views in SQL SERVER.

You can do what you want with grouping sets:
select id, name, count(*)
from t
group by grouping sets ((id, name), (name));
The group by on id, name is redundant; the value should always be "1". However, this allows the use of grouping sets, which is a convenient way to phrase the query.

Looks like you just want to count how many entries you have for each 'name', in which case you just need to do a simple COUNT query:
CREATE VIEW view_name AS
SELECT name, COUNT(id) AS counts
FROM table
GROUP BY name
The output in your case would be:
name counts
--------------
Tom 3
Leo 1

postgres - partial column in SELECT/GROUP BY - column must appear in the GROUP BY clause or be used in an aggregate function

Both the following two statements produce an error in Postgres:
SELECT substring(start_time,1,8) AS date, count(*) as total from cdrs group by date;
SELECT substring(start_time,1,8) AS date, count(*) as total from cdrs group by substring(start_time,1,8);
The error is:
column "cdrs.start_time" must appear in the GROUP BY clause or be used
in an aggregate function
My reading of postgres docs is that both SELECT and GROUP BY can use an expression
postgres 8.3 SELECT
The start_time field is a string and has a date/time in form ccyymmddHHMMSS. In mySQL they both produce desired and expected results:
+----------+-------+
| date | total |
+----------+-------+
| 20091028 | 9 |
| 20091029 | 110 |
| 20091120 | 14 |
| 20091121 | 4 |
+----------+-------+
4 rows in set (0.00 sec)
I need to stick with Postgres (heroku). Any suggestions?
p.s. there is lots of other discussion around that talks about missing items in GROUP BY and why mySQL accepts this, why others don't ... strict adherence to SQL spec etc etc, but I think this is sufficiently different to 1062158/converting-mysql-select-to-postgresql and 1769361/postgresql-group-by-different-from-mysql to warrant a separate question.

You did something else that you didn't describe in the question, as both of your queries work just fine. Tested on 8.5 and 8.3.8:
# create table cdrs (start_time text);
CREATE TABLE
# insert into cdrs (start_time) values ('20090101121212'),('20090101131313'),('20090510040603');
INSERT 0 3
# SELECT substring(start_time,1,8) AS date, count(*) as total from cdrs group by date;
date | total
----------+-------
20090510 | 1
20090101 | 2
(2 rows)
# SELECT substring(start_time,1,8) AS date, count(*) as total from cdrs group by substring(start_time,1,8);
date | total
----------+-------
20090510 | 1
20090101 | 2
(2 rows)

Just to summarise, error
column "cdrs.start_time" must appear in the GROUP BY clause or be used in an aggregate function
was caused (in this case) by ORDER BY start_time clause. Full statement needed to be either:
SELECT substring(start_time,1,8) AS date, count(*) as total FROM cdrs GROUP BY substring(start_time,1,8) ORDER BY substring(start_time,1,8);
or
SELECT substring(start_time,1,8) AS date, count(*) as total FROM cdrs GROUP BY date ORDER BY date;

Two simple things you might try:
Upgrade to postgres 8.4.1
Both queries Work Just Fine For Me(tm) under pg841
Group by ordinal position
That is, GROUP BY 1 in this case.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to use condition and aggregation in postgresql - sql

Related

Running sum of unique users in redshift

How can I summarize data by year in SQL?

Window function filter not ignoring rollup null

Create VIEW (count duplicate values in column)

postgres - partial column in SELECT/GROUP BY - column must appear in the GROUP BY clause or be used in an aggregate function

Categories

Resources