Cumulative Sum with Postgre SQL using date truncating - sql

I'm relatively new to using SQL in Apache Superset and I'm not sure where to look or how to solve my problem.
The short version of what I am trying to do is add a column of cumulative sum based on the total number of users by month.
Here is my PostgreSQL query so far:
SELECT
DATE(DATE_TRUNC('month', crdate)) AS "Month",
COUNT(DISTINCT user_id) AS "COUNT_DISTINCT(user_id)"
FROM
datasource
WHERE
user_id IS NOT NULL
GROUP BY
DATE(DATE_TRUNC('month', create))
ORDER BY
"COUNT_DISTINCT(user_id)" DESC
Sum of Users by Month

There are some syntax errors, you can't order by an alias and in group by your date column is wrong, so it should be like this:
SELECT
DATE(DATE_TRUNC('month', crdate)) AS "Month",
COUNT(DISTINCT user_id) AS "COUNT_DISTINCT(user_id)"
FROM
datasource
WHERE
user_id IS NOT NULL
GROUP BY
DATE(DATE_TRUNC('month', crdate)) AS "Month"
ORDER BY
COUNT_DISTINCT(user_id) desc

You can use your query a Basis for the Window function
CREATE TABLE datasource(crdate timestamp,user_id int)
WITH CTE AS (
SELECT
DATE_TRUNC('month',"crdate") as "Month",
COUNT(DISTINCT user_id) AS "COUNT_DISTINCT(user_id)"
FROM
datasource
WHERE
user_id IS NOT NULL
GROUP BY
DATE_TRUNC('month', "crdate")
)
SELECT "Month", SUM("COUNT_DISTINCT(user_id)") OVER (ORDER BY "Month") as cumultatove_sum
FROM CTE
Month | cumultatove_sum
:---- | --------------:
db<>fiddle here

Related

Group by month and counting rows for current and all previous months

PostgreSQL 13
Assuming a simplified table plans like the following, it can be assumed that there is at least 1 row for every month and sometimes multiple rows on the same day:
id
first_published_at
12345678910
2022-10-01 03:58:55.118
abcd1234efg
2022-10-03 03:42:55.118
jhsdf894hld
2022-10-03 17:34:55.118
aslb83nfys5
2022-09-12 08:17:55.118
My simplified query:
SELECT TO_CHAR(plans.first_published_at, 'YYYY-MM') AS publication_date, COUNT(*)
FROM plans
WHERE plans.first_published_at IS NOT NULL
GROUP BY TO_CHAR(plans.first_published_at, 'YYYY-MM');
This gives me the following result:
publication_date
count
2022-10
3
2022-09
1
But the result I would need for October is 4.
For every month, the count should be an aggregation of the current month and ALL previous months. I would appreciate any insight on how to approach this.
I would use your query as a CTE and run a select that uses cumulative sum as a window function.
with t as
(
SELECT TO_CHAR(plans.first_published_at, 'YYYY-MM') AS publication_date,
COUNT(*) AS cnt
FROM plans
WHERE plans.first_published_at IS NOT NULL
GROUP BY publication_date
)
select publication_date,
sum(cnt) over (order by publication_date) as "count"
from t
order by publication_date desc;
Demo on DB fiddle

Looking to create a query in SQL that states

i am relatively new to SQL and I'm looking to create a query that states how many records were created by those other than a certain "good" group of users (userids). If possible grouped by month as well. Any suggestions? I have some basic logic set out below.
Table is called newcompanies
SELECT COUNT(record_num), userid
FROM Newcompanies
WHERE userID <> (certain group of userIds)
GROUP BY Month
Will i be required to create a second table where the group of "good" userids is held
There are a few ways to do this. Without knowing your exact columns, this will be a rough estimate.
SELECT id,
DATEPART(MONTH, created_date) AS created_month,
COUNT(*)
FROM your_table
WHERE id NOT IN(
--hardcode userID's here
)
GROUP BY
id,
DATEPART(MONTH, created_date)
Or you could have a table with your good id's and then exclude those.
SELECT id,
DATEPART(MONTH, created_date) AS created_month,
COUNT(*)
FROM your_table
WHERE id NOT IN(
SELECT id
from your_good_id_table
)
GROUP BY
id,
DATEPART(MONTH, created_date)
-- if month is not a field in the table you will have to do a function to parse out the month that will depend on the sql database you are using, if it is MS SQL you can do Month(datefield)
SELECT COUNT(record_num), userid, Month
FROM Newcompanies
WHERE userID NOT IN (
Select UserID
from ExcludeTheseUserIDs
)
GROUP BY Month, userid

Unique values per time period

In my table trips , I have two columns: created_at and user_id
My goal is to count unique user_ids per month with a query in postgres. So far, I have written this - but it returns an error
SELECT user_id,
to_char(created_at, 'YYYY-MM') as t COUNT(*)
FROM (SELECT DISTINCT user_id
FROM trips) group by t;
How should I change this query?
The query is much simpler than that:
SELECT to_char(created_at, 'YYYY-MM') as yyyymm, COUNT(DISTINCT user_id)
FROM trips
GROUP BY yyyymm
ORDER BY yyyymm;

SQL Group By for quarterly dates

transaction_date is in a date format.
What I'm actually trying to output is the COUNT DISTINCT of Unique_ID by quarter (i.e., how many times did a Unique_Id appear in a given quarter).
SELECT transaction_date ,
UNIQUE_ID,
FROM panel
WHERE (some criteria = 'x')
GROUP BY UNIQUE_ID
try this :
SELECT datepart(quarter,transaction_date),
count(distinct UNIQUE_ID) as cnt
FROM panel
WHERE (some criteria = 'x')
GROUP BY datepart(quarter,p.transaction_date)
but the count(distinct) will do a sort so it will take you a lot of time. so you can distinct it first in the table then do the count
SELECT datepart(quarter,p.transaction_date),
count(p.UNIQUE_ID) as cnt
FROM (select distinct transaction_date as transaction_date, UNIQUE_ID
from panel) as p
WHERE (some criteria = 'x')
GROUP BY datepart(quarter,p.transaction_date)
I'd use date_trunc:
select
date_trunc ('quarter', transaction_date), count (distinct unique_id)
from panel
where criteria = 'x'
group by 1
This presupposes that when you say "by quarter" that 1Q2015 is different than 1Q2014.
SELECT DATEPART(QUARTER, transaction_date) ,
COUNT(DISTINCT UNIQUE_ID),
FROM panel
GROUP BY transaction_date

Compute count and running total for date field in SQL

Here is my dilemma. I have a field in the SQL database, called booking_date. The date is in a format like this
2014-10-13 12:05:58.533
I would like to be able to compute a count of bookings for each date (not date time) as well as a running total.
So my report would look something like so
My SQL code is like so
SELECT
dbo.book.create_time,
replace(convert(nvarchar, dbo.book.create_time, 106),' ', '/') as bookingcreation,
count(*) as Book_Count
FROM
....tables here
However, my count calculation is counting the date based of this type of date format > 2014-10-13 12:05:58.533 which is not computing correctly.
So instead, I'm getting this:
Also, I am not sure how to compute the running total. But I first need to get the count correctly.
Any help is greatly appreciated.
You seem to be using SQL Server. To get the count by day:
SELECT cast(dbo.book.create_time as date) as create_date
count(*) as Book_Count
FROM ...tables here
GROUP BY cast(dbo.book.create_time as date)
ORDER BY create_date;
You can get the cumulative sum in SQL Server 2012+ using the cumulative sum function:
SELECT cast(dbo.book.create_time as date) as create_date
count(*) as Book_Count,
sum(count(*)) over (order by cast(dbo.book.create_time as date) ) as Running_Count
FROM ...tables here
GROUP BY cast(dbo.book.create_time as date)
ORDER BY create_date;
In earlier versions, you can do something similar with a correlated subquery or cross apply.
EDIT:
In SQL Server 2008, you can do:
WITH t as (
SELECT cast(dbo.book.create_time as date) as create_date
count(*) as Book_Count
FROM ...tables here
GROUP BY cast(dbo.book.create_time as date)
)
SELECT t.create_date, t.Book_Count,
(SELECT SUM(Book_Count)
FROM t t2
WHERE t2.create_date <= t.create_date
) as Running_Count
FROM t
ORDER BY create_date;
Try to use trunc(book.create_time) in your query instead of the conversion you're doing