Get a cumulative sum in SQL - sql

I am struggling with a postgresql query where I am trying to get the cumulative sum instead of the sum, by date truncated.
Here is my original query
SELECT date_trunc('month', "public"."stock_transaction"."created_at") AS "created_at", "Category"."name" AS "Category - name", sum("public"."stock_transaction"."cost") AS "sum"
FROM "public"."stock_transaction"
LEFT JOIN "public"."product" "Product" ON "public"."stock_transaction"."product_id" = "Product"."id" LEFT JOIN "public"."category" "Category" ON "Product"."category_id" = "Category"."id"
WHERE ("public"."stock_transaction"."owner_id" = {{organization.id}}::uuid
AND {{createdAt}}
AND "Product"."recipe" = FALSE)
GROUP BY date_trunc('month', "public"."stock_transaction"."created_at"), "Category"."name"
ORDER BY date_trunc('month', "public"."stock_transaction"."created_at") ASC, "Category"."name" ASC
and it looks like this
It's generated by metabase. createdAt is a date field filter and organizationId an uuid.
The result should be an increasing bar chart with the value of each month added to the previous month.
I have tried with subqueries but I have a hard time solving every SQL error I jump into.
Is there a SQL boss around who can help? Thanks :D

you just need a window function to get a cumulative sum :
SELECT s."created_at"
, s."Category - name"
, sum(s."sum") OVER (PARTITION BY s."created_at", s."Category - name" ORDER BY s."created_at" ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "sum"
FROM (
SELECT date_trunc('month', "public"."stock_transaction"."created_at") AS "created_at", "Category"."name" AS "Category - name", sum("public"."stock_transaction"."cost") AS "sum"
FROM "public"."stock_transaction"
LEFT JOIN "public"."product" "Product" ON "public"."stock_transaction"."product_id" = "Product"."id"
LEFT JOIN "public"."category" "Category" ON "Product"."category_id" = "Category"."id"
WHERE ("public"."stock_transaction"."owner_id" = {{organization.id}}::uuid
AND {{createdAt}}
AND "Product"."recipe" = FALSE)
GROUP BY date_trunc('month', "public"."stock_transaction"."created_at"), "Category"."name"
-- ORDER BY date_trunc('month', "public"."stock_transaction"."created_at") ASC, "Category"."name" ASC -- not needed here
) AS s
ORDER BY s."created_at"

Related

Add ROW_NUMBER() function to SQL query

I have the following query:
select
DISTINCT(b.org),
b.env,
b.proxy,
b."type",
b.name,
b.policytype,
b.disabled,
b."report refresh date",
b.rank,
first_value(LOWER(a."value"))
over(partition by
b.org,
b.env,
b.proxy
order by b."report refresh date" desc
rows between unbounded preceding and unbounded following) as "value"
from
(select *, DENSE_RANK() OVER (ORDER BY "report refresh date" DESC) as rank from infosec.apigee_policy_info_for_proxy) b
left join
(select * from api.apigee_product
where attribute = 'tui-api-domain') a
on a.org = b.org
and a.env = b.env
and a.proxy = b.proxy
where b.rank <=60
group by b.org,
b.env,
b.proxy,
b."type",
b.name,
b.policytype,
b.disabled,
b."report refresh date",
b.rank,
a."value"
and need to add a function in the end of the above query that calculates the row number.
For that I have the following query:
ROW_NUMBER() over (order by "report refresh date" ASC) as rowid
I'm having problems on where to put it, in the first query showned.
Can someone help?
Thank you.
I would simplify this query as :
select distinct b.org, b.env, b.proxy, b.type,b.name,
b.policytype, b.disabled, b."report refresh date", b.rank,
first_value(LOWER(a."value"))
over(partition by b.org, b.env, b.proxy order by b."report refresh date" desc
rows between unbounded preceding and unbounded following) as "value",
ROW_NUMBER() over (order by "report refresh date" ASC) as rowid
from (select *, DENSE_RANK() OVER (ORDER BY "report refresh date" DESC) as rank
from infosec.apigee_policy_info_for_proxy
) b left join
api.apigee_product a
on a.org = b.org and
a.env = b.env and
a.proxy = b.proxy and
a.attribute = 'tui-api-domain'
where b.rank <= 60;
Note : DISTINCT is not function so, removed (). Use LEFT JOIN directly instead of subquery.

Sum for a rolling total

I have the following query:
select b.month_date,total_signups,active_users from
(
SELECT date_trunc('month',confirmed_at) as month_date
, count(distinct id) as total_signups
FROM follower.users
WHERE confirmed_at::date >= dateadd(day,-90,getdate())::date
and (deleted_at is null or deleted_at > date_trunc('month',confirmed_at))
group by 1
) a ,
(
SELECT date_trunc('month', inv.created_at) AS month_date
,COUNT(DISTINCT em.user_id) AS active_users
FROM follower.invitees inv
INNER JOIN follower.events
ON inv.event_id = em.event_id
where inv.created_at::date >= dateadd(day,-90,getdate())::date
GROUP BY 1
) b
where a.month_date=b.month_date
This returns three columns month date, total signups and active users, what I need is a rolling total for all users in the fourth column (rolling total of signups). I've tried over and partition functions with no luck. Could someone help? Appreciate it very much.
Try adding this column definition to your first Select:
SUM(total_signups)
OVER (ORDER BY b.month_date ASC rows between unbounded preceding and current row)
AS running_total
Here's a mini-demo

ROW_NUMBER() OVER (PARTITION BY) showing duplicate results for Group By Clause

I have the below query that was created to show the summation of the "Last" values for a year, usually this is a december value, but the year could potentially end in any month so i want to add together the last values for each goalmontecarloheaderid. I have it working 99%, but there are some random duplicates in the [year] value.
WITH endBalances AS (
SELECT ROW_NUMBER() OVER (PARTITION By GoalMonteCarloHeaderID, Year(Convert(date,MonthDate)) Order By Max(Month(Convert(date,MonthDate))) desc) n, Max(Month(Convert(date,MonthDate))) maxMonth, GrowthBucket, WithdrawalBucket, NoTaxesBucket,
Year(MonthDate) [year]
From GoalMonteCarloMedianResults mcmr
full join GoalMonteCarloHeader mch on mch.ID = mcmr.GoalMonteCarloHeaderID
full join GoalChartData gcd on gcd.ID = mch.GoalChartDataID and gcd.TypeID = 2
inner join Goal g on g.iGoalID = gcd.GoalID
where g.iTypeID in (1) and g.iHHID = 850802
group by GoalMonteCarloHeaderID, MonthDate, GrowthBucket, WithdrawalBucket, NoTaxesBucket
)
SELECT [year], Sum(GrowthBucket) GrowthBucket, Sum(WithdrawalBucket) WithdrawalBucket,Sum(NoTaxesBucket) NoTaxesBucket, maxMonth
From endBalances
where [year] is not null and n=1
Group By [year], maxMonth
order by [year] asc
Showing two random duplicates in the database result;
you can see in the image there are two examples where the year is duplicated and displayed for more than just the 'last' month in the year. Am I doing something wrong with the group by or the PARTITION BY() in my query? I am not the most familiar with this functionality of T-SQL.
T-SQL has a lovely function for this which has no direct equivalent in MySQL.
ROW_NUMBER() OVER (PARTITION BY [year] ORDER BY MonthDate DESC) AS rn
Then anything with rn=1 will be the last entry in a year.
The answers to this question have a few ideas:
ROW_NUMBER() in MySQL

What's the proper SQL query to find a 'status change' before given date?

I have a table of logged 'status changes'. I need to find the latest status change for a user, and if it was a) a certain 'type' of status change (s.new_status_id), and b) greater than 7 days old (s.change_date), then include it in the results. My current query sometimes returns the second-to-latest status change for a given user, which I don't want -- I only want to evaluate the last one.
How can I modify this query so that it will only include a record if it is the most recent status change for that user?
Query
SELECT DISTINCT ON (s.applicant_id) s.applicant_id, a.full_name, a.email_address, u.first_name, s.new_status_id, s.change_date, a.applied_class
FROM automated_responses_statuschangelogs s
INNER JOIN application_app a on (a.id = s.applicant_id)
INNER JOIN accounts_siuser u on (s.person_who_modified_id = u.id)
WHERE now() - s.change_date > interval '7' day
AND s.new_status_id IN
(SELECT current_status
FROM application_status
WHERE status_phase_id = 'In The Flow'
)
ORDER BY s.applicant_id, s.change_date DESC, s.new_status_id, s.person_who_modified_id;
You can use row_number() to filter one entry per applicant:
select *
from (
select row_number() over (partition by applicant_id
order by change_date desc) rn
, *
from automated_responses_statuschangelogs
) as lc
join application_app a
on a.id = lc.applicant_id
join accounts_siuser u
on lc.person_who_modified_id = u.id
join application_status stat
on lc.new_status_id = stat.current_status
where lc.rn = 1
and stat.status_phase_id = 'In The Flow'
and lc.change_date < now() - interval '7' day

sql db2 select records from either table

I have an order file, with order id and ship date. Orders can only be shipped monday - friday. This means there are no records selected for Saturday and Sunday.
I use the same order file to get all order dates, with date in the same format (yyyymmdd).
i want to select a count of all the records from the order file based on order date... and (i believe) full outer join (or maybe right join?) the date file... because i would like to see
20120330 293
20120331 0
20120401 0
20120402 920
20120403 430
20120404 827
etc...
however, my sql statement is still not returning a zero record for the 31st and 1st.
with DatesTable as (
select ohordt "Date" from kivalib.orhdrpf
where ohordt between 20120315 and 20120406
group by ohordt order by ohordt
)
SELECT ohscdt, count(OHTXN#) "Count"
FROM KIVALIB.ORHDRPF full outer join DatesTable dts on dts."Date" = ohordt
--/*order status = filled & order type = 1 & date between (some fill date range)*/
WHERE OHSTAT = 'F' AND OHTYP = 1 and ohscdt between 20120401 and 20120406
GROUP BY ohscdt ORDER BY ohscdt
any ideas what i'm doing wrong?
thanks!
It's because there is no data for those days, they do not show up as rows. You can use a recursive CTE to build a contiguous list of dates between two values that the query can join on:
It will look something like:
WITH dates (val) AS (
SELECT CAST('2012-04-01' AS DATE)
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT Val + 1 DAYS
FROM dates
WHERE Val < CAST('2012-04-06' AS DATE)
)
SELECT d.val AS "Date", o.ohscdt, COALESCE(COUNT(o.ohtxn#), 0) AS "Count"
FROM dates AS d
LEFT JOIN KIVALIB.ORDHRPF AS o
ON o.ohordt = TO_CHAR(d.val, 'YYYYMMDD')
WHERE o.ohstat = 'F'
AND o.ohtyp = 1