TSQL Running Totals aggregate from sum of previous rows - sql

Not sure how to word this. Say i have a select returing this.
Name, month, amount
John, June, 5
John, July,6
John, July, 3
John August, 10
and I want to aggregate and report beggining blance for each month.
name, month, beggining balance.
john, may, 0
john, june, 0
john, july, 5
john, august, 14
john, September, 24
I can do this in excel with cell formulas, but how can I do it in SQL without storing values somewhere? I have another table with fiscal months i can do a left outer join with so all months are reported, just not sure how to aggregate from prior months in sql.

select
name
, month
, (select sum(balance) from mytable
where mytable.month < m.month and mytable.name = m.name) as starting_balance
from mytable m
group by name, month
This is not as nice as windowing functions, but since they vary from database to database, you'd need to tell us which system you are using.
And it's an inline subquery, which is not very performant. But at least it's easy to understand what's going on !

Use Grouping like this
SELECT NAME, MONTH , SUM(Balance) FROM table GROUP BY NAME, MONTH

Assuming your months are represented as dates, this will give you the running total.
select t1.name, t1.month, sum(t2.amount)
from yourtable t1
left join yourtable t2
on t1.name = t2.name
and t1.month>t2.month
group by t1.name, t1.month

Related

Counting DISTINCT in FULL OUTER JOIN

I'm sure there's a simple solution to this, which my pea brain is unable to comprehend right now.
I'm using the following query with a FULL OUTER JOIN and I would like to COUNT the DISTINCT memberid:
SELECT a.year,
COUNT(DISTINCT a.memberid) AS members
FROM (SELECT DISTINCT YEAR,
memberid
FROM (SELECT EXTRACT(YEAR FROM created_at) AS YEAR,
EXTRACT(MONTH FROM created_at) AS MONTH,
member_id AS memberid,
COUNT(DISTINCT field1) AS field1
FROM table1
GROUP BY YEAR,
MONTH,
member_id
ORDER BY YEAR,
MONTH,
eids DESC)) a
FULL OUTER JOIN (SELECT DISTINCT YEAR,
memberid
FROM (SELECT EXTRACT(YEAR FROM created) AS YEAR,
EXTRACT(MONTH FROM created) AS MONTH,
memberid,
COUNT(field2) AS field2
FROM table2
GROUP BY YEAR,
MONTH,
memberid
ORDER BY YEAR,
MONTH,
questions DESC)) b
ON a.year = b.year
AND a.memberid = b.memberid
GROUP BY a.year
ORDER BY a.year
This query executes properly, but I'm quite sure that the results are not what I expect.
I get the following results:
2014 26834
2015 58573
2016 178378
2017 233291
2018 297404
2019 281088
Let's call the queries on either side of the FULL OUTER JOIN as Left query and Right query for now. When I aggregate the Right query on year and count the distinct memberid, I get the following results:
2013 3915
2014 59025
2015 115514
2016 176528
2017 216675
2018 301007
2019 311141
As we can see, the results (DISTINCT COUNT) for the Right query itself is higher than the complete query having the FULL OUTER JOIN. This obviously doesn't make sense.
In my final result, I would like to run a COUNT DISTINCT on ALL the memberid (i.e. the memberid that appear in Left query, plus the memberid that appear in the Right query, without counting any memberid twice) and aggregate it by year.
I know the solution to this has to be simple. Any help would be much appreciated.
You are only counting a.memberid, which means anything from the right side is ignored.
To make this work you should do a union between the left and right side, and then just count(distinct memberid)

how to produce a customer retention table /cohort analysis with SQL

I'm trying to write an SQL query (Presto SQL syntax) to produce a customer retention table (see sample below).
A customer who makes at least one transaction in a month is considered as retained for that month.
this is the table
user_id transaction_date
bdcff651- . 2018-01-01
bdcff641 . 2018-03-15
this is the result I would like to get
The first row should be understood as follows:
Out of all customers who made their first transaction in the month of Jan 2018 (defined as “Jan Activation Cohort”), 35% subsequently made a transaction during the one month period following their first transaction date, 23% in the next month, 15% in the next month and so on.
Date 1st Month 2nd Month 3rd Month
2018-01-01 35% 23% . 15%
2018-02-0 33 % 26% . 13%
2018-03-0 36% 27% 12%
As an example, if person XYZ makes his first transaction on 10th February 2018, his 1st month will be from 11th February 2018 to 10th March 2018, 2nd month will be from 11th March 2018 to 10th April 2018 and so on. This person’s details need to appear in the Feb 2018 cohort in the Customer Retention Table.
would appreciate any help! thanks.
You can use conditional aggregation. However, I am not sure what your real calculations are.
If I just use the built-in definitions of date_diff(), then the logic looks like:
select date_trunc(month, first_td) as yyyymm,
count(distinct user_id) as cnt,
(count(distinct case when date_diff(month, first_td, transaction_date) = 1
then user_id
end) /
count(distinct user_id)
) as month_1_ratio,
(count(distinct case when date_diff(month, first_td, transaction_date) = 2
then user_id
end) /
count(distinct user_id)
) as month_2_ratio
from (select t.*,
min(transaction_date) over (partition by user_id) as first_td
from t
) t
group by date_trunc(month, first_td)
order by yyyymm;
I am not familiar with Presto exactly, and do not have a way to test Presto code. However, it looks like from searching around a bit that it wouldn't be too hard to convert to Presto syntax from something like SQL Server syntax. Here is what I would do in SQL Server and you should be able to carry the concept over to Presto:
with transactions_info_per_user as (
select user_id, min(transaction_date) as first_transaction,
convert(datepart(year, min(transaction_date)) as varchar(4)) + convert(datepart(month, min(transaction_date)) as varchar(2)) as activation_cohort
from my_table
group by user_id
),
users_per_activation_cohort as (
select activation_cohort, count(*) as number_of_users
from transactions_info_per_user
group by activation_cohort
),
months_after_activation_per_purchase as (
select distinct mt.user_id, ti.activation_cohort, datediff(month, mt.transaction_date, ti.first_transaction) AS months_after_activation
from my_table mt
left join transactions_info_per_user as ti
on mt.user_id = ti.user_id
),
final as (
select activation_cohort, months_after_activation, count(*) as user_count_per_cohort_with_purchase_per_month_after_activation
from months_after_activation_per_purchase
group by activation_cohort, months_after_activation
)
select activation_cohort, months_after_activation,
convert(user_count_per_cohort_with_purchase_per_month_after_activation as decimal(9,2)) / convert(users_per_activation_cohort as decimal(9,2)) * 100
from final
--Then pivot months_after_activation into columns
I was very explicit with the naming of things so you could follow the thought process. Here is an example of how to pivot in Presto. Hopefully this helps you!

How do I correctly use the SQL Sum function with multiple variables and grouping?

I am trying to write an SQL statement based on the following code.
CREATE TABLE mytable (
year INTEGER,
month INTEGER,
day INTEGER,
hoursWorked INTEGER )
Assuming that each employee works multiple days over each month in a 3 year period.
I need to write an sql statement that returns the total hours worked in each month, grouped by earliest year/month first.
I tried doing this, but I don't think it is correct:
SELECT Sum(hoursWorked) FROM mytable
ORDER BY(year,month)
GROUP BY(month);
I am a little confused about how to operate the sum function in conjunction with thee GROUP BY or ORDER BY function. How does one go about doing this?
Try this:
SELECT year, month, SUM(hoursWorked)
FROM mytable
GROUP BY year, month
ORDER BY year, month
This way you will have for example:
2014 December 30
2015 January 12
2015 February 40
Fields you want to group by always have be present in SELECT part of query. And vice-versa - what you put in SELECT part, need be also in GROUP BY.
SELECT year, month, Sum(hoursWorked)as workedhours
FROM mytable
GROUP BY year,month
ORDER BY year,month;
You have to group by year and month.
Is this what you are trying to do. This will sum by Year/Month and Order by Year/Month.
Select [Year], [Month], Sum(HoursWorked) as WorkedHours
From mytable
Group By [Year], [Month]
Order by [Year], [Month]
You have to group by year and month, otherwise you will have the hours you worked on March 2014 and 2015 in one record :)
SELECT Sum(hoursWorked) as hoursWorked, year, month
FROM mytable
GROUP BY(year, month)
ORDER BY(year,month)
;

SQL Aggregates OVER and PARTITION

All,
This is my first post on Stackoverflow, so go easy...
I am using SQL Server 2008.
I am fairly new to writing SQL queries, and I have a problem that I thought was pretty simple, but I've been fighting for 2 days. I have a set of data that looks like this:
UserId Duration(Seconds) Month
1 45 January
1 90 January
1 50 February
1 42 February
2 80 January
2 110 February
3 45 January
3 62 January
3 56 January
3 60 February
Now, what I want is to write a single query that gives me the average for a particular user and compares it against all user's average for that month. So the resulting dataset after a query for user #1 would look like this:
UserId Duration(seconds) OrganizationDuration(Seconds) Month
1 67.5 63 January
1 46 65.5 February
I've been batting around different subqueries and group by scenarios and nothing ever seems to work. Lately, I've been trying OVER and PARTITION BY, but with no success there either. My latest query looks like this:
select Userid,
AVG(duration) OVER () as OrgAverage,
AVG(duration) as UserAverage,
DATENAME(mm,MONTH(StartDate)) as Month
from table.name
where YEAR(StartDate)=2014
AND userid=119
GROUP BY MONTH(StartDate), UserId
This query bombs out with a "Duration' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause" error.
Please keep in mind I'm dealing with a very large amount of data. I think I can make it work with CASE statements, but I'm looking for a cleaner, more efficient way to write the query if possible.
Thank you!
You are joining two queries together here:
Per-User average per month
All Organisation average per month
If you are only going to return data for one user at a time then an inline select may give you joy:
SELECT AVG(a.duration) AS UserAvergage,
(SELECT AVG(b.Duration) FROM tbl b WHERE MONTH(b.StartDate) = MONTH(a.StartDate)) AS OrgAverage
...
FROM tbl a
WHERE userid = 119
GROUP BY MONTH(StartDate), UserId
Note - using comparison on MONTH may be slow - you may be better off having a CTE (Common Table Expression)
missing partition clause in Average function
OVER ( Partition by MONTH(StartDate))
Please try this. It works fine to me.
WITH C1
AS
(
SELECT
AVG(Duration) AS TotalAvg,
[Month]
FROM [dbo].[Test]
GROUP BY [Month]
),
C2
AS
(
SELECT Distinct UserID,
AVG(Duration) OVER(PARTITION BY UserID, [Month] ORDER BY UserID) AS DetailedAvg,
[Month]
FROM [dbo].[Test]
)
SELECT C2.*, C1.TotalAvg
FROM C2 c2
INNER JOIN C1 c1 ON c1.[Month] = c2.[Month]
ORDER BY c2.UserID, c2.[Month] desc;
I was able to get it done using a self join, There's probably a better way.
Select UserId, AVG(t1.Duration) as Duration, t2.duration as OrgDur, t1.Month
from #temp t1
inner join (Select Distinct MONTH, AVG(Duration) over (partition by Month) as duration
from #temp) t2 on t2.Month = t1.Month
group by t1.Month, t1.UserId, t2.Duration
order by t1.UserId, Month desc
Here's using a CTE which is probably a better solution and definitely easier to read
With MonthlyAverage
as
(
Select MONTH, AVG(Duration) as OrgDur
from #temp
group by Month
)
Select UserId, AVG(t1.Duration) as Duration, m.duration as OrgDur , t1.Month
from #temp t1
inner join MonthlyAverage m on m.Month = t1.Month
group by UserId, t1.Month, m.duration
You can try below with less code.
SELECT Distinct UserID,
AVG(Duration) OVER(PARTITION BY [Month]) AS TotalAvg,
AVG(Duration) OVER(PARTITION BY UserID, [Month] ORDER BY UserID) AS DetailedAvg,
[Month]
FROM [dbo].[Test]

Sum a field for each month cummulatively and dynamically

I have a table which contains around 50,000 records of information which has been set up to look back as far as start of current financial year.
As it stands I have not updated this table since last month so currently the data in there assumes we are still looking back as far back as April 1st 2011.
note(when i refresh the data, there will only be April 2012's data in there as we are now in April, then in May it will have April 2012 and May 2012 and so on...)
Each record has 4 columns I am concerned with:
Department,
Incident date,
month,
year,
reduced
Both the month and year columns have been generated from the incident date field which is in this format:
2011-06-29 00:00:00.000
I need to for each department, sum the reduced but in a cumulative fashion.
eg seen as though April 2011 will be the earliest month/year data I have at the moment, I will want to know the total reduced for every department for April.
Then for May I want April & May combined, for June I need April,May,June and so on...
Is there an intelligent way to do this so that as soon as I reimport data into this table it will realise that there is now only one month and that the year has updated and will for now until next month only display April's sum(reduced)
The following will return the cumulative totals grouped by Department, Year and Month. If you're clearing out the data from the previous tax year when refreshing then you can omit the WHERE clause.
SELECT T1.[Year],
T1.[Month],
T1.Department,
SUM(T2.Reduced) ReducedTotals
FROM [TABLENAME] T1
INNER JOIN [TABLENAME] T2 ON ( T1.Department = T2.Department AND T1.IncidentDate >= T2.IncidentDate )
WHERE T1.IncidentDate >= '2012-04-01'
GROUP BY T1.[Year],
T1.[Month],
T1.Department
ORDER BY T1.[Year],
T1.[Month],
T1.Department
select t1.id, t1.singlenum, SUM(t2.singlenum) as sum
from #t t1 inner join #t t2 on t1.id >= t2.id
group by t1.id, t1.singlenum
order by t1.id