Running Total by Year in SQL - sql

I have a table broken out into a series of numbers by year, and need to build a running total column but restart during the next year.
The desired outcome is below
Amount | Year | Running Total
-----------------------------
1 2000 1
5 2000 6
10 2000 16
5 2001 5
10 2001 15
3 2001 18
I can do an ORDER BY to get a standard running total, but can't figure out how to base it just on the year such that it does the running total for each unique year.

SQL tables represent unordered sets. You need a column to specify the ordering. One you have this, it is a simple cumulative sum:
select amount, year, sum(amount) over (partition by year order by <ordering column>)
from t;
Without a column that specifies ordering, "cumulative sum" does not make sense on a table in SQL.

Related

SQL How to take the minium for multiple fields?

Consider the following data set that records the product sold, year, and revenue from that particular product in thousands of dollars. This data table (YEARLY_PRODUCT_REVENUE) is stored in SQL and has many more rows.
Year | Product | Revenue
2000 Table 100
2000 Chair 200
2000 Bed 150
2010 Table 120
2010 Chair 190
2010 Bed 390
Using SQL, for every year I would like to find the product that has the maximum revenue.
That is, I would like my output to be the following:
Year | Product | Revenue
2000 Chair 200
2010 Bed 390
My attempt so far has been this:
SELECT year, product, MIN(revenue)
FROM YEARLY_PRODUCT_REVENUE
GROUP BY article, month;
But when I do this, I get multiple-year values for distinct products. For instance, I'm getting the output below which is an error. I'm not entirely sure what the error here is. Any help would be much appreciated!
Year | Product | Revenue
2000 Table 100
2000 Bed 150
2010 Table 120
2010 Chair 190
You don't mention the database so I'll assume it's PostgreSQL. You can do:
select distinct on (year) * from t order by year, revenue desc
You want filtering rather than aggregation. We can use window functions (which most databases support) to rank yearly product sales, and then retain only the top selling product per year.
select *
from (
select r.*, rank() over(partition by year order by revenue desc) rn
from yearly_product_revenue r
) r
where rn = 1;
Here is a shorter solution if your database support the standard WITH TIES clause:
select *
from yearly_product_revenue r
order by rank() over(partition by year order by revenue desc)
fetch first row with ties

Count total without duplicate records

I have a table that contains the following columns: TrackingStatus, Year, Month, Order, Notes
I need to calculate the total number of tracking status for each year and month.
For example, if the table contains the following orders:
TrackingStatus
Year
Month
Order
Notes
F
2020
1
33
F
2020
1
33
DFF
E
2020
2
36
xxx
A
2021
3
34
X1
A
2021
3
34
DD
A
2021
3
88
A
2021
2
45
The result should be:
• Tracking F , year 2020, month 1 the total will be one (because it's the same year, month, and order).
• Tracking A , year 2021, month 2 the total will be one. (because there is only one record with the same year, month, and order).
• Tracking A , year 2021, month 3 the total will be two. (because there are two orders within the same year and month).
So the expected SELECT output will be like that:
TrackingStatus
Year
Month
Total
F
2020
1
1
E
2020
2
1
A
2021
2
1
A
2021
3
2
I was trying to use group by but then it will count the number of records which in my scenario is wrong.
How can I get the total orders for each month without counting “duplicate” records?
Thank you
You can use a COUNT DISTINCT aggregation function, whereas the COUNT allows you to count the values, but the DISTINCT condition will allow each value only once.
SELECT TrackingStatus,
Year,
Month,
COUNT(DISTINCT Order) AS Total
FROM tab
GROUP BY TrackingStatus,
Year,
Month
ORDER BY Year,
Month
Here you can find a tested solution in a MySQL environment, though this should work with many DBMS.

How to "calculate performant wise" cumulative sum column in sql

Hi lets say i have a table that contains cost per day
and i want by the end of the month to calculate that cumulative sum for that day
so if for say we have those values: 1,2,3 (per 3 days)
we we'll calculate 1,(1+2)=3, (1+2+3)=6 (for the last day)
i wonder how we can do it through sql without sorting the days (n*lgn) cost
is there any other way to solve it?
sample data :
1/1, 1
2,1, 10
3/, 12
desired result (with total from start of the month):
1/1, 1, 1
2,1, 10, 11
3/, 12, 23
I'm guessing you want a rolling sum.
select *
, sum(cost_column) over (order by day_column asc) as rolling_cost
from yourtable
day_column
cost_column
rolling_cost
2022-1-1
1
1
2022-1-2
10
11
2022-1-3
12
23
Demo on db<>fiddle here

Order table by the total count but do not lose the order by names

I have a table, consisting of 3 columns (Person, Year and Count), so for each person, there are several rows with different years and counts and the final row with total count. I want to keep the table ordered by Name, but also order it by the total count.
So the rows should be ordered by sum, but also grouped by the Person and ordered by year. When I am trying to order by sum, of course, both person and years are messed up. Is there a way to sort like this?
You've stored those "total" rows as well? Gosh! Why did you do that?
Anyway: if you
compute rank for rows whose year column is equal to 'total' and
add case expression into the order by clause,
you might get what you want:
SQL> with sorter as
2 (select name, cnt,
3 rank() over (order by cnt) rnk
4 from test
5 where year = 'total'
6 )
7 select t.*
8 from test t join sorter s on s.name = t.name
9 order by s.rnk, case when year = 'total' then '9'
10 else year
11 end;
NAME YEAR CNT
---- ----- ----------
John 2018 3
John 2019 2
John total 5
Bob 2017 2
Bob 2019 4
Bob total 6
6 rows selected.
SQL>

Looking to select values grouped by one column but create a hierarchy of the different columns to find "the best" column

Mightn't make much sense but let's try.
I have a dataset that is quite large and I have a few "duplicates" in a column. Within that column, I want to group it but select the corresponding row that is the "best fit" based on the max/sum of other columns. Is this possible within SQL?
Input:
Name
Transactions
Date
Apple #
Orange #
John
10
today
10
10
John
15
Yesterday
10
10
Jack
10
Today
5
5
Output I expect:
Name
Transactions
Date
Apple #
Orange #
Total #
John
15
Yesterday
10
10
20
Jack
10
Today
5
5
10
The hierarchy would be, max(transactions), max(date) and then sum(Apple, Orange).
I want to do it then for every unique name.
If I understand correctly, you can use row_number(). The key is setting up the order by to reflect the conditions you want:
select t.*
from (select t.*,
row_number() over (partition by name order by transactions desc, date desc, apple + orange desc) as seqnum
from t
) t
where seqnum = 1;