Cummulative sum in report - sql

Hello All,
I am working on a report where I am doing calculations:
Let's take the first line as an example. In the remaining prior column we have 15 and in the taken column we have 0.5, so in the remaining column, we have 14.5.
Now the issue is to use the result in the remaining field and transfer it to the next line in the remaining prior column. So instead of having 14 we should be having 14.5.
Has anyone worked on something similar and guide me on how to work on this? I really want to learn how to solve such an issue.

The ANSI standard lag() function does exactly what you want. SQL tables represent unordered sets, so I need to assume that you have some column -- which I will call id -- that identifies the ordering of the rows.
The syntax for lag() is:
select t.*, lag(Remaining) over (order by 1) as prevRemaining
from table t;
If you have a database that does not support the ANSI standard window functions, you can get the same effect with a subquery. However, the syntax for that might vary slightly among databases.

Related

When can aggregate functions be nested in standard SQL?

I know it wasn't allowed in SQL-92. But since then it may have changed, particularly when there's a window applied. Can you explain the changes and give the version (or versions if there were more) in which they were introduced?
Examples
Is SUM(COUNT(votes.option_id)) OVER() valid syntax per standard SQL:2016 (or earlier)?
This is my comment (unanswered, an probably unlikely to in such an old question) in Why can you nest aggregate functions when using a window function in PostgreSQL?.
The Calculating Running Total (SQL) kata at Codewars has as its most upvoted solution (using PostgreSQL 13.0, a highly standard compliant engine, so the code is likely to be standard) this one:
SELECT
CREATED_AT::DATE AS DATE,
COUNT(CREATED_AT) AS COUNT,
SUM(COUNT(CREATED_AT)) OVER (ORDER BY CREATED_AT::DATE ROWS UNBOUNDED PRECEDING)::INT AS TOTAL
FROM
POSTS
GROUP BY
CREATED_AT::DATE
(Which could be simplified to:
SELECT
created_at::DATE date,
COUNT(*) COUNT,
SUM(COUNT(*)) OVER (ORDER BY created_at::DATE)::INT total
FROM posts
GROUP BY created_at::DATE
I assume the ::s are a new syntax for casting I didn't know of. And that casting from TIMESTAMP to DATE is now allowed (in SQL-92 it wasn't).)
As this SO answer explains, Oracle Database allows it even without a window, pulling in the GROUP BY from context. I don't know if the standard allows it.
You already noticed the difference yourself: It's all about the window. COUNT(*) without an OVER clause for instance is an aggregation function. COUNT(*) with an OVER clause is a window function.
By using aggregation functions you condense the original rows you get after the FROM clause and WHERE clause are applied to either the specified group in GROUP BY or to one row in the absence of a GROUP BY clause.
Window functions, aka analytic functions, are applied afterwards. They don't change the number of result rows, but merely add information by looking at all or some rows (the window) of the selected data.
In
SELECT
options.id,
options.option_text,
COUNT(votes.option_id) as vote_count,
COUNT(votes.option_id) / SUM(COUNT(votes.option_id)) OVER() * 100.0 as vote_percentage
FROM options
LEFT JOIN votes on options.id = votes.option_id
GROUP BY options.id;
we first join votes to options and then count the votes per option by aggregating the joined rows down to one result row per option (GROUP BY options.id). We count on a non-nullable column in the votes table (COUNT(votes.option_id), so we get a zero count in case there are no votes, because in an outer joined row this column is set to null.
After aggregating all rows and getting thus one row per option we apply a window function (SUM() OVER) on this result set. We apply the analytic SUM on the vote count (SUM(COUNT(votes.option_id)) by looking at the whole result set (empty OVER clause), thus getting the same total vote count in every row. We use this value for a calculation: option's vote count diveded by total vote count times 100, which is the option's percentage of total votes.
The PostgreSQL query is very similar. We select the number of posts per date (COUNT(created_at) is nothing else than a mere COUNT(*)) along with a running total of these counts (by using a window that looks at all rows up to the current row).
So, while this looks like we are nesting two aggregate functions, this is not really the case, because SUM OVER is not considered an agregation function but an analytic/window function.
Oracle does allow applying an aggregate function directly on another, thus invoking a final aggregation on a previous grouped by aggregation. This allows us to get one result row of, say, the average of sums without having to write a subquery for this. This is not compliant with the SQL standard, however, and very unpopular even among Oracle developers at that.

Unexpected result with ORDER BY

I have the following query:
SELECT
D.[Year] AS [Year]
, D.[Month] AS [Month]
, CASE
WHEN f.Dept IN ('XSD') THEN 'Marketing'
ELSE f.Dept
END AS DeptS
, COUNT(DISTINCT f.OrderNo) AS CountOrders
FROM Sales.LocalOrders AS l WITH
INNER JOIN Sales.FiscalOrders AS f
ON l.ORDER_NUMBER = f.OrderNo
INNER JOIN Dimensions.Date_Dim AS D
ON CAST(D.[Date] AS DATE) = CAST(f.OrderDate AS DATE)
WHERE YEAR(f.OrderDate) = 2019
AND f.Dept IN ('XSD', 'PPM', 'XPP')
GROUP BY
D.[Year]
, D.[Month]
, f.Dept
ORDER BY
D.[Year] ASC
, D.[Month] ASC
I get the following result the ORDER BY isn't giving the right result with Month column as we can see it is not ordered:
Year Month Depts CountOrders
2019 1 XSD 200
2019 10 PPM 290
2019 10 XPP 150
2019 2 XSD 200
2019 3 XPP 300
The expected output:
Year Month Depts CountOrders
2019 1 XSD 200
2019 2 XSD 200
2019 3 XPP 300
2019 10 PPM 290
2019 10 XPP 150
Your query
It is ordered by month, as your D.[Month] is treated like a text string in the ORDER BY clause.
You could do one of two things to fix this:
Use a two-digit month number (e.g. 01... 12)
Use a data type for the ORDER BY clause that will be recognized as representing a month
A quick fix
You can correct this in your code by quickly changing the ORDER BY clause to analyze those columns as though they are numbers, which is done by converting ("casting") them to an integer data type like this:
ORDER BY
CAST(D.[Year] AS INT) ASC
,CAST(D.[Month] AS INT) ASC
This will correct your unexpected query results, but does not address the root cause, which is your underlying data (more on that below).
Your underlying data
The root cause of your issue is how your underlying data is stored and/or surfaced.
Your Month seems to be appearing as a default data type (VarChar), rather than something more specifically suited to a month or date.
If you administer or have access to or control over the database, it is a good idea to consider correcting this.
In considering this, be mindful of potential context and change management issues, including:
Is this underlying data, or just a representation of upstream data that is elsewhere? (e.g. something that is refreshed periodically using a process that you do not control, or a view that is redefined periodically)
What other queries or processes rely on how this data is currently stored or surfaced (including data types), that may break if you mess with it?
Might there be validation issues if correcting it? (such as from the way zero, null, non-numeric or non-date data is stored, even if invalid)
What change management practices should be followed in your environment?
Is the data source under high transactional load?
Is it a production dataset?
Are other reporting processes dependent on it?
None of these issues are a good excuse to leave something set up incorrectly forever, which will likely compound the issue and introduce others. However, that is only part of the story.
The appropriate approach (correct it, or leave it) will depend on your situation. In a perfect textbook world, you'd correct it. In your world, you will have to decide.
A better way?
The above solution is a bit of a quick and nasty way to force your query to work.
The fact that the solution CASTs late in the query syntax, after the results have been selected and filtered, hints that is not the most elegant way to achieve this.
Ideally you can convert data types as early as possible in the process:
If done in underlying data, not the query, this is the ultimate but may not suit the situation (see below)
If done in the query, try to do it earlier.
In your case, your GROUP BY and ORDER BY are both using columns that look to be redundant data from the original query results, that is, you are getting a DATE and a MONTH and a YEAR. Ideally you would just get a DATE and then use the MONTH or YEAR from that date. Your issue is your dates are not actually dates (see "underlying data" above), which:
In the case of DATE, is converted in your INNER JOIN line ON CAST(D.[Date] AS DATE) = CAST(f.OrderDate AS DATE) (likely to minimise issues with the join)
In the case of D.[year] and D.[month], are not converted (which is why we still need to convert them further down, in ORDER BY)
You could consider ignoring D.[month] and use the MONTH DATEPART computed from DATE, which would avoid the need to use CAST in the ORDER BY clause.
In your instance, this approach is a middle ground. The quick fix is included at the top of this answer, and the best fix is to correct the underlying data. This last section considers optimizing the quick fix, but does not correct the underlying issue. It is only mentioned for awareness and to avoid promoting the use of CAST in an ORDER BY clause as the most legitimate way of addressing your issue with good clean query syntax.
There are also potential performance tradeoffs between how many columns you select that you don't need (e.g. all of the ones in D?), whether to compute the month from the date or a seperate month column, whether to cast to date before filtering, etc. These are beyond the scope of this solution.
So:
The immediate solution: use the quick fix
The optimal solution: after it's working, consider the underlying data (in your situation)
The real problem is your object Dimensions.Date_Dim here. As you are simply ordering on the value of D.[Year] and D.[Month] without manipulating the values at all, this means the object is severely flawed; you are storing numerical data as a varchar. varchar, and numerical data types are completely different. For example 2 is less than 10 but '2' is greater than '10'; because '2' is greater than '1', so therefore it must also be greater than '10'.
The real solution, therefore, is fixing your object. Assuming that both Month and Year are incorrectly stored as a varchar, don't have any non-integer values (another and different flaw if so), and not a computed column then you could just do:
ALTER TABLE Dimensions.Date_Dim ALTER COLUMN [Year] int NOT NULL;
ALTER TABLE Dimensions.Date_Dim ALTER COLUMN [Month] int NOT NULL;
You could, however, also make the columns a PERSISTED computed column, which might well be easier, in my opinion, as DATEPART already returns a strongly typed int value.
ALTER TABLE dbo.Date_Dim DROP COLUMN [Month];
ALTER TABLE dbo.Date_Dim ADD [Month] AS DATEPART(MONTH,[Date]) PERSISTED;
Of course, for both solutions, you'll need to (first) DROP and (afterwards) reCREATE any indexes and constraints on the columns.
As long as your "Month" is always 1-12, you can use
SELECT ..., TRY_CAST(D.[Month] AS INT) AS [Month],...
ORDER BY TRY_CAST(D.[Month] AS INT)
The simplest solution is:
ORDER BY MIN(D.DATE)
or:
ORDER BY MIN(f.ORDER_DATE)
Fiddling with the year and month columns is totally unnecessary when you have a date column that is available.
A very common issue when you store numerical data as a varchar/nvarchar.
Try to cast Year and Month to INT.
ORDER BY
CAST(D.[Year] AS INT) ASC
,CAST(D.[Month] AS INT) ASC
If you try using the <, > and BETWEEN operators, you will get some really "weird" results.

Why is SQL Server returning a different order when using 'month' in 'where'?

I run a procedure call that calculates sums into table rows. First I taught the procedure is not working as expected, so I wasted half a day trying to fix what actually works fine.
Later I actually taken a look at the SELECT that gets the data on screen and was surprised by this:
YEAR(M.date) = 2016
--and MONTH(M.date) = 2
and
YEAR(M.date) = 2016
and MONTH(M.date) = 2
So the second example returns a different sorting than the first.
The thing is I do calculations on the whole year. Display data on year + month parameters.
Can someone explain why this is happening and how to avoid this?
In my procedure that calls the SELECT for on screen data I have it implemented like so:
and (#month = 0 or (month(M.date) = #month))
and year(M.date) = #year
So the month parameter is optional if the user wants to see the data for the whole year and year parameter is mandatory.
You are ordering by the date column. However, the date column is not unique -- multiple rows have the same date. The ORDER BY returns these in arbitrary order. In fact, you might get a different ordering for the same query running at different times.
To "fix" this, you need to include another column (or columns) that is unique for each row. In your case, that would appear to be the id column:
order by date, id
Another way to think about this is that in SQL the sorts are not stable. That is, they do not preserve the original ordering of the data. This is easy to remember, because there is no "original ordering" for a table or result set. Remember, tables represent unordered sets.

subtracting in SQL Server

I have a table in SQL Server where I have the scores for some competencies, I have one score for the standard and one for the actual score. For instance S25 is the actual score and C25 is the standard for the score. I need to find the difference between the two so I can see who was above and below the standard and cannot figure out how to get the subtract to work. THe way I tried was
Select (S25) - (C25) AS 25_Score
Which did not work
If table starts with a number, bracket it, and that might work. What error do you get?
select (S25)-(C25) AS [25_Score]
from table_name
Your query should work if your columns are a numeric datatype.
The only issue I see is you are starting the alias with a number. You will need to escape the number value with a square bracket:
Select (S25) - (C25) AS [25_Score]
from yt;
See Demo
It may be that the column is of varchar so you have to convert
select convert(int,[S25])-convert(int,[C25]) AS [25_Score]
from table_name

using SUM and presenting result as absolute (ABS)

I have both positive and negative numbers (money) in a column and need to:
SUM the total ie. SUM(myColumn) based on if the numbers are +/-
Present the result as an absolute ie. even though the result is -1234 it should be presented as 1234
SQL is not my trade as you probably notice but we've solved most other issues but this one so any help is appriciated. Keep in mind my skill level is very low
You will have to use a combination of the sum and abs aggregate functions in SQL. Since you want the absolute value of the sum, the sum function will need to be called inside the call to abs:
select abs(sum(columnName)) from table
That should work for both SQL Server, MySQL, and Oracle.
Try one (or more) of these
SELECT SUM(moneyColumn) FROM MyTable
SELECT SUM(ABS(moneyColumn) FROM MyTable
SELECT ABS(SUM(moneyColumn) FROM MyTable