Create a running subtotal for SQL Server - sql

I am trying to get a running subtotal (understanding this is different from subtotals for groups, and the rollup approach).
Tried using
Row_Number() over (order by ID_Number) as Row_Count
and nesting it in select statements and using a LEFT OUTER JOIN on itself (which just churns).
What I am trying to get is this:
if ROW_COUNT > 1 THEN RUNNINGTOTAL = Volume_Category + (RUNNINGTOTAL for ID_Number where ROW_COUNT= ROW_COUNT(for this ID_Number*)-1)
I have a table with a list of unique "ID-Numbers" which are the focus here.

Unless you are using SQL Server 2012, the easiest way to do a cumulative sum is with a correlated subquery. Here is the template for the code:
select t.*,
(select sum(val) from t t2 where t2.ordercol <= t.ordercol) as cumesum
from t
In 2012, you can do:
select t.*,
sum(val) over (order by ordercol) as cumesum
from t
In both these, val is the column you want to sum and ordercol is how the ordering is specified.

Try this:
SELECT
T1.Id,
SUM(T2.Amount) Total
FROM tbl T1
JOIN Tbl T2
ON T1.Id>= T2.Id
GROUP BY T1.Id

Related

Grouping values when only all values are equal

I'm trying to group some data. This is the situation:
I'm doing this select:
select
min(id) id
,week
,percentage
from table1
group by week,percentage
Could anyone help me out? I want that it only groups if all values are equal. If there is some different value in percentage it should not be grouping.. the id3 should not group for the week 3. I'm using SQL Server 2012.
Thanks.
You don't really want aggregation. You want to remove certain ids.
The following concatenates all the week/percentage values together to get a single identifier to identify duplicates. Despite what I just said, this then uses aggregation for "filtering" to get the first one:
select min(id) as id, week, percentage
from (select t1.*,
string_agg(concat(week, ':', percentage), ',') within group (order by week) over (partition by id) as id_week_percentages
from table1 t1
) t1
group by id_week_percentages, week, percentage;
The aggregation just allows this to be written without an extra subquery. You could do something similar as:
select top (1) with ties id, week, percentage
from (select t1.*,
string_agg(concat(week, ':', percentage), ',') within group (order by week) over (partition by id) as id_week_percentages
from table1 t1
) t1
order by row_number() over (partition by id_week_percentages, week, percentage order by id);
Or use a separate subquery to pull the first id for each week/percentage combination.
EDIT:
With for xml path:
select top (1) with ties id, week, percentage
from (select t1.*,
(select concat(week, ':', percentage, ',')
from table1 tt1
where tt1.id = t1.id
order by week, percentage
for xml path ('')
) as id_week_percentages
from table1 t1
) t1
order by row_number() over (partition by id_week_percentages, week, percentage order by id);

SQL query filtering on COUNT of a column

I need to filter a SQL query based on a count of records within that query.
I want the query to return only the rows where the count of "Location" is greater than 5.
For example, we have 100 rows of data. 10 "Locations" make up all 100 rows, but I only want the rows where the COUNT("Location") > 5, essentially eliminating the rows with Locations where the COUNT("Location") < 5.
I've tried combinations of aggregation and the HAVING clause but can't nail down the answer.
I think you want a window function:
select t.*
from (select t.*, count(*) over (partition by location) as cnt
from t
) t
where cnt >= 5;
Alternative answer from Gordon Linoff using CTE:
with CTE as(
select *, count(*) over (partition by Location) as count from table
)
select *from CTE where count >= 5
Here is the solution you may have been after by using the HAVING clause...
select t.*
from t
inner join
(
select Location, count(*) as Count
from t
group by Location
having count(*) >= 5
) as t2 on t.Location = t2.Location
order by t.ID
Here it is in action.

SQL - How to sum only one row above

I have a data set like this one below:
The result column is the sum of the value column of this row and the value column one row above the current row.
Is it possible to write a query for this result?
If your database does not support LAG, then we have some other options. In MySQL, we can try using a correlated subquery to find the lag value:
SELECT
id,
value,
COALESCE((SELECT t2.value FROM yourTable t2
WHERE t2.id < t1.id ORDER BY t2.id DESC LIMIT 1), 0) result
FROM yourTable t1;
A similar query would also work on SQL Server, using TOP 1 in the subquery.
You didn't tag your question with a specific rdbms, but most modern RDBMSs support the lag window function:
SELECT value + COALESCE(LAG(value) OVER (ORDER BY id), 0) AS result
FROM mytable
You can use lag() as in:
select t.*,
(value + lag(value) over (order by id)) as result
from t;
If you wanted a value for the first row, you can use sum() with a windowing clause:
select t.*,
sum(t.value) over (order by t.id rows between 1 preceding and current row) as result
from t;
This generalizes easily to more rows.
Or, if the ids have no gaps, then you can use join:
select t.*, (t.value + tprev.value) as result
from t left join
t tprev
on tprev.id = t.id - 1;

How to select only incremental records in BIG QUERY

I have a data in my database like as follows
and i am expecting the result like
Can anyone please help me how to write a select query for this. it is a kind of incremental load of data
Please help me on this
You can use window functions. You want the earliest version of each record, so:
select t.*
from (select t.*,
row_number() over (partition by empid, empname, sal order by create_time) as seqnum
from t
) t
where seqnum = 1;
If you want to detect changes, rather than the first occurrence of a set of values, you can use lag():
select t.*
from (select t.*,
lag(sal) over (partition by empid, empname order by create_time) as prev_sal
from t
) t
where prev_sal is null or prev_sal <> sal;
This handles salaries that decrease as well as increase.
SQL SERVER or ORACLE?
here I try for SQL SERVER
for all records, get the most recent previous record for the employee (if any) and check for changes since then, if none, do not select - therefore, only new or changed records are shown
SELECT t1.* FROM tab t1
OUTER APPLY
(SELECT TOP 1 t2.*
FROM tab t2
WHERE t1.empid = t2.empid
AND t2.create_time < t1.create_time
ORDER BY t2.create_time DESC
) IQ
WHERE IQ.empid IS NULL
OR IQ.ename != t1.ename
OR IQ.sal != t1.sal

PostgreSQL MAX and GROUP BY

I have a table with id, year and count.
I want to get the MAX(count) for each id and keep the year when it happens, so I make this query:
SELECT id, year, MAX(count)
FROM table
GROUP BY id;
Unfortunately, it gives me an error:
ERROR: column "table.year" must appear in the GROUP BY clause or be
used in an aggregate function
So I try:
SELECT id, year, MAX(count)
FROM table
GROUP BY id, year;
But then, it doesn't do MAX(count), it just shows the table as it is. I suppose because when grouping by year and id, it gets the max for the id of that specific year.
So, how can I write that query? I want to get the id´s MAX(count) and the year when that happens.
The shortest (and possibly fastest) query would be with DISTINCT ON, a PostgreSQL extension of the SQL standard DISTINCT clause:
SELECT DISTINCT ON (1)
id, count, year
FROM tbl
ORDER BY 1, 2 DESC, 3;
The numbers refer to ordinal positions in the SELECT list. You can spell out column names for clarity:
SELECT DISTINCT ON (id)
id, count, year
FROM tbl
ORDER BY id, count DESC, year;
The result is ordered by id etc. which may or may not be welcome. It's better than "undefined" in any case.
It also breaks ties (when multiple years share the same maximum count) in a well defined way: pick the earliest year. If you don't care, drop year from the ORDER BY. Or pick the latest year with year DESC.
For many rows per id, other query techniques are (much) faster. See:
Select first row in each GROUP BY group?
Optimize GROUP BY query to retrieve latest row per user
select *
from (
select id,
year,
thing,
max(thing) over (partition by id) as max_thing
from the_table
) t
where thing = max_thing
or:
select t1.id,
t1.year,
t1.thing
from the_table t1
where t1.thing = (select max(t2.thing)
from the_table t2
where t2.id = t1.id);
or
select t1.id,
t1.year,
t1.thing
from the_table t1
join (
select id, max(t2.thing) as max_thing
from the_table t2
group by id
) t on t.id = t1.id and t.max_thing = t1.thing
or (same as the previous with a different notation)
with max_stuff as (
select id, max(t2.thing) as max_thing
from the_table t2
group by id
)
select t1.id,
t1.year,
t1.thing
from the_table t1
join max_stuff t2
on t1.id = t2.id
and t1.thing = t2.max_thing