Increment value colum by previous row in select sql statement - sql

i have to find a way to solve this issue... in a table like that, i would see my column "C" increment his value on each rows, starting from a costant, adding value in column "B" and adding value by the previous value in the same column "C".
Furthermore ... Grouping by User.
For example: (starting point Phil: 350, starting point Mark: 100)
USER - POINT - INITIALPOINT
Phil - 1000 - 1350
Phil - 150 - 1500
Phil - 200 - 1700
Mark - 300 - 400
Mark - 250 - 650
How can i do that?

Using windowing. The table declaration is SQL Server but the rest is standard SQL if your RDBMS supports it (SQL Server 2012, PostgreSQL 9.1 etc)
DECLARE #t TABLE (ID int IDENTITY(1,1), UserName varchar(100), Point int);
INSERT #t (UserName, Point)
VALUES
('Phil', 1000),
('Phil', 150),
('Phil', 200),
('Mark', 300),
('Mark', 250);
DECLARE #n TABLE (UserName varchar(100), StartPoint int);
INSERT #n (UserName, StartPoint)
VALUES
('Phil', 350),
('Mark', 100);
SELECT
T.ID, T.UserName, T.Point,
N.StartPoint + SUM(Point) OVER(PARTITION BY T.UserName ORDER BY T.ID ROWS UNBOUNDED PRECEDING)
FROM
#n N
JOIN
#t T ON N.UserName = T.UserName
ORDER BY
T.ID;
To do this, you need an order to the table (I used ID) and a better way of doing a starting value (I used a separate table)

SQL Server 2008 doesn't support cumulative sums directly using window functions. You can use a correlated subquery for the same effect.
So, using the same structure as GBN:
DECLARE #t TABLE (ID int IDENTITY(1,1), UserName varchar(100), Point int);
INSERT #t (UserName, Point)
VALUES
('Phil', 1000),
('Phil', 150),
('Phil', 200),
('Mark', 300),
('Mark', 250);
DECLARE #n TABLE (UserName varchar(100), StartPoint int);
INSERT #n (UserName, StartPoint)
VALUES
('Phil', 350),
('Mark', 100);
SELECT
T.ID, T.UserName, T.Point,
(N.StartPoint +
(select SUM(Point) from #t t2 where t2.UserName = t.userName and t2.ID <= t.id)
)
FROM
#n N
JOIN
#t T ON N.UserName = T.UserName
ORDER BY
T.ID;

You didn't specify your DBMS, so this is ANSI SQL:
select "user",
point,
case
when "user" = 'Phil' then 350
else 100
end + sum(point) over (partition by "user" order by some_date_column) as sum
from the_table
where "user" in ('Mark', 'Phil')
order by "user", some_date_column;
You need some column to sort the rows by, otherwise the "running sum" will be meaningliss as rows in a table are not sorted (there is no such thing as "the first row" in a relational table). That's the some_date_column is for in my example. It could be an increasing primary key or something else as long as it defines a proper ordering of the rows.

Related

How can I delete trailing contiguous records in a partition with a particular value?

I'm using the latest version of SQL Server and have the following problem. Given the table below, the requirement, quite simply, is to delete "trailing" records in each _category partition that have _value = 0. Trailing in this context means, when the records are placed in _date order, any series or contiguous block of records with _value = 0 at the end of the list should be deleted. Records with _value = 0 that have subsequent records in the partition with some non-zero value should stay.
create table #x (_id int identity, _category int, _date date, _value int)
insert into #x values (1, '2022-10-01', 12)
insert into #x values (1, '2022-10-03', 0)
insert into #x values (1, '2022-10-04', 10)
insert into #x values (1, '2022-10-06', 11)
insert into #x values (1, '2022-10-07', 10)
insert into #x values (2, '2022-10-01', 1)
insert into #x values (2, '2022-10-02', 0)
insert into #x values (2, '2022-10-05', 19)
insert into #x values (2, '2022-10-10', 18)
insert into #x values (2, '2022-10-12', 0)
insert into #x values (2, '2022-10-13', 0)
insert into #x values (2, '2022-10-15', 0)
insert into #x values (3, '2022-10-02', 10)
insert into #x values (3, '2022-10-03', 0)
insert into #x values (3, '2022-10-05', 0)
insert into #x values (3, '2022-10-06', 12)
insert into #x values (3, '2022-10-08', 0)
I see a few ways to do it. The brute force way is to to run the records through a cursor in date order, and grab the ID of any record where _value = 0 and see if it holds until the category changes. I'm trying to avoid T-SQL though if I can do it in a query.
To that end, I thought I could apply some gaps and islands trickery and do something with window functions. I feel like there might be a way to leverage last_value() for this, but so far I only see it useful in identifying partitions that have the criteria, not so much in helping me get the ID's of the records to delete.
The desired result is the deletion of records 10, 11, 12 and 17.
Appreciate any help.
I'm not sure that your requirement requires a gaps and islands approach. Simple exists logic should work.
SELECT _id, _catrgory, _date, _value
FROM #x x1
WHERE _value <> 0 OR
EXISTS (
SELECT 1
FROM #x x2
WHERE x2._category = x1._category AND
x2._date > x1._date AND
x2._value <> 0
);
Assuming that all _values are greater than or equal to 0 you can use MAX() window function in an updatable CTE:
WITH cte AS (
SELECT *,
MAX(_value) OVER (
PARTITION BY _category
ORDER BY _date
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) max
FROM #x
)
DELETE FROM cte
WHERE max = 0;
If there are negative _values use MAX(ABS(_value)) instead of MAX(_value).
See the demo.
Using common table expressions, you can use:
WITH CTE_NumberedRows AS (
SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY _category ORDER BY _date)
FROM #x
),
CTE_Keepers AS (
SELECT _category, rnLastKeeper = MAX(rn)
FROM CTE_NumberedRows
WHERE _value <> 0
GROUP BY _category
)
DELETE NR
FROM CTE_NumberedRows NR
LEFT JOIN CTE_Keepers K
ON K._category = NR._category
WHERE NR.rn > ISNULL(K.rnLastKeeper, 0)
See this db<>fiddle for a working demo.
EDIT: My original post did not handle the all-zero's edge case. This has been corrected above, together with some naming tweaks. (The original can still be found here.
Tim Biegeleisen's post may be the simpler approach.

Insert grouped data

I am getting expected results from my query, I am using group by to group the data on the basis of different Ids.
The problem I am facing is that I have to insert this grouped data in the table called gstl_calculated_daily_fee, but when I pass the grouped result to variables called #total_mada_local_switch_high_value and #mada_range_id and insert them in the table then I get only the last result of the query in the table.
Sample result:
Fee range_id
1.23 1
1.22 2
2.33 3
I get only 2.33 and 1 after I insert but I have to insert the whole result in to the table.
Please suggest how can I insert the whole query result into the table. Below is the query:
DECLARE #total_mada_local_switch_high_value decimal(32,4) = 0.00;
DECLARE #mada_range_id int = 0;
select
#total_mada_local_switch_high_value = SUM(C.settlement_fees),
#mada_range_id = C.range_id
From
(
select
*
from
(
select
rowNumber = #previous_mada_switch_fee_volume_based_count + (ROW_NUMBER() OVER(PARTITION BY DATEPART(MONTH, x_datetime) ORDER BY x_datetime)),
tt.x_datetime
from gstl_trans_temp tt where (message_type_mapping = 0220) and card_type ='GEIDP1' and response_code IN(00,10,11) and tran_amount_req >= 5000
) A
CROSS APPLY
(
select
rtt.settlement_fees,
rtt.range_id
From gstl_mada_local_switch_fee_volume_based rtt
where A.rowNumber >= rtt.range_start
AND (A.rowNumber <= rtt.range_end OR rtt.range_end IS NULL)
) B
) C
group by CAST(C.x_datetime AS DATE),C.range_id
-- Insert Daily Volume
INSERT INTO
gstl_calculated_daily_fee(business_date,fee_type,fee_total,range_id)
VALUES
(#tlf_business_date,'MADA_SWITCH_FEE_LOCAL_CARD', #total_mada_local_switch_high_value, #mada_range_id)
I see no need for variables here. You can insert the aggregated results directly.
Sample data
create table Data
(
Range int,
Fee money
);
insert into Data (Range, Fee) values
(1, 1.00),
(1, 0.50),
(2, 3.00),
(3, 0.25),
(3, 0.50);
create table DataSum
(
Range int,
FeeSum money
);
Solution
insert into DataSum (Range, FeeSum)
select d.Range, sum(d.Fee)
from Data d
group by d.Range;
Fiddle to see things in action.

Sql Server While Loop with Changing Condition

I have a User Table in my database that contains two fields
user_id
manager_id
I am trying to construct a query to list all of the manager_ids that are associated with a user_id in a hierarchical structure.
So if i give a user_id, i will get that users manager, followed by that persons manager all the way to the very top.
So far i have tried but it doesnt give what i need:
WITH cte(user_id, manager_id) as (
SELECT user_id, manager_id
FROM user
WHERE manager_id=#userid
UNION ALL
SELECT u.user_id, u.manager_id,
FROM user u
INNER JOIN cte c on e.manager_id = c.employee_id
)
INSERT INTO #tbl (manager_id)
select user_id, manager_id from cte;
If anyone can point me in the right direction that would be great.
I thought about a While loop but this may not be very efficient and im not too sure how to implement that.
OP asked for a while loop, and while (ha, pun) this may not be the best way... Ask and you shall receive. (:
Here is sample data I created (in the future, please provide this):
CREATE TABLE #temp (userID int, managerID int)
INSERT INTO #temp VALUES (1, 3)
INSERT INTO #temp VALUES (2, 3)
INSERT INTO #temp VALUES (3, 7)
INSERT INTO #temp VALUES (4, 6)
INSERT INTO #temp VALUES (5, 7)
INSERT INTO #temp VALUES (6, 9)
INSERT INTO #temp VALUES (7, 10)
INSERT INTO #temp VALUES (8, 10)
INSERT INTO #temp VALUES (9, 10)
INSERT INTO #temp VALUES (10, 12)
INSERT INTO #temp VALUES (11, 12)
INSERT INTO #temp VALUES (12, NULL)
While Loop:
CREATE TABLE #results (userID INT, managerID INT)
DECLARE #currentUser INT = 1 -- Would be your parameter!
DECLARE #maxUser INT
DECLARE #userManager INT
SELECT #maxUser = MAX(userID) FROM #temp
WHILE #currentUser <= #maxUser
BEGIN
SELECT #userManager = managerID FROM #temp WHERE userID = #currentUser
INSERT INTO #results VALUES (#currentUser, #userManager)
SET #currentUser = #userManager
END
SELECT * FROM #results
DROP TABLE #temp
DROP TABLE #results
Get rid of this column list in your CTE declaration that has nothing to do with the columns you are actually selecting in the CTE:
WITH cte(employee_id, name, reports_to_emp_no, job_number) as (
Just make it this:
WITH cte as (
I recommend recursive solution:
WITH Parent AS
(
SELECT * FROM user WHERE user_id=#userId
UNION ALL
SELECT T.* FROM user T
JOIN Parent P ON P.manager_id=T.user_id
)
SELECT * FROM Parent
To see demo, run following:
SELECT * INTO #t FROM (VALUES (1,NULL),(2,1),(3,2),(4,1)) T(user_id,manager_id);
DECLARE #userId int = 3;
WITH Parent AS
(
SELECT * FROM #t WHERE user_id=#userId
UNION ALL
SELECT T.* FROM #t T
JOIN Parent P ON P.manager_id=T.user_id
)
SELECT * FROM Parent

using row_number to return specific rows of query

I am using SQL Server 2012 & MATLAB. I have a table of 5 columns (1 char, 1 datetime and 3 floats). I have a simple query shown below that returns the data from this table which contains over a million records - this however causes an out of memory error in MATLAB.
simple query
select id_co, date_r, FAM_Score, FAM_A_Score, FAM_Score
from GSI_Scores
where id_co <> 'NULL'
order by id_co, date_rating
So I was looking to breakdown the query select the data in batches of 250,000 records. I have just come across the ROW_NUMBER function which I added to my query, please see below. This numbers all the records for me. However I am having trouble selecting say records between 250,000 and 500,000. How do I do this?
updated query
select id_co, date_r, FAM_Score, FAM_A_Score, FAM_Score, row_number() over (order by id_co) as num_co
from GSI_Scores
where id_co <> 'NULL' and num_sedol between 250000 and 500000
order by id_co, date_rating
Simply use a sub query or Common Table Expression (CTE).
;WITH CTE AS
(
--Your query
)
SELECT * FROM CTE
WHERE num_co BETWEEN 250000 AND 500000
Just an sample example
declare #t table (ID INT)
insert into #t (id)values (1)
insert into #t (id)values (2)
insert into #t (id)values (3)
insert into #t (id)values (4)
;WITH CTE AS
(
select *,COUNT(ID)OVER (PARTITION BY ID ) RN from #t
)
Select ID from CTE C WHERE C.ID BETWEEN 2 AND 4
ORDER BY RN
OR
;WITH CTE (select id_co,
date_r,
FAM_Score,
FAM_A_Score,
FAM_Score,
COUNT(id_co) over (PARTITION BY ID DESC) as num_co
from GSI_Scores
where id_co <> 'NULL')
Select C.d_co,
C.date_r,
C.FAM_Score,
C.FAM_A_Score,
C.FAM_Score FROM CTE C
WHERE C.id_co between 250000 and 500000
order by C.id_co, C.date_rating
You could try using the OFFSET x ROWS FETCH NEXT y ROWS ONLY commands like this:
CREATE TABLE TempTable (
TempID INT IDENTITY(1,1) NOT NULL,
SomeDescription VARCHAR(255) NOT NULL,
PRIMARY KEY(TempID))
INSERT INTO TempTable (SomeDescription)
VALUES ('Description 1'),
('Description 2'),
('Description 3'),
('Description 4'),
('Description 5'),
('Description 6'),
('Description 7'),
('Description 8'),
('Description 9'),
('Description 10')
SELECT * FROM TempTable ORDER BY TempID OFFSET 3 ROWS FETCH NEXT 2 ROWS ONLY;

Using a custom aggregate function in a GROUP BY?

I have a simple MEDIAN calculation function:
IF OBJECT_ID(N'COMPUTEMEDIAN', N'FN') IS NOT NULL
DROP FUNCTION dbo.COMPUTEMEDIAN;
GO
CREATE FUNCTION dbo.COMPUTEMEDIAN(#VALUES NVARCHAR(MAX))
RETURNS DECIMAL
WITH EXECUTE AS CALLER
AS
BEGIN
DECLARE #SQL NVARCHAR(MAX)
DECLARE #MEDIAN DECIMAL
SET #MEDIAN = 0.0;
DECLARE #MEDIAN_TEMP TABLE (RawValue DECIMAL);
-- This is the Killer!
INSERT INTO #MEDIAN_TEMP
SELECT s FROM master.dbo.Split(',', #VALUES) OPTION(MAXRECURSION 0)
SELECT #MEDIAN =
(
(SELECT MAX(RawValue) FROM
(SELECT TOP 50 PERCENT RawValue FROM #MEDIAN_TEMP ORDER BY RawValue) AS BottomHalf)
+
(SELECT MIN(RawValue) FROM
(SELECT TOP 50 PERCENT RawValue FROM #MEDIAN_TEMP ORDER BY RawValue DESC) AS TopHalf)
) / 2
--PRINT #SQL
RETURN #MEDIAN;
END;
GO
However, my table is of the following form:
CREATE TABLE #TEMP (GroupName VARCHAR(MAX), Value DECIMAL)
INSERT INTO #TEMP VALUES ('A', 1.0)
INSERT INTO #TEMP VALUES ('A', 2.0)
INSERT INTO #TEMP VALUES ('A', 3.0)
INSERT INTO #TEMP VALUES ('A', 4.0)
INSERT INTO #TEMP VALUES ('B', 10.0)
INSERT INTO #TEMP VALUES ('B', 11.0)
INSERT INTO #TEMP VALUES ('B', 12.0)
SELECT * FROM #TEMP
DROP TABLE #TEMP
What is the best way to invoke the MEDIAN function on this table using a GROUP BY on the id column? So, I am looking for something like this:
SELECT id, COMPUTEMEDIAN(Values)
FROM #TEMP
GROUP BY id
My current approach involves using XMLPATH to combine all values resulting from a GROUP BY operation into a large string and then passing it to the function but this involves the String splitting operation and for large strings this just slows down everything. Any suggestions?
Since you're using SQL Server 2008, I would suggest writing the aggregate function as a CLR function.
http://msdn.microsoft.com/en-us/library/91e6taax(v=vs.80).aspx
Also, people have asked this question before. Perhaps their answers would be helpful
Function to Calculate Median in Sql Server
EDIT: I can confirm this works very very well on a large database (30,000 values)
Hmm... Just came across this so the following works perfectly fine but am not sure how expensive it can turn out to be:
SELECT
GroupName,
AVG(Value)
FROM
(
SELECT
GroupName,
cast(Value as decimal(5,2)) Value,
ROW_NUMBER() OVER (
PARTITION BY GroupName
ORDER BY Value ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY GroupName
ORDER BY Value DESC) AS RowDesc
FROM #TEMP SOH
) x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY GroupName
ORDER BY GroupName;
No need to use a user defined function! Here's how I would do it:
CREATE TABLE #TEMP (id VARCHAR(MAX), Value DECIMAL)
INSERT INTO #TEMP VALUES('A', 1.0)
INSERT INTO #TEMP VALUES('A', 2.0)
INSERT INTO #TEMP VALUES('A', 3.0)
INSERT INTO #TEMP VALUES('A', 4.0)
INSERT INTO #TEMP VALUES('B', 10.0)
INSERT INTO #TEMP VALUES('B', 11.0)
INSERT INTO #TEMP VALUES('B', 12.0)
SELECT
(SELECT TOP 1 Value
FROM (SELECT TOP(calcs.medianIndex) Value
FROM #temp
WHERE #temp.ID = calcs.ID ORDER BY Value ASC) AS subSet
ORDER BY subSet.Value DESC), ID
FROM
(SELECT
CASE WHEN count(*) % 2 = 1 THEN count(*)/2 + 1
ELSE count(*)/2
END AS medianIndex,
ID
FROM #TEMP
GROUP BY ID) AS calcs
DROP TABLE #TEMP
Might want to double check the behavior when there is an even number of records.
EDIT: After reviewing your work in your Median function, I realize that my answer basically just moved your work out of the function and into your regular query. So... why does your median calculation have to be inside of the user-defined function? It seems alot
more difficult that way.