Row by Row processing in SQL - sql

I am using Hive SQL server. In my database, I am trying to remove records which have less than 7 days of the gap with the previous record but when removing the record I want to check the gap with a "previous retained record", not with any previous record.
I want to retain all the record marked as 1 specifically Rec # 7 Although the gap of 7th record is <7, since the previous record is being removed the gap of 7th with 5th becomes 8.

You can use a cumulative max:
select t.*
from (select t.*,
max(case when retained = 1 then intdate end) over (order by intdate rows between unbounded preceding and 1 preceding) as prev_intdate
from t
) t
where prev_intdate is null or
prev_intdate > dateadd(intdate, 7);

Related

Identifying if a column is in descending order

I am using Microsoft SQL Server 2005 Management Studio. I am a bit new so I hope I am not breaking any rules. My data is 15 columns and almost a million rows, however I am just giving you a sample to get assistance on one area where I am stuck.
In the above example as you can see the column 'lastlevel' values are decreasing. Also you can see that data in the 'Last_read' column date range is from today to 14 days prior (it was ran yesterday hence April 27, also pls. disregard that for 1st customer date 2021/04/14 is missing, it is an anomaly).
Column 'Shipto' provides the customer number and each customer has max 14 rows of data.
Please disregard column 'current_reading' and rn
If look at 'lastlevel' again you will notice that the values are going down consistently, however on April 18th, it goes from 0.73 to 0.74, showing an increase of 0.01.
What I want to do is that whenever there is an increase at all, I want that whole customer's all 14 rows be removed from the output i.e. I only want to see customers that have the prefect descending data and no increases.
Can you help?
WITH
deltas AS
(
-- For each [Shipto]; deduct the preceding row's value and record it as the [delta]
-- Note, each [Shipto]'s first row's delta with therefor be NULL
SELECT
*,
lastlevel - LAG(lastlevel) OVER (PARTITION BY Shipto ORDER BY Last_Read, lastlevel DESC) AS delta
FROM
yourTable
),
max_deltas AS
(
-- Get the maximum of the deltas per [Shipto]
SELECT
*,
MAX(delta) OVER (PARTITION BY Shipto) AS max_delta
FROM
deltas
)
-- Return only rows where the delta never exceeds 0 (thus, never ascending over any timestep)
SELECT
*
FROM
max_deltas
WHERE
max_delta <= 0
I've ordered by Last_Read, lastlevel DESC such that if two readings are on the same date, it is assumed that the highest value should be considered to have happened first.

Comparing time difference for every other row

I'm trying to determine the length of time in days between using the AR_Event_Creation_Date_Time for every other row. For example, the number of days between the 1 and 2 row, 3rd and 4th, 5th and 6th etc. In other words, there will be a number of days value for every even row and NULL for every odd row. My code below works if there are only two rows per borrower number but falls down when there are more than two. In the results, notice the change in 1002092539
SELECT Borrower_Number,
Workgroup_Name,
FORMAT(AR_Event_Creation_Date_Time,'d','en-us') AS Tag_Date,
Usr_Usrnm,
DATEDIFF(day, LAG(AR_Event_Creation_Date_Time,1) OVER(PARTITION BY
Borrower_Number Order By Borrower_Number), AR_Event_Creation_Date_Time) Diff
FROM Control_Mail
You need to add in a row number. Also your partition by is non-deterministic:
SELECT Borrower_Number,
Workgroup_Name,
FORMAT(AR_Event_Creation_Date_Time,'d','en-us') AS Tag_Date,
Usr_Usrnm,
DATEDIFF(day, LAG(AR_Event_Creation_Date_Time,1) OVER(PARTITION BY Borrower_Number, (rn - 1) / 2 ORDER BY AR_Event_Creation_Date_Time),
AR_Event_Creation_Date_Time) Diff
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Borrower_Number ORDER BY AR_Event_Creation_Date_Time) AS rn
FROM Control_Mail
) C
```

SQL Server need the total of the previous 6 rows

I'm using SQL Server and I need to get the sum of the previous 6 rows of my table and place the results in its own column.
I'm able to get the 6th row back with the below query:
SELECT id
,FileSize
,LAG(FileSize,6) OVER (ORDER BY DAY(CompleteTime)) previous
FROM Jobs_analytics
group by id, CompleteTime, Jobs_analytics.FileSize
which gives me the six row back, but what I need is the sum of all six rows previous.
any help would be appreciate
Mike
You can use:
SELECT ja.id, ja.FileSize, CompleteTime,
SUM(FileSize) OVER (ORDER CompleteTime ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) as previous
FROM Jobs_analytics ja;
I don't see why GROUP BY is necessary. There are no aggregation functions.
Note that this takes 6 days including the current day. If you want the six preceding rows:
SELECT ja.id, ja.FileSize, DATE,
SUM(FileSize) OVER (ORDER BY CompleteTime ja.id ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING) as previous
FROM Jobs_analytics ja

Need to Update based on ID and Date

I have the following SQL statement, which I think should update 1 field, using some pretty simple standard deviation logic, and based on ID and Date. I think the ID and Date has to be included to get everything aligned right. So, here is the code that I'm testing.
UPDATE Price_Test2
SET Vol30Days = STDEV(PX_BID) OVER (ORDER BY ID_CUSIP, AsOfDate ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) FROM Price_Test2
WHERE ID_CUSIP in (SELECT DISTINCT ID_CUSIP FROM Price_Test2)
It seems like it should work fine, but something is off because I'm getting an error that says: Cannot use both a From clause and a subquery in the where clause or in the data values list in an Update statement.
I am using SQL Server 2019.
You are using aggregation functions in an update. What you want is an updatable subquery (or CTE):
UPDATE p
SET Vol30Days = new_Vol30Days,
Vol60Days = new_Vol60Days,
Vol90Days = new_Vol90Days
FROM (SELECT p.*,
STDEV(PX_BID) OVER (ORDER BY Date ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) as new_Vol30day,
STDEV(PX_BID) OVER (ORDER BY Date ROWS BETWEEN 60 PRECEDING AND CURRENT ROW) as new_Vol60day,
STDEV(PX_BID) OVER (ORDER BY Date ROWS BETWEEN 90 PRECEDING AND CURRENT ROW) as new_Vol60day
FROM prices p
) p;

Oracle Running Total

Looking for advice with 2 different types of sub-totals using PLSQL.
I need to pull a data set with 1) a unique headcount, and 2) a total number of credits, as a running total over time.
Raw Data:
This is the transactional data -- every time a student registers or a course, a record is inserted with the date, student id, and credits (along with course number and a bunch of other relevant data). One record per course per student.
STUDENT_ID CREDITS DATE
1 3 01-JAN-12
1 2 02-JAN-12
57 1 03-JAN-12
1 1 03-JAN-12
Processed Data:
This is what the boss needs to see -- it will be used for trending later (to see, for example, how this year's Jan-01 is measuring up against last year's Jan-01, etc.).
UniqueHeadcount SumCredits Date
1 3 01-JAN-12
1 5 02-JAN-12
2 7 03-JAN-12
The brute approach to this is to write a bunch of separate SELECTS (one for each day), and UNION them together. For example:
SELECT
COUNT(DISTINCT STUDENT_ID) as "UniqueHeadcount",
SUM(CREDIT_HR) as "SumCredits",
'01-JAN-12' as "DATE"
FROM
REGISTRATIONS
WHERE
TO_CHAR(DATE,'yyyymmdd') <= '20120101'
GROUP BY
'01-JAN-12'
UNION
SELECT
COUNT(DISTINCT STUDENT_ID) as "UniqueHeadcount",
SUM(CREDIT_HR) as "SumCredits",
'02-JAN-12' as "DATE"
FROM
REGISTRATIONS
WHERE
TO_CHAR(DATE,'yyyymmdd') <= '20120102'
GROUP BY
'02-JAN-12'
UNION
...
And that works -- the results are accurate -- but as you can see -- this is nowhere near elegant -- and if you have to do it for 365 days, well...it's a beast. There's got to be a better way to do it.
So far in my search, I've learned about an 'OVER' clause that I can use -- like this:
SELECT
COUNT(DISTINCT STUDENT_ID) OVER(ORDER BY TRUNC(RSTS_DATE) ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) "UniqueHeadcount",
SUM(CREDIT_HR) OVER(ORDER BY TRUNC(RSTS_DATE) ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as "SumCredits",
TRUNC(RSTS_DATE) as "DATE"
FROM
REGISTRATIONS
This query is way, way shorter (yay) -- but has two significant problems that I can't yet find my way around. First is that it doesn't work (by design, aparently?) with the COUNT DISTINCT. So I comment that out for a moment, but then run into the second problem: it ignores the TRUNC() function. The RSTS_DATE, though it appears to be just a day/month/year value when you run a SELECT on it, actually holds the time as well, so the result set I get is not summed simply over date, but also over times -- so instead of one record per day, my processed data returns hundreds of records per day (one for each individual course registration). For example:
UniqueHeadcount SumCredits Date
1 3 01-JAN-12
1 5 02-JAN-12
2 6 03-JAN-12 (hidden time: 07:32:27)
2 7 03-JAN-12 (hidden time: 08:01:33)
Not what I'm after.
So I'm looking for expertise -- if what I've explained so far makes sense -- is there another way to use the OVER clause, or perhaps there may be another feature of PLSQL altogether I should be using for this? I'm not strong in PLSQL if you can't tell, but if anyone can give me some direction -- even just words to google, I'd appreciate the help.
Thanks
Try this:
WITH CRdata AS
(
SELECT COUNT(DISTINCT STUDENT_ID) AS UniqueHeadcount,
SUM(CREDIT_HR) AS SumCredits,
TRUNC(RSTS_DATE) RSTS_DATE
FROM REGISTRATIONS
GROUP BY TRUNC(RSTS_DATE)
)
SELECT SUM(UniqueHeadcount) OVER(ORDER BY RSTS_DATE ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS UniqueHeadcount,
SUM(SumCredits) OVER(ORDER BY RSTS_DATE ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS SumCredits,
RSTS_DATE
FROM CRdata