SQL script to find previous value, not necessarily previous row - sql

is there a way in SQL to find a previous value, not necessarily in the previous row, within the same SELECT statement?
See picture below. I'd like to add another column, ELAPSED, that calculates the time difference between TIMERSTART, but only when DEVICEID is the same, and I_TYPE is viewDisplayed. e.g. subtract 1 from 2, store difference in 3, store 0 in 4 because i_type is not viewDisplayed, subtract 2 from 5, store difference in 6, and so on.
It has to be a statement, I can't use a stored procedure in this case.
SELECT DEVICEID, I_TYPE, TIMERSTART,
O AS ELAPSED -- CASE WHEN <CONDITION> THEN TIMEDIFF() ELSE 0 END AS ELAPSED
FROM CLIENT_USAGE
ORDER BY TIMERSTART ASC
I'm using SAP HANA DB, but it works pretty much like the latest version of MS-SQL. So, if you know how to make it work in SQL, I can make it work in HANA.

You can make a subquery to find the last time entered previous to the row in question.
select deviceid, i_type, timerstart, (timerstart - timerlast) as elapsed.
from CLIENT_USAGE CU
join ( select top 1 timerstart as timerlast
from CLIENT_USAGE C
where (C.i_type = CU.i_type) and
(C.deviceid = CU.deviceid) and (C.timerstart < CU.timerstart)
order by C.timerstart desc
) as temp1
on temp1.i_type = CU.i_type
order by timerstart asc
This is a rough sketch of what the sql should look like I do not know what your primary key is on this table if it is i_type or i_type and deviceid. But this should help with how to atleast calculate the field. I do not think it would be necessary to store the value unless this table is very large or the hardware being used is very slow. It can be calculated rather easily each time this query is run.

SAP HANA supports window functions:
select DEVICEID,
TIMERSTART,
lag(TIMERSTART) over (partition by DEVICEID order by TIMERSTART) as previous_start
from CLIENT_USAGE
Then you can wrap this in parentheses and manipulate the data to your hearts' content

Related

Limit result rows for minimal time intervals for PostgreSQL

Background: I am running TeslaMate/Grafana for monitoring my car status, one of the gauges plots the battery level fetched from database. My server is located remotely and running in a Dock from an old NAS, so both query performance and network overhead matters.
I found the koisk page frequently hangs and by investigation, it might caused by the query -- two of the plots returns 10~100k rows of results from database. I want to limit the number of rows returned by SQL queries, as the plots certainly don't have that much precision for drawing such detailed intervals.
I tried to follow this answer and use row_number() to pop only 100-th rows of results, but more complicated issues turned up, that is, the time intervals among rows are not consistent.
The car has 4 status, driving / online / asleep / offline.
If the car is at driving status, the time interval could be less than 200ms as the car pushes the status whenever it has new data.
If the car is at online status, the time interval could be several minutes as the system actively fetches the status from the car.
Even worse, if the system thinks the car is going to sleep and need to stop fetching status (to avoid preventing the car to sleep), the interval could be 40 minutes maximum depend on settings.
If the car is in asleep/offline status, no data is recorded at all.
This obviously makes skipping every n-th rows a bad idea, as for case 2-4 above, lots of data points might missing so that Grafana cannot plot correct graph representing the battery level at satisfactory precision.
I wonder if there's any possible to skip the rows by time interval from a datetime field rather than row_number() without much overhead from the query? i.e., fetch every row with minimal 1000ms from the previous row.
E.g., I have following data in the table, I want the rows returned are row 1, 4 and 5.
row date
[1] 1610000001000
[2] 1610000001100
[3] 1610000001200
[4] 1610000002000
[5] 1610000005000
The current (problematic) method I am using is as follows:
SELECT $__time(t.date), t.battery_level AS "SOC [%]"
FROM (
SELECT date, battery_level, row_number() OVER(ORDER BY date ASC) AS row
FROM (
SELECT battery_level, date
FROM positions
WHERE car_id = $car_id AND $__timeFilter(date)
UNION ALL
SELECT battery_level, date
FROM charges c
JOIN charging_processes p ON p.id = c.charging_process_id
WHERE $__timeFilter(date) AND p.car_id = $car_id) AS data
ORDER BY date ASC) as t
WHERE t.row % 100 = 0;
This method clearly gives problem that only returns alternate rows instead of what I wanted (given the last row reads t.row % 2 = 0)
PS: please ignore the table structures and UNION from the sample code, I haven't dig deep enough to the tables which could be other tweaks but irrelevant to this question anyway.
Thanks in advance!
You can use a recursive CTE:
WITH RECURSIVE rec(cur_row, cur_date) AS (
(
SELECT row, date
FROM t
ORDER BY date
LIMIT 1
)
UNION ALL
(
SELECT row, date
FROM t
JOIN rec
ON t.date >= cur_date + 1000
ORDER BY t.date
LIMIT 1
)
)
SELECT *
FROM rec;
cur_row
cur_date
1
1610000001000
4
1610000002000
5
1610000005000
View on DB Fiddle
Using a function instead would probably be faster:
CREATE OR REPLACE FUNCTION f() RETURNS SETOF t AS
$$
DECLARE
row t%ROWTYPE;
cur_date BIGINT;
BEGIN
FOR row IN
SELECT *
FROM t
ORDER BY date
LOOP
IF row.date >= cur_date + 1000 OR cur_date IS NULL
THEN
cur_date := row.date;
RETURN NEXT row;
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT *
FROM f();
row
date
1
1610000001000
4
1610000002000
5
1610000005000

Executing a Aggregate function within a case without Group by

I am trying to assign a specific code to a client based on the number of gifts that they have given in the past 6 months using a CASE. I am unable to use WITH (screenshot) due to the limitations of the software that I am creating the query in. It only allows for select functions. I am unsure how to get a distinct count from another table (transaction data) and use that as parameters in the CASE I have currently built (based on my client information table). Does anyone know of any workarounds for this? I am unable to GROUP BY clientID at the end of my query because not all of my columns are aggregate, and I only need to GROUP BY clientID for this particular WHEN statement in the CASE. I have looked into the OVER() clause, but I am needing my date range that I am evaluating to be dynamic (counting transactions over the last six months), and the amount of rows that I would be including is variable, as the transaction count month to month varies. Also, the software that I am building this in does not recognize the PARTITIONED BY parameter of the over clause.
Any help would be great!
EDIT:
it is not letting me attach an image... -____- I have added the two sections of code that I am looking for assistance with!
WITH "6MonthGIftCount" (
"ConstituentID"
,"GiftCount"
)
AS (
SELECT COUNT(DISTINCT "GiftView"."GiftID" FROM "GiftView" WHERE MONTHS_BETWEEN("GiftView"."GiftDate", getdate()) <= 6 GROUP BY "GiftView"."ConstituentID")
SELECT...CASE
WHEN "6MonthGiftCount"."GiftCount" >= 4
THEN 'A010'
)
Perform your grouping/COUNT(1) in a subquery to obtain the total # of donations by ConstituentID, then JOIN this total into your main query that uses this new column to perform its CASE statement.
select
hist.*,
case when timesDonated > 5 then 'gracious donor'
when timesDonated > 3 then 'repeated donor'
when timesDonated >= 1 then 'donor'
else null end as donorCode
from gifthistory hist
left join ( /* your grouping subquery here, pretending to be a new table */
select
personID,
count(1) as timesDonated
from gifthistory i
WHERE abs(months_between(giftDate, sysdate)) <= 6
group by personid ) grp on hist.personid = grp.personID
order by 1;
*Naturally, syntax changes will vary by DB; you didn't specify which it was based on, but you should be able to use this template with whichever you utilize. This works in both Oracle and SQL Server after tweaking the month calculation appropriately.

sql row_number stay the same unless a criteria is met

First time post here, and I've done a bunch of searches to find this but don't know the terminology to search for to begin with. I have a table in SQL Server 2012 containing timesheet data with these columns: Name, ID, ENTEREDONDTM, EVENTDATE, STARTDTM, ENDDTM, STARTREASON, ENDREASON
I'm trying to do a row_number where the value in row_number stays the same unless StartReason = 'newShift' in which case I would like for it to increase by 1.
My end goal is to find a total shift length per shift and I know how to do those calculations based on startdtm and enddtm, but there is no current column with a shiftID for me to group by.
You can use Rank () windowed function, partitioned by StartReason and add +1 (to reserve the first).
Before use this value, you can use a case to compare the value.
Exemple: case StartReason when 'newShift' then 1 else Rank () over (Partition by StartReason ) +1

Get a Row if within certain time period of other row

I have a SQL statement that I am currently using to return a number of rows from a database:
SELECT
as1.AssetTagID, as1.TagID, as1.CategoryID,
as1.Description, as1.HomeLocationID, as1.ParentAssetTagID
FROM Assets AS as1
INNER JOIN AssetsReads AS ar ON as1.AssetTagID = ar.AssetTagID
WHERE
(ar.ReadPointLocationID='Readpoint1' OR ar.ReadPointLocationID='Readpoint2')
AND (ar.DateScanned between 'LastScan' AND 'Now')
AND as1.TagID!='000000000000000000000000'
I am wanting to do a query that will get the row with the oldest DateScanned from this query and also get another row from the database if there was one that was within a certain period of time from this row (say 5 seconds for an example). The oldest record would be relatively simple by selecting the first record in a descending sort, but how would I also get the second record if it was within a certain time period of the first?
I know I could do this process with multiple queries, but is there any way to combine this process into one query?
The database that I am using is SQL Server 2008 R2.
Also please note that the DateScanned times are just placeholders and I am taking care of that in the application that will be using this query.
Here is a fairly general way to approach it. Get the oldest scan date using min() as a window function, then use date arithmetic to get any rows you want:
select t.* -- or whatever fields you want
from (SELECT as1.AssetTagID, as1.TagID, as1.CategoryID,
as1.Description, as1.HomeLocationID, as1.ParentAssetTagID,
min(DateScanned) over () as minDateScanned, DateScanned
FROM Assets AS as1
INNER JOIN AssetsReads AS ar ON as1.AssetTagID = ar.AssetTagID
WHERE (ar.ReadPointLocationID='Readpoint1' OR ar.ReadPointLocationID='Readpoint2')
AND (ar.DateScanned between 'LastScan' AND 'Now')
AND as1.TagID!='000000000000000000000000'
) t
where datediff(second, minDateScanned, DateScanned) <= 5;
I am not really sure of sql server syntax, but you can do something like this
SELECT * FROM (
SELECT
TOP 2
as1.AssetTagID,
as1.TagID,
as1.CategoryID,
as1.Description,
as1.HomeLocationID,
as1.ParentAssetTagID ,
ar.DateScanned,
LAG(ar.DateScanned) OVER (order by ar.DateScanned desc) AS lagging
FROM
Assets AS as1
INNER JOIN AssetsReads AS ar
ON as1.AssetTagID = ar.AssetTagID
WHERE (ar.ReadPointLocationID='Readpoint1' OR ar.ReadPointLocationID='Readpoint2')
AND (ar.DateScanned between 'LastScan' AND 'Now')
AND as1.TagID!='000000000000000000000000'
ORDER BY
ar.DateScanned DESC
)
WHERE
lagging IS NULL or DateScanned - lagging < '5 SECONDS'
I have tried to sort the results by DateScanned desc and then just the top most 2 rows. I have then used the lag() function on DateScanned field, to get the DateScanned value for the previous row. For the topmost row the DateScanned shall be null as its the first record, but for the second one it shall be value of the first row. You can then compare both of these values to determine whether you wish to display the second row or not
more info on the lagging function: http://blog.sqlauthority.com/2011/11/15/sql-server-introduction-to-lead-and-lag-analytic-functions-introduced-in-sql-server-2012/

Teradata - Max value of a dataset with corresponding date

This is probably obvious, I just can't seem to get it to work right. Let's say I have a table of various servers and their CPU percentages for every day for the past year. I want to basically say:
"for every server name, show me the max CPU value that this server hit (from this dataset) and the corresponding date that it happened on"
So ideally I would get a result like:
server1 52.34% 3/16/2012
server2 48.76% 4/15/2012
server3 98.32% 6/16/2012
etc..
When I try to do this like so, I can't use a group by or else it just shows me every date:
select servername, date, max(cpu) from cpu_values group by 1,2 order by 1,2;
This of course just gives me every server and every date.. Sub-query? Partition by? Any assistance would be appreciated!
You can use the row_number() OLAP window function:
select servername
, cpu
, date
from cpu_values
qualify row_number() over (partition by servername
order by cpu desc) = 1
Notice that you do not need a GROUP BY or ORDER BY clause. The PARTITION clause is similar to a GROUP BY and the ORDER BY clause sorts the rows within each partition (in this case by descending cpu). The "=1" part selects the single row that satisfies the condition.
A subquery would be the simplest solution:
SELECT
S.Name, Peak.PeakUsage, MIN(S.Date) AS Date
FROM
ServerHistory AS S
INNER JOIN
(
SELECT
ID, MAX(CPUUsage) AS PeakUsage
FROM
ServerHistory
WHERE
Date BETWEEN X AND Y
GROUP BY
ID
) AS Peak ON S.ID = Peak.ID
GROUP BY
S.Name, Peak.PeakUsage
P.S., next time around, you may want to tag with "SQL". There are relatively few Teradata people out there, but plenty who can help with basic SQL questions.