PowerBI Get Previous row value according to filters - powerpivot

I have a table with different objects and the objects evolve over time. One object is identified by object_number and we can track it with the object_line_number. And every evolution of the object has a status.
I want to calculate the time elapsed between some status.
Below is my table for one object_number "us1":
In yellow are the rowscontaining the starting date. They are found if (status_id = 0 and (old_status <> 0 or object_line_number = 1) and emergency_level = 1).
In green are the rows containing the ending date. They are found if (status_id =2,3,4,5 and old_status = 0).
The column old_status does not exist in the table. This is the status of the previous row (according to the object)line_number). I am retrieving it thanks to the following measure:
old_status = CALCULATE (
MAX(fact_object[status_id]),
FILTER (
ALL(fact_object),
fact_object[object_line_number] = IF(fact_object[object_line_number]=1, fact_object[object_line_number], MAX (fact_object[object_line_number])-1)),
VALUES (fact_object[object_number]))
I am in DirectQuery mode, so a lot of functions are not present for Calculated Columns, that's why I am using Measures.
Once that is done, I want then to be able to get for every green row the date_modification of the previous yellow row.
In this example, the result would be 4/4 then 1. So that I can calculate the time difference between the date_modification of the current green row and the date_modification of the previous yellow row.
So I was thinking of adding a new column named date_received, which is the date_modification of the previous yellow row;
From there, I just have to keep only the green rows and calculate the difference between date_modification and date_received.
My final calcul is actually to have this :
Result = (number of green rows which date difference between date_modification and date_received <= 15 min) / (number of green rows
which DAY(date_modification) = DAY(date_received))
But I don't know how to do it.
I have tried in the same spirit of the old_status measure to do this:
date_received = CALCULATE (
MAX(fact_object[date_modification]),
FILTER (
ALL(fact_object),
(fact_object[object_line_number] = MAX (fact_object[object_line_number])-1) && MY OTHER FILTERS
),
VALUES (fact_object[object_number])
)
But didn't succeed.
In SQL, the equivalent would be like this:
SELECT
SUM(CASE WHEN (DATEDIFF(MINUTE, T.date_received, T.date_planification) <= 15) THEN 1 ELSE 0 END) /
SUM(CASE WHEN (DAY(T.date_received) = DAY(T.date_planification)) THEN 1 ELSE 0 END) as result
FROM (
SELECT *, T.status_id as current_status,
LAG(T.date_modification) OVER(PARTITION BY T.object_number ORDER BY T.object_line_number) as date_received,
T.date_modification as date_planification
FROM
(
select *,
LAG (status_id) OVER(PARTITION BY object_number ORDER BY object_line_number) AS old_status
from dbo.fact_object
) AS T
WHERE ((T.status_id = 0 AND (T.old_status <> 0 OR T.object_line_number = 1) AND T.emergency_level = 1) OR (T.old_status = 0 AND T.status_id IN (2,3,4,5)))--974
) AS T
WHERE old_status = 0
(Well maybe there is a better way to do it in SQL that I've done).
How can I achieve this?

In this case, I would first sort the table by Object then by Object Line number (or Date Modified - but these appear not to change for each row.)
Duplicate the table
Table1 - Add an index starting at 1
Table2 - Add an index starting at 0
From Table1, merge it with table2, using the new index fields
Expand the MergedStatus ID (rename to NextStatusID) and the Object (rename to NextObject) field.
You can now create a new conditional column to compare the two status fields. E.g. Status 2 = If Object = NextObject then [NextStatusID] else null

Related

Why I Can't show only row who has max value in Sql Server?

This is my query :
WITH TABLE as (
SELECT distinct STJH.[META_CODE_ORIGINE] AS Salle_TMD_JOUR_HISTORIQUE_ID
,STJH.[STJH_DATE_ACTION]
,STJH.[STJH_TYPE_ACTION]
, STH.ST_ID,
STJH.[STJ_JOUR]
,STJH.[STJ_HEURE_DEBUT]
,STJH.[STJ_HEURE_FIN]
,STH.[META_CODE_ORIGINE] AS SALLE_TMD_HISTORIQUE_ID
,STH.[ST_DATE_DEBUT]
,STH.[ST_DATE_FIN]
,STH.[STH_DATE_ACTION]
,MAX (sth.STH_DATE_ACTION) over (partition by STH.[META_CODE_ORIGINE]) as max_date_action,
FROM [BI_EASILY].[bloc].[D_TMD_JOUR_HISTORIQUE] STJH
left JOIN [BI_EASILY].[bloc].[D_TMD_HISTORIQUE] STH ON STJH.[ST_ID] = STH.[ST_ID]
WHERE STJH.[STJH_TYPE_ACTION] <> 3
AND STJH.ST_ID = 37 and STJH.ST_ID = 37
AND STJ_JOUR = 1 )
select * from table where table.max_date_action <= stjh_date_action
This is my result ( sorry i didn't copy manualy a lot of data)
I want to show only last row whose has max value of STJH_Date_Action. When i will remove my filter « Where STJ_JOUR =1 (Monday). I want to show only one row having STJ_JOUR_VALUE = 1. I tried to do this by this condition « where max_date_action< = STJH_DATE_ACTION » but it doesn’t work. Can you help me please.
for all the records in the result (max_date_action<=stjh_date_action) as you partition the data by (STH.[META_CODE_ORIGINE])

Complex INSERT INTO SELECT statement in SQL

I have two tables in SQL. I need to add rows from one table to another. The table to which I add rows looks like:
timestamp, deviceID, value
2020-10-04, 1, 0
2020-10-04, 2, 0
2020-10-07, 1, 1
2020-10-08, 2, 1
But I have to add a row to this table if a state for a particular deviceID was changed in comparison to the last timestamp.
For example this record "2020-10-09, 2, 1" won't be added because the value wasn't changed for deviceID = 2 and last timestamp = "2020-10-08". In the same time record "2020-10-09, 1, 0" will be added, because the value for deviceID = 1 was changed to 0.
I have a problem with writing a query for this logic. I have written something like this:
insert into output
select *
from values
where value != (
select value
from output
where timestamp = (select max(timestamp) from output) and output.deviceID = values.deviceID)
Of course it doesn't work because of the last part of the query "and output.deviceID = values.deviceID".
Actually the problem is that I don't know how to take the value from "output" table where deviceID is the same as in the row that I try to insert.
I would use order by and something to limit to one row:
insert into output
select *
from values
where value <> (select o2.value
from output o2
where o2.deviceId = v.deviceId
order by o2.timestamp desc
fetch first 1 row only
);
The above is standard SQL. Specific databases may have other ways to express this, such as limit or top (1).

Fetch rows based on condition

I am using PostgreSQL on Amazon Redshift.
My table is :
drop table APP_Tax;
create temp table APP_Tax(APP_nm varchar(100),start timestamp,end1 timestamp);
insert into APP_Tax values('AFH','2018-01-26 00:39:51','2018-01-26 00:39:55'),
('AFH','2016-01-26 00:39:56','2016-01-26 00:40:01'),
('AFH','2016-01-26 00:40:05','2016-01-26 00:40:11'),
('AFH','2016-01-26 00:40:12','2016-01-26 00:40:15'), --row x
('AFH','2016-01-26 00:40:35','2016-01-26 00:41:34') --row y
Expected output:
'AFH','2016-01-26 00:39:51','2016-01-26 00:40:15'
'AFH','2016-01-26 00:40:35','2016-01-26 00:41:34'
I had to compare start and endtime between alternate records and if the timedifference < 10 seconds get the next record endtime till last or final record.
I,e datediff(seconds,2018-01-26 00:39:55,2018-01-26 00:39:56) Is <10 seconds
I tried this :
SELECT a.app_nm
,min(a.start)
,max(b.end1)
FROM APP_Tax a
INNER JOIN APP_Tax b
ON a.APP_nm = b.APP_nm
AND b.start > a.start
WHERE datediff(second, a.end1, b.start) < 10
GROUP BY 1
It works but it doesn't return row y when conditions fails.
There are two reasons that row y is not returned is due to the condition:
b.start > a.start means that a row will never join with itself
The GROUP BY will return only one record per APP_nm value, yet all rows have the same value.
However, there are further logic errors in the query that will not successfully handle. For example, how does it know when a "new" session begins?
The logic you seek can be achieved in normal PostgreSQL with the help of a DISTINCT ON function, which shows one row per input value in a specific column. However, DISTINCT ON is not supported by Redshift.
Some potential workarounds: DISTINCT ON like functionality for Redshift
The output you seek would be trivial using a programming language (which can loop through results and store variables) but is difficult to apply to an SQL query (which is designed to operate on rows of results). I would recommend extracting the data and running it through a simple script (eg in Python) that could then output the Start & End combinations you seek.
This is an excellent use-case for a Hadoop Streaming function, which I have successfully implemented in the past. It would take the records as input, then 'remember' the start time and would only output a record when the desired end-logic has been met.
Sounds like what you are after is "sessionisation" of the activity events. You can achieve that in Redshift using Windows Functions.
The complete solution might look like this:
SELECT
start AS session_start,
session_end
FROM (
SELECT
start,
end1,
lead(end1, 1)
OVER (
ORDER BY end1) AS session_end,
session_boundary
FROM (
SELECT
start,
end1,
CASE WHEN session_switch = 0 AND reverse_session_switch = 1
THEN 'start'
ELSE 'end' END AS session_boundary
FROM (
SELECT
start,
end1,
CASE WHEN datediff(seconds, end1, lead(start, 1)
OVER (
ORDER BY end1 ASC)) > 10
THEN 1
ELSE 0 END AS session_switch,
CASE WHEN datediff(seconds, lead(end1, 1)
OVER (
ORDER BY end1 DESC), start) > 10
THEN 1
ELSE 0 END AS reverse_session_switch
FROM app_tax
)
AS sessioned
WHERE session_switch != 0 OR reverse_session_switch != 0
UNION
SELECT
start,
end1,
'start'
FROM (
SELECT
start,
end1,
row_number()
OVER (PARTITION BY APP_nm
ORDER BY end1 ASC) AS row_num
FROM APP_Tax
) AS with_row_number
WHERE row_num = 1
) AS with_boundary
) AS with_end
WHERE session_boundary = 'start'
ORDER BY start ASC
;
Here is the breadkdown (by subquery name):
sessioned - we first identify the switch rows (out and in), the rows in which the duration between end and start exceeds limit.
with_row_number - just a patch to extract the first row because there is no switch into it (there is an implicit switch that we record as 'start')
with_boundary - then we identify the rows where specific switches occur. If you run the subquery by itself it is clear that session start when session_switch = 0 AND reverse_session_switch = 1, and ends when the opposite occurs. All other rows are in the middle of sessions so are ignored.
with_end - finally, we combine the end/start of 'start'/'end' rows into (thus defining session duration), and remove the end rows
with_boundary subquery answers your initial question, but typically you'd want to combine those rows to get the final result which is the session duration.

Return overlapping date records in SQL

I used the following query to fetch the overlapping records in SQL:
SELECT QUOTE_ID,FUNCTION_ID,FUNCTION_DT,FUNC_SPACE_ID,FN_START_TIME,FN_END_TIME,DATE_AUTH_LEVEL
FROM R_13_ALL_RESERVED A
WHERE
A.FUNC_SPACE_ID = '401-ZFU-52'
AND A.FUNCTION_DT = TO_DATE('09/03/2015','MM/DD/YYYY')
AND EXISTS ( SELECT 'X'
FROM R_13_ALL_RESERVED B
WHERE A.PROPERTY = B.PROPERTY
AND A.FUNCTION_DT = B.FUNCTION_DT
AND A.FUNCTION_ID <> B.FUNCTION_ID
AND ( ( A.FN_START_TIME > B.FN_START_TIME
AND A.FN_START_TIME < B.FN_END_TIME)
OR ( B.FN_START_TIME > A.FN_START_TIME
AND B.FN_START_TIME < A.FN_END_TIME)
OR ( A.FN_START_TIME = B.FN_START_TIME
AND A.FN_END_TIME = B.FN_END_TIME)
)
)
But eventhough the dates are not overlapping it still returns the records as overlapping.
I am missing some thing here?
Also if the date records overlap, I need to compare the count of function_id records with DATE_AUTH_LEVEL, if 2 function_id records overlap and the count of function_id would be 2 and DATE_AUTH_LEVEL is 1, such record should in the result set.
Please find the data set in SQLFiddle
http://sqlfiddle.com/#!9/95874/1
Desired Output : The SQL should return overlapping FN_START_TIME and FN_END_TIME for a function_space_id and it's function_dt
In the provided example, row 5 and 6 overlap for the function space id '401-ZFU-12' and function_dt 'August, 15 2015' and all others are not overlapping
The simplest predicate (where clause condition) for detecting the overlap of two ranges is to compare the start of the first range with the end of the 2nd range, and the start of the 2nd range with the end of the first range:
WHERE R1.Start_Date <= R2.End_Date
AND R2.Start_Date <= R1.End_Date
As you can see each of the two inequalities looks at a start and end value from separate records (R1 and R2 and then R2 and R1 respectively) all that remains is to add the conditions that will correlate the records, and also ensure that you aren't comparing a row to itself So if you want to find all Common_IDs that have Distinct_IDs with over lapping date ranges:
select *
from Your_Table R1
where exists (select 1 from Your_Table R2
where R1.Common_ID = R2.Common_ID
and R1.Distinct_ID <> R2.Distinct_ID
and R1.Start_Date <= R2.End_Date
and R2.Start_Date <= R1.End_Date)
If there is no Distinct_ID to use, you can use R1.rowid <> R2.rowid in place of R1.Distinct_ID <> R2.Distinct_ID
Here is an approach to troubleshooting the issue on your end.
My first suspicion is that the results of your exists clause are too broad and thus returning rows for every record matching in the outer clause unexpectedly. Likely there are rows that do not fall on the desired date or spaceid that share one component of their interval with your inner criteria.
Inspect the results of the inner select statement (the one within the exists clause) for an example row, exchanging all the 'A' aliased values with actual values from one of the rows returned you did not expect to receive.
Additionally, you can inspect what I think would be a semi join in the execution profile to see what the join criteria are. If you expect it to be filtered by a constant for 'FUNC_SPACE_ID' of '401-ZFU-52', you will discover that it is not.

SQL query determine stopped time in a range

I have to determine stopped time of an vehicle that sends back to server its status data every 30 second and this data is stored in a table of a database.
The fields of a status record consist of (vehicleID, ReceiveDate, ReceiveTime, Speed, Location).
Now what I want to do is, determine each suspension time at the point that vehicle speed came to zero to the status the vehicle move again and so on for next suspension time.
For example on a given day, a given vehicle may have 10 stopped status and I must determine duration of each by a query.
The result can be like this:
id Recvdate Rtime Duration
1 2010-05-01 8:30 45min
1 2110-05-01 12:21 3hour
This is an application of windows functions (called analytic functions in Oracle).
Your goal is to assign a "block number" to each sequence of stops. That is, all stops in a sequence (for a vehicle) will have the same block number, and this will be different from all other sequences of stops.
Here is a way to assign the block number:
Create a speed flag that says 1 when speed > 0 and 0 when speed = 0.
Enumerate all the records where the speed flag = 1. These are "blocks".
Do a self join to put each flag = 0 in a block (this requires grouping and taking the max blocknum).
Summarize by duration or however you want.
The following code is a sketch of what I mean. It won't solve your problem, because you are not clear about how to handle day breaks, what information you want to summarize, and it has an off-by-1 error (in each sequence of stops it includes the previous non-stop, if any).
with vd as
(
select vd.*,
(case when SpeedFlag = 1
then ROW_NUMBER() over (partition by id, SpeedFlag) end) as blocknum
from
(
select vd.*, (case when speed = 0 then 0 else 1 end) as SpeedFlag
from vehicaldata vd
) vd
)
select id, blocknum, COUNT(*) as numrecs, SUM(duration) as duration
from
(
select vd.id, vd.rtime, vd.duration, MAX(vdprev.blocknum) as blocknum
from vd
left outer join vd vdprev
on vd.id = vdprev.id
and vd.rtime > vdprev.rtime
group by vd.id, vd.rtime, vd.duration
) vd
group by id, blocknum