I have been working on this SQL code for a bit and I cannot get it to display like I want. I have an operation that we send parts outside of our business but there is no time stamp on when that operation sent out.
I am taking the previous operation's last labor date and the purchase order creation date to try and find out how long it takes that department to issued a purchase order.
I have tried LAST_Value to add to my query. I have even played with LAG and couldn't get a anything but errors.
SELECT
JobOpDtl.JobNum,
JobOpDtl.OprSeq,
JobOpDtl.OpDtlDesc,
LastValue.ClockInDate,
LastValue.LastValue
FROM Erp.JobOpDtl
LEFT OUTER JOIN Erp.LaborDtl ON
LaborDtl.JobNum = JobOpDtl.JobNum
and LaborDtl.OprSeq = JobOpDtl.OprSeq
LEFT OUTER JOIN (
Select
LaborDtl.JobNum,
LaborDtl.OprSeq,
MAX(LaborDtl.ClockInDate) as ClockInDate,
LAST_VALUE (LaborDtl.ClockInDate) OVER (PARTITION BY OprSeq ORDER BY JobNum) as LastValue
FROM Erp.LaborDtl
GROUP BY
LaborDtl.JobNum,
LaborDtl.OprSeq,
LaborDtl.ClockInDate
) as LastValue ON
JobOpDtl.JobNum = LastValue.JobNum
and JobOpDtl.OprSeq = LastValue.OprSeq
WHERE JobOpDtl.JobNum = 'PA8906'
GROUP BY
JobOpDtl.JobNum,
LastValue.OprSeq,
JobOpDtl.OpDtlDesc,
JobOpDtl.OprSeq,
LastValue.ClockInDate,
LastValue.LastValue
No errors, just not displaying how I am wanting it.
I would like it to display the OperSeq with the previous OperSeq last transaction date.
The basic function you want is LAG (as you suggested) but you need to wrap it in a COALESCE. Here is a sample code that illustrates the concept
SELECT * INTO #Jobs
FROM (VALUES ('P1','Step1', '2019-04-01'), ('P1','Step2', '2019-04-02')
, ('P1','Step3', '2019-04-03'), ('P1','Step4', NULL),
('P2','Step1', '2019-04-01'), ('P2','Step2', '2019-04-03')
, ('P2','Step3', '2019-04-06'), ('P2','Step4', NULL)
) as JobDet(JobNum, Descript, LastDate)
SELECT *
, COALESCE( LastDate, LAG(LastDate,1)
OVER(PARTITION BY JobNum
ORDER BY COALESCE(LastDate,GETDATE()))) as LastValue
FROM #Jobs
ORDER BY JobNum, Descript
DROP TABLE #Jobs
To apply it to your specific problem, I'd suggest using a COMMON TABLE EXPRESSION that replaces LastValue and using that instead of the raw table for your queries.
Your example picture doesn't match any tables you reference in your code (it would help us significantly if you included code that created temp tables matching those referenced in your code) so this is a guess, but it will be something like this:
;WITH cteJob as (
SELECT JobNum, OprSeq, OpDtlDesc, ClockInDate
, COALESCE( LastValue, LAG(LastValue,1)
OVER(PARTITION BY JobNum
ORDER BY COALESCE(LastValue,GETDATE()))) as LastValue
FROM Erp.JobOptDtl
) SELECT *
FROM cteJob as J
LEFT OUTER JOIN LaborDtl as L
on J.JobNum = JobNum
AND J.OprSeq = L.OprSeq
BTW, if you clean up your question to provide a better example of your data (i.e. SELECT INTO sttements like in the start of my answer that produce tables that correspond to the tables in your code instead of an image of an excel file) I might be able to get you closer to what you need, but hopefully this is enough to get you on the right track and it's the best I can do with what you've provided so far.
I have some data that contains data from measurements from crash impact tests.
When the object is not moving the measurements contain much rows of the same data, when the object is moving and shaking it can register quite big fluctuations.
Problem: I have hundreds of millions of lines of this data and to use it in reporting (mostly plotting) I have to find a way to make simplify everything and especially reduce the number of records.
Sometimes I have 20 times exactly the same value (=ChannelValue)
An example of the data is the following:
idMetaData;TimeStamp;SampleNumber;ChannelValue
3;0,5036500;12073;0.4573468975
3;0,5037000;12074;0.4418814526
3;0,5037500;12075;0.4109505628
3;0,5038000;12076;0.4109505628
3;0,5038500;12077;0.4264160077
3;0,5038999;12078;0.4573468975
3;0,5039499;12079;0.4573468975
3;0,5039999;12080;0.4109505628
3;0,5040500;12081;0.3336233382
3;0,5041000;12082;0.2408306686
3;0,5041500;12083;0.1789688889
3;0,5042000;12084;0.1789688889
3;0,5042500;12085;0.2253652237
3;0,5042999;12086;0.3026924483
3;0,5043499;12087;0.3645542280
3;0,5044000;12088;0.3954851178
3;0,5044500;12089;0.3645542280
3;0,5045000;12090;0.3026924483
3;0,5045500;12091;0.2253652237
3;0,5046000;12092;0.1635034440
3;0,5046499;12093;0.1325725541
3;0,5046999;12094;0.1480379991
3;0,5047500;12095;0.1789688889
3;0,5048000;12096;0.1944343338
3;0,5048500;12097;0.2098997788
3;0,5049000;12098;0.1944343338
3;0,5049500;12099;0.1635034440
3;0,5049999;12100;0.1171071092
3;0,5050499;12101;0.0861762194
3;0,5051000;12102;0.0707107744
3;0,5051500;12103;0.0707107744
3;0,5052000;12104;0.0861762194
3;0,5052500;12105;0.1171071092
3;0,5053000;12106;0.1635034440
idMetaData;TimeStamp;SampleNumber;ChannelValue
50;0,8799999;19600;-0.7106432894
50;0,8800499;19601;-0.7484265845
50;0,8801000;19602;-0.7232377211
50;0,8801500;19603;-0.6098878356
50;0,8802000;19604;-0.6098878356
50;0,8802500;19605;-0.6476711307
50;0,8802999;19606;-0.7232377211
50;0,8803499;19607;-0.7988043114
50;0,8803999;19608;-0.8617764701
50;0,8804500;19609;-0.8491820384
50;0,8805000;19610;-0.8617764701
50;0,8805500;19611;-0.7988043114
50;0,8806000;19612;-0.8239931749
50;0,8806499;19613;-0.7988043114
50;0,8806999;19614;-0.7736154480
50;0,8807499;19615;-0.6602655625
50;0,8807999;19616;-0.5972934038
50;0,8808500;19617;-0.6602655625
50;0,8809000;19618;-0.7484265845
50;0,8809500;19619;-0.8365876066
50;0,8809999;19620;-0.7862098797
50;0,8810499;19621;-0.8113987432
50;0,8810999;19622;-0.7988043114
50;0,8811499;19623;-0.6980488576
50;0,8812000;19624;-0.7232377211
50;0,8812500;19625;-0.7484265845
50;0,8813000;19626;-0.7232377211
50;0,8813500;19627;-0.8239931749
50;0,8813999;19628;-0.8491820384
50;0,8814499;19629;-0.8617764701
50;0,8814999;19630;-0.8365876066
50;0,8815500;19631;-0.8365876066
50;0,8816000;19632;-0.7988043114
50;0,8816500;19633;-0.8113987432
50;0,8817000;19634;-0.8113987432
50;0,8817499;19635;-0.7736154480
50;0,8817999;19636;-0.7232377211
50;0,8818499;19637;-0.6728599942
50;0,8819000;19638;-0.7232377211
50;0,8819500;19639;-0.7610210163
50;0,8820000;19640;-0.7106432894
50;0,8820500;19641;-0.6602655625
50;0,8820999;19642;-0.6602655625
50;0,8821499;19643;-0.6854544259
50;0,8821999;19644;-0.7736154480
50;0,8822500;19645;-0.8113987432
50;0,8823000;19646;-0.8869653335
50;0,8823500;19647;-0.8743709018
50;0,8824000;19648;-0.7988043114
50;0,8824499;19649;-0.8491820384
50;0,8824999;19650;-0.8239931749
50;0,8825499;19651;-0.8239931749
50;0,8825999;19652;-0.7232377211
50;0,8826500;19653;-0.6854544259
50;0,8827000;19654;-0.6728599942
50;0,8827500;19655;-0.6854544259
50;0,8827999;19656;-0.7232377211
50;0,8828499;19657;-0.7232377211
50;0,8828999;19658;-0.6980488576
50;0,8829499;19659;-0.6980488576
50;0,8830000;19660;-0.7106432894
50;0,8830500;19661;-0.6854544259
50;0,8831000;19662;-0.7484265845
50;0,8831499;19663;-0.7484265845
50;0,8831999;19664;-0.7736154480
50;0,8832499;19665;-0.7610210163
50;0,8832999;19666;-0.7610210163
50;0,8833500;19667;-0.7988043114
50;0,8834000;19668;-0.8617764701
50;0,8834500;19669;-0.9121541970
50;0,8835000;19670;-0.8869653335
50;0,8835499;19671;-0.8743709018
50;0,8835999;19672;-0.9121541970
50;0,8836499;19673;-0.8491820384
50;0,8837000;19674;-0.7988043114
50;0,8837500;19675;-0.7736154480
50;0,8838000;19676;-0.7106432894
50;0,8838500;19677;-0.6980488576
50;0,8838999;19678;-0.7484265845
50;0,8839499;19679;-0.8491820384
50;0,8839999;19680;-0.8491820384
50;0,8840500;19681;-0.7610210163
50;0,8841000;19682;-0.7106432894
50;0,8841500;19683;-0.7232377211
50;0,8842000;19684;-0.7962098797
50;0,8842499;19685;-0.7358321528
50;0,8842999;19686;-0.7232377211
50;0,8843499;19687;-0.7484265845
50;0,8844000;19688;-0.6728599942
50;0,8844500;19689;-0.6854544259
50;0,8845000;19690;-0.7106432894
50;0,8845500;19691;-0.7232377211
50;0,8845999;19692;-0.7862098797
50;0,8846499;19693;-0.7862098797
idMetaData;TimeStamp;SampleNumber;ChannelValue
15;0,3148000;8296;1.5081626404
15;0,3148500;8297;1.5081626404
15;0,3149000;8298;1.5727382554
15;0,3149500;8299;1.5081626404
15;0,3150000;8300;1.4920187367
15;0,3150500;8301;1.4435870254
15;0,3151000;8302;1.4274431217
15;0,3151500;8303;1.5243065442
15;0,3152000;8304;1.4920187367
15;0,3152500;8305;1.5081626404
15;0,3153000;8306;1.4920187367
15;0,3153500;8307;1.5565943516
15;0,3154000;8308;1.5081626404
15;0,3154500;8309;1.5404504479
15;0,3155000;8310;1.5081626404
15;0,3155500;8311;1.5727382554
15;0,3156000;8312;1.5404504479
15;0,3156500;8313;1.3951553142
15;0,3157000;8314;1.4758748329
15;0,3157500;8315;1.4435870254
15;0,3158000;8316;1.4920187367
15;0,3158500;8317;1.4920187367
15;0,3159000;8318;1.5081626404
15;0,3159500;8319;1.4597309292
15;0,3160000;8320;1.4274431217
15;0,3160500;8321;1.4274431217
15;0,3161000;8322;1.4597309292
15;0,3161500;8323;1.5565943516
15;0,3162000;8324;1.5888821591
15;0,3162500;8325;1.5565943516
15;0,3163000;8326;1.5243065442
15;0,3163500;8327;1.5404504479
15;0,3164000;8328;1.5404504479
15;0,3164500;8329;1.5404504479
15;0,3165000;8330;1.5404504479
I want to reduce the number of records by factor 10 or 20.
One solution would be to keep the average of 20 rows but then there is one problem, when there is a peek it will 'evaporate' in the average.
What I'd need is an average of 20 rows ('ChannelValue') but when there is a value that is a 'peek' -> definition: differs more than 10% -positive or negative- with the last (2?) value(s) than for this one do not take the average but the peek value, and from there continue the averages... This is the intelligence I mean in the title
I could also use some sort of 'distinct' logic that will also reduce the number of records by factor 8 to 10.
I read stuff about the NTILE function but this is all new for me.
Partition by idMetadata, order by id (there is an id column which I did not include right now)
Thanks so much in advance!
Here's one way. In SQL Server 2012 i'd use LEAD() or LAG() but since you are on 2008 we can use ROW_NUMBER() with a CTE and then limit on the variation.
declare #test table (idMetaData int, TimeStamp varchar(64), SampleNumber bigint, ChannelValue decimal(16,10))
insert into #test
values
(3,'0,5036500',12073,0.4573468975),
(3,'0,5037000',12074,0.4418814526),
(3,'0,5037500',12075,0.4109505628),
(3,'0,5038000',12076,0.4109505628),
(3,'0,5038500',12077,0.4264160077),
(3,'0,5038999',12078,0.4573468975),
(3,'0,5039499',12079,0.4573468975),
(3,'0,5039999',12080,0.4109505628),
(3,'0,5040500',12081,0.3336233382),
(3,'0,5041000',12082,0.2408306686),
(3,'0,5041500',12083,0.1789688889),
(3,'0,5042000',12084,0.1789688889)
--set the minimum variation you want to keep. Anything greate than this will be removed
declare #variation decimal(16,10) = 0.0000000010
--apply an order with row_number()
;with cte as(
select
idMetaData
,TimeStamp
,SampleNumber
,ChannelValue
,row_number() over (partition by idMetadata order by SampleNumber) as RN
from #test),
--self join to itself adding the next row as additional columns
cte2 as(
select
c.*
,c2.TimeStamp as C2TimeStamp
,c2.SampleNumber as C2SampleNumber
,c2.ChannelValue as C2ChannelValue
from cte c
left join cte c2 on c2.rn = c.rn + 1)
--only return the results where the variation is met. Change the variation to see this in action
select
idMetaData
,TimeStamp
,SampleNumber
,ChannelValue
from
cte2
where
ChannelValue - C2ChannelValue > #variation or C2ChannelValue is null
This doesn't take an "average" which would have to be a running average but what it allows you to do is to use a variance measurement to say that any consecutive measurements which only vary by n amount, treat as a single measurement. The higher the variance you choose, the more rows that will be "removed" or treated equally. It's a way to cluster your points in order to remove some noise without using something like K-Means which is hard in SQL.
Just for fun. I modified a stored procedure which generates dynamic stats for any table/query/measure. This has been tailored to be stand-alone.
This will generate a series of analytical items for groups of 10 ... an arbitrary value.
Just a side note: If there is no true MODE, ModeR1 and ModeR2 will represent the series range. When ModeR1 = ModeR2 then that would be the true mode.
dbFiddle
Example
;with cteBase as (Select GroupBy = [idMetaData]
,Item = Row_Number() over (Partition By [idMetaData] Order By SampleNumber) / 10
,RowNr = Row_Number() over (Partition By [idMetaData] Order By SampleNumber)
,Measure = ChannelValue
,TimeStamp
,SampleNumber
From #YourTable
),
cteMean as (Select GroupBy,Item,Mean=Avg(Measure),Rows=Count(*),MinRow=min(RowNr),MaxRow=max(RowNr) From cteBase Group By GroupBy,Item),
cteMedn as (Select GroupBy,Item,MedRow1=ceiling(Rows/2.0),MedRow2=ceiling((Rows+1)/2.0) From cteMean),
cteMode as (Select GroupBy,Item,Mode=Measure,ModeHits=count(*),ModeRowNr=Row_Number() over (Partition By GroupBy,Item Order By Count(*) Desc) From cteBase Group By GroupBy,Item,Measure)
Select idMetaData = A.GroupBy
,Bin = A.Item+1
,TimeStamp1 = min(TimeStamp)
,TimeStamp2 = max(TimeStamp)
,SampleNumber1 = min(SampleNumber)
,SampleNumber2 = max(SampleNumber)
,Records = count(*)
,StartValue = sum(case when RowNr=B.MinRow then Measure end)
,EndValue = sum(case when RowNr=B.MaxRow then Measure end)
,UniqueVals = count(Distinct A.Measure)
,MinVal = min(A.Measure)
,MaxVal = max(A.Measure)
,Mean = max(B.Mean)
,Median = isnull(Avg(IIF(RowNr between MedRow1 and MedRow2,Measure,null)),avg(A.Measure))
,ModeR1 = isnull(max(IIf(ModeHits>1,D.Mode,null)),min(A.Measure))
,ModeR2 = isnull(max(IIf(ModeHits>1,D.Mode,null)),max(A.Measure))
,StdDev = Stdev(A.Measure)
From cteBase A
Join cteMean B on (A.GroupBy=B.GroupBy and A.Item=B.Item)
Join cteMedn C on (A.GroupBy=C.GroupBy and A.Item=C.Item)
Join cteMode D on (A.GroupBy=D.GroupBy and A.Item=D.Item and ModeRowNr=1)
Group By A.GroupBy,A.Item
Order By A.GroupBy,A.Item
Returns
I am using PostgreSQL on Amazon Redshift.
My table is :
drop table APP_Tax;
create temp table APP_Tax(APP_nm varchar(100),start timestamp,end1 timestamp);
insert into APP_Tax values('AFH','2018-01-26 00:39:51','2018-01-26 00:39:55'),
('AFH','2016-01-26 00:39:56','2016-01-26 00:40:01'),
('AFH','2016-01-26 00:40:05','2016-01-26 00:40:11'),
('AFH','2016-01-26 00:40:12','2016-01-26 00:40:15'), --row x
('AFH','2016-01-26 00:40:35','2016-01-26 00:41:34') --row y
Expected output:
'AFH','2016-01-26 00:39:51','2016-01-26 00:40:15'
'AFH','2016-01-26 00:40:35','2016-01-26 00:41:34'
I had to compare start and endtime between alternate records and if the timedifference < 10 seconds get the next record endtime till last or final record.
I,e datediff(seconds,2018-01-26 00:39:55,2018-01-26 00:39:56) Is <10 seconds
I tried this :
SELECT a.app_nm
,min(a.start)
,max(b.end1)
FROM APP_Tax a
INNER JOIN APP_Tax b
ON a.APP_nm = b.APP_nm
AND b.start > a.start
WHERE datediff(second, a.end1, b.start) < 10
GROUP BY 1
It works but it doesn't return row y when conditions fails.
There are two reasons that row y is not returned is due to the condition:
b.start > a.start means that a row will never join with itself
The GROUP BY will return only one record per APP_nm value, yet all rows have the same value.
However, there are further logic errors in the query that will not successfully handle. For example, how does it know when a "new" session begins?
The logic you seek can be achieved in normal PostgreSQL with the help of a DISTINCT ON function, which shows one row per input value in a specific column. However, DISTINCT ON is not supported by Redshift.
Some potential workarounds: DISTINCT ON like functionality for Redshift
The output you seek would be trivial using a programming language (which can loop through results and store variables) but is difficult to apply to an SQL query (which is designed to operate on rows of results). I would recommend extracting the data and running it through a simple script (eg in Python) that could then output the Start & End combinations you seek.
This is an excellent use-case for a Hadoop Streaming function, which I have successfully implemented in the past. It would take the records as input, then 'remember' the start time and would only output a record when the desired end-logic has been met.
Sounds like what you are after is "sessionisation" of the activity events. You can achieve that in Redshift using Windows Functions.
The complete solution might look like this:
SELECT
start AS session_start,
session_end
FROM (
SELECT
start,
end1,
lead(end1, 1)
OVER (
ORDER BY end1) AS session_end,
session_boundary
FROM (
SELECT
start,
end1,
CASE WHEN session_switch = 0 AND reverse_session_switch = 1
THEN 'start'
ELSE 'end' END AS session_boundary
FROM (
SELECT
start,
end1,
CASE WHEN datediff(seconds, end1, lead(start, 1)
OVER (
ORDER BY end1 ASC)) > 10
THEN 1
ELSE 0 END AS session_switch,
CASE WHEN datediff(seconds, lead(end1, 1)
OVER (
ORDER BY end1 DESC), start) > 10
THEN 1
ELSE 0 END AS reverse_session_switch
FROM app_tax
)
AS sessioned
WHERE session_switch != 0 OR reverse_session_switch != 0
UNION
SELECT
start,
end1,
'start'
FROM (
SELECT
start,
end1,
row_number()
OVER (PARTITION BY APP_nm
ORDER BY end1 ASC) AS row_num
FROM APP_Tax
) AS with_row_number
WHERE row_num = 1
) AS with_boundary
) AS with_end
WHERE session_boundary = 'start'
ORDER BY start ASC
;
Here is the breadkdown (by subquery name):
sessioned - we first identify the switch rows (out and in), the rows in which the duration between end and start exceeds limit.
with_row_number - just a patch to extract the first row because there is no switch into it (there is an implicit switch that we record as 'start')
with_boundary - then we identify the rows where specific switches occur. If you run the subquery by itself it is clear that session start when session_switch = 0 AND reverse_session_switch = 1, and ends when the opposite occurs. All other rows are in the middle of sessions so are ignored.
with_end - finally, we combine the end/start of 'start'/'end' rows into (thus defining session duration), and remove the end rows
with_boundary subquery answers your initial question, but typically you'd want to combine those rows to get the final result which is the session duration.