Specific grouping elements in SQL Server

Specific grouping elements in SQL Server - sql

I've got a problem with my SQL task and didn't find any answers yet.
I've got table with this sample data:
ID
Value
Date
1
1
2020-01-01
1
2
2020-03-02
1
1
2020-03-21
1
1
2020-04-14
1
3
2020-05-01
1
1
2020-08-09
1
1
2020-09-12
1
1
2020-10-12
1
3
2020-12-04
All I want to get is:
ID
Value
Date
1
1
2020-01-01
1
2
2020-03-02
1
1
2020-03-21
1
3
2020-05-01
1
1
2020-08-09
1
3
2020-12-04
Some kind of changing value history, but only if the value was changed - when value on new record is the same, get value with min date.
I tried with grouping and row_number, but got no positive results. Any ideas how to do that?

One way to articulate your logic is to say that you want to retain a record when the previous record, as ordered by the date (within a given ID), has a different value than the current record.
WITH cte AS (
SELECT *, LAG(Value) OVER (PARTITION BY ID ORDER BY Date) LagValue
FROM yourTable
)
SELECT ID, Value, Date
FROM cte
WHERE LagValue <> Value OR LagValue IS NULL
ORDER BY Date;
Demo

Related

How to filter out multiple downtime events in SQL Server?

There is a query I need to write that will filter out multiples of the same downtime event. These records get created at the exact same time with multiple different timestealrs which I don't need. Also, in the event of multiple timestealers for a downtime event I need to make the timestealer 'NULL' instead.
Example table:
Id
TimeStealer
Start
End
Is_Downtime
Downtime_Event
1
Machine 1
2022-01-01 01:00:00
2022-01-01 01:01:00
1
Malfunction
2
Machine 2
2022-01-01 01:00:00
2022-01-01 01:01:00
1
Malfunction
3
NULL
2022-01-01 00:01:00
2022-01-01 00:59:59
0
Operating
What I need the query to return:
Id
TimeStealer
Start
End
Is_Downtime
Downtime_Event
1
NULL
2022-01-01 01:00:00
2022-01-01 01:01:00
1
Malfunction
2
NULL
2022-01-01 00:01:00
2022-01-01 00:59:59
0
Operating

Seems like this is a top 1 row of each group, but with the added logic of making a column NULL when there are multiple rows. You can achieve that by also using a windowed COUNT, and then a CASE expression in the outer SELECT to only return the value of TimeStealer when there was 1 event:
WITH CTE AS(
SELECT V.Id,
V.TimeStealer,
V.Start,
V.[End],
V.Is_Downtime,
V.Downtime_Event,
ROW_NUMBER() OVER (PARTITION BY V.Start, V.[End], V.Is_Downtime,V.Downtime_Event ORDER BY ID) AS RN,
COUNT(V.ID) OVER (PARTITION BY V.Start, V.[End], V.Is_Downtime,V.Downtime_Event) AS Events
FROM(VALUES('1','Machine 1',CONVERT(datetime2(0),'2022-01-01 01:00:00'),CONVERT(datetime2(0),'2022-01-01 01:01:00'),'1','Malfunction'),
('2','Machine 2',CONVERT(datetime2(0),'2022-01-01 01:00:00'),CONVERT(datetime2(0),'2022-01-01 01:01:00'),'1','Malfunction'),
('3','NULL',CONVERT(datetime2(0),'2022-01-01 00:01:00'),CONVERT(datetime2(0),'2022-01-01 00:59:59'),'0','Operating'))V(Id,TimeStealer,[Start],[End],Is_Downtime,Downtime_Event))
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS ID,
CASE WHEN C.Events = 1 THEN C.TimeStealer END AS TimeStealer,
C.Start,
C.[End],
C.Is_Downtime,
C.Downtime_Event
FROM CTE C
WHERE C.RN = 1;

SQL Troubleshooting Help on Table Structure

I'm attempting to calculate average number of days between a customer's 1st and 3rd purchase, but struggling to get the data ordered in a way that will allow me to calculate.
I currently have the below data table. (Note: Order sequence number refers to the number order for that customer.)
Order Date
Customer Number
Order Sequence Number
2020-09-20
1
1
2021-01-20
1
2
2021-01-21
1
3
2020-10-01
2
1
2020-08-06
3
1
2020-09-06
3
2
2020-09-09
3
3
I've been trying to get the data to look like the following table. [To then be able to calculate datediff on the last two columns.]
Customer Number
Order Count
First Order Date
Third Order Date
1
3
2020-09-20
2021-01-21
2
1
2020-10-01
Null
3
3
2020-08-06
2020-09-09
I've completely messed up the code, but here's what I've been trying.
CREATE TABLE X2 as
SELECT
customer_number,
max(order_sequence_number) as order_count,
CASE
WHEN order_sequence_number = 1 then order_date
ELSE null
END as first_order_date,
CASE
WHEN order_sequence_number = 3 then order_date
ELSE null
END as third_order_date
FROM X1
GROUP BY customer_number;
Can someone please tell me what I'm missing? Thanks in advance!

You are on the right track but you need aggregation functions:
SELECT customer_number,
max(order_sequence_number) as order_count,
MAX(CASE WHEN order_sequence_number = 1 THEN order_date END) as first_order_date,
MAX(CASE WHEN order_sequence_number = 3 THEN order_date END) as third_order_date
FROM X1
GROUP BY customer_number;
To get the difference in days, you would just subtract the two expressions using whatever date arithmetic is supported in your database.

adjust date overlaps within a group

I have this table and I want to adjust END_DATE one day prior to the next ST_DATE in case if there are overlap dates for a group of ID
TABLE HAVE
ID ST_DATE END_DATE
1 2020-01-01 2020-02-01
1 2020-05-10 2020-05-20
1 2020-05-18 2020-06-19
1 2020-11-11 2020-12-01
2 1999-03-09 1999-05-10
2 1999-04-09 2000-05-10
3 1999-04-09 2000-05-10
3 2000-06-09 2000-08-16
3 2000-08-17 2009-02-17
Below is what I'm looking for
TABLE WANT
ID ST_DATE END_DATE
1 2020-01-01 2020-02-01
1 2020-05-10 2020-05-17 =====changed to a day less than the next ST_DATE due to some sort of overlap
1 2020-05-18 2020-06-19
1 2020-11-11 2020-12-01
2 1999-03-09 1999-04-08 =====changed to a day less than the next ST_DATE due to some sort of overlap
2 1999-04-09 2000-05-10
3 1999-04-09 2000-05-10
3 2000-06-09 2000-08-16
3 2000-08-17 2009-02-17

Maybe you can use LEAD() for this. Initial idea:
select
id, st_date, end_date
, lead( st_date ) over ( partition by id order by st_date ) nextstart_
from overlap
;
-- result
ID ST_DATE END_DATE NEXTSTART
---------- --------- --------- ---------
1 01-JAN-20 01-FEB-20 10-MAY-20
1 10-MAY-20 20-MAY-20 18-MAY-20
1 18-MAY-20 19-JUN-20 11-NOV-20
1 11-NOV-20 01-DEC-20
2 09-MAR-99 10-MAY-99 09-APR-99
2 09-APR-99 10-MAY-00
3 09-APR-99 10-MAY-00 09-JUN-00
3 09-JUN-00 16-AUG-00 17-AUG-00
3 17-AUG-00 17-FEB-09
Once you have the next start date and the end_date side by side (as it were),
you can use CASE ... for adjusting the dates as you need them.
select ilv.id, ilv.st_date
, case
when ilv.end_date > ilv.nextstart_ then
to_char( ilv.nextstart_ - 1 ) || ' <- modified end date'
else
to_char( ilv.end_date )
end dt_modified
from (
select
id, st_date, end_date
, lead( st_date ) over ( partition by id order by st_date ) nextstart_
from overlap
) ilv
;
ID ST_DATE DT_MODIFIED
---------- --------- ---------------------------------------
1 01-JAN-20 01-FEB-20
1 10-MAY-20 17-MAY-20 <- modified end date
1 18-MAY-20 19-JUN-20
1 11-NOV-20 01-DEC-20
2 09-MAR-99 08-APR-99 <- modified end date
2 09-APR-99 10-MAY-00
3 09-APR-99 10-MAY-00
3 09-JUN-00 16-AUG-00
3 17-AUG-00 17-FEB-09
DBfiddle here.

If two "windows" for the same id have the same start date, then the problem doesn't make sense. So, let's assume that the problem makes sense - that is, the combination (id, st_date) is unique in the inputs.
Then, the problem can be formulated as follows: for each id, order rows by st_date ascending. Then, for each row, if its end_dt is less than the following st_date, return the row as is. Otherwise replace end_dt with the following st_date, minus 1. This last step can be achieved with the analytic lead() function.
A solution might look like this:
select id, st_date,
least(end_date, lead(st_date, 1, end_date + 1)
over (partition by id order by st_date) - 1) as end_date
from have
;
The bit about end_date + 1 in the lead function handles the last row for each id. For such rows there is no "next" row, so the default application of lead will return null. The default can be overridden by using the third parameter to the function.

Computing rolling average and standard deviation by dates

I have the below table where I will need to compute the rolling average and standard deviation based on the dates. I have listed below the tables and expected results. I am trying to compute the rolling average for an id based on date. rollAvgA is computed based on metricA. For example, for the first occurrence of id for a particular date the result should return zero as it does not have any preceding values. Please let me know how this can be accomplished?
Current Table :
Date id metricA
8/1/2019 100 2
8/2/2019 100 3
8/3/2019 100 2
8/1/2019 101 2
8/2/2019 101 3
8/3/2019 101 2
8/4/2019 101 2
Expected Table :
Date id metricA rollAvgA
8/1/2019 100 2 0
8/2/2019 100 3 2.5
8/3/2019 100 2 2.3
8/1/2019 101 2 0
8/2/2019 101 3 2.5
8/3/2019 101 2 2.3
8/4/2019 101 2 2.25

You seem to want a cumulative average. This is basically:
select t.*,
avg(metricA * 1.0) over (partition by id order by date) as rollingavg
from t;
The only caveat is that the first value is an average of one value. To handle this, use a case expression:
select t.*,
(case when row_number() over (partition by id order by date) > 1
then avg(metricA * 1.0) over (partition by id order by date)
else 0
end) as rollingavg
from t;

How many Days each item was in each State, the full value of the period

This post is really similar to my question:
SQL Server : how many days each item was in each state
but I dont have the column Revision to see wich is the previous state, and also I want to get the full time of a status, I b
....
I'm want to get how long one item has been in one status in general, my table look like this:
ID DATE STATUS
3D56B7B1-FCB3-4897-BAEB-004796E0DC8D 2016-04-05 11:30:00.000 1
3D56B7B1-FCB3-4897-BAEB-004796E0DC8D 2016-04-08 11:30:00.000 13
274C5DA9-9C38-4A54-A697-009933BB7B7F 2016-04-29 08:00:00.000 5
274C5DA9-9C38-4A54-A697-009933BB7B7F 2016-05-04 08:00:00.000 4
A70A66DC-9D9E-49BE-93CF-00F9E3E06CE2 2016-04-14 07:50:00.000 1
A70A66DC-9D9E-49BE-93CF-00F9E3E06CE2 2016-04-21 14:00:00.000 2
A70A66DC-9D9E-49BE-93CF-00F9E3E06CE2 2016-04-23 12:15:00.000 3
A70A66DC-9D9E-49BE-93CF-00F9E3E06CE2 2016-04-23 16:15:00.000 1
BF122AE1-CB39-4967-8F37-012DC55E92A7 2016-04-05 10:30:00.000 1
BF122AE1-CB39-4967-8F37-012DC55E92A7 2016-04-20 17:00:00.000 5
I want to get this
Column 1 : ID Column 2 : Status Column 3 : Time with the status
Column 3 : Time with the status
= NextDate - PreviosDate + 1
if is the last Status, is count as 1
if is more than one Status on the same day, I get the Last one (u can say that only mather the last Status of the day)
by ID, Status must be unique
I should look like this:
ID STATUS TIME
3D56B7B1-FCB3-4897-BAEB-004796E0DC8D 1 3
3D56B7B1-FCB3-4897-BAEB-004796E0DC8D 13 1
274C5DA9-9C38-4A54-A697-009933BB7B7F 5 5
274C5DA9-9C38-4A54-A697-009933BB7B7F 4 1
A70A66DC-9D9E-49BE-93CF-00F9E3E06CE2 1 8
A70A66DC-9D9E-49BE-93CF-00F9E3E06CE2 2 2
BF122AE1-CB39-4967-8F37-012DC55E92A7 1 15
BF122AE1-CB39-4967-8F37-012DC55E92A 5 1

Thanks to #ConradFrix comments, this is how works ..
WITH CTE
AS
(
SELECT
ID,
STATUS,
DATE,
LEAD(DATE, 1) over (partition by ID order by DATE) LEAD,
ISNULL(DATEDIFF(DAYOFYEAR, DATE,
LEAD(DATE, 1) over (partition by ID order by DATE)), 1) DIF_BY_LEAD
FROM TABLE_NAME
)
SELECT ID, STATUS, SUM(DIF_BY_LEAD) AS TIME_STATUS
FROM CTE GROUP BY ID, STATUS
ORDER BY ID, STATUS

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Specific grouping elements in SQL Server - sql

Related

How to filter out multiple downtime events in SQL Server?

SQL Troubleshooting Help on Table Structure

adjust date overlaps within a group

Computing rolling average and standard deviation by dates

How many Days each item was in each State, the full value of the period

Categories

Resources