SQL - Adding conditions to SELECT - sql

I have a table which has a timestamp and inCycle status of a machine. I'm using two CTE's and doing an INNER JOIN on row number so I can easily compare the timestamp of one row to the next. I have the DATEDIFF working and now I need to look at the inCycle status. Basically, if the inCycleThis and inCycleNext both = 1, I need to add it to an InCycle total.
Similarly (Shown table will make this clear):
incycleThis/next = 0,1 = not in cycle
incycleThis/next = 0,0 = not in cycle
incycleThis/next = 1,1 = in cycle
If I was doing this client side, this would be pretty simple. I need to do this in a stored procedure though due to there being a lot of records. I'd love to use an 'IF' in the SELECT section, but it seems that's not how it works.
The result I'm looking for at the end is simply: InCycle = Xtime. Something like:
SUM(Diff_seconds if((InCycleThis = 1 AND InCycleNext = 1) OR (InCycleThis = 1 AND InCycleNext = 0))
This is what I have so far:
WITH History_CTE (DT, MID, FRO, IC, RowNum)
AS
(
SELECT DateAndTime
,MachineID
,FeedRateOverride
,InCycle
,ROW_NUMBER()OVER(ORDER BY MachineID, DateAndTime) AS "row number"
FROM History
WHERE DateAndTime >= '2020-11-15'
AND DateAndTime < '2020-11-16'
),
History2_CTE (DT2, MID2, FRO2, IC2, RowNum2)
AS
(
SELECT DateAndTime
,MachineID
,FeedRateOverride
,InCycle
,ROW_NUMBER()OVER(ORDER BY MachineID, DateAndTime) AS "row number"
FROM History
WHERE DateAndTime >= '2020-11-15'
AND DateAndTime < '2020-11-16'
)
SELECT DT as 'TimeStamp'
,DT2 as 'TimeStamp Next Row'
,MID
,FRO
,IC as 'InCycle this'
,IC2 as 'InCycle next'
,RowNum
,DATEDIFF(s, History2_CTE.DT2, History_CTE.DT) AS 'Diff_seconds'
FROM History_CTE
INNER JOIN
History2_CTE ON History_CTE.RowNum = History2_CTE.RowNum2 + 1

Consider adding a third CTE to first conditionally calculate your needed value. Then aggregate for final statement. Recall CTEs can reference previously defined CTEs. Be sure to always quailfy columns with table aliases in JOIN queries.
WITH
... first two ctes...
, sub AS (
SELECT h1.DT AS 'TimeStamp'
, h2.DT2 AS 'TimeStamp Next Row'
, h1.MID
, h1.FRO
, h1.IC AS 'InCycle this'
, h2.IC2 AS 'InCycle next'
, h1.RowNum
, DATEDIFF(s, h2.DT2, h1.DT) AS 'Diff_seconds'
, CASE
WHEN (h1.IC = 1 AND h2.IC2 = 1) OR (h1.IC= 1 AND h2.IC2 = 0)
THEN DATEDIFF(s, h2.DT2, h1.DT)
END AS 'IC_Diff_seconds'
FROM History_CTE h1
INNER JOIN History2_CTE h2
ON h1.RowNum = h2.RowNum2 + 1
)
SELECT SUM([Diff_seconds]) AS Diff_seconds_Total
, SUM([IC_Diff_seconds]) AS IC_Diff_seconds_Total
FROM sub
And if needing to add groupings, incorporate GROUP BY:
SELECT h1.MID
, h1.FRO
, SUM([Diff_seconds]) AS Diff_seconds_Total
, SUM([IC_Diff_seconds]) AS IC_Diff_seconds_Total
FROM sub
GROUP BY h1.MID
, h1.FRO
Even aggregate calculations by day:
SELECT CONVERT(date, [TimeStamp]) AS [Day]
, SUM([Diff_seconds]) AS Diff_seconds_Total
, SUM([IC_Diff_seconds]) AS IC_Diff_seconds_Total
FROM sub
GROUP BY CONVERT(date, [TimeStamp])

The result I'm looking for at the end is simply: InCycle = Xtime. Something like:
SUM(Diff_seconds if((InCycleThis = 1 AND InCycleNext = 1) OR (InCycleThis = 1 AND InCycleNext = 0))
As I understand your question, you just need to sum the difference betwen the timestamp of "in cycle" rows and the timestamp of the next row.
select machineid,
sum(datediff(s, dateandtime, lead_dateandtime)) as total_in_time
from (
select h.*,
lead(dateandtime) over(partition by machineid order by dateandtime) as lead_dateandtime
from history h
) h
where inclycle = 1
group by machineid

Related

Delete the records repeated by date, and keep the oldest

I have this query, and it returns the following result, I need to delete the records repeated by date, and keep the oldest, how could I do this?
select
a.EMP_ID, a.EMP_DATE,
from
EMPLOYES a
inner join
TABLE2 b on a.table2ID = b.table2ID and b.ID_TYPE = 'E'
where
a.ID = 'VJAHAJHSJHDAJHSJDH'
and year(a.DATE) = 2021
and month(a.DATE) = 1
and a.ID <> 31
order by
a.DATE;
Additionally, I would like to fill in the missing days of the month ... and put them empty if I don't have that data, can this be done?
I would appreciate if you could guide me to solve this problem
Thank you!
The other answers miss some of the requirement..
Initial step - do this once only. Make a calendar table. This will come in handy for all sorts of things over the time:
DECLARE #Year INT = '2000';
DECLARE #YearCnt INT = 50 ;
DECLARE #StartDate DATE = DATEFROMPARTS(#Year, '01','01')
DECLARE #EndDate DATE = DATEADD(DAY, -1, DATEADD(YEAR, #YearCnt, #StartDate));
;WITH Cal(n) AS
(
SELECT 0 UNION ALL SELECT n + 1 FROM Cal
WHERE n < DATEDIFF(DAY, #StartDate, #EndDate)
),
FnlDt(d, n) AS
(
SELECT DATEADD(DAY, n, #StartDate), n FROM Cal
),
FinalCte AS
(
SELECT
[D] = CONVERT(DATE,d),
[Dy] = DATEPART(DAY, d),
[Mo] = DATENAME(MONTH, d),
[Yr] = DATEPART(YEAR, d),
[DN] = DATENAME(WEEKDAY, d),
[N] = n
FROM FnlDt
)
SELECT * INTO Cal FROM finalCte
ORDER BY [Date]
OPTION (MAXRECURSION 0);
credit: mostly this site
Now we can write some simple query to stick your data (with one small addition) onto it:
--your query, minus the date bits in the WHERE, and with a ROW_NUMBER
WITH yourQuery AS(
SELECT a.emp_id, a.emp_date,
ROW_NUMBER() OVER(PARTITION BY CAST(a.emp_date AS DATE) ORDER BY a.emp_date) rn
FROM EMPLOYES a
INNER JOIN TABLE2 b on a.table2ID = b.table2ID
WHERE a.emp_id = 'VJAHAJHSJHDAJHSJDH' AND a.id <> 31 AND b.id_type = 'E'
)
--your query, left joined onto the cal table so that you get a row for every day even if there is no emp data for that day
SELECT c.d, yq.*
FROM
Cal c
LEFT JOIN yourQuery yq
ON
c.d = CAST(yq.emp_date AS DATE) AND --cut the time off
yq.rn = 1 --keep only the earliest time per day
WHERE
c.d BETWEEN '2021-01-01' AND EOMONTH('2021-01-01')
We add a rownumbering to your table, it restarts every time the date changes and counts up in order of time. We make this into a CTE (or a subquery, CTE is cleaner) then we simply left join it to the calendar table. This means that for any date you don't have data, you still have the calendar date. For any days you do have data, the rownumber rn being a condition of the join means that only the first datetime from each day is present in the results
Note: something is wonky about your question . You said you SELECT a.emp_id and your results show 'VJAHAJHSJHDAJHSJDH' is the emp id, but your where clause says a.id twice, once as a string and once as a number - this can't be right, so I've guessed at fixing it but I suspect you have translated your query into something for SO, perhaps to hide real column names.. Also your SELECT has a dangling comma that is a syntax error.
If you have translated/obscured your real query, make absolutely sure you understand any answer here when translating it back. It's very frustrating when someone is coming back and saying "hi your query doesn't work" then it turns out that they damaged it trying to translate it back to their own db, because they hid the real column names in the question..
FInally, do not use functions on table data in a where clause; it generally kills indexing. Always try and find a way of leaving table data alone. Want all of january? Do like I did, and say table.datecolumn BETWEEN firstofjan AND endofjan etc - SQLserver at least stands a chance of using an index for this, rather than calling a function on every date in the table, every time the query is run
You can use ROW_NUMBER
WITH CTE AS
(
SELECT a.EMP_ID, a.EMP_DATE,
RN = ROW_NUMBER() OVER (PARTITION BY a.EMP_ID, CAST(a.DATE as Date) ORDER BY a.DATE ASC)
from EMPLOYES a INNER JOIN TABLE2 b
on a.table2ID = b.table2ID
and b.ID_TYPE = 'E'
where a.ID = 'VJAHAJHSJHDAJHSJDH'
and year(a.DATE) = 2021
and MONTH(a.DATE) = 1
and a.ID <> 31
)
SELECT * FROM CTE
WHERE RN = 1
Try with an aggregate function MAX or MIN
create table #tmp(dt datetime, val numeric(4,2))
insert into #tmp values ('2021-01-01 10:30:35', 1)
insert into #tmp values ('2021-01-02 10:30:35', 2)
insert into #tmp values ('2021-01-02 11:30:35', 3)
insert into #tmp values ('2021-01-03 10:35:35', 4)
select * from #tmp
select tmp.*
from #tmp tmp
inner join
(select max(dt) as dt, cast(dt as date) as dt_aux from #tmp group by cast(dt as date)) compressed_rows on
tmp.dt = compressed_rows.dt
drop table #tmp
results:

Speed up execution of query to find sequential rows that have a changed value

My goal is to go through my dataset, compare each ITEM_NO/LOC day-by-day, and identify days where the VAL has changed from the day before. Right now, I do that by sorting, creating a column of row numbers, joining the table to itself offset by a row, and then only picking rows where VAL has changed.
Each month has about half a billion records. In total there's around 2.7 billion records. The data is stored in DB2 BLU. The table already has indices for ITEM_NO, LOC, and ARCV_DATE. I only have select access to the table.
I think the big bottleneck is the order by in the select statement given that n is so large. One idea I had was to try to do the sorting month-by-month and then union each of the months together.
Here's what I have so far:
with x as (
select ITEM_NO, LOC, ARCV_DATE, VAL, ROW_NUMBER() over (order by ITEM_NO, LOC, ARCV_DATE) as RN
from MY_SCHEMA.MY_TABLE a
where
ARCV_DATE >= '2017-06-01'
and ARCV_DATE < '2017-07-01'
)
SELECT
x.ITEM_NO,
x.LOC,
y.ARCV_DATE as CHANGE_DATE,
y.VAL,
x.VAL as OLD_VAL
FROM x
INNER JOIN x AS y
ON x.rn = y.rn + 1
WHERE
x.VAL <> y.VAL
and x.ITEM_NO = y.ITEM_NO
and x.LOC = y.LOC
What could I do to improve performance on this for such a dataset?
Without any write access your options are very limited because the query isn't that complex. You could try avoiding the join altogether by using LAG() OVER() such as this:
SELECT
*
FROM (
SELECT
ITEM_NO
, LOC
, ARCV_DATE
, VAL
, LAG(ARCV_DATE, 1) OVER (PARTITION BY ITEM_NO, LOC ORDER BY ARCV_DATE DESC) AS CHANGE_DATE
, LAG(VAL, 1) OVER (PARTITION BY ITEM_NO, LOC ORDER BY ARCV_DATE DESC) AS OLD_VAL
FROM MY_SCHEMA.MY_TABLE
WHERE ARCV_DATE >= '2017-06-01'
AND ARCV_DATE < '2017-07-01'
) d
WHERE ( VAL <> OLD_VAL OR OLD_VAL IS NULL )
But tuning this further could require adding or changing indexes.
SELECT currentval.ITEM,
currentval.LOC
currentval.ARCV_DATE currentdate
prevval.ARCV_DATE Previousdate
currentval.val currentval
prevval.val Previousval
FROM MY_SCHEMA.MY_TABLE currentval JOIN
MY_SCHEMA.MY_TABLE prevval ON
currentval.ITEM_NO = prevval.ITEM_NO
WHERE currentval.loc = prevval.loc
AND currentval.val <> prevval.val
AND currentval.ARCV_DATE = prevval.ARCV_DATE+1
AND currentval.ARCV_DATE >= '2017-06-01'
AND prevval.ARCV_DATE < '2017-07-01'
Assuming that values will change from one day to next day. This query will retrieve the values that changes from previous day to current day.
AND currentval.ARCV_DATE = prevval.ARCV_DATE+1

Case statement based in max min dates

I have a columns as Memnumber, activity type, activity date, activity ID. One member can have activities after few days. I want to write a case statement that if the activity date is most initial then INITIAL and if activity is most recent then MR and if there is any activity in between these 2 dates then BETWEEN. They need to be grouped by Memnumber and treatment type.
I wrote query as :
--MR County Tree
SELECT T0.MEMBERNUMBER,
T0.ACTIVITYTYPE,
T1.MR_CY17,
T1.IN_CY17,
T0.ACTIVITY_DATE,
(T0.ACTIVITYID)
FROM DLA_EXTRACT_FINAL T0
INNER JOIN (
SELECT MEMBERNUMBER,
ACTIVITYTYPE,
MAX(ACTIVITY_DATE) MR_CY17,
MIN(ACTIVITY_DATE) IN_CY17
FROM DLA20_EXTRACT_FINAL
WHERE to_char(ACTIVITY_DATE, 'YYYYMMDD') >= 20170101
AND to_char(ACTIVITY_DATE, 'YYYYMMDD') <= 20171231
GROUP BY MEMBERNUMBER,
ACTIVITYTYPE
) T1 ON T0.MEMBERNUMBER = T1.MEMBERNUMBER
AND T0.ACTIVITYTYPE = T1.ACTIVITYTYPE
AND T0.ACTIVITY_DATE = T1.MR_CY17
--where T0.ACTIVITYTYPE='MT'
WHERE t0.MEMBERNUMBER = 'M500085268'
GROUP BY T0.MEMBERNUMBER,
T0.ACTIVITYTYPE,
T1.MR_CY17,
T1.IN_CY17,
T0.ACTIVITYID,
T0.ACTIVITY_DATE
ORDER BY T0.MEMBERNUMBER,
T0.ACTIVITYTYPE,
T1.MR_CY17,
T1.IN_CY17.
Looking for a solution.
You want to use window functions. Something like:
SELECT T0.MEMBERNUMBER,
T0.ACTIVITYTYPE,
T0.ACTIVITY_DATE,
T0.ACTIVITYID,
case when row_number() over (partition by T0.MEMBERNUMBER, T0.ACTIVITYTYPE
order by T0.ACTIVITY_DATE) = 1 then 1 else 0 end most_initial,
case when row_number() over (partition by T0.MEMBERNUMBER, T0.ACTIVITYTYPE
order by T0.ACTIVITY_DATE desc) = 1 then 1 else 0 end most_recent
FROM DLA_EXTRACT_FINAL T0
Then you can use case statements to label as INITIAL if most_intial = 1, MR if most_recent = 1, or BETWEEN if both are 0.

How to select the last 12 months in sql?

I need to select the last 12 months. As you can see on the picture, May occurs two times.
But I only want it to occur once. And it needs to be the newest one.
Plus, the table should stay in this structure, with the latest month on the bottom.
And this is the query:
SELECT Monat2,
Monat,
CASE WHEN NPLAY_IND = '4P'
THEN 'QuadruplePlay'
WHEN NPLAY_IND = '3P'
THEN 'TriplePlay'
WHEN NPLAY_IND = '2P'
THEN 'DoublePlay'
WHEN NPLAY_IND = '1P'
THEN 'SinglePlay'
END AS Series,
Anzahl as Cnt
FROM T_Play_n
where NPLAY_IND != '0P'
order by Series asc ,Monat
This is the new query
SELECT sub.Monat2,sub.Monat,
CASE WHEN NPLAY_IND = '4P'
THEN 'QuadruplePlay'
WHEN NPLAY_IND = '3P'
THEN 'TriplePlay'
WHEN NPLAY_IND = '2P'
THEN 'DoublePlay'
WHEN NPLAY_IND = '1P'
THEN 'SinglePlay'
END
AS Series, Anzahl as Cnt FROM (SELECT ROW_NUMBER () OVER (PARTITION BY Monat2 ORDER BY Monat DESC)rn,
Monat2,
Monat,
Anzahl,
NPLAY_IND
FROM T_Play_n)sub
where sub.rn = 1
It does only show the months once but it doesn't do that for every Series.
So with every Play it should have 12 months.
In Oracle and SQL-Server you can use ROW_NUMBER.
name = month name and num = month number:
SELECT sub.name, sub.num
FROM (SELECT ROW_NUMBER () OVER (PARTITION BY name ORDER BY num DESC) rn,
name,
num
FROM tab) sub
WHERE sub.rn = 1
ORDER BY num DESC;
WITH R(N) AS
(
SELECT 0
UNION ALL
SELECT N+1
FROM R
WHERE N < 12
)
SELECT LEFT(DATENAME(MONTH,DATEADD(MONTH,-N,GETDATE())),3) AS [month]
FROM R
The With R(N) is a Common Table Expression.The R is the name of the result set (or table) that you are generating. And the N is the month number.
In SQL Server you can do It in following:
SELECT DateMonth, DateWithMonth -- Specify columns to select
FROM Tbl -- Source table
WHERE CAST(CAST(DateWithMonth AS INT) * 100 + 1 AS VARCHAR(20)) >= DATEADD(MONTH, -12,GETDATE()) -- Condition to return data for last 12 months
GROUP BY DateMonth, DateWithMonth -- Uniqueness
ORDER BY DateWithMonth -- Sorting to get latest records on the bottom
So it sounds like you want to select rows that contain the last occurrence of months. Something like this should work:
select * from [table_name]
where id in (select max(id) from [table_name] group by [month_column])
The last select in the brackets will get a list of id's for the last occurrence of each month. If the year+month column you have shown is not in descending order already, you might want to max this column instead.
You can use something like this(the table dbo.Nums contains int values from 0 to 11)
SELECT DATEADD(MONTH, DATEDIFF(MONTH, '19991201', CURRENT_TIMESTAMP) + n - 12, '19991201'),
DATENAME(MONTH,DateAdd(Month, DATEDIFF(month, '19991201', CURRENT_TIMESTAMP) + n - 12, '19991201'))
FROM dbo.Nums
I suggest to use a group by for the month name, and a max function for the numeric component. If is not numeric, use to_number().

How to find the row where the sum of all values in a column reaches a specified value?

Given data in a table with the following schema:
CREATE TABLE purchases (timestamp DATETIME, quantity INT)
I would like to find the point in time (i.e. the timestamp of the row) where the sum of the values in the quantity column passed a certain threshold value.
This is in MS SQL Server, and ideally I'd like to avoid using a cursor if possible.
SELECT timestamp, SUM(quantity)
FROM purchases
GROUP BY timestamp
HAVING SUM(quantity) > someValue
Or if it is a Running Sum
SELECT a1.timestamp
FROM purchases a1, purchases a2
WHERE a1.quantity >= a2.quantity or (a1.quantity=a2.quantity and a1.timestamp = a2.timestamp)
GROUP BY a1.timestamp, a1.quantity
having SUM(a2.quantity) >= someValue
ORDER BY a1.timestamp ASC
LIMIT 1
You could get the smallest timestamp where the sum of the previous values is larger than the threshold:
select min(timestamp)
from purchases p
where (
select sum(x.quantity)
from purchases x
where x.timestamp < p.timestamp
) > #threshold
However, this is not a very efficient query, so it might be better to use a cursor after all.
In SQL Server 2005+ you could try this:
;WITH numbered AS (
SELECT
timestamp,
quantity,
rownum = ROW_NUMBER() OVER (ORDER BY timestamp)
FROM purchases
),
recursive AS (
SELECT
timestamp,
quantity,
rownum,
runningsum = quantity,
passed = CASE WHEN n.quantity < #threshold THEN 0 ELSE 1 END
FROM numbered
UNION ALL
SELECT
n.timestamp,
n.quantity,
n.rownum,
runningsum = n.quantity + r.runningsum,
passed = CASE WHEN n.quantity + r.runningsum < #threshold THEN 0 ELSE 1 END
FROM numbered n
INNER JOIN recursive r ON n.rownum = r.rownum + 1
)
SELECT MIN(timestamp)
FROM recursive
WHERE passed = 1
Basically, same as #Guffa's solution, only makes use of CTEs to avoid the need of triangular join.