How can I flag the earliest of overlapping records?

How can I flag the earliest of overlapping records? - sql

I have a dataset where I need to pick out and keep records that have no overlapping time frames, and for those that do overlap, keep the earliest record.
I have been able to successfully picked out the records that have no overlapping time frames with the below code:
IF OBJECT_ID('tempdb..#overlaps') IS NOT NULL DROP Table #overlaps
SELECT
CASE WHEN EXISTS(SELECT 1 FROM #service r2
WHERE r2.client_ID = r1.client_ID
AND r2.service_ID <> r1.service_ID
AND r1.service_start_date <= r2.service_end_date
AND r2.service_start_date <= r1.service_end_date)
THEN 1
ELSE 0
END AS Overlap
,*
into #overlaps
FROM #services r1
This produces the below for an example client:
Overlap client_ID service_ID service_start_date service_end_date
1 12345 123 27-Oct-2009 03-Jan-2013
1 12345 124 27-Dec-2012 19-Mar-2013
1 12345 125 18-Mar-2013 04-Jun-2014
1 12345 126 29-Jun-2014 28-Apr-2017
1 12345 127 23-Jun-2014 14-Aug-2014
1 12345 128 27-Apr-2015 07-Nov-2015
1 12345 129 01-Aug-2015 01-Dec-2015
0 12345 132 01-Jul-2017 09-Dec-2017
0 12345 133 02-Jan-2018 20-Jan-2018
0 12345 134 03-May-2018 05-Jun-2018
What I want to do, is for where overlap = 1, add a column to flag if that record is the first record of an overlapping "set", first in terms of the start date. The service_ID is not actually sequential, I just replaced it to be as dummy data.
So in the above case, record #1 should be flagged a 1 because it has the earliest start of the service compared to its overlapped service record #2 which started later, so record #2 would be flagged a 0, the same for record #3 (ie. flagged as a 0). Going on, record #4 should be flagged as a 1, as it overlaps the ones records below it.
In terms of the final product, I eventually want to just show any non-overlapping periods, and the earliest/first record for the records that do overlap So in the above scenario, records #1,4, 8,9,10 would remain and the rest would be removed. Each record should remain it's own record though, they should not be "pivoted" up into a continuous record.
In other words, what I need to flag are the earliest record that started where there is more than one active service occurring in parallel.
EDIT:
So for example, client has 4 services: Service A started Jan 1 - July 31, Service B Started Feb 1 ended August 1, Service C started September 1 ended Oct 1, Service D started Nov 1 ended Dec 1...Service A should be flagged as 1, Service B which started while Service A was still active should be flagged 0, Service C started without any service being active will be flagged as 1, same as Service D

I think the flag would be:
SELECT (CASE WHEN NOT EXISTS (SELECT 1
FROM #service r2
WHERE r2.client_ID = r1.client_ID AND
r2.service_ID <> r1.service_ID AND
r1.service_start_date <= r2.service_end_date AND
r2.service_start_date < r1.service_end_date
)
THEN 1
ELSE 0
END) AS First_Overlap;
Notes:
This doesn't actually check for an overlap. I left that out, because you can use the overlaps flag for the check, or include the exists query.
The only difference is < versus <= for the overlap check on the start date.
This might not work as you want when the period of overlap has multiple records beginning at the same time.
Also, I suspect you are trying to solve a gaps-and-islands problem. Using multiple temporary tables and the logic that you are using is unnecessary. You might want to ask another question about the entire problem you want to solve, rather than this one facet.

Its difficult to read your exact goal here, but if you're looking to flag based on the service_start_date, when Overlap = 1. This would suffice.
;WITH CTE (Overlap, client_ID, service_ID, service_start_date) AS (
SELECT * FROM (
VALUES
('1','12345','123','10/27/2009'),
('1','12345','124','12/27/2012'),
('1','12345','125','3/18/2013'),
('1','12345','126','6/29/2014'),
('1','12345','127','6/23/2014'),
('1','12345','128','4/27/2015'),
('1','12345','129','8/1/2015'),
('0','12345','132','7/1/2017'),
('0','12345','133','1/2/2018'),
('0','12345','134','5/3/2018')
) AS A (Overlap, client_ID, service_ID, service_start_date)
)
SELECT CTE.Overlap,
CTE.client_ID,
CTE.service_ID,
CTE.service_start_date,
t2.Result
FROM CTE
LEFT JOIN (
SELECT '1' AS Result,
t2.client_ID,
MIN(t2.service_start_date) AS service_start_date
FROM CTE t2
WHERE t2.Overlap = '1'
GROUP BY client_ID
) t2 ON CTE.client_ID = t2.client_ID
AND CTE.service_start_date = t2.service_start_date
ORDER BY service_ID
This also does not account for anything other than flagging the first Overlap by service_start_date. For instance, if you wanted to flag those that aren't first as 0's, that would need to be added.

UPDATE #overlaps SET IsFirst=1
FROM
(SELECT overlap, client_id client_id, service_start_date service_start_date, service_end_date service_end_date, min(service_id) service_id
FROM #overlaps
WHERE overlap=1
group by overlap, client_id, service_start_date, service_end_date) a
where #overlaps.client_id = a.client_id and #overlaps.service_id = a.service_id
Edit
#marshymell0 - I think I'm understanding what you want. Writing this as a query is pretty tricky, so I'm using a cursor instead. In the section where I have the line PRINT #service_start_date_prev, is where you would update the flag column that determines if the record is the first in the overlapping set.
DECLARE #overlap_prev int, #client_id_prev int, #service_id_prev int
DECLARE #overlap_next int, #client_id_next int, #service_id_next int
DECLARE #service_start_date_prev datetime, #service_end_date_prev datetime
DECLARE #service_start_date_next datetime, #service_end_date_next datetime
DECLARE #part_of_set int = 0
DECLARE o_cursor CURSOR
FOR SELECT overlap, client_id, service_id, service_start_date, service_end_date
FROM #overlaps where overlap=1
ORDER BY service_start_date
OPEN o_cursor
FETCH NEXT FROM o_cursor
INTO #overlap_next, #client_id_next, #service_id_next, #service_start_date_next, #service_end_date_next
WHILE ##FETCH_STATUS = 0
BEGIN
IF (#service_start_date_prev IS NOT NULL)
BEGIN
IF (#part_of_set = 0 AND #service_start_date_prev <= #service_end_date_next AND #service_start_date_next <= #service_end_date_prev)
BEGIN
PRINT #service_start_date_prev
SET #part_of_set = 1
END
ELSE
SET #part_of_set = 0
END
SET #overlap_prev = #overlap_next
SET #client_id_prev = #client_id_next
SET #service_id_prev = #service_id_next
SET #service_start_date_prev = #service_start_date_next
SET #service_end_date_prev = #service_end_date_next
FETCH NEXT FROM o_cursor
INTO #overlap_next, #client_id_next, #service_id_next, #service_start_date_next, #service_end_date_next
END
CLOSE o_cursor;
DEALLOCATE o_cursor;

Related

A question again on cursors in SQL Server

I am reading data using modbus The data contains status of the 250 registers in a PLC as either off or on with the time of reading as the time stamp. The raw data received is stored in table as below where the column register represents the register read and the column value represents the status of the register as 0 or 1 with time stamp. In the sample I am showing data for just one register (ie 250). Slave ID represents the PLC from which data was obtained
I need to populate one more table Table_signal_on_log from the raw data table. This table should contain the time at which the value changed to 1 as the start time and the time at which it changes back to 0 as end time. This table is also given below
I am able to do it with a cursor but it is slow and if the number of signals increases could slow down the processing. How could I do without cursor. I tried to do it with set based operations I couldn't get one working. I need to avoid repeat values ie after recording 13:30:30 as the time at which signal becomes 1, I have to ignore all entries till it becomes 0 and record that as end time. Again ignore all values till becomes 1. This process is done once in 20 seconds (can be done at any interval but presently 20). So I may have 500 rows to be looped through every time. This may increase as the number of PLCs connected increases and cursor operation is bound to be an issue
Raw data table
SlaveID Register Value Timestamp ProcessTime
-------------------------------------------------------
3 250 0 13:30:10 NULL
3 250 0 13:30:20 NULL
3 250 1 13:30:30 NULL
3 250 1 13:30:40 NULL
3 250 1 13:30:50 NULL
3 250 1 13:31:00 NULL
3 250 0 13:31:10 NULL
3 250 0 13:31:20 NULL
3 250 0 13:32:30 NULL
3 250 0 13:32:40 NULL
3 250 1 13:32:50 NULL
Table_signal_on_log
SlaveID Register StartTime Endtime
3 250 13:30:30 13:31:10
3 250 13:32:50 NULL //value is still 1

This is a classic gaps-and-islands problem, there are a number of solutions. Here is one:
Get the previous Value for each row using LAG
Filter so we only have rows where the previous Value is different or non-existent, in other words the beginning of an "island" of rows.
Of those rows, get the next Timestamp for eacc row using LEAD.
Filter so we only have Value = 1.
WITH cte1 AS (
SELECT *,
PrevValue = LAG(t.Value) OVER (PARTITION BY t.SlaveID, t.Register ORDER BY t.Timestamp)
FROM YourTable t
),
cte2 AS (
SELECT *,
NextTime = LEAD(t.Timestamp) OVER (PARTITION BY t.SlaveID, t.Register ORDER BY t.Timestamp)
FROM cte1 t
WHERE (t.Value <> t.PrevValue OR t.PrevValue IS NULL)
)
SELECT
t.SlaveID,
t.Register,
StartTime = t.Timestamp,
Endtime = t.NextTime
FROM cte2 t
WHERE t.Value = 1;
db<>fiddle

A follow up question on Gaps and Islands solution

This is continuation of my previous question A question again on cursors in SQL Server.
To reiterate, I get values from a sensor as 0 (off) or 1(on) every 10 seconds. I need to log in another table the on times ie when the sensor value is 1.
I will process the data every one minute (which means I will have 6 rows of data). I needed a way to do this without using cursors and was answered by #Charlieface.
WITH cte1 AS (
SELECT *,
PrevValue = LAG(t.Value) OVER (PARTITION BY t.SlaveID, t.Register ORDER BY t.Timestamp)
FROM YourTable t
),
cte2 AS (
SELECT *,
NextTime = LEAD(t.Timestamp) OVER (PARTITION BY t.SlaveID, t.Register ORDER BY t.Timestamp)
FROM cte1 t
WHERE (t.Value <> t.PrevValue OR t.PrevValue IS NULL)
)
SELECT
t.SlaveID,
t.Register,
StartTime = t.Timestamp,
Endtime = t.NextTime
FROM cte2 t
WHERE t.Value = 1;
db<>fiddle
The raw data set and desired outcome are as below. Here register 250 represents the sensor and value presents the value as 0 or 1 and time stamp represents the time of reading the value
SlaveID
Register
Value
Timestamp
ProcessTime
3
250
0
13:30:10
NULL
3
250
0
13:30:20
NULL
3
250
1
13:30:30
NULL
3
250
1
13:30:40
NULL
3
250
1
13:30:50
NULL
3
250
1
13:31:00
NULL
3
250
0
13:31:10
NULL
3
250
0
13:31:20
NULL
3
250
0
13:32:30
NULL
3
250
0
13:32:40
NULL
3
250
1
13:32:50
NULL
The required entry in the logging table is
SlaveID
Register
StartTime
Endtime
3
250
13:30:30
13:31:10
3
250
13:32:50
NULL //value is still 1
The solution given works fine but when the next set of data is processed, the exiting open entry (end time is null) is to be considered.
If the next set of values is only 1 (ie all values are 1), then no entry is to be made in the log table since the value was 1 in the previous set of data and continues to be 1. When the value changes 0 in one of the sets, then the end time should be updated with that time. A fresh row to be inserted in log table when it becomes 1 again

I solved the issue by using a 'hybrid'. I get 250 rows (values of 250 sensors polled) every 10 seconds. I process the data once in 180 seconds. I get about 4500 records which I process using the CTE. Now I get result set of around 250 records (a few more than 250 if some signals have changed the state). This I insert into a #table (of the table being processed) and use a cursor on this #table to check and insert into the log table. Since the number of rows is around 250 only cursor runs without issue.
Thanks to #charlieface for the original answer.

How to check period id and last updated value then assign to variable?

I have table where users will be able to get the periods for each record. There are four different periods for each record. Here is example of table periods with data:
profile_id year_id last_update_dt period_id
1234564 2019 2017-06-13 15:11:34 2
1234565 2019 2017-04-14 09:54:29 3
1234566 2019 2018-02-01 14:44:10 4
1234567 2019 2017-07-12 08:51:14 5
345356 2020 2019-12-23 12:34:56 2
Here is example of profile table data:
rec_id year_id profile_id
7548763 2018 988753
7548763 2019 746546
7548763 2020 765745
6983234 2020 345356
Current code is developed with back end language (in my case ColdFusion) and looks like this:
<cfquery name="qPeriods" datasource="testDB">
SELECT p2.last_update_dt AS period2, p3.last_update_dt AS period3, p4.last_update_dt AS period4
FROM profile pf
LEFT JOIN periods p2 ON pf.profile_id = p2.profile_id AND pf.year_id = p2.year_id AND p2.period_id = 3
LEFT JOIN periods p3 ON pf.profile_id = p3.profile_id AND pf.year_id = p3.year_id AND p3.period_id = 4
LEFT JOIN periods p4 ON pf.profile_id = p4.profile_id AND pf.year_id = p4.year_id AND p4.period_id = 5
WHERE pf.rec_id = 7548763 AND pf.year_id = 2019
</cfquery>
<cfset period = 2 />
<cfset period = len(trim(qPeriods.period2)) gte '1' ? '3' : period />
<cfset period = len(trim(qPeriods.period3)) gte '1' ? '4' : period />
<cfset period = len(trim(qPeriods.period4)) gte '1' ? '5' : period />
As you can see I used hard coded values in cfquery to get some data for testing purpose. The logic will set default value for period 2. Then it will check and override previous value if criteria is met and set to previous value if it's not. I was wondering if this can be simplified and instead of joining each period use only one join. So I came up with this example:
DECLARE #period varchar(1)
SELECT
#period = CASE WHEN ps.period_id = 3 AND LTRIM(RTRIM(ps.last_update_dt)) IS NOT NULL THEN '3' END,
#period = CASE WHEN ps.period_id = 4 AND LTRIM(RTRIM(ps.last_update_dt)) IS NOT NULL THEN '4' END,
#period = CASE WHEN ps.period_id = 5 AND LTRIM(RTRIM(ps.last_update_dt)) IS NOT NULL THEN '5' END,
#period = CASE WHEN #period IS NULL THEN '2' END
FROM profile pf
LEFT JOIN periods ps ON pf.profile_id = ps.profile_id AND ps.year_id = ps.year_id
WHERE pf.rec_id = 7548763 AND pf.year_id = 2019
For some reason code above was always giving 2 as a result. I'm not sure why that's happening. I looked over the code multiple times but still can;t find the reason why it's failing. If anyone see the issue with the code I created in SQL please let me know.

Consider the below SQL:
DECLARE #Value INT
SET #Value = 5
SELECT #Value = CASE WHEN 1 = 2 THEN 123 END
SELECT #Value
If you run this you'll see that #Value turns out to be NULL, not 5. This is because when the condition in the CASE statement is not met and it has no ELSE then it doesn't ignore setting the variable, it sets it to NULL.
If you take this and look at your code you will see that if the "#period = CASE WHEN ps.period_id = 5 ..." condition is not met then #period will be set to NULL, thus causing the next part "#period = CASE WHEN #period IS NULL" to pass the condition and then set it to 2.
Basically, you're expecting the value of #period to be what it was the last time it passed a condition in one of the CASE statements, when in reality it's setting it to NULL each time it fails. To avoid this, try adding "ELSE #period" clauses to each CASE.

If you just want the record with the smallest period_id for a given user, you could just sort and top 1:
select top 1 per.last_update
from profiles pro
inner join periods per
on pro.profile_id = per.profile_id
and pro.year_id = per.year_id
where
pro.year_id = 2019
and pro.rec_id = 7548763
order by per.period_id

Based on an earlier comment made by #espresso_coffee:
"We always want to return the highest period."
I take this to mean the objective is to return a single row with the max(period_id) for a given profile_id/year_id combo; if my understanding is correct then how about a simple max() query, eg:
select isnull(max(period_id),'2')
from periods
where rec_id = 7548763
and year_id = 2019
and ltrim(rtrim(ps.last_update_dt)) is not NULL
'course, this assumes that a record in profile does exist; if it's possible for rows to exist in periods without a matching row in profile, we could add a join (or exists() sub-query) to verify the existence of a row in profile.
NOTE: I'm assuming Sybase ASE; if this is one of the other Sybase RDBMS products (eg, IQ, SQLAnywhere, Advantage) then we may need to switch out the isnull() for the corresponding function.

You should not be setting the same parameter multiple times in a select. Your last select is better written as:
SELECT #period = (CASE WHEN ps.period_id IN (3, 4, 5) AND
LTRIM(RTRIM(ps.last_update_dt)) IS NOT NULL
THEN CAST(ps.period_id AS CHAR(1))
ELSE '2'
END)

Auto Generated Serial Number using Stored Procedure

I want to create a procedure which would create a serial number using a stored procedure.
I have three tables:
Table 1:
create table ItemTypes
(
ItemTypeID int not null primary key,
ItemType varchar(30)
)
Table 2:
create table ItemBatchNumber
(
ItemBatchNumberID int not null primary key,
ItemBatchNumber varchar(20),
ItemType varchar(30),
)
Table 3:
create table ItemMfg
(
ManufactureID int not null primary key,
ItemBatchNumberID int foreign key references ItemBatchNumber(ItemBatchNumberID),
SerialNumber varchar(10),
MfgDate datetime
)
For each Item Type there are several Item batch number.
Now, first 3 digit of serial no is xyz. 4 digit of serial no should be Item Type(e.g if Item type is 'b' then serial no should be xyzb).
5 digit of serial no should be like this:
In a day, for first Item batch number of a Item type- 5th digit should be 1 and it will remain 1 for that day.For the next Item batch number it should be 2 and it will remain 2 for that day.
For next day same rule applied.
e.g suppose 'b' Item Type has 3 Item batch number WB1,WB2,WB3. If today someone select WB2(Item batch number) of 'b' Item Type first then Serial No should be xyzb1 and it will remain xyzb1 for today for WB2. Now if someone select WR1 next then Serial No should be xyzb2 for today. Tomorrow which Item batch number of 'b' Item type will be selected first, for that batch number and that type serial no should be xyzb1. Same rule applied for other item type.
I have tried till now:
create procedure Gen_SerialNumber
(
#ManufactureID int,
#IitemType varchar(30),
#ItemBatchNumberID int,
#Date datetime,
#SerialNumber out,
#fifthDigit int out
)
AS
Begin
set #IitemType=(Select ItemType from ItemBatchNumber where ItemBatchNumber=#ItemBatchNumber)
Declare #SerialNumber1 varchar(20)
Set #SerialNumber1= 'xyz'+''+#IitemType+''+CAST( (Select COUNT(distinct ItemBatchNumber)from ItemBatchNumber
where ItemType=#IitemType) as varchar (10) )
Set #fifthDigit=SUBSTRING(#SerialNumber1,5,1)
IF EXISTS(SELECT SerialNumber FORM ItemMfg WHERE SerialNumber=null or SerialNumber!=#SerialNumber)
SET #fifthDigit=1
IF EXISTS(SELECT mfgDate,ItemBatchNumberID FROM ItemMfg WHERE mfgDate=#Date and ItemBatchNumberID=#ItemBatchNumberID)
SET #fifthDigit=1
ELSE
SET #fifthDigit=#fifthDigit+1
SET #SerialNumber=('xyz'+''+#ItemType+''+cast(#fifthdigit as varchar(2)))
INSERT INTO ItemMfg VALUES(#ItemType,#ItemBatchNumberID,#SerialNumber,#Date)
END
I am new to SQL. 4rth digit of SN is generated correctly from my code. I am facing problem on how to increment the value of fifth digit checking with dates when next different item batch number of a same item type or different item type is used. Please let me know if you have any doubt. Thanks in advance.

A couple of concerns with your question: I am confused by the appearance of the WireBatchNumber column in your stored procedure, as that is not included in the table definition you provided. Secondly, the last #ItemType (right before the end command) is misspelled.
I think the challenge here is that you need your stored procedure increment a variable across batches and to "start over" each day. That would suggest to me that you need the procedure to
1. Track the last time it was run and see if it was today or not.
2. Track how many times it has been run today.
It is not a very beginner level knowledge type of task, but there is a way, apparently: how to declare global variable in SQL Server..?. Using the type of variable mentioned in this link, you could set up some conditional structures that compare a date variable to today using the DateDiff() function, changing the date variable and resetting your counters if they the two dates are on different days, then incrementing a counter for the item batch number and using this counter to provide the fifth digit.

Let's populate some data:
ItemTypes:
ItemTypeID ItemType
1 a
2 b
3 c
ItemBatchNumber:
ItemBatchNumberID ItemBatchNumber ItemType
...
11 WB1 b
22 WB2 b !
33 WB3 b !!
44 WB1 c
55 WB2 c !
66 WB3 c !!
77 WB3 c !!
...
ItemMfg:
ManufactureID ItemBatchNumberID MfgDate
111 22 2015-03-01 7:00 -> xyzb1
222 11 2015-03-01 8:00 -> xyzb1
333 22 2015-03-01 9:00 -> xyzb2
444 33 2015-03-02 5:00 -> xyzb1
555 33 2015-03-02 6:00 -> xyzb2
666 11 2015-03-02 7:00 -> xyzb1
777 33 2015-03-02 8:00 -> xyzb3
888 11 2015-03-02 9:00 -> xyzb2
999 22 2015-03-02 9:35 -> xyzb1
I see some inappropriate things - it does not necessary means there are mistakes.
Sorry, I do not know real business rules - they may explain things differently.
I assume the simplest and most usable logic.
ItemTypes - is looks like lookup table. But you do not use ItemTypeID, instead you use it's value (a,b,c) as unique code.
So table should looks like this:
ItemType(PK) ItemTypeDescr
a Type A
b Most Usable Type B
c Free text for type C
ItemBatchNumber - match table - define matches between batches and ItemTypes.
It may have data as marked with "!", and this is valid.
To avoid 77 need add some unique index or set PK on ItemBatchNumber+ItemType.
But in any case sets as 22,55 and 11,44.. are normaly expected. So your SP will fails.
Here you will get an error.
Here query may return multipple rows (22, 55):
set #IitemType=(Select ItemType from ItemBatchNumber where ItemBatchNumber=#ItemBatchNumber)
alternatively:
Select #IitemType = ItemType from ItemBatchNumber where ItemBatchNumber=#ItemBatchNumber
in case of multipple rows it will return last one.
But anyway it is not correct - if #ItemBatchNumber = 'WB2' which value is expected 'b' or 'c'?
errors:
...
... and ItemTypeID=#IitemType ...
ItemTypeID is int;
#IitemType is char ('b') - what do you expect?
...
MfgDate=#Date
Dates '2015-03-02 5:00' and '2015-03-02 7:00' are not the same, but in the same day.
...
...from ItemBatchNumber,ItemMfg
where MfgDate=#Date
and ItemBatchNumber=#ItemBatchNumber
and ItemTypeID=#IitemType
even if ItemBatchNumber will return one row, and date will not count time, it will return ALL batches from ItemMfg from one day.
you need to do proper join.
...
Select COUNT(distinct ItemBatchNumber)
you always will need (count() + 1), and not a distinct
I am not sure, when you need to generate SN:
a. at the moment when (before) you ADD new ItemMfg, then need to check for current day.
b. for any existed ItemMfg row (for 555), then you need to exclude from count() all rqws after 555.
That's how query may looks:
a. (on adding new ItemMfg-row - pass values used for creation ItemMfg-row)
create procedure AddNewItemMfg -- ...and generate SN
(
#ItemBatchNumberID int,
#MfgDate datetime -- pass getdate()
) AS Begin
-- calc 5th digit:
declare #fifthDigit int;
select
#fifthDigit = count(*) + 1
from ItemBatchNumber AS bb
inner join ItemMfg ii ON ii.ItemBatchNumberID = bb.ItemBatchNumberID
where bb.ItemBatchNumberID = #ItemBatchNumberID -- single ItemBatchNumber-row to get ItemType
and ii.MfgDate <= #MfgDate -- all previous datetimes
and cast(ii.MfgDate as date) = cast(#MfgDate as date) -- same day
-- ManufactureID is Identity (i.e. autoincremented)
INSERT INTO ItemMfg (ItemBatchNumberID, MfgDate, SerialNumber)
SELECT #ItemBatchNumberID
, #MfgDate
, 'xyz' + bb.ItemType + cast(#fifthDigit as varchar(5))
FROM ItemBatchNumber bb
WHERE bb.ItemBatchNumber = #ItemBatchNumber
;
end
b. for any already existed ItemMfg-row
create procedure GenerateSNforExistedItemMfg
(
#ManufactureID int
) AS Begin
-- generate SN - same as previous but in single query
declare #SN varchar(10);
Select #SN = 'xyz'
+ bb.ItemType
+ cast(
select count(*) + 1
from ItemMfg mm -- single row from mm
inner join ItemBatchNumber AS bb -- single row from bb
ON bb.ItemBatchNumberID = mm.ItemBatchNumberID
inner join ItemMfg ii -- many matched rows
ON ii.ItemBatchNumberID = bb.ItemBatchNumberID
where mm.ManufactureID = #ManufactureID -- single row from mm
and ii.MfgDate <= mm.MfgDate -- all previous datetimes
and cast(ii.MfgDate as date) = cast(mm.MfgDate as date) -- same day
as varchar(5))
FROM ItemBatchNumber AS bb
INNER JOIN ItemMfg ii ON ii.ItemBatchNumberID = bb.ItemBatchNumberID
WHERE ii.ManufactureID = #ManufactureID
-- update SN
UPDATE ItemMfg SET
SerialNumber = #SN
WHERE ii.ManufactureID = #ManufactureID;
end

Help writing a mysql query - this mustve been done 1000 times before but I am struggling..please help?

Update:
I am editing my question in the hope of getting a better answer. I see this is not so simple but I cant believe there is not a simpler solution than what has been mentioned so far.
I am now looking to see if there is some kind of php, mysql solution to deal with this in the most efficent way. I have modified my question below to try and make things clearer
I have a table with the following fields:
UserID
GroupID
Action
ActionDate
This table simply stores whenever a user on my system is added to a group (action = 1) or removed from a group (action = -1). The datetime is recorded whenever one of the above actions take place, as ActionDate
A group is charged for every user they have each month as long as the user was part of the group for at least 15 days of that billing month (a billing month means not the beginning of a month necessarily, could be from the 15th of Jan to 15th of Feb)
I bill my groups every month at the begining of a billing month for all users who are part of their group at that time. Now over the course of the month they might add new users to their group or remove existing users from their group.
If they removed a user I need to know if the user was part of the group for at least 15 days of that billing month. If he was then do nothing, if not then the group needs to be refunded for that user (as they paid for the user at the beginning of the month but he was part of the group for less than 15 days)
If they added a user and the user was in the group for at least 15 days (ie added within 15 days of billing month AND was not removed before 15 days were up) then the group must be charged for this user. If the user did not end up with 15 days as part of the group then we do nothing (no charge).
Some of the additional complexities are:
A user might be added or removed multiple times over the course of that billing month and we would need to keep track of total number of days that he was part of the group
We need to be able to differentiate between users who are being removed (ultimately) or added (ultimately) in order to correctly bill the group. (for example a user who has 10 days as part of the group - if he was ultimately removed from the group then we issue a refund. If he was being added to the group then we dont charge - because less than 10 days)
In any given billing month the user might not appear in this table since their status was not changed - ie they remained a part of the group or were never part of the group. The truth is that nothing needs to be done with these users as if necessary they will be included in the base monthly calculation of "how many users in group today"
I am starting to realize there is no simple mysql solution and i need a php, mysql combo. Please help!!!
Here is my most recent sql attempt but it does not incorporate all the issues i have discussed below:
SELECT *
FROM groupuserlog
where action = 1
and actiondate >= '2010-02-01'
and actiondate < date_add('2010-02-01',INTERVAL 15 DAY)
and userid not in (select userid
from groupuserlog
where action = -1
and actiondate < '2010-03-01'
and actiondate > date_add('2010-02-01', INTERVAL 15 DAY))

I am assuming that a User might have joined a group long before the billing period, and might not change status during the billing period. This requires your entire table to be scanned to construct a membership table which looks like this:
create table membership (
UserId int not null,
GroupId int not null,
start datetime not null,
end datetime not null,
count int not null,
primary key (UserId, GroupId, end )
);
Once this is correctly populated, the answer you want is easily obtained:
set #sm = '2009-02-01';
set #em = date_sub( date_add( #sm, interval 1 month), interval 1 day);
# sum( datediff( e, s ) + 1 ) -- +1 needed to include last day in billing
select UserId,
GroupId,
sum(datediff( if(end > #em, #em, end),
if(start<#sm, #sm, start) ) + 1 ) as n
from membership
where start <= #em and end >= #sm
group by UserId, GroupId
having n >= 15;
The scan needs to be performed by a cursor (which will not be fast). We need to sort your input table by ActionDate and Action so that "join" events appear before "leave" events. The count field
is there to help cope with pathological cases - where a membership is ended one date, then re-started on the same date, and ended again on the same date, and started again on the same date, etc. In these cases, we increment the count for each start event, and decrement for each end event. We will only close a membership when an end event takes the count down to zero. At the end of populating the membership table, you can query the value of count: closed memberships should have count = 0, open memberships (not yet closed) should have count = 1. Any entries with count outside 0 and 1 should be examined closely - this would indicate a bug somewhere.
The cursor query is:
select UserID as _UserID, GroupID as _GroupID, Date(ActionDate) adate, Action from tbl
order by UserId, GroupId, Date(ActionDate), Action desc;
"Action desc" should break ties so that start events appear before end events should someone join and leave a group on the same date. ActionDate needs to be converted from a datetime to a date because we're interested in units of days.
The actions within the cursor would be the following:
if (Action = 1) then
insert into membership
set start=ActionDate, end='2037-12-31', UserId=_UserId, GroupId=_GroupId, count=1
on duplicate key update set count = count + 1;
elsif (Action == -1)
update membership
set end= if( count=1, Actiondate, end),
count = count - 1
where UserId=_UserId and GroupId=_GroupId and end = '2037-12-31';
end if
I have not given you the exact syntax of the cursor definition required (you can find that in the MySQL manual) because the full code will obscure the idea. In fact, it might be faster to perform the cursor logic within your application - perhaps even building the membership details within the application.
EDIT: Here is the actual code:
create table tbl (
UserId int not null,
GroupId int not null,
Action int not null,
ActionDate datetime not null
);
create table membership (
UserId int not null,
GroupId int not null,
start datetime not null,
end datetime not null,
count int not null,
primary key (UserId, GroupId, end )
);
drop procedure if exists popbill;
delimiter //
CREATE PROCEDURE popbill()
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE _UserId, _GroupId, _Action int;
DECLARE _adate date;
DECLARE cur1 CURSOR FOR
select UserID, GroupID, Date(ActionDate) adate, Action
from tbl order by UserId, GroupId, Date(ActionDate), Action desc;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
truncate table membership;
OPEN cur1;
REPEAT
FETCH cur1 INTO _UserId, _GroupId, _adate, _Action;
IF NOT done THEN
IF _Action = 1 THEN
INSERT INTO membership
set start=_adate, end='2037-12-31',
UserId=_UserId, GroupId=_GroupId, count=1
on duplicate key update count = count + 1;
ELSE
update membership
set end= if( count=1, _adate, end),
count = count - 1
where UserId=_UserId and GroupId=_GroupId and end = '2037-12-31';
END IF;
END IF;
UNTIL done END REPEAT;
CLOSE cur1;
END
//
delimiter ;
Here's some test data:
insert into tbl values (1, 10, 1, '2009-01-01' );
insert into tbl values (1, 10, -1, '2009-01-02' );
insert into tbl values (1, 10, 1, '2009-02-03' );
insert into tbl values (1, 10, -1, '2009-02-05' );
insert into tbl values (1, 10, 1, '2009-02-05' );
insert into tbl values (1, 10, -1, '2009-02-05' );
insert into tbl values (1, 10, 1, '2009-02-06' );
insert into tbl values (1, 10, -1, '2009-02-06' );
insert into tbl values (2, 10, 1, '2009-02-20' );
insert into tbl values (2, 10, -1, '2009-05-30');
insert into tbl values (3, 10, 1, '2009-01-01' );
insert into tbl values (4, 10, 1, '2009-01-31' );
insert into tbl values (4, 10, -1, '2009-05-31' );
Here's the code being run, and the results:
call popbill;
select * from membership;
+--------+---------+---------------------+---------------------+-------+
| UserId | GroupId | start | end | count |
+--------+---------+---------------------+---------------------+-------+
| 1 | 10 | 2009-01-01 00:00:00 | 2009-01-02 00:00:00 | 0 |
| 1 | 10 | 2009-02-03 00:00:00 | 2009-02-05 00:00:00 | 0 |
| 1 | 10 | 2009-02-06 00:00:00 | 2009-02-06 00:00:00 | 0 |
| 2 | 10 | 2009-02-20 00:00:00 | 2009-05-30 00:00:00 | 0 |
| 3 | 10 | 2009-01-01 00:00:00 | 2037-12-31 00:00:00 | 1 |
| 4 | 10 | 2009-01-31 00:00:00 | 2009-05-31 00:00:00 | 0 |
+--------+---------+---------------------+---------------------+-------+
6 rows in set (0.00 sec)
Then, check how many billing days appear in feb 09:
set #sm = '2009-02-01';
set #em = date_sub( date_add( #sm, interval 1 month), interval 1 day);
select UserId,
GroupId,
sum(datediff( if(end > #em, #em, end),
if(start<#sm, #sm, start) ) + 1 ) as n
from membership
where start <= #em and end >= #sm
group by UserId, GroupId;
+--------+---------+------+
| UserId | GroupId | n |
+--------+---------+------+
| 1 | 10 | 4 |
| 2 | 10 | 9 |
| 3 | 10 | 28 |
| 4 | 10 | 28 |
+--------+---------+------+
4 rows in set (0.00 sec)
This can be made to just scan table for changes since the last run:
remove the "truncate membership" statement.
create a control table containing the last timestamp processed
calculate the last timestamp you want to include in this run (I would suggest that max(ActionDate) is not good because there might be some out-of-order arrivals coming with earlier timestamps. A good choice is "00:00:00" this morning, or "00:00:00" on the first day of the month).
alter the cursor query to only include tbl entries between the date of the last run (from the control table) and the calculated last date.
finally update the control table with the calculated last date.
If you do that, it is also a good idea to pass in a flag that allows you to rebuild from scratch - ie. reset the control table to the start of time, and truncate the membership table before running the usual procedure.

Not sure about your table but perhaps something like?
SELECT COUNT(UserID)
FROM MyTable
WHERE MONTH(ActionDate) = 3
AND GroupID = 1
AND Action = 1
GROUP BY UserID

I think all the complexity lies in how to figure out the adjacent remove action for a given add action. So, how about adding a column pointing at the primary key of the subsequent action?
Supposing that column is called NextID,
How many users joined a group in a given month and remained part of that group for at least 15 days:
SELECT COUNT(DISTINCT UserID)
FROM MyTable AS AddedUsers
LEFT OUTER JOIN MyTable
ON MyTable.ID = AddedUsers.NextID
AND MyTable.ActionDate > DATE_ADD(AddedUsers.ActionDate, INTERVAL 15 DAY)
AND MyTable.Action = -1
WHERE MONTH(AddedUsers.ActionDate) = 3 AND YEAR(AddedUsers.ActionDate) = 2012
AND AddedUsers.GroupID = 1
AND AddedUsers.Action = 1
AND MONTH(DATE_ADD(AddedUsers.ActionDate, INTERVAL 15 DAY)) = 3;
How many people were removed from a group in a given month that did not remain in a group for at least 15 days:
SELECT COUNT(DISTINCT UserID)
FROM MyTable AS RemovedUsers
INNER JOIN MyTable
ON MyTable.NextID = RemovedUsers.ID
AND RemovedUsers.ActionDate <= DATE_ADD(MyTable.ActionDate, INTERVAL 15 DAY)
AND MyTable.Action = 1
WHERE MONTH(RemovedUsers.ActionDate) = 3 AND YEAR(RemovedUsers.ActionDate) = 2012
AND RemovedUsers.GroupID = 1
AND RemovedUsers.Action = -1;

I started working through Martin's proposed solution and realised that although it is probably the right path to take i decided that I would go with what I know best which is php as opposed to complex sql. Although for sure less efficient, since my table sizes will never be too big it makes the most sense for me.
In the end I wrote a simple query which creates a user history in chronological order for all user activity in the group for a given month.
SELECT Concat(firstname,' ',lastname) as name, username, UserID,ACTION , Date(ActionDate), Unix_Timestamp(ActionDate) as UN_Action, DateDiff('$enddate', actiondate ) AS DaysTo, DateDiff( actiondate, '$startdate' ) AS DaysFrom
FROM `groupuserlog` inner join users on users.id = groupuserlog.userid WHERE groupuserlog.groupid = $row[groupid] AND ( actiondate < '$enddate' AND actiondate >= '$startdate') ORDER BY userid, actiondate
I then loop through the result set and collect all data for each user. The first action (either add or remove) of the month indicates whether or not this is a user was someone who previously existed in the group or not. I then go through the history and simply calculate the number of active days - at the end of it I just see if a refund or charge shoudl be issued, depending on whether the user previously existed in the group or not.
Its not so pretty but it does the job cleanly and allows me for some additional processing which I need to do.
Thanks to everyone fo the help.
My php code, if anyone is interested looks as follows:
while($logrow = mysql_fetch_row($res2)) {
list($fullname, $username, $guserid,$action,$actiondate,$uxaction,$daysto,$daysfrom) = $logrow;
if($action == 1)
$actiondesc = "Added";
else
$actiondesc = "Removed";
//listing each user by individual action and building a history
//the first action is very important as it defines the previous action
if($curruserid != $guserid) {
if($curruserid > 0) {
//new user history so reset and store previous user value
if($wasMember) {
//this was an existing member so check if need refund (if was not on for 15 days)
$count = $basecount + $count;
echo "<br>User was member and had $count days usage";
if($count< 15) {
array_push($refundarrinfo, "$fullname (#$guserid $username)");
array_push($refundarr, $guserid);
echo " REFUND";
} else
echo " NONE";
} else {
//this user was not an existing member - see if need to charge (ie if was on for min 15 days)
$count = $basecount + $count;
echo "<br>User was not a member and was added for $count days usage";
if($count >= 15) {
array_push($billarrinfo, "$fullname (#$guserid $username)");
array_push($billarr, $guserid);
echo " CHARGE";
} else
echo " NONE";
}
}
$basecount = 0;
$count = 0;
$prev_uxaction = 0;
//setup new user - check first action
echo "<br><hr><br>$guserid<br>$actiondesc - $actiondate"; // - $daysto - $daysfrom";
if($action == 1)
$wasMember = FALSE;
else {
//for first action - if is a remove then store in basecount the number of days that are for sure in place
$basecount = $daysfrom;
$wasMember = TRUE; //if doing a remove myust have been a member
}
} else
echo "<br>$actiondesc - $actiondate";// - $daysto - $daysfrom";
$curruserid = $guserid;
if($action == 1) { //action = add
$count = $daysto;
$prev_uxaction = $uxaction; //store this actiondate in case needed for remove calculation
} else { //action = remove
//only do something if this is a remove coming after an add - if not it has been taken care of already
if($prev_uxaction != 0) {
//calc no. of days between previous date and this date and overwrite count by clearing and storing in basecount
$count = ($uxaction - $prev_uxaction)/(60 * 60 * 24);
$basecount = $basecount + $count;
$count = 0; //clear the count as it is stored in basecount
}
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can I flag the earliest of overlapping records? - sql

Related

A question again on cursors in SQL Server

A follow up question on Gaps and Islands solution

How to check period id and last updated value then assign to variable?

Auto Generated Serial Number using Stored Procedure

Help writing a mysql query - this mustve been done 1000 times before but I am struggling..please help?

Categories

Resources