I have the following problem related industrial pump readings. A pump usually have a meter that keeps the record of volume of material processed by that specific pump. Sometimes the meter needs to be replaced with a entirely new meter (meter reading starts with 0) or an old working meter (meter reading can be more than 0). I have a dataset that keeps maintenance record of the pump with meter readings.
And the indication of a meter change is only when we have data in OLD_METER_READING column, otherwise it is blank.
In ideal scenario the data looks like following:
PUMP_NO INSPECTION_DATE MAINTENANCE_TASK METER_READING OLD_METER_READING TOTAL_PUMP_LIFE
11 11-AUG-2000 A 12489 12489
11 14-JUL-2001 B 14007 14007
11 03-SEP-2002 Y 0 14007 14007
11 03-SEP-2002 C 0 14007 14007
11 03-SEP-2002 B 0 14007 14007
11 04-JUN-2003 A 1200 16007
11 21-DEC-2003 A 8000 22007
11 23-FEB-2004 Y 0 10000 24007
11 26-MAY-2004 B 10 24017
11 26-MAY-2004 P 20 24027
11 26-MAY-2004 R 300 24307
11 04-OCT-2004 B 2312 26319
11 31-MAR-2005 A 2889 26896
11 06-NOv-2006 V 5000 29007
11 14-JUL-2008 T 0 7000 31007
However in many cases the Pump technician will make a mistake in loging METER_READING during change of meter. So the data may end up looking like:
PUMP_NO INSPECTION_DATE MAINTENANCE_TASK METER_READING OLD_METER_READING TOTAL_PUMP_LIFE
11 11-AUG-2000 A 12489 12489
11 14-JUL-2001 B 14007 14007
11 03-SEP-2002 Y 0 14007 14007
11 03-SEP-2002 C 0 14007 14007
11 03-SEP-2002 B 0 14007 14007
11 04-JUN-2003 A 1200 16007
11 21-DEC-2003 A 8000 22007
11 23-FEB-2004 Y 0 10000 24007
11 26-MAY-2004 B 10000 34007
11 26-MAY-2004 P 10000 34007
11 26-MAY-2004 R 10000 34007
11 04-OCT-2004 B 2312 26319
11 31-MAR-2005 A 2889 26896
11 06-NOV-2006 V 5000 29007
11 14-JUL-2008 T 0 7000 31007
The mistake in the 2nd set of data is that the technician rather than loging the actual METER_READING used last METER_READING from old meter as the new METER_READING on the day of 26-MAY-2004. However, correct METER_READING was logged again from 04-OCT-2004. We have numerous occasion where for a specific pump (PUMP_NO) we will have erroneous METER_READING entered in the database after a meter change event. It is also creating wrong and confusing value for the TOTAL_PUMP_LIFE.
So, to correct the data we want to add another column in the table and update the table with a Oracle Procedure where the procedure will check the METER_READING field with the following logic:
check the data between two subsequent meter change event. (for example, in this case between 1st meter 03-SEP-2002 and 2nd meter change-23-FEB-2004. And again between 2nd meter change-23-FEB-2004 and 3rd meter change 14-JUL-2008).
if METER_READING between any of these period is higher at prior date compared to METER_READING on a prior date then update the higher METER_READING with the 2nd lowest value (0 and 2312 are the 2 lowest, so update with 2312) in that period.
So, the period between first 2 meter changes will pass and no update will be necessary.However, in the 2nd set of the date all the values (10000) in the METER_READING column for 26-MAY-2014 will be updated with the value of 2312.
I am not sure how to write a PL\SQL to do the compare the values between two events and also how to update the value of a prior date (if higher value found in the METER_READING column) with a lower value between that period.
Database: Oracle SQL 11g
So in looking at your problem, I don't know that you need to resort to PL/SQL. The following query should help you identify which records are in need of updating:
SELECT m.*,
MIN(meter_reading)
OVER (PARTITION BY m.pump_no
ORDER BY m.inspection_date
RANGE BETWEEN NVL((SELECT min(n.inspection_date)-m.inspection_date
FROM maintenance n
WHERE n.inspection_date > m.inspection_date),
0) FOLLOWING
AND NVL((SELECT min(n.inspection_date)-m.inspection_date-1
FROM maintenance n
WHERE n.old_meter_reading IS NOT NULL
AND n.inspection_date > m.inspection_date),
0) FOLLOWING) AS MIN_READING_FOLLOWING
FROM maintenance m
ORDER BY m.inspection_date, old_meter_reading ASC NULLS LAST;
I created a SQLFiddle to demonstrate the query. (Link)
The analytic MIN function is looking at all rows between the next date a read was performed AND the next meter change to see if any of them have a value which is less than the current read.
You could use this as part of an update statement. As for TOTAL_PUMP_LIFE, it might be easiest to recalculate that after you've corrected the meter_readings as part of a separate operation.
Edit 1: Adding PL/SQL to make updates
DECLARE
CURSOR c_readings IS
SELECT m.*,
MIN(meter_reading)
OVER (PARTITION BY m.pump_no
ORDER BY m.inspection_date
RANGE BETWEEN NVL((SELECT min(n.inspection_date)-m.inspection_date
FROM maintenance n
WHERE n.inspection_date > m.inspection_date),
0) FOLLOWING
AND NVL((SELECT min(n.inspection_date)-m.inspection_date-1
FROM maintenance n
WHERE n.old_meter_reading IS NOT NULL
AND n.inspection_date > m.inspection_date),
0) FOLLOWING) AS MIN_READING_FOLLOWING
FROM maintenance m
ORDER BY m.inspection_date, old_meter_reading ASC NULLS LAST;
BEGIN
FOR rec IN c_readings LOOP
IF rec.meter_reading > rec.min_reading_following THEN
UPDATE maintenance m
SET m.meter_reading = rec.min_reading_following
WHERE m.pump_no = rec.pump_no
AND m.inspection_date = rec.inspection_date
AND m.maintenance_task = rec.maintenance_task;
END IF;
END LOOP;
END;
/
You'll need to either COMMIT when this is done or add it to the code.
Maybe what u need to do is something like this:
update MyTable mt1
set value = (select min(value)
from MyTable2 mt2
where mt1.id = mt2.id --your relation
and value NOT IN (select min(value)
from MyTable2 mt3
where mt2.id = mt3.id))
With this update u are getting the min value and not taking the min value original with the NOT IN.
I am working with an sas table and the dates are represented as numbers given in columns "entered" and "left" . I have to count the days the member remained in the system. Like, for example below for id 1, the person entered on 7071 and again used a different product on 7075 although he remained continuously in system from 7071 to 7083. That is the dates overlap. I want to count the final duration a member stayed in the system like as for id 1 it is 12 days (7083-7071) + 2 days (7087 to 7089) + 4 days (7095 to 7099). So the total is 18 days. (There are some duplicate entered and left values but other columns (not shown here) are not same, so these rows were not removed.) . Since i'm working in sas so the idea can be both in sas data or the sas-sql format.
For member 2, there is no overlap of values. So the day count is 2 (8921 to 8923) + 5 days (8935 to 8940) = 7 days. I was able to solve this case as the days didn't overlap but for overlap case, any suggestion or code/advice is appreciated.
id Entered left
1 7071 7077
1 7071 7077
1 7075 7079
1 7077 7083
1 7077 7083
1 7078 7085
1 7087 7089
1 7095 7099
2 8921 8923
2 8935 8940
So the final table should be of the form
id days_in_system
1 18
2 7
This is a surprisingly tricky problem as every row has to be compared to every other row for the same id to check for overlaps and if there are multiple overlaps you have to be very careful not to double-count them.
Here's a hash-based solution - the idea is to build up a hash containing all of the individual days a member has stayed as you go along, then count the number of items in it at the end:
data have;
input id Entered left;
cards;
1 7071 7077
1 7071 7077
1 7075 7079
1 7077 7083
1 7077 7083
1 7078 7085
1 7087 7089
1 7095 7099
2 8921 8923
2 8935 8940
;
run;
data want;
length day 8;
if _n_ = 1 then do;
declare hash h();
rc = h.definekey('day');
rc = h.definedone();
end;
do until(last.id);
set have;
by id;
do day = entered to left - 1;
rc = h.add();
end;
end;
total_days = h.num_items;
rc = h.clear();
keep id total_days;
run;
This should be fairly light on memory as it only has to load the days for 1 id at a time.
The output from id 1 is 20, not 18 - here's a breakdown of the new days added row-by-row that I generated by adding a bit of debugging logic. If this is wrong, please indicate where:
_N_=1
7071 7072 7073 7074 7075 7076
_N_=2
No new days
_N_=3
7077 7078
_N_=4
7079 7080 7081 7082
_N_=5
No new days
_N_=6
7083 7084
_N_=7
7087 7088
_N_=8
7095 7096 7097 7098
_N_=1
8921 8922
_N_=2
8935 8936 8937 8938 8939
If you want to add only days for rows matching a particular condition, you can pick those using a where clause on the set statement, e.g.
set have(where = (var1 in ('value1', 'value2', ...)));
I have a dataset where I need to pick out and keep records that have no overlapping time frames, and for those that do overlap, keep the earliest record.
I have been able to successfully picked out the records that have no overlapping time frames with the below code:
IF OBJECT_ID('tempdb..#overlaps') IS NOT NULL DROP Table #overlaps
SELECT
CASE WHEN EXISTS(SELECT 1 FROM #service r2
WHERE r2.client_ID = r1.client_ID
AND r2.service_ID <> r1.service_ID
AND r1.service_start_date <= r2.service_end_date
AND r2.service_start_date <= r1.service_end_date)
THEN 1
ELSE 0
END AS Overlap
,*
into #overlaps
FROM #services r1
This produces the below for an example client:
Overlap client_ID service_ID service_start_date service_end_date
1 12345 123 27-Oct-2009 03-Jan-2013
1 12345 124 27-Dec-2012 19-Mar-2013
1 12345 125 18-Mar-2013 04-Jun-2014
1 12345 126 29-Jun-2014 28-Apr-2017
1 12345 127 23-Jun-2014 14-Aug-2014
1 12345 128 27-Apr-2015 07-Nov-2015
1 12345 129 01-Aug-2015 01-Dec-2015
0 12345 132 01-Jul-2017 09-Dec-2017
0 12345 133 02-Jan-2018 20-Jan-2018
0 12345 134 03-May-2018 05-Jun-2018
What I want to do, is for where overlap = 1, add a column to flag if that record is the first record of an overlapping "set", first in terms of the start date. The service_ID is not actually sequential, I just replaced it to be as dummy data.
So in the above case, record #1 should be flagged a 1 because it has the earliest start of the service compared to its overlapped service record #2 which started later, so record #2 would be flagged a 0, the same for record #3 (ie. flagged as a 0). Going on, record #4 should be flagged as a 1, as it overlaps the ones records below it.
In terms of the final product, I eventually want to just show any non-overlapping periods, and the earliest/first record for the records that do overlap So in the above scenario, records #1,4, 8,9,10 would remain and the rest would be removed. Each record should remain it's own record though, they should not be "pivoted" up into a continuous record.
In other words, what I need to flag are the earliest record that started where there is more than one active service occurring in parallel.
EDIT:
So for example, client has 4 services: Service A started Jan 1 - July 31, Service B Started Feb 1 ended August 1, Service C started September 1 ended Oct 1, Service D started Nov 1 ended Dec 1...Service A should be flagged as 1, Service B which started while Service A was still active should be flagged 0, Service C started without any service being active will be flagged as 1, same as Service D
I think the flag would be:
SELECT (CASE WHEN NOT EXISTS (SELECT 1
FROM #service r2
WHERE r2.client_ID = r1.client_ID AND
r2.service_ID <> r1.service_ID AND
r1.service_start_date <= r2.service_end_date AND
r2.service_start_date < r1.service_end_date
)
THEN 1
ELSE 0
END) AS First_Overlap;
Notes:
This doesn't actually check for an overlap. I left that out, because you can use the overlaps flag for the check, or include the exists query.
The only difference is < versus <= for the overlap check on the start date.
This might not work as you want when the period of overlap has multiple records beginning at the same time.
Also, I suspect you are trying to solve a gaps-and-islands problem. Using multiple temporary tables and the logic that you are using is unnecessary. You might want to ask another question about the entire problem you want to solve, rather than this one facet.
Its difficult to read your exact goal here, but if you're looking to flag based on the service_start_date, when Overlap = 1. This would suffice.
;WITH CTE (Overlap, client_ID, service_ID, service_start_date) AS (
SELECT * FROM (
VALUES
('1','12345','123','10/27/2009'),
('1','12345','124','12/27/2012'),
('1','12345','125','3/18/2013'),
('1','12345','126','6/29/2014'),
('1','12345','127','6/23/2014'),
('1','12345','128','4/27/2015'),
('1','12345','129','8/1/2015'),
('0','12345','132','7/1/2017'),
('0','12345','133','1/2/2018'),
('0','12345','134','5/3/2018')
) AS A (Overlap, client_ID, service_ID, service_start_date)
)
SELECT CTE.Overlap,
CTE.client_ID,
CTE.service_ID,
CTE.service_start_date,
t2.Result
FROM CTE
LEFT JOIN (
SELECT '1' AS Result,
t2.client_ID,
MIN(t2.service_start_date) AS service_start_date
FROM CTE t2
WHERE t2.Overlap = '1'
GROUP BY client_ID
) t2 ON CTE.client_ID = t2.client_ID
AND CTE.service_start_date = t2.service_start_date
ORDER BY service_ID
This also does not account for anything other than flagging the first Overlap by service_start_date. For instance, if you wanted to flag those that aren't first as 0's, that would need to be added.
UPDATE #overlaps SET IsFirst=1
FROM
(SELECT overlap, client_id client_id, service_start_date service_start_date, service_end_date service_end_date, min(service_id) service_id
FROM #overlaps
WHERE overlap=1
group by overlap, client_id, service_start_date, service_end_date) a
where #overlaps.client_id = a.client_id and #overlaps.service_id = a.service_id
Edit
#marshymell0 - I think I'm understanding what you want. Writing this as a query is pretty tricky, so I'm using a cursor instead. In the section where I have the line PRINT #service_start_date_prev, is where you would update the flag column that determines if the record is the first in the overlapping set.
DECLARE #overlap_prev int, #client_id_prev int, #service_id_prev int
DECLARE #overlap_next int, #client_id_next int, #service_id_next int
DECLARE #service_start_date_prev datetime, #service_end_date_prev datetime
DECLARE #service_start_date_next datetime, #service_end_date_next datetime
DECLARE #part_of_set int = 0
DECLARE o_cursor CURSOR
FOR SELECT overlap, client_id, service_id, service_start_date, service_end_date
FROM #overlaps where overlap=1
ORDER BY service_start_date
OPEN o_cursor
FETCH NEXT FROM o_cursor
INTO #overlap_next, #client_id_next, #service_id_next, #service_start_date_next, #service_end_date_next
WHILE ##FETCH_STATUS = 0
BEGIN
IF (#service_start_date_prev IS NOT NULL)
BEGIN
IF (#part_of_set = 0 AND #service_start_date_prev <= #service_end_date_next AND #service_start_date_next <= #service_end_date_prev)
BEGIN
PRINT #service_start_date_prev
SET #part_of_set = 1
END
ELSE
SET #part_of_set = 0
END
SET #overlap_prev = #overlap_next
SET #client_id_prev = #client_id_next
SET #service_id_prev = #service_id_next
SET #service_start_date_prev = #service_start_date_next
SET #service_end_date_prev = #service_end_date_next
FETCH NEXT FROM o_cursor
INTO #overlap_next, #client_id_next, #service_id_next, #service_start_date_next, #service_end_date_next
END
CLOSE o_cursor;
DEALLOCATE o_cursor;
I have a table with 3 columns (sorted by the first two):
letter
number (sorted for each letter)
difference between current number and previous number of the same letter
I'd like to calculate (with vanlla SQL) a fourth new column RESULT to group these data when the third column (difference of number between contiguos record; i.e #2 --> 4 = 5-1) is greater than 30 marking all the records of this interval with letter-number of the first record (i.e A1 for #1,#2,#3).
Since the difference between contiguos numbers makes sense just for records with the same letter, for the first record of a new letter, the value of differnce is 31 (meaning that it's a new group; i.e. #6).
Here is what I'd like to get as result:
# Letter Number Difference RESULT (new column)
1 A 1 1 A1
2 A 5 4 A1
3 A 7 2 A1
4 A 40 33 A40 (*)
5 A 43 3 A40
6 B 1 31 B1 (*)
7 B 25 24 B1
8 B 27 2 B1
9 B 70 43 B70 (*)
10 B 75 5 B70
Now I can only find the "breaking values" (*) with this query where they get a value of 1:
select letter
,number
,cast(difference/30 as int) break
from table
where cast(difference/30 as int) = 1
Even though I'm able to find these breaking values I can't finish my task.
Can anyone help me finding a way to obtain the column RESULT?
Thanks in advance
FF
As I understand you need to construct the last result column. You can use concat to do that:
SELECT letter
,number
,concat(letter, cast(difference/30 as int)) result
FROM table
HAVING result = 'A1'
after some exercise and a little help from a friend of mine, I've found a possible solution to my sql prolblem.
The only requirment for the solution is that my first record must have a value of 31 in Difference field (since I need "breaks" when Difference > 30 than the previous record).
Here is the query to get the column RESULT I needed:
select alls.letter
,alls.number
,ints.letter||ints.number as result
from competition.lag alls
,(select letter
,number
,difference
,result
from (select letter
,number
,difference
,case when difference>30 then 1 else 2 end as result
from competition.lag
) temp
where result = 1
) ints
where ints.letter=alls.letter
and alls.number>=ints.number
and alls.number-30<=ints.number
I have a table with a column of values with the following sample data that has been pulled for 1 user:
ID | Data
5 Record1
12 NULL
13 NULL
15 Record1
20 Record12
28 NULL
31 NULL
35 Record12
37 Record23
42 Record34
51 NULL
53 Record34
58 Record5
61 Record17
63 NULL
69 Record17
What I would like to do is to delete any values in the Data column where the Data value does not have a start and finish record. So in the above Record 23 and Record 5 would be deleted.
Please note that the Record(n) may appear more than once so it's not as straight forward as doing a count on the Data value. It needs to be incremental, a record should always start and finish before another one starts, if it starts and doesnt finish then I want to remove it.
Sadly SQL Server 2008 does not have LAG or LEAD which would make the operation simpler.
You could use a common table expression for finding the non consecutive (non null) values, and delete them;
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY id) rn FROM table1 WHERE data IS NOT NULL
)
DELETE c1 FROM cte c1
LEFT JOIN cte c2 ON (c1.rn = c2.rn+1 OR c1.rn = c2.rn-1) AND c1.data = c2.data
WHERE c2.id IS NULL
An SQLfiddle to test with.
If you just want to see which rows would be deleted, replace DELETE c1 with SELECT c1.*.
...and as always, remember to back up before running potentially destructive SQL for random people on the Internet.