Subquery using Oracle SQL - sql

I am trying to write a query which excludes meter_nos with three of the same date values ie. a 1-3 relationship.
The query must only show meter_nos which have different dates, i.e a 1-1 relationship.
Can anyone help? I am stuck
Heres a sample below:
...and a.mtr_id in (select b.mtr_id
from ci_mtr_config b
where a.mtr_id=b.mtr_id
group by b.mtr_id
having count(b.mtr_id)=3)
and a.mtr_id not in (select f.eff_dttm
from ci_mtr_config f
where a.mtr_id=f.mtr_id
group by f.eff_dttm
having count(f.eff_dttm)=3)
This does not work.

Try using COUNT(*) OVER(PARTITION BY ....) to count the number of rows sharing the same meter and date. Then filter by that calculation.
CREATE TABLE CI_MTR_CONFIG
(MTR_ID INT, EFF_DTTM DATE)
;
INSERT INTO CI_MTR_CONFIG
(MTR_ID, EFF_DTTM)
VALUES
(303, to_date('2017-01-01','yyyy-mm-dd')),
(303, to_date('2017-01-01','yyyy-mm-dd')),
(303, to_date('2017-01-01','yyyy-mm-dd')),
(202, to_date('2017-01-01','yyyy-mm-dd')),
(202, to_date('2017-01-01','yyyy-mm-dd')),
(101, to_date('2017-01-01','yyyy-mm-dd'))
;
select
*
from (
select
*, count(*) over(partition by MTR_ID, EFF_DTTM) as count_of
from CI_MTR_CONFIG
) d
where count_of = 1
Only meter 101 would be returned from the sample data above.
Note if EFF_DTTM information is more accurate than day use TRUNC()
count(*) over(partition by MTR_ID, TRUNC(EFF_DTTM)) as count_of

Related

Comparing a value of a row with the value of the previous row

I have a table in SQL Server that stores geology samples, and there is a rule that must be adhered to.
The rule is simple, a "DUP_2" sample must always come after a "DUP_1" sample (sometimes they are loaded inverted)
CREATE TABLE samples (
id INT
,name VARCHAR(5)
);
INSERT INTO samples VALUES (1, 'ASSAY');
INSERT INTO samples VALUES (2, 'DUP_1');
INSERT INTO samples VALUES (3, 'DUP_2');
INSERT INTO samples VALUES (4, 'ASSAY');
INSERT INTO samples VALUES (5, 'DUP_2');
INSERT INTO samples VALUES (6, 'DUP_1');
INSERT INTO samples VALUES (7, 'ASSAY');
id
name
1
ASSAY
2
DUP_1
3
DUP_2
4
ASSAY
5
DUP_2
6
DUP_1
7
ASSAY
In this example I would like to show all rows where name equal to 'DUP_2' and predecessor row (using ID) name is different from 'DUP_1'.
In this case, it would be row 5 only.
I would appreciate very much if you help me.
You can use the LAG() window function or you can use LEAD() - they are identical except for the way in which they are ordered. That is - LAG(name) OVER ( ORDER BY id ) is the same as LEAD(name) OVER ( ORDER BY id DESC ). (You can read more about these functions here.)
WITH s1 ( id, name, prior_name ) AS (
SELECT id, name, LAG(name) OVER ( ORDER BY id ) AS prior_name
FROM samples
)
SELECT id, name
FROM s1
WHERE name = 'DUP_2'
AND COALESCE(prior_name, 'DUMMY') != 'DUP_1';
The reason for the COALESCE() at the end with the DUMMY value is that the first value won't have a LAG(); it will be NULL; and we want to return the DUP_2 record in this case since it doesn't follow a DUP_1 record.
You can use lag():
select s.*
from (select s.*,
lag(name) over (order by id) as prev_name
from samples s
) s
where name = 'DUP_2' and (prev_name <> 'DUP_1' or prev_name is null)

TSQL - Run date comparison for "duplicates"/false positives on initial query?

I'm pretty new to SQL and am working on pulling some data from several very large tables for analysis. The data is basically triggered events for assets on a system. The events all have a created_date (datetime) field that I care about.
I was able to put together the query below to get the data I need (YAY):
SELECT
event.efkey
,event.e_id
,event.e_key
,l.l_name
,event.created_date
,asset.a_id
,asset.asset_name
FROM event
LEFT JOIN asset
ON event.a_key = asset.a_key
LEFT JOIN l
ON event.l_key = l.l_key
WHERE event.e_key IN (350, 352, 378)
ORDER BY asset.a_id, event.created_date
However, while this gives me the data for the specific events I want, I still have another problem. Assets can trigger these events repeatedly, which can result in large numbers of "false positives" for what I'm looking at.
What I need to do is go through the result set of the query above and remove any events for an asset that occur closer than N minutes together (say 30 minutes for this example). So IF the asset_ID is the same AND the event.created_date is within 30 minutes of another event for that asset in the set THEN I want that removed. For example:
For the following records
a_id 1124 created 2016-02-01 12:30:30
a_id 1124 created 2016-02-01 12:35:31
a_id 1124 created 2016-02-01 12:40:33
a_id 1124 created 2016-02-01 12:45:42
a_id 1124 created 2016-02-02 12:30:30
a_id 1124 created 2016-02-02 13:00:30
a_id 1115 created 2016-02-01-12:30:30
I'd want to return only:
a_id 1124 created 2016-02-01 12:30:30
a_id 1124 created 2016-02-02 12:30:30
a_id 1124 created 2016-02-02 13:00:30
a_id 1115 created 2016-02-01-12:30:30
I tried referencing this and this but I can't make the concepts there work for me. I know I probably need to do a SELECT * FROM (my existing query) but I can't seem to do that without ending up with tons of "multi-part identifier can't be bound" errors (and I have no experience creating temp tables, my attempts at that have failed thus far). I also am not exactly sure how to use DATEDIFF as the date filtering function.
Any help would be greatly appreciated! If you could dumb it down for a novice (or link to explanations) that would also be helpful!
This is a trickier problem than it initially appears. The hard part is capturing the previous good row and removing the next bad rows but not allowing those bad rows to influence whether or not the next row is good. Here is what I came up with. I've tried to explain what is going on with comments in the code.
--sample data since I don't have your table structure and your original query won't work for me
declare #events table
(
id int,
timestamp datetime
)
--note that I changed some of your sample data to test some different scenarios
insert into #events values( 1124, '2016-02-01 12:30:30')
insert into #events values( 1124, '2016-02-01 12:35:31')
insert into #events values( 1124, '2016-02-01 12:40:33')
insert into #events values( 1124, '2016-02-01 13:05:42')
insert into #events values( 1124, '2016-02-02 12:30:30')
insert into #events values( 1124, '2016-02-02 13:00:30')
insert into #events values( 1115, '2016-02-01 12:30:30')
--using a cte here to split the result set of your query into groups
--by id (you would want to partition by whatever criteria you use
--to determine that rows are talking about the same event)
--the row_number function gets the row number for each row within that
--id partition
--the over clause specifies how to break up the result set into groups
--(partitions) and what order to put the rows in within that group so
--that the numbering stays consistant
;with orderedEvents as
(
select id, timestamp, row_number() over (partition by id order by timestamp) as rn
from #events
--you would replace #events here with your query
)
--using a second recursive cte here to determine which rows are "good"
--and which ones are not.
, previousGoodTimestamps as
(
--this is the "seeding" part of the recursive cte where I pick the
--first rows of each group as being a desired result. Since they
--are the first in each group, I know they are good. I also assign
--their timestamp as the previous good timestamp since I know that
--this row is good.
select id, timestamp, rn, timestamp as prev_good_timestamp, 1 as is_good
from orderedEvents
where rn = 1
union all
--this is the recursive part of the cte. It takes the rows we have
--already added to this result set and joins those to the "next" rows
--(as defined by our ordering in the first cte). Then we output
--those rows and do some calculations to determine if this row is
--"good" or not. If it is "good" we set it's timestamp as the
--previous good row timestamp so that rows that come after this one
--can use it to determine if they are good or not. If a row is "bad"
--we just forward along the last known good timestamp to the next row.
--
--We also determine if a row is good by checking if the last good row
--timestamp plus 30 minutes is less than or equal to the current row's
--timestamp. If it is then the row is good.
select e2.id
, e2.timestamp
, e2.rn
, last_good_timestamp.timestamp
, case
when dateadd(mi, 30, last_good_timestamp.timestamp) <= e2.timestamp then 1
else 0
end
from previousGoodTimestamps e1
inner join orderedEvents e2 on e2.id = e1.id and e2.rn = e1.rn + 1
--I used a cross apply here to calculate the last good row timestamp
--once. I could have used two identical subqueries above in the select
--and case statements, but I would rather not duplicate the code.
cross apply
(
select case
when e1.is_good = 1 then e1.timestamp --if the last row is good, just use it's timestamp
else e1.prev_good_timestamp --the last row was bad, forward on what it had for the last good timestamp
end as timestamp
) last_good_timestamp
)
select *
from previousGoodTimestamps
where is_good = 1 --only take the "good" rows
Links to MSDN for some of the more complicated things here:
CTEs and Recursive CTEs
CROSS APPLY
-- Sample data.
declare #Samples as Table ( Id Int Identity, A_Id Int, CreatedDate DateTime );
insert into #Samples ( A_Id, CreatedDate ) values
( 1124, '2016-02-01 12:30:30' ),
( 1124, '2016-02-01 12:35:31' ),
( 1124, '2016-02-01 12:40:33' ),
( 1124, '2016-02-01 12:45:42' ),
( 1124, '2016-02-02 12:30:30' ),
( 1124, '2016-02-02 13:00:30' ),
( 1125, '2016-02-01 12:30:30' );
select * from #Samples;
-- Calculate the windows of 30 minutes before and after each CreatedDate and check for conflicts with other rows.
with Ranges as (
select Id, A_Id, CreatedDate,
DateAdd( minute, -30, S.CreatedDate ) as RangeStart, DateAdd( minute, 30, S.CreatedDate ) as RangeEnd
from #Samples as S )
select Id, A_Id, CreatedDate, RangeStart, RangeEnd,
-- Check for a conflict with another row with:
-- the same A_Id value and an earlier CreatedDate that falls inside the +/-30 minute range.
case when exists ( select 42 from #Samples where A_Id = R.A_Id and CreatedDate < R.CreatedDate and R.RangeStart < CreatedDate and CreatedDate < R.RangeEnd ) then 1
else 0 end as Conflict
from Ranges as R;

Drop rows identified within moving time window

I have a dataset of hospitalisations ('spells') - 1 row per spell. I want to drop any spells recorded within a week after another (there could be multiple) - the rationale being is that they're likely symptomatic of the same underlying cause. Here is some play data:
create table hif_user.rzb_recurse_src (
patid integer not null,
eventdate integer not null,
type smallint not null
);
insert into hif_user.rzb_recurse_src values (1,1,1);
insert into hif_user.rzb_recurse_src values (1,3,2);
insert into hif_user.rzb_recurse_src values (1,5,2);
insert into hif_user.rzb_recurse_src values (1,9,2);
insert into hif_user.rzb_recurse_src values (1,14,2);
insert into hif_user.rzb_recurse_src values (2,1,1);
insert into hif_user.rzb_recurse_src values (2,5,1);
insert into hif_user.rzb_recurse_src values (2,19,2);
Only spells of type 2 - within a week after any other - are to be dropped. Type 1 spells are to remain.
For patient 1, dates 1 & 9 should be kept. For patient 2, all rows should remain.
The issue is with patient 1. Spell date 9 is identified for dropping as it is close to spell date 5; however, as spell date 5 is close to spell date 1 is should be dropped therefore allowing spell date 9 to live...
So, it seems a recursive problem. However, I've not used recursive programming in SQL before and I'm struggling to really picture how to do it. Can anyone help? I should add that I'm using Teradata which has more restrictions than most with recursive SQL (only UNION ALL sets allowed I believe).
It's a cursor logic, check one row after the other if it fits your rules, so recursion is the easiest (maybe the only) way to solve your problem.
To get a decent performance you need a Volatile Table to facilitate this row-by-row processing:
CREATE VOLATILE TABLE vt (patid, eventdate, exac_type, rn, startdate) AS
(
SELECT r.*
,ROW_NUMBER() -- needed to facilitate the join
OVER (PARTITION BY patid ORDER BY eventdate) AS rn
FROM hif_user.rzb_recurse_src AS r
) WITH DATA ON COMMIT PRESERVE ROWS;
WITH RECURSIVE cte (patid, eventdate, exac_type, rn, startdate) AS
(
SELECT vt.*
,eventdate AS startdate
FROM vt
WHERE rn = 1 -- start with the first row
UNION ALL
SELECT vt.*
-- check if type = 1 or more than 7 days from the last eventdate
,CASE WHEN vt.eventdate > cte.startdate + 7
OR vt.exac_type = 1
THEN vt.eventdate -- new start date
ELSE cte.startdate -- keep old date
END
FROM vt JOIN cte
ON vt.patid = cte.patid
AND vt.rn = cte.rn + 1 -- proceed to next row
)
SELECT *
FROM cte
WHERE eventdate - startdate = 0 -- only new start days
order by patid, eventdate
I think the key to solving this is getting the first date more than 7 days from the current date and then doing a recursive subquery:
with rrs as (
select rrs.*,
(select min(rrs2.eventdate)
from hif_user.rzb_recurse_src rrs2
where rrs2.patid = rrs.patid and
rrs2.eventdate > rrs.eventdate + 7
) as eventdate7
from hif_user.rzb_recurse_src rrs
),
recursive cte as (
select patid, min(eventdate) as eventdate, min(eventdate7) as eventdate7
from hif_user.rzb_recurse_src rrs
group by patid
union all
select cte.patid, cte.eventdate7, rrs.eventdate7
from cte join
hif_user.rzb_recurse_src rrs
on rrs.patid = cte.patid and
rrs.eventdate = cte.eventdate7
)
select cte.patid, cte.eventdate
from cte;
If you want additional columns, then join in the original table at the last step.

How to select values by date field (not as simple as it sounds)

I have a table called tblMK The table contains a date time field.
What I wish to do is create a query which will each time, select the 2 latest entries (by the datetime column) and then get the date difference between them and show only that.
How would I go around creating this expression. This doesn't necessarily need to be a query, it could be a view/function/procedure or what ever works. I have created a function called getdatediff which receives to dates, and returns a string the says (x days y hours z minutes) basically that will be the calculated field. So how would I go around doing this?
Edit: I need to each time select 2 and 2 and so on until the oldest one. There will always be an even amount of rows.
Use only sql like this:
create table t1(c1 integer, dt datetime);
insert into t1 values
(1, getdate()),
(2, dateadd(day,1,getdate())),
(3, dateadd(day,2,getdate()));
with temp as (select top 2 dt
from t1
order by dt desc)
select datediff(day,min(dt),max(dt)) as diff_of_dates
from temp;
sql fiddle
On MySQL use limit clause
select max(a.updated_at)-min(a.updated_at)
From
( select * from mytable order by updated_at desc limit 2 ) a
Thanks guys I found the solution please ignore the additional columns they are for my db:
; with numbered as (
Select part,taarich,hulia,mesirakabala,
rowno = row_number() OVER (Partition by parit order.by taarich)
From tblMK)
Select a.rowno-1,a.part, a.Julia,b.taarich,as.taarich_kabala,a.taarich, a.mesirakabala,getdatediff(b.taarich,a.taarich) as due
From numbered a
Left join numbered b ON b.parit=a.parit
And b.rowno = a.rowno - 1
Where b.taarich is not null
Order by part,taarich
Sorry about mistakes I might of made, I'm on my smartphone.

Newbie: Cache event-change table with every date

I have a table of items which change status every few weeks. I want to look at an arbitrary day and figure out how many items were in each status.
For example:
tbl_ItemHistory
ItemID
StatusChangeDate
StatusID
Sample data:
1001, 1/1/2010, 1
1001, 4/5/2010, 2
1001, 6/15/2010, 4
1002, 4/1/2010, 1
1002, 6/1/2010, 3
...
So I need to figure out how many items were in each status for a given day. So on 5/1/2010, there was one item (1001) in status 2 and one item in status 1 (1002).
I want to create a cached table every night that has a row for every item and every day of the year so I can show status changes over time in a chart. I don't know much about SQL. I was thinking about using a for loop, but based on some of the creative answers I've seen on the forum, I doubt that's the right way.
I'm using SQL Server 2008R2
I looked around and I think this is similar to this question: https://stackoverflow.com/questions/11183164/show-data-change-over-time-in-a-chart but that one wasn't answered. Is there a way to do these things?
A coworker showed me a cool way to do it so I thought I would contribute it to the community:
declare #test table (ItemID int, StatusChangeDate datetime, StatusId tinyint);
insert #test values
(1001, '1/1/2010', 1),
(1001, '4/5/2010', 2),
(1001, '6/15/2010', 4),
(1002, '4/2/2010', 1),
(1002, '6/1/2010', 3);
with
itzik1(N) as (
select 1 union all select 1 union all
select 1 union all select 1), --4
itzik2(N) as (select 1 from itzik1 i cross join itzik1), --16
itzik3(N) as (select 1 from itzik2 i cross join itzik2), --256
itzik4(N) as (select 1 from itzik3 i cross join itzik3), --65536 (184 years)
tally(N) as (select row_number() over (order by (select null)) from itzik4)
select ItemID, StatusChangeDate, StatusId from(
select
test.ItemID,
dates.StatusChangeDate,
test.StatusId,
row_number() over (
partition by test.ItemId, dates.StatusChangeDate
order by test.StatusChangeDate desc) as rnbr
from #test test
join (
select dateadd(dd, N,
(select min(StatusChangeDate) from #test) --First possible date
) as StatusChangeDate
from tally) dates
on test.StatusChangeDate <= dates.StatusChangeDate
and dates.StatusChangeDate <= getdate()
) result
where rnbr = 1