Combine multiple rows of data into one (Start and End time) - sql

I'm currently receiving notifications of when a device is switched on, and another when the device is switched off. These are currently showing in separate rows, however I'd like to combine the one/off record of each instance into one row
The data is entering as below:
ObjectID On/OffID Msgtime
100 1 2022-04-15 10:01:00
1472 1 2022-04-15 10:04:00
100 0 2022-04-15 11:35:00
100 1 2022-04-15 12:00:00
1472 0 2022-04-15 15:00:00
I'd like to have it showing as below:
ObjectID OnTime OffTime
100 2022-04-15 10:01:00 2022-04-15 11:35:00
1472 2022-04-15 10:04:00 2022-04-15 15:00:00
100 2022-04-15 12:00:00 -

Maybe a group by query like below on row_number column
see fiddle link
select objectID,
min(msgTime) as OnTime,
case
when min(MsgTime) <>max(MsgTime)
then max(MsgTime) else NULL
end as OffTime
from
(
select *,
row_number() over (partition by ObjectID order by MsgTime asc)+1 as r
from T
)T
group by objectID, r/2
order by Objectid, r/2

This query would return all 'going to on state' rows, and for each one it finds the nearest 'going to off state' row, if exists (LEFT JOIN)
select
ontm.ObjectID, ontm.MsgTime as OnTime, offtm.MsgTime as OffTime
from yourtable ontm
left join
yourtable offtm
on ontm.ObjectId=offtm.ObjectId
and offtm.onoffid = 0
and ontm.MsgTime <= offtm.MsgTime
and not exists (select 1
from yourtable mdle
where mdle.ObjectId=offtm.ObjectId
and mdle.MsgTime < offtm.MsgTime
and ontm.MsgTime < mdle.MsgTime
)
where ontm.onoffid = 1
Explanation:
We first select all 'going to on' rows; these are the ones we want a result row. We then find all 'matching', i.e. future, 'going to off' state records for the same objectid (we use LEFT JOIN to make sure that if the objectId was left ON we still show it). This would match all future 'going to off' rows for the object, so we need something to make sure that only the earliest one matches our ON row; we do this by making sure that for any candidate OFF row, there is no other future, earlier OFF row: NOT EXISTS.

Related

HIVE: Replace empty results by 0 in group by statements

I'm a new Hive user, and need to aggregate the sum of amounts for a given table. Consider the simplified example below:
SELECT day, sum(amount) FROM tableX WHERE columnA = 'RareValue' GROUP BY day;
Suppose that it's possible that there is no row entry which matches the condition in the WHERE clause for some dates. And so the query result will skip those days.
For example, this is the result I get:
date amount
2018-01-15 230
2018-01-13 210
2018-01-12 140
2018-01-11 222
But this is the desired result:
date amount
2018-01-15 230
2018-01-14 0
2018-01-13 210
2018-01-12 140
2018-01-11 222
I tried this to generate a sequence of dates and then use LEFT JOIN and COALESCE to fill empty dates by zeros. However, the performance was terrible slow. What is the best approach for this?
Supposing that you are trying to exclude the whole day in case when your where condition is true, you can do something like
select
day,
if(max(mycondition) = 0, sum(amount), 0) as mysum from
(
select day, amount,
if(columnA = 'RareValue', 1, 0) as mycondition
FROM tableX
) t GROUP BY day;
I did not have the chance to test it :)
If I correctly understood you all needed days are presented in tableX table. So, I advise first select all rows where columnA is not equal 'RareValue' and that UNION it with your query.
SELECT day, 0 FROM tableX WHERE columnA != 'RareValue'
UNION
SELECT day,sum(amount) from tableX WHERE columnA = 'RareValue' GROUP BY day;
if the days from the first select repeats you can add 'distinct'

delete duplicate rows from table row number partition by

delete duplicates per com_id:
I have to write a generic DELETE that will remove vald records to the end of the world (vald_to = 9999-01-01)
if:
tar_id = -1
vald_from = 0001-01-01
there is another record for this com_id with vald_to = 9999-01-01
The query is:
delete from C
where (COM_ID, VALD_TO) in
(
select
COM_ID,
VALD_TO,
row_number()
over
(partition by COM_IDorder by VALD_TO DESC) dup
from C
where
tar_id=-1
and
vald_from = 0001-01-01
and
dup > 1
);
The script removes all records for the com_id
You could use rowid, this delete worked for me:
delete from c
where rowid in (
select rwd
from (
select rowid rwd,
row_number() over (partition by com_id order by null) dup
from c
where tar_id = -1
and vald_from = date '0001-01-01'
and vald_to = date '9999-01-01' )
where dup > 1 );
I don't really understand what you consider a duplicate you want removed. So just guessing: You are considering a record a duplicate, when its range is covered by another record. E.g.:
COM_ID vald_from vald_to
123 0001-01-01 9999-01-01
123 0001-01-01 2017-01-01
The second record is superfluous, because its range is only part of the larger range of the first record. But the same would be true for:
COM_ID vald_from vald_to
123 2016-01-01 2017-01-01
123 2016-02-01 2016-03-01
Again, the second record's range is only part of the first one's.
A query to remove those superfluous records would be:
delete from c
where exists
(
select *
from c other
where other.com_id = c.com_id
and other.vald_from <= c.vald_from
and other.vald_to >= c.vald_to
and other.rowid <> c.rowid
)
and tar_id = -1;
I don't know, whether you want tar_id = -1 inside the subquery, too. And maybe you still want to restrict this somehow to dates date '0001-01-01' and date '9999-01-01'. If so, adjust the statement accordingly.

Deleting record in SQL depending on next record

I have records with columns: ID, Time_End and Attribute.
I need to delete all records,
WHERE Time_End = '1990-01-01 00:00:00.000' AND Attribute <> '9'
but only:
if the next row does not have the same attribute number
or
the next row has the same attribute number and a Time_End value of 1990-01-01 00:00:00.000
For example:
ID Time_End Attribute
---------------------------------------------
235 1990-01-01 00:00:00.000 5 /delete
236 1990-01-01 00:00:00.000 5 /delete
237 1990-01-01 00:00:00.000 5
238 2016-10-10 23:45:40.000 5
ID Time_End Attribute
---------------------------------------------
312 1990-01-01 00:00:00.000 8 /delete
313 2016-01-09 18:00:00.000 6
314 1990-01-01 00:00:00.000 4 /delete
315 1990-01-01 00:00:00.000 7
316 2016-10-10 23:45:40.000 7
Our customer have 50 database tables with thousands of records in every table (and of course more columns, I mentioned only those, which have impact on solution). Records are send in to the database from PLC, but sometimes (we don't know why) PLC send also wrong records.
So what I need is a query which finds those wrong records and deletes them. :)
Anybody who knows how the SQL code should look like?
Please see my SQL below. First, we collect ids to delete using two window functions (LEAD) to get the next row needed data. Then, with all needed data computed, apply the evaluation rules proposed by the OP. Last, use the obtained ids to delete the affected records of the tablet by id with an in clause.
DELETE toDeleteTable
WHERE toDeleteTable.id IN (WITH dataSet
AS (SELECT toDeleteTable.id,
toDeleteTable.time_end,
toDeleteTable.attribute,
LEAD(toDeleteTable.time_end,1,0) OVER (ORDER BY toDeleteTable.id) AS next_time_end,
LEAD(toDeleteTable.attribute,1,0) OVER (ORDER BY toDeleteTable.id) AS next_attribute
FROM toDeleteTable)
SELECT dataSet.id
FROM dataSet
WHERE dataSet.time_end = '1990-01-01 00:00:00.000'
AND dataSet.attribute <> '9'
AND ( (dataSet.next_attribute = dataSet.attribute AND dataSet.next_time_end = '1990-01-01 00:00:00.000')
OR dataSet.next_attribute <> dataSet.attribute)
)
You can accomplish this with a simple apply join. The below should give you enough to make this work for your needs without doing anything complex:
declare #t table(ID int
,Time_End datetime
,Attribute int
);
insert into #t values(235,'1990-01-01 00:00:00.000',5),(236,'1990-01-01 00:00:00.000',5),(237,'1990-01-01 00:00:00.000',5),(238,'2016-10-10 23:45:40.000',5),(312,'1990-01-01 00:00:00.000',8),(313,'2016-01-09 18:00:00.000',6),(314,'1990-01-01 00:00:00.000',4),(315,'1990-01-01 00:00:00.000',7),(316,'2016-10-10 23:45:40.000',7);
select t.*
,tm.*
from #t t
outer apply (select top 1 tt.Time_End
,tt.Attribute
from #t tt
where t.ID < tt.ID
order by tt.ID
) tm
where t.Attribute <> tm.Attribute
or (t.Attribute = tm.Attribute
and tm.Time_End = '1990-01-01 00:00:00.000'
);
I think you can use ROW_NUMBER() like this:
;WITH t AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Time_End ORDER BY ID DESC) AS seq
FROM yourTable
WHERE Attribute <> '9'
AND Time_End = CAST('1990-01-01 00:00:00.000' as datetime)
)
DELETE FROM t
WHERE seq > 1;
Not Tested - HTH ;).

oracle sql query optimization further 1

I have written a query to select * from bdb to get only updated values in PRICE for the combination of DAY,INST in the newest ACT
I created A table like
CREATE TABLE bdb(
ACT NUMBER(8) NOT NULL,
INST NUMBER(8) NOT NULL,
DAY DATE NOT NULL,
PRICE VARCHAR2 (3),
CURR NUMBER (8,2),
PRIMARY KEY (ACT,INST,DAY)
);
used this to populate the table
DECLARE
t_day bdb.day%type:= '1-JAN-16';
n pls_integer;
BEGIN
<< act_loop >>
FOR i IN 1..3 LOOP --NUMBER OF ACT i
<< inst_loop >>
FOR j IN 1..1000 LOOP --NUMBER OF INST j
t_day:='3-JAN-16';
<< day_loop >>
FOR k IN 1..260 LOOP --NUMBER OF DAYS k
n:= dbms_random.value(1,3);
INSERT into bdb (ACT,INST,DAY,PRICE,CURR) values (i,j,t_day,n,10.3);
t_day:=t_day+1;
END loop day_loop;
END loop inst_loop;
END loop act_loop;
END;
/
using this query
I get only the DAY,INST,PRICE
select day,inst,price from bdb where (act=(select max(act) from bdb))
minus
select day,inst,price from bdb where act=(select max(act)-1 from bdb);
above one is fast.but I want to get all the field in efficient way.
the one I came up with bit slow which is this,
select
e1.*
from
(select
*
from
bdb
where
(act=(select max(act) from bdb))
)e1,
(select day,inst,price from bdb where (act=(select max(act) from bdb))
minus
select day,inst,price from bdb where act=(select max(act)-1 from bdb)) e2
where
e1.day=e2.day and e1.inst=e2.inst;
can anyone give any suggestion to how to optimized this any more? or with out using cross join with two table how to get the required output.Help me ;)
simply I need is
ACT INST DAY PRI CURR
------------------------------------
3 890 05-MAR-16 3 10.3
3 890 06-MAR-16 2 10.3
3 890 07-MAR-16 2 10.3
3 891 05-MAR-16 2 10.3
3 891 06-MAR-16 1 10.3
3 891 07-MAR-16 2 10.3
4 890 05-MAR-16 3 10.3
4 890 06-MAR-16 2 10.3
4 890 07-MAR-16 1 10.3
4 891 05-MAR-16 2 10.3
4 891 06-MAR-16 2 10.3
4 891 07-MAR-16 1 10.3
Here for (890,05-MAR-16) (890,06-MAR-16) (890,06-MAR-16)
(891,05-MAR-16) (891,06-MAR-16) (891,06-MAR-16) in act=3
price are
3,2,2
2,1,2
but when act=4 happens
(890,07-MAR-16)
(891,06-MAR-16)
(891,07-MAR-16)
price values are change from what they were in act=3.
others not change
ultimately what I need is
ACT INST DAY PRI CURR
------------------------------------
4 890 07-MAR-16 1 10.3
4 891 06-MAR-16 2 10.3
4 891 07-MAR-16 1 10.3
It looks like you're after the day, inst and price values which have a row where the act column has the maximum act value out of the whole table, but doesn't have a row where the act column is one less than the max act value.
You could try this:
SELECT day,
inst,
price
FROM (SELECT day,
inst,
price,
act,
MAX(act) OVER () max_overall_act
FROM bdb)
WHERE act IN (max_overall_act, max_overall_act -1)
GROUP BY day, inst, price
HAVING MAX(CASE WHEN act = max_overall_act THEN 1 END) = 1
AND MAX(CASE WHEN act = max_overall_act - 1 THEN 1 END) IS NULL;
First of all, the subquery finds the maximum act value across the whole table.
Then we select all rows that have an act value that is the maximum value or one less than that.
After that, we group the rows and find out which ones have an act = max act val, but don't have an act = max act val -1.
However, from what you said in your post:
I have written a query to select * from bdb to get only updated values in PRICE for the combination of DAY,INST in the newest ACT
neither the query you came up with and the above query in my answer seem to tally with what you are after.
I think instead, you're after something like:
SELECT act,
inst,
DAY,
price,
curr,
prev_price -- if desired
FROM (SELECT act,
inst,
DAY,
price,
curr,
LEAD(price) OVER (PARTITION BY inst, DAY ORDER BY act DESC) prev_price,
row_number() OVER (PARTITION BY inst, DAY ORDER BY act DESC) rn
FROM bdb)
WHERE rn = 1
AND prev_price != price;
What this does is use the LEAD() analytic (based on the descending act order) to find the price of the row with the previous act for each inst and day, along with the rownumber.
Then to find the latest act row, we simply select the rows where the rownumber is 1 and also where the previous price doesn't match the current price. You can then display both the current and the previous price, if you want to.

Get a single max date if dates are not unique

For sql 2000,
Very similar to what I asked here
Get distinct max date using SQL
But this time the dates aren't unique so for this table pc_bsprdt_tbl
pc_bsprhd_key pc_bsprdt_shpiadt pc_bsprdt_prod
21ST 99-00 2001-04-30 23:59:59.000 72608-12895
21ST 99-00 2001-04-30 23:59:59.000 72608-12910
AFCC990915 1999-09-01 00:00:00.000 72608-12115
AFCC990915 1999-09-01 00:00:00.000 CHU99-01514
AFCC990915 1999-09-01 00:00:00.000 POP99-01514
I would like returned
21ST 99-00 2001-04-30 23:59:59.000
AFCC990915 1999-09-01 00:00:00.000
Now, the pc_bsprdt_prod is unique so what I have tried is using the max for the product like this to give me uniqueness.
Select T.pc_bsprhd_key, T.pc_bsprdt_shpiadt
From pc_bsprdt_tbl As T
Join (
Select pc_bsprhd_key, Max( T1.pc_bsprdt_shpiadt ) As MaxDateTime, Max(pc_bsprdt_prod) as Product
From pc_bsprdt_tbl As T1
Group By T1.pc_bsprhd_key
) As Z
On Z.pc_bsprhd_key = T.pc_bsprhd_key
And Z.MaxDateTime = T.pc_bsprdt_shpiadt
AND Z.Product = T.pc_bsprdt_prod
It seems like it works :)
Is there a way to do it though just using the date? Maybe a top 1 in there somewhere?
SELECT pc_bsprhd_key, MAX(pc_bsprdt_shpiadt)
FROM pc_bsprdt_tbl
GROUP BY pc_bsprhd_key;
That might not be working as you think it is. That will give you the MAX(Date) and MAX(prod) which might not be on the same row. Here is an example:
CREATE TABLE #Test
(
a int,
b date,
c int,
)
INSERT INTO #Test(a, b, c)
SELECT 1, '01/01/2010', 3 UNION ALL
SELECT 1, '01/02/2010', 2 UNION ALL
SELECT 1, '01/03/2010', 1 UNION ALL
SELECT 2, '01/01/2010', 1
SELECT a, MAX(b), MAX(c) FROM #TEST
GROUP BY a
Which will return
----------- ---------- -----------
1 2010-01-03 3
2 2010-01-01 1
Notice that 1/03/2010 and 3 are not in the same row. In this situation I don't think it matters to you, but just a heads up.
As for the actual question- in SQL2005 we would probably apply a ROW_NUMBER over the groups to get the row with the latest date for each part, however you don't have access to this feature in 2000. If the above is giving you correct results I'd say use it.