Select last record from data table for each device in devices table [duplicate] - sql

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 6 years ago.
I have a problem with the executing speed of my sql query to postgres database.
I have 2 tables:
table 1: DEVICES
ID | NAME
------------------
1 | first device
2 | second device
table 2: DATA
ID | DEVICE_ID | TIME | DATA
--------------------------------------------
1 | 1 | 2016-07-14 2:00:00 | data1
2 | 1 | 2016-07-14 1:00:00 | data2
3 | 2 | 2016-07-14 4:00:00 | data3
4 | 1 | 2016-07-14 3:00:00 | data4
5 | 2 | 2016-07-14 6:00:00 | data5
6 | 2 | 2016-07-14 5:00:00 | data6
I need get this select's result table:
ID | DEVICE_ID | TIME | DATA
-------------------------------------------
4 | 1 | 2016-07-14 3:00:00 | data4
5 | 2 | 2016-07-14 6:00:00 | data5
i.e. for each device in devices table I need to get only one data record with the last TIME value.
This is my sql query:
SELECT * FROM db.data d
WHERE d.time = (
SELECT MAX(d2.time) FROM db.data d2
WHERE d2.device_id = d.device_id);
This is HQL query equivalent:
SELECT d FROM Data d
WHERE d.time = (
SELECT MAX(d2.time) FROM Data d2
WHERE d2.device.id = t2.device.id)
Yes, I use Hibernate ORM in my project - may this info will be useful for someone.
I got correct answer on my queries, BUT it's too long - about 5-10 seconds on 10k records in data table and only 2 devices in devices table. It's terrible.
First of all, I thought that problem is in Hibernate. But native sql query from psql in linux terminal execute the same time as through hibernate.
How can I optimize my query? This query is too complexity:
O(device_count * data_count^2)

Since you're using Postgres, you could use window functions to achieve this, like so:
select
sq.id,
sq.device_id,
sq.time,
sq.data
from (
select
data.*,
row_number() over (partition by data.device_id order by data.time desc) as rnk
from
data
) sq
where
sq.rnk = 1
The row_number() window function first ranks the rows in the data table on the basis of the device_id and time columns, and the outer query then picks the highest-ranked rows.

Related

Get First Record of Each Group

First I would like to apologize if it is a basic question.
So, i have monitoring data being stored every 5 seconds.
I want create a query that returns me the first record every 10 minutes, for example:
|Data | Voltage (V) |
|2020-08-14 14:00:00 | 10
|2020-08-14 14:00:05 | 15
|2020-08-14 14:00:00 | 12
.... |
|2020-08-14 14:10:10 | 25
|2020-08-14 14:10:15 | 30
|2020-08-14 14:10:20 | 23
The desired result is:
|Data |Voltage (V) |
|2020-08-14 14:00:00 10 |
|2020-08-14 14:10:10 25 |
I'm using SQLServer database.
I read about similar solutions as post: Select first row in each GROUP BY group?
But i can't resolve my issue.
I started with:
SELECT Data, Voltage
GROUP BY DATEADD(MINUTE,(DATEDIFF(MINUTE, 0 , Data)/10)*10,0)
ORDER BY DATA DESC
But i can't use FIRST() or top 1 in this query.
Anyone have ideas?
Thanks a lot!
If I understand correctly:
select t.*
from (select t.*,
row_number() over (partition by DATEADD(MINUTE,(DATEDIFF(MINUTE, 0 , Data)/10)*10,0) order by data) as seqnum
from t
) t
where seqnum = 1;

SQL query that finds dates between a range and takes values from another query & iterates range over them?

Sorry if the wording for this question is strange. Wasn't sure how to word it, but here's the context:
I'm working on an application that shows some data about the how often individual applications are being used when users make a request from my web server. The way we take data is by every time the start page loads, it increments a data table called WEB_TRACKING at the date of when it loaded. So there are a lot of holes in data, for example, an application might've been used heavily on September 1st but not at all September 2nd. What I want to do, is add those holes with a value on hits of 0. This is what I came up with.
Select HIT_DATA.DATE_ACCESSED, HIT_DATA.APP_ID, HIT_DATA.NAME, WORKDAYS.BENCH_DAYS, NVL(HIT_DATA.HITS, 0) from (
select DISTINCT( TO_CHAR(WEB.ACCESS_TIME, 'MM/DD/YYYY')) as BENCH_DAYS
FROM WEB_TRACKING WEB
) workDays
LEFT join (
SELECT TO_CHAR(WEB.ACCESS_TIME, 'MM/DD/YYYY') as DATE_ACCESSED, APP.APP_ID, APP.NAME,
COUNT(WEB.IP_ADDRESS) AS HITS
FROM WEB_TRACKING WEB
INNER JOIN WEB_APP APP ON WEB.APP_ID = APP.APP_ID
WHERE APP.IS_ENABLED = 1 AND (APP.APP_ID = 1 OR APP.APP_ID = 2)
AND (WEB.ACCESS_TIME > TO_DATE('08/04/2018', 'MM/DD/YYYY')
AND WEB.ACCESS_TIME < TO_DATE('09/04/2018', 'MM/DD/YYYY'))
GROUP BY TO_CHAR(WEB.ACCESS_TIME, 'MM/DD/YYYY'), APP.APP_ID, APP.NAME
ORDER BY TO_CHAR(WEB.ACCESS_TIME, 'MM/DD/YYYY'), app_id DESC
) HIT_DATA ON HIT_DATA.DATE_ACCESSED = WORKDAYS.BENCH_DAYS
ORDER BY WORKDAYS.BENCH_DAYS
It returns all the dates that between the date range and even converts null hits to 0. However, it returns null for app id and app name. Which makes sense, and I understand how to give a default value for 1 application. I was hoping someone could help me figure out how to do it for multiple applications.
Basically, I am getting this (in the case of using just one application):
| APP_ID | NAME | BENCH_DAYS | HITS |
| ------ | ---------- | ---------- | ---- |
| NULL | NULL | 08/04/2018 | 0 |
| 1 | test_app | 08/05/2018 | 1 |
| NULL | NULL | 08/06/2018 | 0 |
But I want this(with multiple applications):
| APP_ID | NAME | BENCH_DAYS | HITS |
| ------ | ---------- | ---------- | ---- |
| 1 | test_app | 08/04/2018 | 0 |<- these 0's are converted from null
| 1 | test_app | 08/05/2018 | 1 |
| 1 | test_app | 08/06/2018 | 0 | <- these 0's are converted from null
| 2 | prod_app | 08/04/2018 | 2 |
| 2 | prod_app | 08/05/2018 | 0 | <- these 0's are converted from null
So again to reiterate the question in this long post. How should I go about populating this query so that it fills up the holes in the dates but also reuses the application names and ids and populates that information as well?
You need a list of dates, that probably comes from a number generator rather than a table (if that table has holes, your report will too)
Example, every date for the past 30 days:
select trunc(sysdate-30) + level as bench_days from dual connect by level < 30
Use TRUNC instead of turning a date into a string in order to cut the time off
Now you have a list of dates, you want to add in repeating app id and name:
select * from
(select trunc(sysdate-30) + level as bench_days from dual connect by level < 30) dat
CROSS JOIN
(select app_id, name from WEB_APP WHERE APP.IS_ENABLED = 1 AND APP_ID in (1, 2) app
Now you have all your dates, crossed with all your apps. 2 apps and 30 days will make a 60 row resultset via a cross join. Left join your stat data onto it, and group/count/sum/aggregate ...
select app.app_id, app.name, dat.artificialday, COALESCE(stat.ct, 0) as hits from
(select trunc(sysdate-30) + level as artificialday from dual connect by level < 30) dat
CROSS JOIN
(select app_id, name from WEB_APP WHERE APP.IS_ENABLED = 1 AND APP_ID in (1, 2) app
LEFT JOIN
(SELECT app_id, trunc(access_time) accdate, count(ip_address) ct from web_tracking group by app_id, trunc(access_time)) stat
ON
stat.app_id = app.app_id AND
stat.accdate = dat.artificialday
You don't have to write the query this way/do your grouping as a subquery, I'm just representing it this way to lead you to thinking about your data in blocks, that you build in isolation and join together later, to build more comprehensive blocks

SQL Query AVG Date Time In same Table Column

I’m trying to make a query that returns the difference of days to get the average of days in a period of time. This is the situation I need to get the max date from the status 2 and the max date from the status 3 from a request and get how much time the user spend on that period of time
So far this is the query I Have right now I get the mas and min and the difference between the days but are not the max of the status 2 and the max of status 3
Query I have so far:
SELECT distinct t1.user, t1.Request,
Min(t1.Time) as MinDate,
Max(t1.Time) as MaxDate,
DATEDIFF(day, MIN(t1.Time), MAX(t1.Time))
FROM [Hst_Log] t1
where t1.Request = 146800
GROUP BY t1.Request, t1.user
ORDER BY t1.user, max(t1.Time) desc
Example table:
-------------------------------
user | Request | Status | Time
-------------------------------
User 1 | 2 | 1 | 6/1/15 3:25 PM
User 2 | 1 | 1 | 2/1/15 3:24 PM
User 2 | 3 | 1 | 2/1/15 3:24 PM
User 1 | 4 | 1 | 5/10/15 3:18 PM
User 3 | 3 | 2 | 5/4/15 2:36 PM
User 2 | 2 | 2 | 6/4/15 2:34 PM
User 3 | 2 | 3 | 6/10/15 5:51 PM
User 1 | 1 | 2 | 5/1/15 5:49 PM
User 3 | 4 | 2 | 5/16/15 2:39 PM
User 2 | 4 | 2 | 5/17/15 2:32 PM
User 2 | 3 | 2 | 4/6/15 2:22 PM
User 2 | 3 | 3 | 4/7/15 2:06 PM
-------------------------------
I will appreciate all the help
You'll need to use subqueries since the groups for the min and max times are different. One query will pull the min value where the status is 2. Another will pull the max value where the status is 3.
Something like this:
SELECT MinDt.[User], minDt.MinTime, MaxDt.MaxTime, datediff(d,minDt.MinTime, MaxDt.MaxTime) as TimeSpan
FROM
(SELECT t1.[user], t1.Request,
Min(t1.Time) as MinTime
FROM [Hst_Log] t1
where t1.Request = 146800
and t1.[status] = 2
GROUP BY t1.Request, t1.[user]) MinDt
INNER JOIN
(SELECT t1.[user], t1.Request,
Max(t1.Time) as MaxTime
FROM [Hst_Log] t1
where t1.[status] = 3
GROUP BY t1.Request, t1.[user]) MaxDt
ON MinDt.[User] = MaxDt.[User] and minDt.Request = maxDt.Request
something like this?
(mysql)
SELECT t.*,MAX(t.UFecha), x.*,y.*,Min(t.UFecha) as MinDate,
Max(t.UFecha) as MaxDate,
avg(x.Expr2+y.Expr3),//?????
DATEDIFF(MIN(t.UFecha), MAX(t.UFecha)) AS Expr1
FROM `app_upgrade_hst_log` t
left join(select count(*),Request, DATEDIFF(MIN(UFecha), MAX(UFecha)) AS Expr2 FROM `app_upgrade_hst_log` where Status=1 group by Request,Status) x on t.Request= x.Request
left join(select count(*),Request, DATEDIFF(MIN(UFecha), MAX(UFecha)) AS Expr3 FROM `app_upgrade_hst_log` where Status=2) y on t.Request= y.Request
group by t.Request,t.Status
What is the SQL-Server version? Maybe you could use your query as CTE and do a follow-up SELECT where you can use the Min and Max date as date period.
EDIT: Exampel
WITH myCTE AS
(
put your query here
)
SELECT * FROM myCTE
You can use myCTE for further joins too, pick out the needed date, use sub-select, what ever... AND: have a look on the OVER-link, could be helpfull...
Depending on the version you could also think about using OVER
https://msdn.microsoft.com/en-us/library/ms189461.aspx

how to integrate electrical currents in sqlite

I could solve the following problem in PHP, but I wonder if it could be done in SQLite.
The simplified version looks like this: I have a simple electrical circuit. I can switch on and off a red, a green and a blue light independently. I record the timing in seconds and electrical current in ampere for each light in a table as follows:
| Lamp | On | Off | Current |
|-------|----|:---:|--------:|
| red | 2 | 14 | 3 |
| green | 5 | 8 | 8 |
| blue | 6 | 10 | 2 |
As you can see, they overlap. If I want to integrate the current properly (to calculate the energy consumption), I have to transform this table to a new one, which adds the electrical currents. I get the following table (manually) with adapted timing:
| T1 | T2 | Sum(Current) | Comment |
|:--:|:--:|-------------:|:--------------:|
| 2 | 5 | 3 | red |
| 5 | 6 | 11 | red+green |
| 6 | 8 | 13 | red+green+blue |
| 8 | 10 | 5 | red+blue |
| 10 | 14 | 3 | red |
Any ideas if sqlite can do that? Perhaps by creating interim tables?
It's fairly complex, but I was able to do it with a couple of views:
create table elec (lamp char(10),on_tm int,off_tm int,current int);
insert into elec values
('red',2,14,3),
('green',5,8,8),
('blue',6,10,2);
create view all_tms as
select distinct on_tm
from elec
union
select distinct off_tm
from elec;
create view all_periods as
select t1.on_tm,
(select min(t2.on_tm)
from all_tms t2
where t2.on_tm > t1.on_tm) off_tm
from all_tms t1
select
all_periods.on_tm,
all_periods.off_tm,
sum(case when elec.on_tm <= all_periods.on_tm
and elec.off_tm >= all_periods.off_tm
then elec.current
else 0
end) total_current,
group_concat(case when elec.on_tm <= all_periods.on_tm
and elec.off_tm >= all_periods.off_tm
then elec.lamp
end) lamps
from
all_periods,
elec
group by
all_periods.on_tm,
all_periods.off_tm
The views combine all of the start/stop times into distinct blocks as you have in your output (2-5,5-6, etc.).
The final SELECT evaluates each row from the original table against each time block. If the lamp was on (start time is before the start of the evaluation time, and stop time is after the end of the evaluation time), then its current is counted.
This assumes a sufficiently recent SQLite version; with earlier versions, you would have to replace the common table expressions with temporary views:
WITH all_times(T)
AS (SELECT "On" FROM MyTable
UNION
SELECT Off FROM MyTable),
intervals(T1, T2)
AS (SELECT T,
(SELECT min(T)
FROM all_times AS next_time
WHERE next_time.T > all_times.T) AS T2
FROM all_times
WHERE T2 IS NOT NULL)
SELECT T1,
T2,
(SELECT sum(Current)
FROM MyTable
WHERE T1 >= "On" AND T2 <= Off) AS Current_Sum,
(SELECT group_concat(lamp, '+')
FROM MyTable
WHERE T1 >= "On" AND T2 <= Off) AS Comment
FROM intervals
ORDER BY T1

Calculate time difference between rows

I currently have a database in the following format
ID | DateTime | PID | TIU
1 | 2013-11-18 00:15:00 | 1551 | 1005
2 | 2013-11-18 00:16:03 | 1551 | 1885
3 | 2013-11-18 00:16:30 | 9110 | 75527
4 | 2013-11-18 00:22:01 | 1022 | 75
5 | 2013-11-18 00:22:09 | 1019 | 1311
6 | 2013-11-18 00:23:52 | 1022 | 89
7 | 2013-11-18 00:24:19 | 1300 | 44433
8 | 2013-11-18 00:38:57 | 9445 | 2010
I have a scenario where I need to identify where there are gaps in processes more than 5 minutes by using the DateTime column.
An example of what I am trying to achieve is:
ID | DateTime | PID | TIU
3 | 2013-11-18 00:16:30 | 9110 | 75527
4 | 2013-11-18 00:22:01 | 1022 | 75
7 | 2013-11-18 00:24:50 | 1300 | 44433
8 | 2013-11-18 00:38:57 | 9445 | 2010
ID3 is the last row before a 6 minute 1 second gap, ID4 is the next row after it.
ID7 is the last row before a 14 minute 7 second gap, ID8 is the next record available.
I am trying to do this in SQL, however if needs be I can do this in C# to process instead.
I have tried a number of inner joins, however the table is over 3 million rows so performance suffers greatly.
This is a CTE solution but, as has been indicated, this may not always perform well - because we're having to compute functions against the DateTime column, most indexes will be useless:
declare #t table (ID int not null,[DateTime] datetime not null,
PID int not null,TIU int not null)
insert into #t(ID,[DateTime],PID,TIU) values
(1,'2013-11-18 00:15:00',1551,1005 ),
(2,'2013-11-18 00:16:03',1551,1885 ),
(3,'2013-11-18 00:16:30',9110,75527 ),
(4,'2013-11-18 00:22:01',1022,75 ),
(5,'2013-11-18 00:22:09',1019,1311 ),
(6,'2013-11-18 00:23:52',1022,89 ),
(7,'2013-11-18 00:24:19',1300,44433 ),
(8,'2013-11-18 00:38:57',9445,2010 )
;With Islands as (
select ID as MinID,[DateTime],ID as RecID from #t t1
where not exists
(select * from #t t2
where t2.ID < t1.ID and --Or by date, if needed
--Use 300 seconds to avoid most transition issues
DATEDIFF(second,t2.[DateTime],t1.[DateTime]) < 300
)
union all
select i.MinID,t2.[DateTime],t2.ID
from Islands i
inner join
#t t2
on
i.RecID < t2.ID and
DATEDIFF(second,i.[DateTime],t2.[DateTime]) < 300
), Ends as (
select MinID,MAX(RecID) as MaxID from Islands group by MinID
)
select * from #t t
where exists(select * from Ends e where e.MinID = t.ID or e.MaxID = t.ID)
This also returns a row for ID 1, since that row has no preceding row within 5 minutes of it - but that should be easy enough to exclude in the final select, if needed.
I've assumed we can use ID as a proxy for increasing dates - that if for two rows, the ID is higher in the second row, then the DateTime will also be later.
Islands is a recursive CTE. The top half (the anchor) just selects rows which do not have any preceding row within 5 minutes of themselves. We select the ID twice for those rows and also keep the DateTime around.
In the recursive portion, we try to find a new row from the table that can be "added on" to an existing Islands row - based on this new row being no more than 5 minutes later than the current end-point of the island.
Once the recursion is complete, we then exclude the intermediate rows that the CTE produces. E.g. for the "4" island, it generated the following rows:
4,00:22:01,4
4,00:22:09,5
4,00:23:52,6
4,00:24:19,7
And all that we care about is that final row where we've identified an "island" of time from ID 4 to ID 7 - that's what the second CTE (Ends) is finding for us.