How to limit results by SUM - sql

I have a table of events called event. For the purpose of this question it only has one field called date.
The following query returns me a number of events that are happening on each date for the next 14 days:
SELECT
DATE_FORMAT( ev.date, '%Y-%m-%d' ) as short_date,
count(*) as date_count
FROM event ev
WHERE ev.date >= NOW()
GROUP BY short_date
ORDER BY ev.start_date ASC
LIMIT 14
The result could be as follows:
+------------+------------+
| short_date | date_count |
+------------+------------+
| 2010-03-14 | 1 |
| 2010-03-15 | 2 |
| 2010-03-16 | 9 |
| 2010-03-17 | 8 |
| 2010-03-18 | 11 |
| 2010-03-19 | 14 |
| 2010-03-20 | 13 |
| 2010-03-21 | 7 |
| 2010-03-22 | 2 |
| 2010-03-23 | 3 |
| 2010-03-24 | 3 |
| 2010-03-25 | 6 |
| 2010-03-26 | 23 |
| 2010-03-27 | 14 |
+------------+------------+
14 rows in set (0.06 sec)
Let's say I want to dislay these events by date. At the same time I only want to display a maximum of 10 at a time. How would I do this?
Somehow I need to limit this result by the SUM of the date_count field but I do not know how.
Anybody run into this problem before?
Any help would be appreciated. Thanks
Edited:
The extra requirement (crucial one, oops) which I forgot in my original post, is that I only want whole days.
ie. Given the limit is 10, it would only return the following rows:
+------------+------------+
| short_date | date_count |
+------------+------------+
| 2010-03-14 | 1 |
| 2010-03-15 | 2 |
| 2010-03-16 | 9 |
+------------+------------+

use a date function to limit the 14 day range of date
use limit to display the first 10
SELECT
DATE_FORMAT( ev.date, '%Y-%m-%d' ) as short_date,
count(*) as date_count
FROM event ev
WHERE ev.date between NOW() and date_add(now(), interval 14 day)
GROUP BY date(short_date)
ORDER BY ev.start_date ASC
LIMIT 0,10

I think that using LIMIT 0, 10 will work for you.

Related

SQL Server - Counting total number of days user had active contracts

I want to count the number of days while user had active contract based on table with start and end dates for each service contract. I want to count the time of any activity, no matter if the customer had 1 or 5 contracts active at same time.
+---------+-------------+------------+------------+
| USER_ID | CONTRACT_ID | START_DATE | END_DATE |
+---------+-------------+------------+------------+
| 1 | 14 | 18.02.2021 | 18.04.2022 |
| 1 | 13 | 02.01.2019 | 02.01.2020 |
| 1 | 12 | 01.01.2018 | 01.01.2019 |
| 1 | 11 | 13.02.2017 | 13.02.2019 |
| 2 | 23 | 19.06.2021 | 18.04.2022 |
| 2 | 22 | 01.07.2019 | 01.07.2020 |
| 2 | 21 | 19.01.2019 | 19.01.2020 |
+---------+-------------+------------+------------+
In result I want a table:
+---------+--------------------+
| USER_ID | DAYS_BEEING_ACTIVE |
+---------+--------------------+
| 1 | 1477 |
| 2 | 832 |
+---------+--------------------+
Where
1477 stands by 1053 (days from 13.02.2017 to 02.01.2020 - user had active contracts during this time) + 424 (days from 18.02.2021 to 18.04.2022)
832 stands by 529 (days from 19.01.2019 to 01.07.2020) + 303 (days from 19.06.2021 to 18.04.2022).
I tried some queries with joins, datediff's, case when conditions but nothing worked. I'll be grateful for any help.
If you don't have a Tally/Numbers table (highly recommended), you can use an ad-hoc tally/numbers table
Example or dbFiddle
Select User_ID
,Days = count(DISTINCT dateadd(DAY,N,Start_Date))
from YourTable A
Join ( Select Top 10000 N=Row_Number() Over (Order By (Select NULL))
From master..spt_values n1, master..spt_values n2
) B
On N<=DateDiff(DAY,Start_Date,End_Date)
Group By User_ID
Results
User_ID Days
1 1477
2 832

Select only record until timestamp from another table

I have three tables.
The first one is Device table
+----------+------+
| DeviceId | Type |
+----------+------+
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
+----------+------+
The second one is History table - data received by different devices.
+----------+-------------+--------------------+
| DeviceId | Temperature | TimeStamp |
+----------+-------------+--------------------+
| 1 | 31 | 15.08.2020 1:42:00 |
| 2 | 100 | 15.08.2020 1:42:01 |
| 2 | 40 | 15.08.2020 1:43:00 |
| 1 | 32 | 15.08.2020 1:44:00 |
| 1 | 34 | 15.08.2020 1:45:00 |
| 3 | 20 | 15.08.2020 1:46:00 |
| 2 | 45 | 15.08.2020 1:47:00 |
+----------+-------------+--------------------+
The third one is DeviceStatusHistory table
+----------+---------+--------------------+
| DeviceId | State | TimeStamp |
+----------+---------+--------------------+
| 1 | 1(OK) | 15.08.2020 1:42:00 |
| 2 | 1(OK) | 15.08.2020 1:43:00 |
| 1 | 1(OK) | 15.08.2020 1:44:00 |
| 1 | 0(FAIL) | 15.08.2020 1:44:30 |
| 1 | 0(FAIL) | 15.08.2020 1:46:00 |
| 2 | 0(FAIL) | 15.08.2020 1:46:10 |
+----------+---------+--------------------+
I want to select the last temperature of devices, but take into account only those history records that occurs until the first device failure.
Since device1 starts failing from 15.08.2020 1:44:30, I don't want its records that go after that timestamp.
The same for the device2.
So as a final result, I want to have only data of all devices until they get first FAIL status:
+----------+-------------+--------------------+
| DeviceId | Temperature | TimeStamp |
+----------+-------------+--------------------+
| 2 | 40 | 15.08.2020 1:43:00 |
| 1 | 32 | 15.08.2020 1:44:00 |
| 3 | 20 | 15.08.2020 1:46:00 |
+----------+-------------+--------------------+
I can select an appropriate history only if device failed at least once:
SELECT * FROM Device D
CROSS APPLY
(SELECT TOP 1 * FROM History H
WHERE D.Id = H.DeviceId
and H.DeviceTimeStamp <
(select MIN(UpdatedOn) from DeviceStatusHistory Y where [State]=0 and DeviceId=D.Id)
ORDER BY H.DeviceTimeStamp desc) X
ORDER BY D.Id;
The problems is, if a device never fails, I don't get its history at all.
Update:
My idea is to use something like this
SELECT * FROM DeviceHardwarePart HP
CROSS APPLY
(SELECT TOP 1 * FROM History H
WHERE HP.Id = H.DeviceId
and H.DeviceTimeStamp <
(select ISNULL((select MIN(UpdatedOn) from DeviceMetadataPart where [State]=0 and DeviceId=HP.Id),
cast('12/31/9999 23:59:59.997' as datetime)))
ORDER BY H.DeviceTimeStamp desc) X
ORDER BY HP.Id;
I'm not sure whether it is a good solution
You can use COALESCE: coalesce(min(UpdateOn), cast('9999-12-31 23:59:59' as datetime)). This ensures you always have an upperbound for your select instead of NULL.
I will treat this as two parts problem
I will try to find the time at which device has failed and if it hasn't failed I will keep it as a large value like some timestamp in 2099
Once I have the above I can simply join with histories table and take the latest value before the failed timestamp.
In order to get one, I guess there can be several approaches. From top of my mind something like below should work
select device_id, coalesce(min(failed_timestamps), cast('01-01-2099 01:01:01' as timestamp)) as failed_at
(select device_id, case when state = 0 then timestamp else null end as failed_timestamps from History) as X
group by device_id
This gives us the minimum of failed timestamp for a particular device, and an arbitrary large value for the devices which have never failed.
I guess after this the solution is straight forward.

Grouping Data 3 Hours after the Initial Time

I need to be able to filter down a dataset to only show the first instance every 3 hours. If an instance is found, any other instances that occur up to 3 hours afterwards should be hidden.
The closes thing I've been able to find is using date_trunc to get the first instance each hour, but I need to hide specifically up to 3 hours after the first instance exactly.
Example Data:
+------------------------+-------+
| Timestamp | Value |
+------------------------+-------+
| "2015-12-29 13:35:00" | 65 |
| "2015-12-29 13:40:00" | 26 |
| "2015-12-29 13:45:00" | 80 |
| "2015-12-29 13:50:00" | 10 |
| "2015-12-29 16:40:00" | 76 |
| "2015-12-29 16:45:00" | 73 |
| "2016-01-04 08:05:00" | 87 |
| "2016-01-04 08:10:00" | 90 |
| "2016-01-04 08:15:00" | 52 |
| "2016-01-04 08:20:00" | 90 |
| "2016-01-04 08:25:00" | 23 |
| "2016-01-04 08:30:00" | 96 |
| "2016-01-04 13:35:00" | 53 |
| "2016-01-04 13:40:00" | 15 |
| "2016-01-04 13:45:00" | 85 |
+------------------------+-------+
Expected Result:
+------------------------+-------+
| Timestamp | Value |
+------------------------+-------+
| "2015-12-29 13:35:00" | 65 |
| "2015-12-29 16:40:00" | 76 |
| "2016-01-04 08:05:00" | 87 |
| "2016-01-04 13:30:00" | 7 |
+------------------------+-------+
Anyone have any ideas? Thank you so much for your help.
This is tricky, because you need to keep track of the last picked record to identify the next one - so you can't just group by 3 hours intervals.
Here is one approach using a recursive cte:
with recursive cte(ts, value) as (
select ts, value
from mytable
where ts = (select min(ts) from mytable)
union all
select x.*
from (select ts from cte order by ts desc limit 1) c
cross join lateral (
select t.ts, t.value
from mytable t
where t.ts >= c.ts + interval '3' hour
order by t.ts
limit 1
) x
)
select * from cte order by ts
The idea is to start from the earliest record in the table, then iterate by picking the first available record that is at least 3 hours later (this assumes no duplicates in the timestamp column).
Note that timestamp is not a good choice for a column name, because it conflicts with a language keyword (that's a datatype). I remaned it to ts in the query.
Demo on DB Fiddle:
ts | value
:------------------ | ----:
2015-12-29 13:35:00 | 65
2015-12-29 16:40:00 | 76
2016-01-04 08:05:00 | 87
2016-01-04 13:35:00 | 53

Where clause changing the results of my datediff column, How can I work around this?

I'm trying to obtain the time elapsed while st1=5. Here is what I currently have, which gives me the datediff time for each state change. My issue is that when i add a where st1=5 clause the datediff shows the difference in time between instances where the state = 5 instead of time elapsed where state is 5.
select timestamp,st1,st2,st3,st4,
datediff(second, timestamp, lead(timestamp)
over (order by timestamp)) as timediff
from A6K_status
Order By Timestamp DESC
+-----+-----+-----+-----+---------------------+----------+
| st1 | st2 | st3 | st4 | TimeStamp | TimeDiff |
+-----+-----+-----+-----+---------------------+----------+
| 3 | 3 | 3 | 3 | 2018-07-23 07:51:06 | |
+-----+-----+-----+-----+---------------------+----------+
| 5 | 5 | 5 | 5 | 2018-07-23 07:50:00 | 66 |
+-----+-----+-----+-----+---------------------+----------+
| 0 | 0 | 10 | 10 | 2018-07-23 07:47:19 | 161 |
+-----+-----+-----+-----+---------------------+----------+
| 5 | 5 | 5 | 5 | 2018-07-23 07:39:07 | 492 |
+-----+-----+-----+-----+---------------------+----------+
| 3 | 3 | 10 | 10 | 2018-07-23 07:37:48 | 79 |
+-----+-----+-----+-----+---------------------+----------+
| 3 | 3 | 10 | 10 | 2018-07-23 07:37:16 | 32 |
+-----+-----+-----+-----+---------------------+----------+
I am trying to sum the time that the state of station1 is 5. From this table above(what I have right now) if i could just sum timediff Where st1=5 that would work perfectly. But by adding "where st1=5" to my query gives me the time difference between instances where the state = 5.
Any help would be much appreciated. I feel very close to the result I would like to achieve. Thanks you.
Edit
This is what I would like to achieve
+-----+------------+----------+
| st1 | TimeStamp | TimeDiff |
+-----+------------+----------+
| 5 | 2018-07-23 | 558 |
+-----+------------+----------+
You would use a subquery (or CTE):
select sum(timediff)
from (select timestamp, st1, st2, st3, st4,
datediff(second, timestamp, lead(timestamp) over (order by timestamp)) as timediff
from A6K_status
) s
where st1 = 5;
Assuming SQL Server, try something like this:
WITH SourceTable AS (
select TOP 100 PERCENT timestamp,st1,st2,st3,st4,
datediff(second, timestamp, lead(timestamp)
over (order by timestamp)) as timediff
from A6K_status
Order By Timestamp DESC
)
SELECT SUM(timediff) as totaltimediff
WHERE st1 = 5

Today vs weeks ago with aggregate function

I'm working on the following presto/sql query using inline filter to get side by side comparison of current date range vs weeks ago data.
In my case query current date range is 2017-09-13 to 2017-09-14.
So far I'm able to get the following results, but unfortunately this is not what I want.
Any kind of help would be greatly appreciated.
SELECT
DATE_TRUNC('day',DATE_PARSE(CAST(sample.datep AS VARCHAR),'%Y%m%d')) AS date,
CAST(SUM(sample.page_views) FILTER (WHERE sample.datep BETWEEN 20170913 AND 20170914) AS DOUBLE) AS page_views,
CAST(SUM(sample.page_views) FILTER (WHERE sample.datep BETWEEN 20170906 AND 20170907) AS DOUBLE) AS page_views_weeks_ago
FROM
sample
WHERE
(
datep BETWEEN 20170906 AND 20170914
)
GROUP BY
1
ORDER BY
1 ASC
LIMIT 50
Actual result:
+------------+------------+----------------------+
| date | page_views | page_views_weeks_ago |
+------------+------------+----------------------+
| 2017-09-06 | 0 | 990,929 |
| 2017-09-07 | 0 | 913,802 |
| 2017-09-08 | 0 | 0 |
| 2017-09-09 | 0 | 0 |
| 2017-09-10 | 0 | 0 |
| 2017-09-11 | 0 | 0 |
| 2017-09-12 | 0 | 0 |
| 2017-09-13 | 1,507,715 | 0 |
| 2017-09-14 | 48,625 | 0 |
+------------+------------+----------------------+
Expected result:
+------------+------------+----------------------+
| date | page_views | page_views_weeks_ago |
+------------+------------+----------------------+
| 2017-09-13 | 1,507,715 | 990,929 |
| 2017-09-14 | 48,625 | 913,802 |
+------------+------------+----------------------+
You can achieve with joining a table with itself as a previous day. For brevity, I assume that we have a date field so that date substructions can be done easily.
SELECT date,
SUM(curr.page_views) AS page_views,
SUM(prev.page_views) AS page_views_weeks_ago
FROM sample curr
JOIN sample prev ON curr.date - 7 = prev.date
GROUP BY 1
ORDER BY 1 ASC