I have a simplified version of the table with information about the status of an instrument. I am trying to find the total time from each status. The DateAdded is a timestamp indicating the beginning of the status and the next entry would be the end of the first status and beginning of the next status.
+--------------------+-----------+-------------------------+
| InstrumentStatusId | Statename | DateAdded |
+--------------------+-----------+-------------------------+
| 737062 | alarming | 2018-03-14 00:37:51.423 |
| 737064 | running | 2018-03-14 00:38:12.410 |
| 737065 | running | 2018-03-14 00:38:21.443 |
| 737149 | alarming | 2018-03-14 01:45:03.433 |
| 737152 | error | 2018-03-14 01:45:39.443 |
| 737153 | idle | 2018-03-14 01:45:42.457 |
| 737154 | running | 2018-03-14 01:45:42.460 |
| 737155 | idle | 2018-03-14 01:45:45.490 |
| 737356 | running | 2018-03-14 04:20:21.350 |
| 737382 | idle | 2018-03-14 04:36:03.433 |
| 737383 | running | 2018-03-14 04:36:03.437 |
| 737384 | idle | 2018-03-14 04:36:06.463 |
| 737890 | running | 2018-03-14 10:13:00.313 |
| 738201 | alarming | 2018-03-14 11:10:41.120 |
| 738204 | idle | 2018-03-14 11:11:11.120 |
|
+--------------------+-----------+-------------------------+
I am having trouble figuring out a solution that takes into account multiple same statuses and calculating the time difference between statuses. I have seen similar questions but can't find a solution that has helped me.
I have a sqlfiddle to play with the data.
This answer interprets "status" as being synonymous with statename.
I think you just need the first record for each status. To get that, use lag(), then lead() on the result, then aggregation:
select statename,
sum(datediff(second, dateadded, next_dateadded)) as total_seconds
from (select s.*,
lead(dateadded) over (order by dateadded) as next_dateadded
from (select s.*, lag(statename) over (order by dateadded) as prev_statename
from instrumentstatus s
) s
where prev_statename is null or prev_statename <> statename
) s
group by statename;
Here is a SQL Fiddle.
Related
Using MS access SQL I have a query (actually a UNION made of multiple queries) and need a cumulative sum (actually a statement of account which items are in chronological order).
How do I get a cumulative sum?
Since they are duplicates by date I have to add a new ID, however, SQL in MS access does not seem to have ROW_ID or similar.
So, we need to sort donation data into chronological order across multiple tables with duplicates. First combine all the tables of donators in one query which sets up the simplest syntax. Then to put things in order we need to have an order for the duplicate dates. The dataset has two natural ways to sort duplicate dates including the donator and the amount. For instance, we could decide that after the date bigger donations come first, If the rule is complicated enough we abstract it to a code module and into public function and include it in the query so that we can sort by it:
'Sorted Donations:'
SELECT (BestDonator(q.donator)) as BestDonator, *
FROM tblCountries as q
UNION SELECT (BestDonator(j.donator)) as BestDonator, *
FROM tblIndividuals as j
ORDER BY EvDate Asc, Amount DESC , BestDonator DESC;
Public Function BestDonator(donator As String) As Long
BestDonator = Len(donator) 'longer names are better :)'
End Function
with sorted donations we have settled on an order for the duplicate dates and have combined both individual donations and country donations, so now we can calculate the running sum directly using either dsum or a subquery. There is no need to calculate row id. The tricky part is getting the syntax correct. I ended up abstracting the running sum calculation to a function and omitting BestDonator because I couldn't easily paste together this query in the query designer and I ran out of time to bug fix
Public Function RunningSum(EvDate As Date, Amount As Currency)
RunningSum = DSum("Amount", "Sorted Donations", "(EvDate < #" & [EvDate] & "#) OR (EvDate = #" & [EvDate] & "# AND Amount >= " & [Amount] & ")")
End Function
Carefully note the OR in the Dsum part of the RunningSum calculation. This is the tricky part to summing the right amounts.
'output
-------------------------------------------------------------------------------------
| donator | EvDate | Amount | RunningSum |
-------------------------------------------------------------------------------------
| Reiny | 1/10/2020 | 321 | 321 |
-------------------------------------------------------------------------------------
| Czechia | 3/1/2020 | 7455 | 7776 |
-------------------------------------------------------------------------------------
| Germany | 3/18/2020 | 4222 | 11998 |
-------------------------------------------------------------------------------------
| Jim | 3/18/2020 | 222 | 12220 |
-------------------------------------------------------------------------------------
| Australien | 4/15/2020 | 13423 | 25643 |
-------------------------------------------------------------------------------------
| Mike | 5/31/2020 | 345 | 25988 |
-------------------------------------------------------------------------------------
| Portugal | 6/6/2020 | 8755 | 34743 |
-------------------------------------------------------------------------------------
| Slovakia | 8/31/2020 | 3455 | 38198 |
-------------------------------------------------------------------------------------
| Steve | 9/6/2020 | 875 | 39073 |
-------------------------------------------------------------------------------------
| Japan | 10/10/2020 | 5234 | 44307 |
-------------------------------------------------------------------------------------
| John | 10/11/2020 | 465 | 44772 |
-------------------------------------------------------------------------------------
| Slowenia | 11/11/2020 | 4665 | 49437 |
-------------------------------------------------------------------------------------
| Spain | 11/22/2020 | 7677 | 57114 |
-------------------------------------------------------------------------------------
| Austria | 11/22/2020 | 3221 | 60335 |
-------------------------------------------------------------------------------------
| Bill | 11/22/2020 | 767 | 61102 |
-------------------------------------------------------------------------------------
| Bert | 12/1/2020 | 755 | 61857 |
-------------------------------------------------------------------------------------
| Hungaria | 12/24/2020 | 9996 | 71853 |
-------------------------------------------------------------------------------------
I have to find the next higher sequence-number depending on certain where-conditions:
TABLE:
+-------+------------------+---------------------------+
| Seq | Start_Time | Queue |
+-------+------------------+---------------------------+
| 34962 | 28.07.2020 17:06 | PQ_NEW PRICE REQUEST GMDM |
| 35393 | 29.07.2020 11:03 | |
| 35394 | 29.07.2020 11:03 | |
| 42886 | 04.09.2020 14:16 | PQ_NEW PRICE REQUEST GMDM |
| 42887 | 04.09.2020 14:16 | PQ_NEW PRICE REQUEST GMDM |
| 42888 | 04.09.2020 14:16 | |
| 42889 | 04.09.2020 14:16 | |
| 42890 | 04.09.2020 14:17 | PQ_COST SWEDEN |
| 42891 | 04.09.2020 14:17 | PQ_COST SWEDEN |
| 42892 | 04.09.2020 14:17 | |
| 42893 | 04.09.2020 14:17 | |
| 42894 | 04.09.2020 14:17 | PQ_NEW PRICE REQUEST GMDM |
| 42895 | 04.09.2020 14:17 | PQ_NEW PRICE REQUEST GMDM |
+-------+------------------+---------------------------+
Example select:
SELECT
start_time
FROM table
WHERE
queue <> 'PQ_NEW PRICE REQUEST GMDM'
AND seq **IS NEXT HIGHER SEQ-VALUE COMPARED TO** (SELECT seq
FROM table
WHERE
queue = 'PQ_NEW PRICE REQUEST GMDM'
AND seq = MIN(seq))
Expected result from table for NEXT HIGHER SEQ-VALUE COMPARED TO:
42890
This would be the next higher number where the condition is met, based on the minimum-sequence number and the condition in the sub-select (34962).
How can I find exactly the next higher sequence-number under certain where-conditions?
Is there even an Oracle-SQL-command? By the way: order by is not an option for the scenario I need it.
SELECT *,
(
select min(seq) from table t2
where t2.seq > t.seq and queue = 'PQ_NEW PRICE REQUEST GMDM'
) as next_seq
FROM table t
WHERE queue <> 'PQ_NEW PRICE REQUEST GMDM';
Try ranking function after calculating the time differences.
I want to return and operate on time values based on their related event values, but only if a specific sequence of events occurs. A simplified example table below:
+--------+------------+-------+-------------+-------+-------------+-------+-------------+-------+-------------+-------+
| id | event1 | time1 | event2 | time2 | event3 | time3 | event4 | time4 | event5 | time5 |
+--------+------------+-------+-------------+-------+-------------+-------+-------------+-------+-------------+-------+
| abc123 | firstevent | 10:00 | secondevent | 10:01 | thirdevent | 10:02 | fourthevent | 10:03 | fifthevent | 10:04 |
| abc123 | thirdevent | 10:10 | secondevent | 10:11 | thirdevent | 10:12 | firstevent | 10:13 | secondevent | 10:14 |
| def456 | thirdevent | 10:20 | firstevent | 10:21 | secondevent | 10:22 | thirdevent | 10:24 | fifthevent | 10:25 |
+--------+------------+-------+-------------+-------+-------------+-------+-------------+-------+-------------+-------+
For this table we want to retrieve the times whenever this particular sequence of events occurs: firstevent, secondevent, thirdevent, and a final event of any non-zero value. Meaning the relevant entries returned would be the following:
+--------+------------+-------+-------------+-------+-------------+-------+-------------+-------+------------+-------+
| id | event1 | time1 | event2 | time2 | event3 | time3 | event4 | time4 | event5 | time5 |
+--------+------------+-------+-------------+-------+-------------+-------+-------------+-------+------------+-------+
| abc123 | firstevent | 10:00 | secondevent | 10:01 | thirdevent | 10:02 | fourthevent | 10:03 | null | null |
| null | null | null | null | null | null | null | null | null | null | null |
| def456 | null | null | firstevent | 10:21 | secondevent | 10:22 | thirdevent | 10:24 | fifthevent | 10:26 |
+--------+------------+-------+-------------+-------+-------------+-------+-------------+-------+------------+-------+
As shown above the columns are irrelevant to the occurrence of the sequence, with two results being returned starting in both the event1 and event2 columns, thus the solution should be independent and support n number of columns. These values can then be aggregated by the final non-zero event that occurs in the sequence after the 3 fixed variables to give something like the following:
+-------------+-------------------------------+
| FinalEvent | AverageTimeBetweenFinalEvents |
+-------------+-------------------------------+
| fourthevent | 1:00 |
| fifthevent | 2:00 |
+-------------+-------------------------------+
Below is for BigQuery Standard SQL
#standardSQL
WITH search_events AS (
SELECT ['firstevent', 'secondevent', 'thirdevent'] search
), temp AS (
SELECT *, REGEXP_EXTRACT(events, CONCAT(search, r',(\w*)')) FinalEvent
FROM (
SELECT id, [time1, time2, time3, time4, time5] times,
(SELECT STRING_AGG(event) FROM UNNEST([event1, event2, event3, event4, event5]) event) events,
(SELECT STRING_AGG(search) FROM UNNEST(search) search) search
FROM `project.dataset.table`, search_events
)
)
SELECT FinalEvent,
times[SAFE_OFFSET(ARRAY_LENGTH(REGEXP_EXTRACT_ALL(REGEXP_EXTRACT(events, CONCAT(r'(.*?)', search, ',', FinalEvent )), ',')) + 3)] time
FROM temp
WHERE IFNULL(FinalEvent, '') != ''
If to apply to sample data from your question - result is
Row FinalEvent time
1 fourthevent 10:03
2 fifthevent 10:25
So, as you can see - all final events are extracted along with their respective times
Now, you can do here whatever analytics you need - I was not sure about logic behind AverageTimeBetweenFinalEvents, so I am leaving this to you - especially that I think that main focus of the question was extraction of those final events
would you be able to provide the logic behind this statement please?
times[SAFE_OFFSET(ARRAY_LENGTH(REGEXP_EXTRACT_ALL(REGEXP_EXTRACT(events, CONCAT(r'(.*?)', search, ',', FinalEvent )), ',')) + 3)] time
Sure, hope below helps to get a logic behind that expression
assemble regular expression to extract list of events happened before matched events
extract those events
extract all commas into array
calculate position of final event by taking number of commas in above array + 3 (three is to reflect number of positions in search sequence)
extract respective time as an element of times array
This is a bit of a complicated question to ask, but I am sure someone here will know the answer in about 2 minutes and I'll feel stupid.
What I have is a table of routes, delivery names, and delivery times. Let's say it looks like this:
+------------+---------------+-------+
| ROUTE CODE | NAME | TIME |
+------------+---------------+-------+
| A | McDonald's | 5:30 |
| A | Arby's | 5:45 |
| A | Burger King | 6:00 |
| A | Wendy's | 6:30 |
| B | Arby's | 7:45 |
| B | Arby's | 7:45 |
| B | Burger King | 8:30 |
| B | McDonald's | 9:00 |
| C | Wendy's | 9:30 |
| C | Lion's Choice | 8:15 |
| C | Steak N Shake | 9:50 |
| C | Hardee's | 10:30 |
+------------+---------------+-------+
What I want the result to return is something like this:
+------------+---------------+------+
| ROUTE CODE | NAME | TIME |
+------------+---------------+------+
| A | McDonald's | 5:30 |
| B | Arby's | 7:45 |
| C | Lion's Choice | 8:15 |
+------------+---------------+------+
So what I want is the name of the minimum time for each route code.
I have written a query that gets me most of the way there (and feel free to improve upon this query if you think there is a more efficient way to do it):
SELECT main1.route_code, main1.first_stop, main2.name
FROM
(SELECT route_code, min(time) as first_stop FROM table1 WHERE date = yesterday GROUP BY route_code) main1
JOIN
(SELECT route_code, name, time FROM table1 WHERE date = yesterday) main2
ON main1.route_code = main2.route_code and main1.first_stop = main2.time
Here is where I need your help though. If I have identical times, it returns that row twice, and I only want it once. So for instance, the above query would return Arby's for route code "B" twice because it has the same time. I only want to see that once, I never want to see anything from a route more than once.
Can anyone help me? Thanks much!
In Postgres, you can use distinct on:
select distinct on (route_code) t.*
from table1 t
order by route_code, time asc;
This is likely to be the fastest method in Postgres. For performance, an index on (route_code, time) is recommended.
Here's another way to get your result that you may or may not like better:
SELECT route_name, time, name FROM
(SELECT *, ROW_NUMBER() OVER (PARTITION BY route_code ORDER BY time ASC) row_num FROM table1) subq
WHERE row_num = 1;
So I have been doing pretty well on my project (Link to previous StackOverflow question), and have managed to learn quite a bit, but there is this one problem that has been really dogging me for days and I just can't seem to solve it.
It has to do with using the UNIX_TIMESTAMP call to convert dates in my SQL database to UNIX time-format, but for some reason only one set of dates in my table is giving me issues!
==============
So these are the values I am getting -
#abridged here, see the results from the SELECT statement below to see the rest
#of the fields outputted
| firstVst | nextVst | DOB |
| 1206936000 | 1396238400 | 0 |
| 1313726400 | 1313726400 | 278395200 |
| 1318910400 | 1413604800 | 0 |
| 1319083200 | 1413777600 | 0 |
when I use this SELECT statment -
SELECT SQL_CALC_FOUND_ROWS *,UNIX_TIMESTAMP(firstVst) AS firstVst,
UNIX_TIMESTAMP(nextVst) AS nextVst, UNIX_TIMESTAMP(DOB) AS DOB FROM people
ORDER BY "ref DESC";
So my big question is: why in the heck are 3 out of 4 of my DOBs being set to date of 0 (IE 12/31/1969 on my PC)? Why is this not happening in my other fields?
I can see the data quite well using a more simple SELECT statement and the DOB field looks fine...?
#formatting broken to change some variable names etc.
select * FROM people;
| ref | lastName | firstName | DOB | rN | lN | firstVst | disp | repName | nextVst |
| 10001 | BlankA | NameA | 1968-04-15 | 1000000 | 4600000 | 2008-03-31 | Positive | Patrick Smith | 2014-03-31 |
| 10002 | BlankB | NameB | 1978-10-28 | 1000001 | 4600001 | 2011-08-19 | Positive | Patrick Smith | 2011-08-19 |
| 10003 | BlankC | NameC | 1941-06-08 | 1000002 | 4600002 | 2011-10-18 | Positive | Patrick Smith | 2014-10-18 |
| 10004 | BlankD | NameD | 1952-08-01 | 1000003 | 4600003 | 2011-10-20 | Positive | Patrick Smith | 2014-10-20 |
It's because those DoB's are from before 12/31/1969, and the UNIX epoch starts then, so anything prior to that would be negative.
From Wikipedia:
Unix time, or POSIX time, is a system for describing instants in time, defined as the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970, not counting leap seconds.
A bit more elaboration: Basically what you're trying to do isn't possible. Depending on what it's for, there may be a different way you can do this, but using UNIX timestamps probably isn't the best idea for dates like that.