How to create an SQL time-in-location table from location/timestamp SQL data stream - sql

I have a question that I'm struggling with in SQL.
I currently have a series of location and timestamp data. It consists of devices in locations at varying timestamps. The locations are repeated, so while they are lat/long coordinates there are several that repeat. The timestamp comes in irregular intervals (sometimes multiple times a second, sometimes nothing for 30 seconds). For example see the below representational data (I am sorting by device name in this example, but could order by anything if it would help):
Device Location Timestamp
X A 1
X A 1.7
X A 2
X A 3
X B 4
X B 5.2
X B 6
X A 7
X A 8
Y A 2
Y A 4
Y C 6
Y C 7
I wish to create a table based on the above data that would show entry/exit or first/last time in each location, with the total duration of that instance. i.e:
Device Location EntryTime ExitTime Duration
X A 1 3 2
X B 4 6 2
X A 7 8 1
Y A 2 4 2
Y C 6 7 1
From here I could process it further to work out a total time in location for a given day, for example.
This is something I could do in Python or some other language with something like a while loop, but I'm really not sure how to accomplish this in SQL.
It's probably worth noting that this is in Azure SQL and I'm creating this table via a Stream Analytics Query to an Event Hubs instance.
The reason I don't want to just simply total all in a location is because it is going to be streaming data and rolling through for a display for say, the last 24 hrs.
Any hints, tips or tricks on how I might accomplish this would be greatly appreciated. I've looked and haven't be able to quite find what I'm looking for - I can see things like datediff for calculating duration between two timestamps, or max and min for finding the first and last dates, but none quite seem to tick the box. The challenge I have here is that the devices move around and come back to the same locations many times within the period. Taking the first occurrence/timestamp of device X at location A and subtracting it from the last, for example, doesn't take into account the other locations it may have traveled to in between those timestamps. Complicating things further, the timestamps are irregular, so I can't simply count the number of occurrences for each location and add them up either.
Maybe I'm missing something simple or obvious, but this has got me stumped! Help would be greatly appreciated :)

I believe grouping would work
SELECT Device, Location, [EntryTime] = MIN(Timestamp), [ExitTime] = Max(Timestamp), [Duration] = MAX(Timestamp)- MIN(Timestamp)
FROM <table>
GROUP BY Device, Location

I was working on similar issue, to some extent in my dataset.
SELECT U.*, TO_DATE(U.WEND,'DD-MM-YY HH24:MI') - TO_DATE(U.WSTART,'DD-MM-YY HH24:MI') AS DURATION
FROM
(
SELECT EMPNAME,TLOC, TRUNC(TO_DATE(T.TDATETIME,'DD-MM-YY HH24:MI')) AS WDATE, MIN(T.TDATETIME) AS WSTART, MAX(T.TDATETIME) AS WEND FROM EMPTRCK_RSMSM T
GROUP BY EMPNAME,TLOC,TRUNC(TO_DATE(T.TDATETIME,'DD-MM-YY HH24:MI'))
) U

Related

Create chart where the columns of the data set are the categories - Report Builder 3.0

I have a very simple chart that I am wanting to add but I can't for the life of me figure it out. The chart is referencing a dataset that returns data like this. It is calculating the sum of each Location and then using Rollup to produce a Total Count for each Week Column
Location CurrentWeek PreviousWeek 2WeeksAgo
======== =========== =========== ===========
North 5 6 3
South 4 3 1
East 8 2 3
West 2 7 0
Total 19 18 7
What I am wanting to do is have the X Axis (horizontal) represented by the CurrentWeek, PreviousWeek, 2WeeksAgo columns and plot the "Total" values from each respective column.
Adding Snip...
Sample Chart
Thanks for adding the image.
So we have a few steps to get to where we need to be - first, we need to transform the data into a format that's easier and more scalable to work with (if we ever add a "3 weeks ago" column, we don't want to have to rework everything). The desired format is:
Date Amount
Current Week 19
1 week ago 18
2 weeks ago 17
Personally - instead of naming stuff "current week", "1 week ago" etc., I would have a WeeksPrior column where 0 would mean the current week, 1 would mean a week ago and so on.
Anyways, to get from your sample table to the more standardized input, we have to use an unpivot (these always hurt my brain, but the docs have some good examples you can use).
SELECT
*
, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS Ordinal
--this is hacky, but ordering by (select null) allows us to assign a row number by the default order
FROM (SELECT 'Total' AS Location, 19 AS CurrentWeek, 18 AS PreviousWeek, 7 AS [2WeeksAgo]) x
--This is the test data, replace this with your actual query
UNPIVOT (Value FOR Date IN ([CurrentWeek], [PreviousWeek], [2WeeksAgo])) y
--This unpivots the test data, converting the separate columns into a single [Date] column, and assigning the values to the [Value] column.
This will spit out the following:
Location Value Date Ordinal
Total 19 CurrentWeek 1
Total 18 PreviousWeek 2
Total 7 2WeeksAgo 3
From here, we add the data to the chart. This is pretty straighforward, but there are a few "gotchas" to be wary of.
First, we'll add the Value column as a chart value, and the Ordinal column as a category group.
Let's see what the chart looks like right now by running the report.
Well, it's getting there, but we want our labels on the bottom. To do this, we go into the Ordinal category group's properties and switch the label to the date column. Make sure you're still sorting by Ordinal, since SSRS doesn't know what "1 week ago" means relative to "Current Week", and will sort alphabetically or randomly if you don't tell it to sort by ordinal.
We can also clean up the chart a bit by removing the legend and changing the major tick mark line style to "solid" on the horizontal axis., leaving us something that looks like this:
Adding a label to the vertical axis would probably also help readability, as would adding hover-text to the points on the chart.

How to display monthly revenue splitting between new users, resurrected users, and churned etc users in sql?

So say I have 3 fields in my table - month, user, and revenue. I am trying to build a table that shows me total revenue by month broken down by the revenue type - new (which means it's a user that spent for the first time)
- returning (which means it's a user that did not spend in the previous month but that spent in the past and spent in the month in question)
- expansion (which means it's a user that spent in the previous month but spent more this time)
- contraction (which means it's a user that spent in the previous month but spent less this time)
- churned (which means it's a user that spent in the previous month but not in this month...so this will actually be negative and will not add to or take away from the total revenue). So basically:
New+
Expansion+Resurrected+Contraction+Remaining+= Total then (Churned)
Exemplary table with the added fields to give a sense of the math
I'm ultimately looking to build a stacked column chart like this one:
So it looks like you need to solve for
2+agoMo PriorMo ThisMo
0 0 x New
x 0 x Returning/Resurected
? w < x Expansion
? z > x Contraction
? y 0 Churned
? x = x >0 Remaining
? 0 0 Sleeping
What have you coded already? what results are you getting?

Select all rows between two specific times across multiple days

I'm managing an SQLite database containing messages along with associated timestamps collected over the past month, and I wish to select all of the entries, for all days, between two given hours. In pseudocode style: SELECT * FROM Messages BETWEEN x AND y, where x might be 14:45 and y 15:45, and returning all the messages between x and y for all the days over the past month.
Is there a straightforward way of performing this in SQLite?
Thanks in advance.
I think you are looking for something like this:
SELECT *
FROM MESSAGES
WHERE time(timestamp) >= time('14:45:00')
AND time(timestamp) <= time('15:45:00')
Fiddle: http://sqlfiddle.com/#!9/cde2c/1/0
I took information from: http://www.sqlite.org/cvstrac/wiki?p=DateAndTimeFunctions

In Crystal Report print only first record in group and leave it summable

I have a table that lists every task an operator completed during a day. This is gathered by a Shop Floor Control program. There is also a column that has the total hours worked that day, this field comes from their time punches. The table looks something like this:
Operator 1 Bestupid 0.5 8 5/12/1986
Operator 1 BeProductive 0.1 8 5/12/1986
Operator 1 Bestupidagain 3.2 8 5/12/1986
Operator 1 Belazy 0.7 8 5/13/1986
Operator 2 BetheBest 1.7 9.25 5/12/1986
I am trying to get an efficiency out of this by summing the process hours and comparing it to the hours worked. The problem is that when I do any kind of summary on the hours worked column it sums EVERY DETAIL LINE.
I have tried:
If Previous (groupingfield) = (groupingfield) Then
HoursWorked = 0
Else
HoursWorked = HoursWorked
I have tried a global three formula trick, but neither of the above leave me with a summable field, I get "A summary has been specified on a non-recurring field"
I currently use a global variable, reset in the group header, but not WhilePrintinganything. However it is missing some records and upon occasion I will get two hoursworked > 0 in the same group :(
Any ideas?
I just want to clarify, I have three groups:
Groups: Work Center --> Operator --> Date
I can summarize the process hours across any group and that's fine. However, the hours worked prints on every detail line even though it really should only print once per Date. Therefore when I summarize the Hours Worked for an operator the total is WAY off because it is adding up 8hours for each entry instead of 8 hours for each day.
Try grouping by the operators. Then create a running total for the process hours that sum for each record and reset on change of group. In the group footer you can display the running total and any other stats for that operator you care to.
Try another running total for the daily hours but pick maximum as the type of summary. Since all the records for the day will have the same hours work the maximum will be correct. Reset with the change of the date group and you should be good to go.

How do I find records that have the same value in adjacent records in Sql Server? (I believe the correct term for this is a region??)

Finding the start and end time for adjacent records that have the same value?
I have a table that contains heart rate readings (in beats per minute) and datetime field. (Actually the fields are heartrate_id, heartrate, and datetime.) The data are generated by a device that records the heart rate and time every 6 seconds. Sometimes the heart rate monitor will give false readings and the recorded beats per minute will "stick" for an period of time. By sticks, I mean the beats per minute value will be identical in adjacent times.
Basically I need to find all the records where the heart rate is the same (e.g. 5 beats per minute, 100 beats per minute, etc.) in but only on adjacent records. If the device records 25 beats per minute for 3 consecutive reading (or 100 consecutive readings) I need to locate these events. The results need to have the heartrate, time the heartrate started, and the time the heart rate ended and ideally the results would look more of less like this:
heartrate starttime endtime
--------- --------- --------
1.00 21:12:00 21:12:24
35.00 07:00:12 07:00:36
I've tried several different approaches but so far I'm striking out. Any help would be greatly appreciated!
EDIT:
Upon review, none of my original work on this answer was very good. This actually belongs to the class of problems known as gaps-and-islands, and this revised answer will use information I've gleaned from similar questions/learned since first answering this question.
It turns out this query can be done a lot more simply than I originally thought:
WITH Grouped_Run AS (SELECT heartRate, dateTime,
ROW_NUMBER() OVER(ORDER BY dateTime) -
ROW_NUMBER() OVER(PARTITION BY heartRate ORDER BY dateTime) AS groupingId
FROM HeartRate)
SELECT heartRate, MIN(dateTime), MAX(dateTime)
FROM Grouped_Run
GROUP BY heartRate, groupingId
HAVING COUNT(*) > 2
SQL Fiddle Demo
So what's happening here? One of the definitions of gaps-and-islands problems is the need for "groups" of consecutive values (or lack thereof). Often sequences are generated to solve this, exploiting an often overlooked/too-intuitive fact: subtracting sequences yields a constant value.
For example, imagine the following sequences, and the subtraction (the values in the rows are unimportant):
position positionInGroup subtraction
=========================================
1 1 0
2 2 0
3 3 0
4 1 3
5 2 3
6 1 5
7 4 3
8 5 3
position is a simple sequence generated over all records.
positionInGroup is a simple sequence generated for each set of different records. In this case, there's actually 3 different sets of records (starting at position = 1, 4, 6).
subtraction is the result of the difference between the other two columns. Note that values may repeat for different groups!
One of the key properties the sequences must share is they must be generated over the rows of data in the same order, or this breaks.
So how is SQL doing this? Through the use of ROW_NUMBER() this function will generate a sequence of numbers over a "window" of records:
ROW_NUMBER() OVER(ORDER BY dateTime)
will generate the position sequence.
ROW_NUMBER() OVER(PARTITION BY heartRate ORDER BY dateTime)
will generate the positionInGroup sequence, with each heartRate being a different group.
In the case of most queries of this type, the values of the two sequences is unimportant, it's the subtraction (to get the sequence group) that matters, so we just need the result of the subtraction.
We'll also need the heartRate and the times in which they occurred to provide the answer.
The original answer asked for the start and end times of each of the "runs" of stuck heartbeats. That's a standard MIN(...)/MAX(...), which means a GROUP BY. We need to use both the original heartRate column (because that's a non-aggregate column) and our generated groupingId (which identifies the current "run" per stuck value).
Part of the question asked for only runs that repeated three or more times. The HAVING COUNT(*) > 2 is an instruction to ignore runs of length 2 or less; it counts rows per-group.
I recommend Ben-Gan's article on interval packing, which applies to your adjacency problem.
tsql-challenge-packing-date-and-time-intervals
solutions-to-packing-date-and-time-intervals-puzzle