selecting percentage of time on/off for groupings by id

selecting percentage of time on/off for groupings by id - sql

I'm looking to summarize some information into a kind of report, and the crux of it is similar to the following problem. I'm looking for the approach in any sql-like language.
consider a schema containing the following:
id - int, on - bool, time - datetime
This table is basically a log that specifies when a thing of id changes state between 'on' and 'off'.
What I want is a table with the percentage of time 'on' for each id seen. So a result might look like this
id, percent 'on'
1, 50
2, 45
3, 67
I would expect the overall time to be
now - (time first seen in the log)
Programatically, I understand how to do this. For each id, I just want to add up all of the segments of time for which the item was 'on' and express this as a percentage of the total time. I'm not quite seeing how to do this in sql however

You can use lead() and some date/time arithmetic (which varies by database).
In pseudo-code this looks like:
select id,
sum(csae when status = on then coalesce(next_datetime, current_datetime) - datetime) end) / (current_datetime - min(datetime))
from (select t.*,
lead(datetime) over (partition by id order by datetime) as next_datetime
from t
) t
group by id;
Date/time functions vary by database, so this is just to give an idea of what to do.

Related

Big Query / SQL finding "new" data in a date range

I have a pretty big event log with columns:
id, timestamp, text, user_id
The text field contains a variety of things, like:
Road: This is the road name
City: This is the city name
Type: This is a type
etc..
I would like to get the result to the following:
Given a start and end date, how many **new** users used a road (that haven't before) grouped by road.
I've got various parts of this working fine (like the total amount of users, the grouping by, date range and so on. The SQL for getting the new users is alluding me though, having tried solutions like SELECT AS STRUCT on sub queries amongst other things.
Ultimately, I'd love to see a result like:
road, total_users, new_users
Any help would be much appreciated.

If I understand correctly, you want something like this:
select road, counif(seqnum = 1) as new_users, count(distinct user_id) as num_users
from (select l.*,
row_number() over (partition by l.user_id, l.text order by l.timestamp) as seqnum
from log l
where l.type = 'Road'
) l
where timestamp >= #timestamp1 and timestamp < #timestamp2
group by road;
This assumes that you have a column that specifies the type (i.e. "road") and another column with the name of the road (i.e. "Champs-Elysees").

SQL Grabbing unque counts per category

I'm pretty new to SQL and Redshift, but there is a weird problem I'm getting.
So my data looks like below. Ignore id, date_time actual values... I just put random info, but its the same format
id date_time(var char 255)
1 2019-01-11T05:01:59
1 2019-01-11T05:01:59
2 2019-01-11T05:01:59
3 2019-01-11T05:01:59
1 2019-02-11T05:01:59
2 2019-02-11T05:01:59
I'm trying to get the number of counts of unique ID's per month.
I've tried the following command below. Given the amount of data, I just tried to do a demo of the first 10 rows of my table...
SELECT COUNT(DISTINCT id),
LEFT(date_time,7)
FROM ( SELECT top 10*
FROM myTable.ME )
GROUP BY LEFT(date_time, 7), id
I expect something like below.
count left
3 2019-01
2 2019-02
But I'm instead getting similar to what's below
I then tried the below command which seems correct.
SELECT COUNT(DISTINCT id),
LEFT(date_time,7)
FROM ( SELECT top 1000000*
FROM myTable.ME )
GROUP BY LEFT(date_time, 7)
However, if you remove the DISTINCT portion, you get the results below. It seems like it is only looking at a certain month (2019-01), rather than other months.
If anyone can tell me what is wrong with the commands I'm using or can give me the correct command, I'll be very grateful. Thank you.
EDIT: Could it possibly be because maybe my data isn't clean?

Why are you using a string for the date? That is simply wrong. There are built-in types. But assuming you have some reason or cannot change it, use string functions:
select left(date_time, 7) as yyyymm,
count(distinct id)
from t
group by yyyymm
order by yyyymm;
In your first query you have id in the group by which does not do what you want.

SQL script to find previous value, not necessarily previous row

is there a way in SQL to find a previous value, not necessarily in the previous row, within the same SELECT statement?
See picture below. I'd like to add another column, ELAPSED, that calculates the time difference between TIMERSTART, but only when DEVICEID is the same, and I_TYPE is viewDisplayed. e.g. subtract 1 from 2, store difference in 3, store 0 in 4 because i_type is not viewDisplayed, subtract 2 from 5, store difference in 6, and so on.
It has to be a statement, I can't use a stored procedure in this case.
SELECT DEVICEID, I_TYPE, TIMERSTART,
O AS ELAPSED -- CASE WHEN <CONDITION> THEN TIMEDIFF() ELSE 0 END AS ELAPSED
FROM CLIENT_USAGE
ORDER BY TIMERSTART ASC
I'm using SAP HANA DB, but it works pretty much like the latest version of MS-SQL. So, if you know how to make it work in SQL, I can make it work in HANA.

You can make a subquery to find the last time entered previous to the row in question.
select deviceid, i_type, timerstart, (timerstart - timerlast) as elapsed.
from CLIENT_USAGE CU
join ( select top 1 timerstart as timerlast
from CLIENT_USAGE C
where (C.i_type = CU.i_type) and
(C.deviceid = CU.deviceid) and (C.timerstart < CU.timerstart)
order by C.timerstart desc
) as temp1
on temp1.i_type = CU.i_type
order by timerstart asc
This is a rough sketch of what the sql should look like I do not know what your primary key is on this table if it is i_type or i_type and deviceid. But this should help with how to atleast calculate the field. I do not think it would be necessary to store the value unless this table is very large or the hardware being used is very slow. It can be calculated rather easily each time this query is run.

SAP HANA supports window functions:
select DEVICEID,
TIMERSTART,
lag(TIMERSTART) over (partition by DEVICEID order by TIMERSTART) as previous_start
from CLIENT_USAGE
Then you can wrap this in parentheses and manipulate the data to your hearts' content

REGR_SLOPE in Teradata SQL Query Returning 0 Slope

I am a relative newbie with Teradata SQL and have run into this strange (I think strange) situation. I am trying to run a regression (REGR_SLOPE) on sensor data. I am gathering sensor readings for a single day, each day is 80 observations which is confirmed by the COUNT in the outer SELECT. My query is:
SELECT
d.meter_id,
REGR_SLOPE(d.reading_measure, d.x_axis) AS slope,
COUNT(d.x_axis) AS xcount,
COUNT(d.reading_measure) AS read_count
FROM
(
SELECT
meter_id,
reading_measure,
row_number() OVER (ORDER BY Reading_Dttm) AS x_axis
FROM data_mart.v_meter_reading
WHERE Reading_Start_Dt = '2017-12-12'
AND Meter_Id IN (11932101, 11419827, 11385229, 11643466)
AND Channel_Num = 5
) d
GROUP BY 1
When I use the "IN" clause in the subquery to specify Meter_Id, I get slope values, but when I take it out (to run over all meters) all the slopes are 0 (zero). I would simply like to run a line through a day's worth of observations (80).
I'm using Teradata v15.0.
What am I missing / doing wrong?

I would bet a Pepperoni Pizza that it's the x_axis value.
Instead try ROW_NUMBER() OVER (PARTITION BY meter_id ORDER BY reading_dttm)
This will ensure that the x_axis starts again from 1 for each meter, and each reading will always be 1 away from the previous reading on the x_axis.
This makes me thing you should probably just use reading_dttm as the x_axis value, rather than fabricating one with ROW_NUMBER(). That way readings with a 5 hour gap between them have a different slope to readings with a 10 day gap between them. You may need to convert the reading_dttm's data-type, with a function like TO_UNIXTIME(reading_dttm), or something similar.
I'll message you my address for the Pizza Delivery. (Joking.)

Additional to #MatBailie's answer.
You probably know that should you order by the timestamp instead of the ROW_NUMBER, but you couldn't do it because Teradata doesn't allow timestamps in this place (strange).
There's no built-in TO_UNIXTIME function in Teradata, but you can use this instead:
REPLACE FUNCTION TimeStamp_to_UnixTime (ts TIMESTAMP(6))
RETURNS decimal(18,6)
LANGUAGE SQL
CONTAINS SQL
DETERMINISTIC
SQL SECURITY DEFINER
COLLATION INVOKER
INLINE TYPE 1
RETURN
(Cast(ts AS DATE) - DATE '1970-01-01') * 86400
+ (Extract(HOUR From ts) * 3600)
+ (Extract(MINUTE From ts) * 60)
+ (Extract(SECOND From ts));
If you're not allowed to create UDFs simply cut&paste the calculation.

SQL, From TimeStamp to relative time

For this SQL,
SELECT CID, Time, Val
FROM MyTable
WHERE CID = 8
I get the following data,
CID, Time, Val
8,2016-10-19 13:49:06.217,7.036
8,2016-10-19 13:49:15.237,6.547
8,2016-10-19 13:49:46.063,6.292
8,2016-10-19 13:49:57.387,5.998
I want each of Time value minus the starting time, which can be calculated by
SELECT MIN(Time) StartTime
FROM MyTable
WHERE CID = 8
I know that I can define a T-SQL variable to do that, However, is it possible to do the task, getting the relative time instead of absolute time for each record, in one SQL?

You can use min() window function to get the minimum time for each id and use it for subtraction.
select cid,time,val,
datediff(millisecond,min(time) over(partition by cid),time) as diff
from mytable
Change the difference interval (millisecond shown) per your requirement. There can be an overflow if the difference is too big.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

selecting percentage of time on/off for groupings by id - sql

Related

Big Query / SQL finding "new" data in a date range

SQL Grabbing unque counts per category

SQL script to find previous value, not necessarily previous row

REGR_SLOPE in Teradata SQL Query Returning 0 Slope

SQL, From TimeStamp to relative time

Categories

Resources