sqlite3 unixtime interval query by multiplication and division - sql

I have been trying to create a query for my sqlite3 database that provides me with a count of all records at 10 minute intervals between a maximum and minimum time.
I found this answer on the internet, and it seems to work:
select (((`unixtime`)/600000)*600000) as timeslice,
count(*) as mycount from mytable
where
`unixtime` >= 1413902772599
and
`unixtime` <= 1413972793000
group by timeslice;
The result I get is something like this:
timeslice mycount
------------- ----------
1413930000000 9
1413930600000 1013
1413931200000 265
1413932400000 410
1413933000000 643
This seems like sort of a hackish way to go about doing this query. It also doesn't include datapoints that have a zero count, which is an edge-case that I am going to have to fix outside of the database scope (unless there is an SQL solution for this).
Is there a better way to go about this? Are there edge cases for this if I proceed to continue using this query? Will this catastrophically fail under certain scenarios that I'm not considering?

There is no better way to round to multiples of 600000; SQLite has the round() function, but you would still need to convert to/from a value that can be rounded to some decimal fraction.
If you had SQLite 3.8.3 or later, you could use a recursive common table expression to generate the intervals:
WITH RECURSIVE intervals(t) AS (
VALUES(1413902400000)
UNION ALL
SELECT t + 600000
FROM intervals
WHERE t < 1413972000000
)
SELECT intervals.t,
COUNT(*)
FROM intervals
LEFT JOIN MyTable
ON MyTable.unixtime BETWEEN intervals.t
AND intervals.t + 599999
GROUP BY 1;

Related

Trouble with Syntax format for Datediff - SQL

I have a syntax formatting issue with the query below.
I am trying to get the difference between two time columns and then subtract 20 to get whatever the difference is minus 20. I also want to take the max value of either that or 0 so anything less than 0 will be 0.
select id, sum(max(0, (date_diff('minute', time_a, time_b)) - 20)) as mins
FROM tbl
What am doing wrong in the query above that is erorring out?
Thanks!
sum(max()) is highly suspicious. Perhaps you intend:
select id, sum(greatest(0, date_diff('minute', time_a, time_b) - 20)) as mins
from tbl

REGR_SLOPE in Teradata SQL Query Returning 0 Slope

I am a relative newbie with Teradata SQL and have run into this strange (I think strange) situation. I am trying to run a regression (REGR_SLOPE) on sensor data. I am gathering sensor readings for a single day, each day is 80 observations which is confirmed by the COUNT in the outer SELECT. My query is:
SELECT
d.meter_id,
REGR_SLOPE(d.reading_measure, d.x_axis) AS slope,
COUNT(d.x_axis) AS xcount,
COUNT(d.reading_measure) AS read_count
FROM
(
SELECT
meter_id,
reading_measure,
row_number() OVER (ORDER BY Reading_Dttm) AS x_axis
FROM data_mart.v_meter_reading
WHERE Reading_Start_Dt = '2017-12-12'
AND Meter_Id IN (11932101, 11419827, 11385229, 11643466)
AND Channel_Num = 5
) d
GROUP BY 1
When I use the "IN" clause in the subquery to specify Meter_Id, I get slope values, but when I take it out (to run over all meters) all the slopes are 0 (zero). I would simply like to run a line through a day's worth of observations (80).
I'm using Teradata v15.0.
What am I missing / doing wrong?
I would bet a Pepperoni Pizza that it's the x_axis value.
Instead try ROW_NUMBER() OVER (PARTITION BY meter_id ORDER BY reading_dttm)
This will ensure that the x_axis starts again from 1 for each meter, and each reading will always be 1 away from the previous reading on the x_axis.
This makes me thing you should probably just use reading_dttm as the x_axis value, rather than fabricating one with ROW_NUMBER(). That way readings with a 5 hour gap between them have a different slope to readings with a 10 day gap between them. You may need to convert the reading_dttm's data-type, with a function like TO_UNIXTIME(reading_dttm), or something similar.
I'll message you my address for the Pizza Delivery. (Joking.)
Additional to #MatBailie's answer.
You probably know that should you order by the timestamp instead of the ROW_NUMBER, but you couldn't do it because Teradata doesn't allow timestamps in this place (strange).
There's no built-in TO_UNIXTIME function in Teradata, but you can use this instead:
REPLACE FUNCTION TimeStamp_to_UnixTime (ts TIMESTAMP(6))
RETURNS decimal(18,6)
LANGUAGE SQL
CONTAINS SQL
DETERMINISTIC
SQL SECURITY DEFINER
COLLATION INVOKER
INLINE TYPE 1
RETURN
(Cast(ts AS DATE) - DATE '1970-01-01') * 86400
+ (Extract(HOUR From ts) * 3600)
+ (Extract(MINUTE From ts) * 60)
+ (Extract(SECOND From ts));
If you're not allowed to create UDFs simply cut&paste the calculation.

SQLite query to get the closest datetime

I am trying to write an SQLite statement to get the closest datetime from an user input (from a WPF datepicker). I have a table IRquote(rateId, quoteDateAndTime, quoteValue).
For example, if the user enter 10/01/2000 and the database have only fixing stored for 08/01/2000, 07/01/2000 and 14/01/2000, it would return 08/01/2000, being the closest date from 10/01/2000.
Of course, I'd like it to work not only with dates but also with time.
I tried with this query, but it returns the row with the furthest date, and not the closest one:
SELECT quoteValue FROM IRquote
WHERE rateId = '" + pRefIndexTicker + "'
ORDER BY abs(datetime(quoteDateAndTime) - datetime('" + DateTimeSQLite(pFixingDate) + "')) ASC
LIMIT 1;
Note that I have a function DateTimeSQLite to transform user input to the right format.
I don't get why this does not work.
How could I do it? Thanks for your help
To get the closest date, you will need to use the strftime('%s', datetime) SQLite function.
With this example/demo, you will get the most closest date to your given date.
Note that the date 2015-06-25 10:00:00 is the input datetime that the user selected.
select t.ID, t.Price, t.PriceDate,
abs(strftime('%s','2015-06-25 10:00:00') - strftime('%s', t.PriceDate)) as 'ClosestDate'
from Test t
order by abs(strftime('%s','2015-06-25 10:00:00') - strftime('%s', PriceDate))
limit 1;
SQL explanation:
We use the strftime('%s') - strftime('%s') to calculate the difference, in seconds, between the two dates (Note: it has to be '%s', not '%S'). Since this can be either positive or negative, we also need to use the abs function to make it all positive to ensure that our order by and subsequent limit 1 sections work correct.
If the table is big, and there is an index on the datetime column, this will use the index to get the 2 closest rows (above and below the supplied value) and will be more efficient:
select *
from
( select *
from
( select t.ID, t.Price, t.PriceDate
from Test t
where t.PriceDate <= datetime('2015-06-23 10:00:00')
order by t.PriceDate desc
limit 1
) d
union all
select * from
( select t.ID, t.Price, t.PriceDate
from Test t
where t.PriceDate > datetime('2015-06-23 10:00:00')
order by t.PriceDate asc
limit 1
) a
) x
order by abs(julianday('2015-06-23 10:00:00') - julianday(PriceDate))
limit 1 ;
Tested in SQLfiddle.
Another useful solution is using BETWEEN operator, if you can determine upper and lower bounds for your time/date query. I encountered this solution just recently here in this link. This is what i've used for my application on a time column named t (changing code for date column and date function is not difficult):
select *
from myTable
where t BETWEEN '09:35:00' and '09:45:00'
order by ABS(strftime('%s',t) - strftime('%s','09:40:00')) asc
limit 1
Also, i must correct my comment on above post. I tried a simple examination of speed of these 3 approaches proposed by #BerndLinde, #ypercubeᵀᴹ and me . I have around 500 tables with 150 rows in each and medium hardware in my PC. The result is:
Solution 1 (using strftime) takes around 12 seconds.
Adding index of column t to solution 1 improves speed by around 30% and takes around 8 seconds. I didn't face any improvement for using index of time(t).
Solution 2 also has around 30% of speed improvement over Solution 1 and takes around 8 seconds
Finally, Solution 3 has around 50% improvement and takes around 5.5 seconds. Adding index of column t gives a little more improvement and takes around 4.8 seconds. Index of time(t) has no effect in this solution.
Note: I'm a simple programmer and this is a simple test in .NET code. A real performance test must consider more professional aspects, which i'm not aware of them. There was also some computations in my code, after querying and reading from database. Also, as #ypercubeᵀᴹ states, this result my not work for large amount of data.

Hours and Average Hours Worked per Day, by Department

I'm trying to get an estimate of how many hours people worked during a set period of time. I want to show this by department and by what area they were working in. Right now I have this:
SELECT M.MemberDepartmentID,T.TaskName,
COUNT(DATEDIFF(HOUR, TT.StartTime, TT.EndTime)) 'Hours',
AVG(DATEDIFF(HOUR, TT.StartTime, TT.EndTime)) Average
FROM Member.TaskTracking TT
LEFT OUTER JOIN Member.Task T
ON TT.TaskID=T.TaskID
JOIN dbo.tblMember M
ON TT.MemberID=M.MemberID
WHERE M.FullTime=1
AND M.EmployeeSalary=1
AND (TT.StartTime >= '2013-10-01'
AND TT.EndTime < '2013-11-01')
GROUP BY M.MemberDepartmentID,T.TaskName
ORDER BY M.MemberDepartmentID,T.TaskName
I don't know how to confirm if it's correct, but some are definitely showing averages of zero even if there were hours worked. And some averages are way higher than the hours worked. For instance, here are some of my results:
MemberDepartmentID TaskName Hours Average
---------------------------------------------------
1 Packing 25 0
1 Picking 6 0
1 PreScanning 38 7
4 Picking 2 104
Suggestions?
First, it is important to note that DATEDIFF(HOUR) returns an integer, and it does not necessarily give a good reflection of how much time has actually passed. For example, these both yield 1:
SELECT DATEDIFF(HOUR, '03:59', '04:01'); -- 2 minutes (0.033333 hours)
SELECT DATEDIFF(HOUR, '03:01', '04:59'); -- 118 minutes (1.966666 hours)
And these both yield 0:
SELECT DATEDIFF(HOUR, '03:01', '03:59'); -- 58 minutes (0.966666 hours)
SELECT DATEDIFF(HOUR, '03:01', '03:02'); -- 1 minute (0.016666 hours)
Next, if you give SQL Server integers to divide, it's going to perform integer math. Meaning it will divide, but it will discard any remainder. This yields 0:
SELECT 3/4;
Even though really it's 0.75, and if it rounded up it should be 1. (Not that either of those results are particularly meaningful). Now, extend that to average.
DECLARE #d1 TABLE(a INT);
INSERT #d1 VALUES(3),(4);
SELECT AVG(a) FROM #d1;
This yields 3, not 3.5, which you would probably expect. For the same reasons as above.
Remembering that some of your tasks may have lasted up to 59 minutes, but would still yield an hour differential of 0, you could have, say, 4 tasks, three that lasted > 1 hour, and one that lasted < 1 hour. So your average calculation would essentially be:
SELECT (1+1+1+0)/4;
Which, as above, still yields 0.
If you want a meaningful average there, you should calculate the time spent more granularly than by hours. For example, you could perform the datediff in minutes:
SELECT DATEDIFF(MINUTE, '03:01', '04:59');
This yields 118. If you want to express that in hours, you could divide by 60.0 (the decimal is important) or multiply by 1.0:
SELECT DATEDIFF(MINUTE, '03:01', '04:59')/60.0;
SELECT 1.0*DATEDIFF(MINUTE, '03:01', '04:59')/60;
These both yield 1.966666. Much more meaningful to average such a result. So perhaps change your expression to:
Average = AVG(1.0*DATEDIFF(MINUTE, TT.StartTime, TT.EndTime)/60)
About the count, not sure what you're attempting to do there, but you may want to make similar adjustments to the calculation and probably consider using SUM. If you show some sample data and the results you expect, we can help more.
Also I recommend not escaping keyword aliases using 'single quotes' - some forms of this syntax are deprecated, and it makes your alias look like a string literal. First, try not to use keywords or otherwise invalid identifiers as aliases; but if you must, escape them with [square brackets].

SQL: Difference between "BETWEEN" vs "current_date - number"

I am wondering which of the following is the best way to implement and why.
select * from table1 where request_time between '01/18/2012' and '02/17/2012'
and
select * from table1 where request_time > current_date - 30
I ran the two queries through some of my date tables in my database and using EXPLAIN ANALYZE I found these results:
explain analyze
select * from capone.dim_date where date between '01/18/2012' and '02/17/2012'
Total runtime: 22.716 ms
explain analyze
select * from capone.dim_date where date > current_date - 30
Total runtime: 65.044 ms
So it looks like the 1st option is more optimal. Of course this is biased towards my DBMS but these are still the results I got.
The table has dates ranging from 1900 to 2099 so it is rather large, and not just some dinky little table.
Between has the inclusive ranges i.e when you issue a query like id between 2 and 10 the value of 2 and 10 will also be fetched.If you want to eliminate these values use > and <.
Also when indexes are applied say on date column > and < makes a good use of index than between.