SQL Big Query - How to write a COUNTIF statement applied to an INTERVAL column - sql

I have a trip_duration column in interval format. I want to remove all observations less than 90 seconds and count how many observations match this condition.
My current SQL query is
WITH
org_table AS (
SELECT
ended_at - started_at as trip_duration
FROM `cyclistic-328701.12_month_user_data_cyclistic.20*`
)
SELECT
COUNTIF(x < 1:30) AS false_start
FROM trip_duration AS x;
I returns Syntax error: Expected ")" but got ":" at [8:16]
I have also tried
SELECT
COUNTIF(x < "0-0 0 0:1:30") AS false_start
FROM trip_duration AS x
It returns Table name "trip_duration" missing dataset while no default dataset is set in the request.
I've read through other questions and have not been able to write a solution.
My first thought is to cast the trip_duration from INTERVAL to TIME format so COUNT IF statements can reference a TIME formatted column instead of INTERVAl.
~ Marcus

Below example shows you the way to handle intervals
with trip_duration as (
select interval 120 second as x union all
select interval 10 second union all
select interval 2 minute union all
select interval 50 second
)
select
count(*) as all_starts,
countif(x < interval 90 second) as false_starts
from trip_duration
with output

To filter the data without the durations less than 90 secs:
SELECT
* # here is whatever field(s) you want to return
FROM
`cyclistic-328701.12_month_user_data_cyclistic.20*`
WHERE
TIMESTAMP_DIFF(ended_at, started_at, SECOND) > 90
You can read about the TIMESTAMP_DIFF function here.
To count the number of occurrences:
SELECT
COUNTIF(TIMESTAMP_DIFF(ended_at, started_at,SECOND) < 90) AS false_start,
COUNTIF(TIMESTAMP_DIFF(ended_at, started_at,SECOND) >= 90) AS non_false_start
FROM
`cyclistic-328701.12_month_user_data_cyclistic.20*`

Related

Correcting sql query to exclude sessions under 2 min

Im am trying to write a code that will remove any entires that are within 2 mins of each other on the same day for each user_id.
for example here is the table:
user_id
day
time
x
1
00:55:54
x
1
00:55:55
x
1
00:56:01
x
2
16:11:43
x
2
16:12:01
x
2
16:15:02
x
2
16:30:07
x
2
16:31:08
x
2
16:40:09
x
2
16:41:02
So if within the same day there was some times that didn't last more than 2 mins i would like to exclude does 2 entires.
Note: day and time were gotten by using the day() and time() on a datetime column called timestamp
The code i have is:
WITH frames AS (
SELECT
user_id, day(timestamp), time(timestamp) AS starttime, COALESCE(
LEAD(time(timestamp)) OVER(PARTITION BY user_id, day(timestamp)),
'23:59:59'
) AS final
FROM events
)
SELECT user_id, day(timestamp), starttime, final, TIMEDIFF(final, starttime) AS duration
FROM frames
WHERE TIMEDIFF(final, starttime) < 120;
but i get this error Error Code: 1054. Unknown column 'timestamp' in 'field list'
I am guessing that you want rows that are more than 2 minutes from the previous row:
select e.*
from (select e.*,
lag(timestamp) over (partition by date(timestamp) order by timestamp) as prev_timestamp
from events e
) e
where prev_timestamp is null or
prev_timestamp < timestamp - interval '2 minute';
You do not specify the database you are using, so this uses reasonable database syntax that might need to be modified for your database.
Also note that in databases that support the function, day() typically returns the day of the month. You want a function that removes the time component, which is usually date() or cast( as date).

Selecting timeranges based on insertion date of matched result

I have a messages(id, inserted_at) table
I want to select the N most recent messages whose inserted_at column is with say, 2 minutes of the single most recent message.
Is this possible?
You could do that with a sub select in the where clause:
select *
from messages
where inserted_at >=
( select max(inserted_at) - interval '90 minute'
from messages
)
order by inserted_at desc
limit 2
... and just specify the interval of your choice, and the limit value.
Note that the two conditions (record limit N, date limit) are in competition, and you may get fewer records than N, or else get some messages excluded although they are within the date/time limit.
See SQL fiddle
If you meant that the date/time condition was to be a minimum time difference, then turn around the where condition from >= to <=:
select *
from messages
where inserted_at <=
( select max(inserted_at) - interval '90 minute'
from messages
)
order by inserted_at desc
limit 2

PostgreSQL sum of intervals

In my database I have rows like:
date , value
16:13:00, 500
16:17:00, 700
16:20:00, 0
Now I want to do "special sum" over value from 16:00:00 to 17:00:00. So until 16:13 we assume that we have 0.
So special sum would look like (I'll omit seconds):
...
0 + -- (16:12)
500 + -- (16:13)
500 + -- (16:14)
500 + -- (16:15)
500 + -- (16:16)
700 + -- (16:17)
700 + -- (16:18)
700 + -- (16:19)
0 + -- (16:20)
...
So I have in database only changes of value and when this change occurs. And I want to sum over the whole hour. Result of this should be 4100.
What is the optimal way of doing that kind of sum in sql with PostgreSQL?
Best
You could at first select only the hour of your timestamp and then group by this hour:
SELECT
sum(s.value),
s.hour
FROM
(SELECT
value,
EXTRACT(HOUR FROM time) as hour
FROM la_table) as s
GROUP BY s.hour
This way you would just get values from 15:00:00 to 15:59:59 of course.
SQLFiddle for playing: http://sqlfiddle.com/#!1/d6ad1/1
If I've understood you correctly, you are looking for simply totals per hour?
SELECT EXTRACT(hour FROM "date") hr,
SUM(value) total
FROM yourtable
GROUP BY hr
ORDER BY hr;
If I understand, you wish to select every entries with occurance within a time period. In this example any row with a time value within 10:00 and 11:00 is selected if it stars before the period or during the period or if it ends during the period of after the period.
select * from table
where table.start_time < end_of_period and table.end_time > end_of_period
select * from table
where (start_time < '2017-05-16 11:00:00') and (end_time > '2017-05-16 10:00:00')

How to write SQL query for the following case.?

I have one Change Report Table which has two columns ChangedTime,FileName
Please consider this table has over 1000 records
Here I need to query all the changes based on following factors
i) Interval (i.e-1mins )
ii) No of files
It means when we have given Interval 1 min and No Of files 10.
If the the no of changed files more than 10 in any of the 1 minute interval, we need to get all the changed files exists in that 1 minute interval
Example:
i) Consider we have 15 changes in the interval 11:52 to 11:53
ii)And consider we have 20 changes in the interval 12:58 to 12:59
Now my expected results would be 35 records.
Thanks in advance.
You need to aggregate by the interval and then do the count. Assuming that an interval starting at time 0 is ok, the following should work:
declare #interval int = 1;
declare #limit int = 10;
select sum(cnt)
from (select count(*) as cnt
from t
group by DATEDIFF(minute, 0, ChangedTime)/#interval
) t
where cnt >= #limit;
If you have another time in mind for when intervals should start, then substitute that for 0.
EDIT:
For your particular query:
select sum(ChangedTime)
from (select count(*) as ChangedTime
from [MyDB].[dbo].[Log_Table.in_PC]
group by DATEDIFF(minute, 0, ChangedTime)/#interval
) t
where ChangedTime >= #limit;
You can't have a three part alias name on a subquery. t will do.
Something like this should work:
You count the number of records using the COUNT() function.
Then you limit the selection with the WHERE clause:
SELECT COUNT(FileName)
FROM "YourTable"
WHERE ChangedTime >= "StartInteval"
AND ChangedTime <= "EndInterval";
Another method that is useful in a where clause is BETWEEN : http://msdn.microsoft.com/en-us/library/ms187922.aspx.
You didn't state which SQL DB you are using so I assume its MSSQL.
select count(*) from (select a.FileName,
b.ChangedTime startTime,
a.ChangedTime endTime,
DATEDIFF ( minute , a.ChangedTime , b.ChangedTime ) timeInterval
from yourtable a, yourtable b
where a.FileName = b.FileName
and a.ChangedTime > b.ChangedTime
and DATEDIFF ( minute , a.ChangedTime , b.ChangedTime ) = 1) temp
group by temp.FileName

Calculate closest working day in Postgres

I need to schedule some items in a postgres query based on a requested delivery date for an order. So for example, the order has a requested delivery on a Monday (20120319 for example), and the order needs to be prepared on the prior working day (20120316).
Thoughts on the most direct method? I'm open to adding a dates table. I'm thinking there's got to be a better way than a long set of case statements using:
SELECT EXTRACT(DOW FROM TIMESTAMP '2001-02-16 20:38:40');
This gets you previous business day.
SELECT
CASE (EXTRACT(ISODOW FROM current_date)::integer) % 7
WHEN 1 THEN current_date-3
WHEN 0 THEN current_date-2
ELSE current_date-1
END AS previous_business_day
To have the previous work day:
select max(s.a) as work_day
from (
select s.a::date
from generate_series('2012-01-02'::date, '2050-12-31', '1 day') s(a)
where extract(dow from s.a) between 1 and 5
except
select holiday_date
from holiday_table
) s
where s.a < '2012-03-19'
;
If you want the next work day just invert the query.
SELECT y.d AS prep_day
FROM (
SELECT generate_series(dday - 8, dday - 1, interval '1d')::date AS d
FROM (SELECT '2012-03-19'::date AS dday) x
) y
LEFT JOIN holiday h USING (d)
WHERE h.d IS NULL
AND extract(isodow from y.d) < 6
ORDER BY y.d DESC
LIMIT 1;
It should be faster to generate only as many days as necessary. I generate one week prior to the delivery. That should cover all possibilities.
isodow as extract parameter is more convenient than dow to test for workdays.
min() / max(), ORDER BY / LIMIT 1, that's a matter of taste with the few rows in my query.
To get several candidate days in descending order, not just the top pick, change the LIMIT 1.
I put the dday (delivery day) in a subquery so you only have to input it once. You can enter any date or timestamp literal. It is cast to date either way.
CREATE TABLE Holidays (Holiday, PrecedingBusinessDay) AS VALUES
('2012-12-25'::DATE, '2012-12-24'::DATE),
('2012-12-26'::DATE, '2012-12-24'::DATE);
SELECT Day, COALESCE(PrecedingBusinessDay, PrecedingMondayToFriday)
FROM
(SELECT Day, Day - CASE DATE_PART('DOW', Day)
WHEN 0 THEN 2
WHEN 1 THEN 3
ELSE 1
END AS PrecedingMondayToFriday
FROM TestDays) AS PrecedingMondaysToFridays
LEFT JOIN Holidays ON PrecedingMondayToFriday = Holiday;
You might want to rename some of the identifiers :-).