Exasol Extrackt from Timestamp Hours AND Minutes - sql

I have a Exasol database with Login values of datatype TIMESTAMP like:
2015-10-01 13:00:34.0
2015-11-02 13:10:10.0
2015-10-06 13:20:03.0
2016-02-01 14:15:34.0
2016-04-03 14:08:10.0
2016-07-01 11:05:07.0
2016-09-03 10:08:12.0
2016-11-15 09:03:30.0
and many many more. I want to do a SQL (SQLite) query where I get like
Logins from 09:00:00 to 09:15:00 and logins from 09:15:00 to 09:30:00 and so on in separate tables (no matter what date it is). I already had success with selecting on 1 hour interval with:
...EXTRACT(HOUR FROM entryTime ) BETWEEN 8 and 8
that way i get entries of my database (no matter what date it is) within 1 hour, but i need smaller intervals, like every 09:00:00 - 09:15:00 minutes. Any ideas how to solve this in Exasol (SQLite)?

You can simply convert the time part of your timestamp to a string and do a between, something like:
WHERE to_char(entryTime, 'hhmi') BETWEEN '0900' AND '0915'
If you want to use extract and numeric values, I suggest this:
WHERE (EXTRACT(HOUR FROM entryTime) * 100) + EXTRACT(MINUTE FROM entryTime)
BETWEEN 900 and 915
I'm not in front of my computer now, but this (or something pretty similar) should work.
But I suspect that in both cases EXASOL will create an expression index for the first part of the WHERE clause. Because, I guess, you use EXASOL because you have a huge amount of data and you want fast performance, my suggestion is to have an additional column in your table where you store the time part of entryTime as a numeric value, that will create a proper index give you better performance.

I found a workaround. The solution is to interleave the SQL states. So in first step, you select the hours, and around that SELECT state, you wrap another, where you specify the minutes.
SELECT * FROM
(SELECT * FROM MY_SCHEMA.EXA_LOCAL WHERE EXTRACT(HOUR FROM TIMESTMP) BETWEEN 9 and 9)
where EXTRACT(MINUTE FROM TIMESTMP) BETWEEN 0 and 15;

Related

Select unique IDs and divide result into X minute intervals based on given timespan

I'm trying to knock some dust off my good old SQL queries, but I'm afraid I need a push in the right direction into taking those dusty skills and transform them into something useful when it comes to BigQuery statements.
I'm currently working with a single table schema looking like this:
In the query I would like to be able to supply the following in my where clause:
The date of which I would like the results to stem from.
A time range - in the above result example this range would be from 20:00 to 21:00. If 1. and 2. in this list should be merged together that's also fine.
The eventId I would like to find records for.
Optionally to be able to determine the interval frequency - should it be divided into each ie. 5, 10 or 15 minute intervals.
Also I would like to count the unique userIds for each interval. If one user is present during the entire session he/she should be taken into the count in every interval.
So think of it as the following:
How many unique users did we have every 5 minutes at X event, between 20:00 and 21:00 at Y day?
How should my query look if I want a result looking (something) like the following pseudo result:
time_interval number_of_unique_userIds
1 2022-03-16 20:00:00 10
2 2022-03-16 20:05:00 12
3 2022-03-16 20:10:00 15
4 2022-03-16 20:15:00 20
5 2022-03-16 20:20:00 30
6 ... etc.
If the time of the query is before the provided end time in the timespan, it should fill out the rest of the interval rows with 0 unique userIds.
In the following result we've executed mentioned query earlier than the provided end date - let's say that it's executed at 20:49:
time_interval number_of_unique_userIds
X 2022-03-16 20:50:00 0
X 2022-03-16 20:55:00 0
X 2022-03-16 21:00:00 0
Here's what I have so far, but it gives me several of the same interval records with what looks like each userId:
SELECT
TIMESTAMP_SECONDS(5*60 * DIV(UNIX_SECONDS(creationTime), 5*60)) time_interval,
COUNT(DISTINCT(userId)) number_of_unique_userIds
FROM `bigquery.table`
WHERE eventId = 'xyz'
AND creationTime > '2022-03-16 20:00:00' AND creationTime < '2022-03-16 21:00:00'
GROUP BY time_interval
ORDER BY time_interval DESC
This gives me somewhat what I expect - but I think the number_of_unique_userIds seems too low, so I'm a little worried that I'm not getting unique userIds for each interval. What I'm thinking is, that userIds that were counted into the first 5 minute interval is not counted in the next. So I'm not sure this query is sufficient for my needs. Also it's not filling the blanks with 0 number_of_unique_userIds.
I hope you can help me out here.
Thanks!

BQ: Select latest date from multiple columns

Good day, all. I wrote a question relating to this earlier, but now I have encountered another problem.
I have to calculate the timestamp difference between the install_time and contributer_time columns. HOWEVER, I have three contributor_time columns, and I need to select the latest time from those columns first then subtract it from install time.
Sample Data
users
install_time
contributor_time_1
contributor_time_2
contributor_time_3
1
8:00
7:45
7:50
7:55
2
10:00
9:15
9:45
9:30
3
11:00
10:30
null
null
For example, in the table above I would need to select contributor_time_3 and subtract it from install_time for user 1. For user 2, I would do the same, but with contributor_time_2.
Sample Results
users
install_time
time_diff_min
1
8:00
5
2
10:00
15
3
11:00
30
The problem I am facing is that 1) the contributor_time columns are in string format and 2) some of them have 'null' string values (which means that I cannot cast it into a timestamp.)
I created a query, but I am am facing an error stating that I cannot subtract a string from timestamp. So I added safe_cast, however the time_diff_min results are only showing when I have all three contributor_time columns as a timestamp. For example, in the sample table above, only the first two rows will pull.
The query I have so far is below:
SELECT
users,
install_time,
TIMESTAMP_DIFF(install_time, greatest(contributor_time_1, contributor_time_2, contributor_time_3), MINUTE) as ctct_min
FROM
(SELECT
users,
install_time,
safe_cast(contributor_time_1 as timestamp) as contributor_time_1,
safe_cast(contributor_time_2 as timestamp) as contributor_time_2,
safe_cast(contributor_time_3 as timestamp) as contributor_time_3,
FROM
(SELECT
users,
install_time,
case when contributor_time_1 = 'null' then '0' else contributor_time_1 end as contributor_time_1,
....
FROM datasource
Any help to point me in the right direction is appreciated! Thank you in advance!
Consider below
select users, install_time,
time_diff(
parse_time('%H:%M',install_time),
greatest(
parse_time('%H:%M',contributor_time_1),
parse_time('%H:%M',contributor_time_2),
parse_time('%H:%M',contributor_time_3)
),
minute) as time_diff_min
from `project.dataset.table`
if applied to sample data in your question - output is
Above can be refactored slightly into below
create temp function latest_time(arr any type) as ((
select parse_time('%H:%M',val) time
from unnest(arr) val
order by time desc
limit 1
));
select users, install_time,
time_diff(
parse_time('%H:%M',install_time),
latest_time([contributor_time_1, contributor_time_2, contributor_time_3]),
minute) as time_diff_min
from `project.dataset.table`
less verbose and no redundant parsing - with same result - so just matter of preferences
You can use greatest():
select t.*,
time_diff(install_time, greatest(contributor_time_1, contributor_time_2, contributor_time_3), minute) as diff_min
from t;
Note: this assumes that the values are never NULL, which seems reasonable based on your sample data.

HiveQL - Query Number of Entries over fixed unit of time

I have a table that is similar to the following:
LOGIN ID (STRING): TIME_STAMP (STRING HH:MM:SS)
BillyJoel 10:45:00
PianoMan 10:45:30
WeDidnt 10:45:45
StartTheFire 10:46:00
AlwaysBurning 10:46:30
Is there any possible way to get a query that gives me a column of the number of logins over a period of time? Something like this:
3 (number of logins from 10:45:00 - 10:45:59)
2 (number of logins from 10:46:00 - 10:46:59)
Note: If you can only do it with int timestamps, that's alright. My original table is all strings, so I thought I would represent that here. The stuff in parentheses don't need to be printed
If you want it by minute, you can just lop off the seconds:
select substr(1, 5, time_stamp) as hhmm, count(*)
from t
group by hhmm
order by hhmm;

Group By n Minutes or Hours

I need to query a table and group records by a user-defined time period that could be any integer of minutes or hours. One assumption we'll make is that any chosen time period is started at 12:00AM (if that makes any sense). In other words, if the user chooses to group records by 15 minutes, we will not allow them to say, begin grouping every 15 minutes starting at 12:07AM. We'll automatically assume/use 12:00AM as the starting point for grouping. Same for any other time period.
Do I need to create my own function for this? I'm not overly concerned about performance as I will be using other methods/limitations to try to keep performance issues at bay.
My table looks like this:
timeentry
--entryid (autonumber)
--begindatetime (datetime)
--enddatetime (datetime)
If I use a function I don't think this matters but I do plan to base my groupings on begindatetime and ignore enddatetime.
I'm using MS Access but I'd like my solution to be compatible with SQL Server and MySQL if possible. However, my primary focus for the moment is just MS Access.
Seems to me the Partition() function could be useful here.
Your code would create a SELECT statement based on user's choices for date (I assumed you want to limit the query to begindatetime values for a single date), time units, and group interval.
This one would be for Jun 14, 2011 as date, minutes as time units, and 15 minutes as the interval.
SELECT
Partition(elapsed,0,1440,15) AS time_block,
q.id,
q.begindatetime
FROM
[SELECT
t.id,
t.begindatetime,
TimeValue(t.begindatetime) * 1440 AS elapsed
FROM tblHK1 AS t
WHERE
t.begindatetime>=#2011-06-14#
And t.begindatetime<#2011-06-15#
]. AS q
ORDER BY q.begindatetime;
Not sure how much you'll like this, though. Here's some sample output:
time_block id begindatetime
60: 74 1 6/14/2011 1:06:05 AM
555: 569 3 6/14/2011 9:15:00 AM
1395:1409 4 6/14/2011 11:15:00 PM
The time_block column isn't very user friendly.
I am not quite sure what you want, but here is one idea:
SELECT DateDiff("n",CDate("00:00"),[BeginDateTime])\15 AS No15s,
(DateDiff("n",CDate("00:00"),[BeginDateTime])\15)*15 AS NoMins,
Count(Table1.BeginDateTime) AS [Count]
FROM Table1
GROUP BY DateDiff("n",CDate("00:00"),[BeginDateTime])\15,
(DateDiff("n",CDate("00:00"),[BeginDateTime])\15)*15;

Restrict SQL results by time, MySQL

I have a table containing events which happen in my application like people logging in and people changing settings.
They have a date/time against the event in the following format:
2010-01-29 10:27:29
Is it possible to use SQL to select the events that have only happened in the last 5 mins?
So if the date/time was 2010-01-29 10:27:29 it would only select the events that happened between 10:27:29 and 10:22:29?
Cheers
Eef
SELECT foo FROM bar WHERE event_time > DATE_SUB(NOW(), INTERVAL 5 MINUTES)
(Not sure if it's minutes or minute)
WHERE my_timestamp < DATE_SUB(now(), INTERVAL 5 MINUTE)
You should provide table and column names to make it easy for us to answer your question.
You can write SQL as
SELECT *
FROM Table
WHERE DateTimeColumnName <= '2010/01/29 10:27:29'
AND DateTimeColumnName >= '2010/01/29 10:22:29'
or you can use BETWEEN
SELECT *
FROM Table
WHERE DateTimeColumnName BETWEEN '2010/01/29 10:22:29' AND '2010/01/29 10:27:29'
Now see if there are datetime functions in MySQL to do a Date Math so just pass a single date stamp, and do the math to subtract 5 min from it and use it as the second parameter in the between clause.