I want to make a table (Table A) in Hive that has three columns. This table has times starting from 5AM and ending at 2AM the next day. Each row is a 5 minute increment from the previous row.
The first two columns are this (and I don't know how to generate this).
start_time | end_time
5:00:00 | 5:05:00
5:05:01 | 5:10:00
23:55:01 | 00:00:00
1:55:01 | 02:00:00
Does anyone know how to do the above?
To shed some background:
Once I have Table A created, I want to use use another table (Table B) that I have with epoch times for each record that represents a visit of a customer, extract the necessary hour/minute/second information, and then provide a sum count of visitors for each time interval in a third column of Table A, say, "customer_count".
I think I know to do the calculation for "customer_count" column for Table A, however, what I need help with is making the first two columns in Table A.
You could do it the other way around:
Crop from table B the dates you are interested in
Group by 5 minute increments (calculated by (time-start_time) / 60 / 5 assuming the epoch is in seconds)
Then turn the increments back into dates and calculate the second end_time column
Something like this:
select from_unixtime(<start time> + period*60*5),
from_unixtime(<start time> + (period+1)*60*5),
count from
(select (time-<start time>)/(60*5) as period,count(*) as count from tableB
where time >= <start time> and time <= <end time>
group by (time-<start time>)/(60*5) ) inner
Note that you won't receive times with zero count (no visits during a period)
I have a DateTime column (timestamp 2022-05-22 10:10:12) with a batch of stamps per each day.
I need to filter the rows where stamp is before 9am (here is no problem) and I'm using this code:
SELECT * FROM tickets
WHERE date_part('hour'::text, tickets.date_in) < 9::double precision;
The output is the list of the rows where the time in timestamp is less than 9 am (50 rows from 2000).
2022-05-22 08:10:12
2022-04-23 07:11:13
2022-06-15 08:45:26
Then I need to find all the days where at least one row has a stamp before 9 am - and here I'm stuck. Any idea how to select all the days where at least one stamp was before 9 am?
The code I'm trying:
SELECT * into temp1 FROM tickets
WHERE date_part('hour'::text, tickets.date_in) < 9::double precision
ORDER BY date_part('day'::text, date_in);
Select * into temp2
from tickets, temp1
where date_part('day'::text, tickets.date_in) = date_part('day'::text, temp1.date_in);
Update temp2 set distorted_route = 1;
But this is giving me nothing.
Expected output is to get all the days where at least one route was done before 9am:
2022-05-22 08:10:12
2022-05-22 10:11:45
2022-05-22 12:14:59
2022-04-23 07:11:13
2022-04-23 11:42:25
2022-06-15 08:45:26
2022-06-15 15:10:57
Should I make an additional table (temp1) to feed it with the first query result (just the rows before 9am) and then make a cross table query to find in the source table public.tickets all the days which are equal to the public.temp1?
Select * from tickets, temp1
where TO_Char(tickets.date_in, 'YYYY-MM-DD')
= TO_Char(temp1.date_in, 'YYYY-MM-DD');
or like this:
FROM tickets
SELECT date_in FROM TO_Char(tickets.date_in, 'YYYY-MM-DD') = TO_Char(temp1.date_in, 'YYYY-MM-DD')
Ideally, I'd want to avoid using a temporary table and make a request just for one table.
After that, I need to create a view or update and add some remarks to the source table.
Assuming you mean:
How to select all rows where at least one row exists with a timestamp before 9 am of the same day?
FROM tickets t
SELECT FROM tickets t1
WHERE t1.date_in::date = t.date_in::date -- same day
AND t1.date_in::time < time '9:00' -- time before 9:00
AND t1.id <> t.id -- exclude self
ORDER BY date_id; -- optional, but typically helpful
id being the PK column of your undisclosed table.
But be aware that ...
... typically you'll want to work with timestamptz instead of timestamp. See:
Ignoring time zones altogether in Rails and PostgreSQL
... this query is slow for big tables, because it cannot use a plain index on (date_id) (not "sargable"). Related:
How do you do date math that ignores the year?
There are various ways to optimize performance. The best way depends on undisclosed information for performance questions.
I'm trying to knock some dust off my good old SQL queries, but I'm afraid I need a push in the right direction into taking those dusty skills and transform them into something useful when it comes to BigQuery statements.
I'm currently working with a single table schema looking like this:
In the query I would like to be able to supply the following in my where clause:
The date of which I would like the results to stem from.
A time range - in the above result example this range would be from 20:00 to 21:00. If 1. and 2. in this list should be merged together that's also fine.
The eventId I would like to find records for.
Optionally to be able to determine the interval frequency - should it be divided into each ie. 5, 10 or 15 minute intervals.
Also I would like to count the unique userIds for each interval. If one user is present during the entire session he/she should be taken into the count in every interval.
So think of it as the following:
How many unique users did we have every 5 minutes at X event, between 20:00 and 21:00 at Y day?
How should my query look if I want a result looking (something) like the following pseudo result:
time_interval number_of_unique_userIds
1 2022-03-16 20:00:00 10
2 2022-03-16 20:05:00 12
3 2022-03-16 20:10:00 15
4 2022-03-16 20:15:00 20
5 2022-03-16 20:20:00 30
6 ... etc.
If the time of the query is before the provided end time in the timespan, it should fill out the rest of the interval rows with 0 unique userIds.
In the following result we've executed mentioned query earlier than the provided end date - let's say that it's executed at 20:49:
time_interval number_of_unique_userIds
X 2022-03-16 20:50:00 0
X 2022-03-16 20:55:00 0
X 2022-03-16 21:00:00 0
Here's what I have so far, but it gives me several of the same interval records with what looks like each userId:
TIMESTAMP_SECONDS(5*60 * DIV(UNIX_SECONDS(creationTime), 5*60)) time_interval,
COUNT(DISTINCT(userId)) number_of_unique_userIds
FROM `bigquery.table`
WHERE eventId = 'xyz'
AND creationTime > '2022-03-16 20:00:00' AND creationTime < '2022-03-16 21:00:00'
GROUP BY time_interval
ORDER BY time_interval DESC
This gives me somewhat what I expect - but I think the number_of_unique_userIds seems too low, so I'm a little worried that I'm not getting unique userIds for each interval. What I'm thinking is, that userIds that were counted into the first 5 minute interval is not counted in the next. So I'm not sure this query is sufficient for my needs. Also it's not filling the blanks with 0 number_of_unique_userIds.
I hope you can help me out here.
I have an SQL Table with following structure
Want the T-SQL way to find out What is the time that the user has logged into the system i.e. Time inbetween Login Time and Logoff Time for each session in a given day.
Day (Date)|UserTime(In Hours) (Logoff Time - LogIn Time)
--------- | -------
Jun 12 | 2
Jun 12 | 3
Jun 13 | 5
I tried using two temporary tables and Row Numbers but could not get it since the comparison was a time i.e. finding out the next Logout event with timestamp is greater than the current row's Login Event.
You need to group the records. I would suggest counting logins or logoffs. Here is one approach to get the time for each "session":
select min(case when auditevent = 'login' then timestamp end) as login_time,
max(timestamp) as logoff_time
from (select t.*,
sum(case when auditevent = 'logoff' then 1 else 0 end) over (order by timestamp desc) as grp
from t
) t
group by grp;
You then have to do whatever you want to get the numbers per day. It is unclear what those counts are.
The subquery does a reverse count. It counts the number of "logoff" records that come on or after each record. For records in the same "session", this count is the same, and suitable for grouping.
I have a table that looks like this:
**ActivityNumber -- TimeStamp -- PreviousActivityNumber -- Team**
1234-4 -- 01/01/2017 14:12 -- 1234-3 -- Team A
There are 400,000 rows.
The ActivityNumber is a unique ticket number with the activity count attached. There are 4 teams.
Each activitynumber is in the table.
I need to calculate the average time taken between updates for each team, for each month (to see how each team is improving over time).
I produced a query which counts the number of activities per team per month - so I'm part way there.
I'm unable to find the timestamp for the previousActivityNumber so I can subtract it from the current Activity number. If I could get this, I could run an average on it.
select a1.Team,
a2.Timestamp as PrevTime,
datediff('n',a1.Timestamp, a2.timestamp) as WorkMinutes
from MyTable a1
left join MyTable a2
on ((a1.Team = a2.Team)
and (a1.PreviousActivityNumber = a2.ActivityNumber )
So this is somewhat of a common question on here but I haven't found an answer that really suits my specific needs. I have 2 tables. One has a list of ProjectClosedDates. The other table is a calendar table that goes through like 2025 which has columns for if the row date is a weekend day and also another column for is the date a holiday.
My end goal is to find out based on the ProjectClosedDate, what date is 5 business days post that date. My idea was that I was going to use the Calendar table and join it to itself so I could then insert a column into the calendar table that was 5 Business days away from the row-date. Then I was going to join the Project table to that table based on ProjectClosedDate = RowDate.
If I was just going to check the actual business-date table for one record, I could use this:
SELECT actual_date from
SELECT actual_date, ROW_NUMBER() OVER(ORDER BY actual_date) AS Row
FROM DateTable
WHERE is_holiday= 0 and actual_date > '2013-12-01'
ORDER BY actual_date
) X
WHERE row = 65
from here:
sql working days holidays
However, this is just one date and I need a column of dates based off of each row. Any thoughts of what the best way to do this would be? I'm using SQL-Server Management Studio.
Completely untested and not thought through:
If the concept of "business days" is common and important in your system, you could add a column "Business Day Sequence" to your table. The column would be a simple unique sequence, incremented by one for every business day and null for every day not counting as a business day.
The data would look something like this:
========== ========
2014-03-03 1
2014-03-04 2
2014-03-05 3
2014-03-06 4
2014-03-07 5
2014-03-10 6
Now it's a simple task to find the N:th business day from any date.
You simply do a self join with the calendar table, adding the offset in the join condition.
select a.actual_date
,b.actual_date as nth_bussines_day
from DateTable a
join DateTable b on(
b.bday_seq = a.bday_seq + 5