BQ: Select latest date from multiple columns - sql

Good day, all. I wrote a question relating to this earlier, but now I have encountered another problem.
I have to calculate the timestamp difference between the install_time and contributer_time columns. HOWEVER, I have three contributor_time columns, and I need to select the latest time from those columns first then subtract it from install time.
Sample Data
users
install_time
contributor_time_1
contributor_time_2
contributor_time_3
1
8:00
7:45
7:50
7:55
2
10:00
9:15
9:45
9:30
3
11:00
10:30
null
null
For example, in the table above I would need to select contributor_time_3 and subtract it from install_time for user 1. For user 2, I would do the same, but with contributor_time_2.
Sample Results
users
install_time
time_diff_min
1
8:00
5
2
10:00
15
3
11:00
30
The problem I am facing is that 1) the contributor_time columns are in string format and 2) some of them have 'null' string values (which means that I cannot cast it into a timestamp.)
I created a query, but I am am facing an error stating that I cannot subtract a string from timestamp. So I added safe_cast, however the time_diff_min results are only showing when I have all three contributor_time columns as a timestamp. For example, in the sample table above, only the first two rows will pull.
The query I have so far is below:
SELECT
users,
install_time,
TIMESTAMP_DIFF(install_time, greatest(contributor_time_1, contributor_time_2, contributor_time_3), MINUTE) as ctct_min
FROM
(SELECT
users,
install_time,
safe_cast(contributor_time_1 as timestamp) as contributor_time_1,
safe_cast(contributor_time_2 as timestamp) as contributor_time_2,
safe_cast(contributor_time_3 as timestamp) as contributor_time_3,
FROM
(SELECT
users,
install_time,
case when contributor_time_1 = 'null' then '0' else contributor_time_1 end as contributor_time_1,
....
FROM datasource
Any help to point me in the right direction is appreciated! Thank you in advance!

Consider below
select users, install_time,
time_diff(
parse_time('%H:%M',install_time),
greatest(
parse_time('%H:%M',contributor_time_1),
parse_time('%H:%M',contributor_time_2),
parse_time('%H:%M',contributor_time_3)
),
minute) as time_diff_min
from `project.dataset.table`
if applied to sample data in your question - output is
Above can be refactored slightly into below
create temp function latest_time(arr any type) as ((
select parse_time('%H:%M',val) time
from unnest(arr) val
order by time desc
limit 1
));
select users, install_time,
time_diff(
parse_time('%H:%M',install_time),
latest_time([contributor_time_1, contributor_time_2, contributor_time_3]),
minute) as time_diff_min
from `project.dataset.table`
less verbose and no redundant parsing - with same result - so just matter of preferences

You can use greatest():
select t.*,
time_diff(install_time, greatest(contributor_time_1, contributor_time_2, contributor_time_3), minute) as diff_min
from t;
Note: this assumes that the values are never NULL, which seems reasonable based on your sample data.

Related

How to make a group query to select multiple rows?

I have a DateTime column (timestamp 2022-05-22 10:10:12) with a batch of stamps per each day.
I need to filter the rows where stamp is before 9am (here is no problem) and I'm using this code:
SELECT * FROM tickets
WHERE date_part('hour'::text, tickets.date_in) < 9::double precision;
The output is the list of the rows where the time in timestamp is less than 9 am (50 rows from 2000).
date_in
2022-05-22 08:10:12
2022-04-23 07:11:13
2022-06-15 08:45:26
Then I need to find all the days where at least one row has a stamp before 9 am - and here I'm stuck. Any idea how to select all the days where at least one stamp was before 9 am?
The code I'm trying:
SELECT * into temp1 FROM tickets
WHERE date_part('hour'::text, tickets.date_in) < 9::double precision
ORDER BY date_part('day'::text, date_in);
Select * into temp2
from tickets, temp1
where date_part('day'::text, tickets.date_in) = date_part('day'::text, temp1.date_in);
Update temp2 set distorted_route = 1;
But this is giving me nothing.
Expected output is to get all the days where at least one route was done before 9am:
date_in
2022-05-22 08:10:12
2022-05-22 10:11:45
2022-05-22 12:14:59
2022-04-23 07:11:13
2022-04-23 11:42:25
2022-06-15 08:45:26
2022-06-15 15:10:57
Should I make an additional table (temp1) to feed it with the first query result (just the rows before 9am) and then make a cross table query to find in the source table public.tickets all the days which are equal to the public.temp1?
Select * from tickets, temp1
where TO_Char(tickets.date_in, 'YYYY-MM-DD')
= TO_Char(temp1.date_in, 'YYYY-MM-DD');
or like this:
SELECT *
FROM tickets
WHERE EXISTS (
SELECT date_in FROM TO_Char(tickets.date_in, 'YYYY-MM-DD') = TO_Char(temp1.date_in, 'YYYY-MM-DD')
);
Ideally, I'd want to avoid using a temporary table and make a request just for one table.
After that, I need to create a view or update and add some remarks to the source table.
Assuming you mean:
How to select all rows where at least one row exists with a timestamp before 9 am of the same day?
SELECT *
FROM tickets t
WHERE EXISTS (
SELECT FROM tickets t1
WHERE t1.date_in::date = t.date_in::date -- same day
AND t1.date_in::time < time '9:00' -- time before 9:00
AND t1.id <> t.id -- exclude self
)
ORDER BY date_id; -- optional, but typically helpful
id being the PK column of your undisclosed table.
But be aware that ...
... typically you'll want to work with timestamptz instead of timestamp. See:
Ignoring time zones altogether in Rails and PostgreSQL
https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_timestamp_.28without_time_zone.29
... this query is slow for big tables, because it cannot use a plain index on (date_id) (not "sargable"). Related:
How do you do date math that ignores the year?
There are various ways to optimize performance. The best way depends on undisclosed information for performance questions.

Calculate total working hours of employee based swipe in/ swipe out using oracel sql

I was recently given a task to calculate an employee's total office hours based on his card swipe in/swipe out. I have the following data :
id gate_1 gate_2 gate_3 gate_4
100 null null null 9:00
100 null 13:30 null null
100 null null 16:00 null
100 null null 18:00 null
Image
Here, the employee 100 comes in via gate_4 at 9 am and takes a break at 13:30 and goes out using gate_2. Then he comes back at 16:00 using gate_3 and leave office at 18:00 using gate_3. So, how to calculate the total in office timing using this data ?
Thanks in advance.
As has been pointed out your data model is denormalized to not even satisfy 1st normal form. The first step is to correct that (doing so in a query). Then there is no indication as to swipe in or swipe out, therefore it must be assumed that the first swipe time is always in and the ins/outs always alternate properly. Finally there is no indication of multiple days being covered so the assumption is just 1 period. That is a lot of assumptions.
Since an Oracle data type date contains time as well as the date and summing differences is much easier than with timestamps I convert timestamp to date in the first step of normalizing the data. Given all this we arrive at: (See Demo)
with normal (emp_id, inout_tm) as
( select emp_id, cast(gate1 as date)
from emp_gate_time
where gate1 is not null
union all
select emp_id, cast(gate2 as date)
from emp_gate_time
where gate2 is not null
union all
select emp_id, cast(gate3 as date)
from emp_gate_time
where gate3 is not null
union all
select emp_id, cast(gate4 as date)
from emp_gate_time
where gate4 is not null
)
select emp_id, round(24.0*(sum(hours)),1) hours_in_office
from ( select emp_id,(time_out - time_in) hours
from ( select emp_id, inout_tm time_in, rn
, lead(inout_tm) over(partition by emp_id order by inout_tm) time_out
from ( select n.*
, row_number() over(partition by emp_id order by inout_tm) rn
from normal n
)
)
where mod(rn,2) = 1
)
group by emp_id;
Items of Interest:
Subquery Factoring (CTE)
Date Arithmatic - in Hours ...Difference Between Dates in hours ...
Oracle Analytic Functions - Row_number, lead
You have a denormalized structure of your db scheme. You have fields as gate_1, gate_2 and etc. It's wrong way. The better way is following, you should have reference table of gates, for example like this
id|gate_name
--|---------
And your table with data for employee will be looks like this.
id_employee|id_gate|time
Then you can sort data in this table, and then count period of time between two consecutive rows.

SELECT statement optimization

I'm not so expert in SQL queryes, but not even a complete newbie.
I'm exporting data from a MS-SQL database to an excel file using a SQL query.
I'm exporting many columns and two of this columns contain a date and an hour, this are the columns I use for the WHERE clause.
In detail I have about 200 rows for each day, everyone with a different hour, for many days. I need to extract the first value after the 15:00 of each day for more days.
Since the hours are different for each day i can't specify something like
SELECT a,b,hour,day FROM table WHERE hour='15:01'
because sometimes the value is at 15:01, sometimes 15:03 and so on (i'm looking for the closest value after the 15:00), for fix this i used this workaround:
SELECT TOP 1 a,b,hour,day FROM table WHERE hour > "15:00"
in this way i can take the first value after the 15:00 for a day...the problem is that i need this for more days...for a user-specifyed interval of days. At the moment i fix this with a UNION ALL statement, like this:
SELECT TOP 1 a,b,hour,day FROM table WHERE data="first_day" AND hour > "15:00"
UNION ALL SELECT TOP 1 a,b,hour,day FROM table WHERE data="second_day" AND hour > "15:00"
UNION ALL SELECT TOP 1 a,b,hour,day FROM table WHERE data="third_day" AND hour > "15:00"
...and so on for all the days (i build the SQL string with a for each day in the specifyed interval).
Until now this worked, but now I need to expand the days interval (now is maximun a week, so 5 days) to up to 60 days. I don't want to build an huge query string, but i can't imagine an alternative way for write the SQL.
Any help appreciated
Ettore
I typical solution for this uses row_number():
SELECT a, b, hour, day
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY day ORDER BY hour) as seqnum
FROM table t
WHERE hour > '15:00'
) t
WHERE seqnum = 1;

Multiple aggregate sums from different conditions in one sql query

Whereas I believe this is a fairly general SQL question, I am working in PostgreSQL 9.4 without an option to use other database software, and thus request that any answer be compatible with its capabilities.
I need to be able to return multiple aggregate totals from one query, such that each sum is in a new row, and each of the groupings are determined by a unique span of time, e.g. WHERE time_stamp BETWEEN '2016-02-07' AND '2016-02-14'. The number of records that satisfy there WHERE clause is unknown and may be zero, in which case ideally the result is "0". This is what I have worked out so far:
(
SELECT SUM(minutes) AS min
FROM downtime
WHERE time_stamp BETWEEN '2016-02-07' AND '2016-02-14'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-02-14' AND '2016-02-21'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-02-28' AND '2016-03-06'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-03-06' AND '2016-03-13'
)
UNION ALL
(
SELECT SUM(minutes))
FROM downtime
WHERE time_stamp BETWEEN '2016-03-13' AND '2016-03-20'
)
UNION ALL
(
SELECT SUM(minutes)
FROM downtime
WHERE time_stamp BETWEEN '2016-03-20' AND '2016-03-27'
)
Result:
min
---+-----
1 | 119
2 | 4
3 | 30
4 |
5 | 62
6 | 350
That query gets me almost the exact result that I want; certainly good enough in that I can do exactly what I need with the results. Time spans with no records are blank but that was predictable, and whereas I would prefer "0" I can account for the blank rows in software.
But, while it isn't terrible for the 6 weeks that it represents, I want to be flexible and to be able to do the same thing for different time spans, and for a different number of data points, such as each day in a week, each week in 3 months, 6 months, each month in 1 year, 2 years, etc... As written above, it feels as if it is going to get tedious fast... for instance 1 week spans over a 2 year period is 104 sub-queries.
What I'm after is a more elegant way to get the same (or similar) result.
I also don't know if doing 104 iterations of a similar query to the above (vs. the 6 that it does now) is a particularly efficient usage.
Ultimately I am going to write some code which will help me build (and thus abstract away) the long, ugly query--but it would still be great to have a more concise and scale-able query.
In Postgres, you can generate a series of times and then use these for the aggregation:
select g.dte, coalesce(sum(dt.minutes), 0) as minutes
from generate_series('2016-02-07'::timestamp, '2016-03-20'::timestamp, interval '7 day') g(dte) left join
downtime dt
on dt.timestamp >= g.dte and dt.timestamp < g.dte + interval '7 day'
group by g.dte
order by g.dte;

How do I produce a time interval query in SQLite?

I have an events based table that I would like to produce a query, by minute for the number of events that were occuring.
For example, I have an event table like:
CREATE TABLE events (
session_id TEXT,
event TEXT,
time_stamp DATETIME
)
Which I have transformed into the following type of table:
CREATE TABLE sessions (
session_id TEXT,
start_ts DATETIME,
end_ts DATETIME,
duration INTEGER
);
Now I want to create a query that would group the sessions by a count of those that were active during a particular minute. Where I would essentially get back something like:
TIME_INTERVAL ACTIVE_SESSIONS
------------- ---------------
18:00 1
18:01 5
18:02 3
18:03 0
18:04 2
Ok, I think I got more what I wanted. It doesn't account for intervals that are empty, but it is good enough for what I need.
select strftime('%Y-%m-%dT%H:%M:00.000',start_ts) TIME_INTERVAL,
(select count(session_id)
from sessions s2
where strftime('%Y-%m-%dT%H:%M:00.000',s1.start_ts) between s2.start_ts and s2.end_ts) ACTIVE_SESSIONS
from sessions s1
group by strftime('%Y-%m-%dT%H:%M:00.000',start_ts);
This will generate a row per minute for the period that the data covers with a count for the number of sessions that were had started (start_ts) but hadn't finished (end_ts).
PostgreSQL allows the following query.
In contrast to your example, this returns an additional column for the day, and it omits the minutes where nothing happened (count=0).
select
day, hour, minute, count(*)
from
(values ( 0),( 1),( 2),( 3),( 4),( 5),( 6),( 7),( 8),( 9),
(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),
(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),
(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),
(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),
(50),(51),(52),(53),(54),(55),(56),(57),(58),(59))
as minutes (minute),
(values ( 0),( 1),( 2),( 3),( 4),( 5),( 6),( 7),( 8),( 9),
(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),
(20),(21),(22),(23))
as hours (hour),
(select distinct cast(start_ts as date) from sessions
union
select distinct cast(end_ts as date) from sessions)
as days (day),
sessions
where
(day,hour,minute)
between (cast(start_ts as date),extract(hour from start_ts),extract(minute from start_ts))
and (cast(end_ts as date), extract(hour from end_ts), extract(minute from end_ts))
group by
day, hour, minute
order by
day, hour, minute;
This isn't exactly your query, but I think it could help. Did you look into the SQLite R-Tree module? This would allow you to create a virtual index on the start/stop time:
CREATE VIRTUAL TABLE sessions_index USING rtree (id, start, end);
Then you could search via:
SELECT * FROM sessions_index WHERE end >= <first minute> AND start <= <last minute>;