SQL count and compare two created_at columns - sql

I have got a table as follows,
ID
CreatedAt_1
CreatedAt_2
ABC
2022-06-10 20:28:37
CFR
2022-06-13 10:00:12
2022-06-10 20:28:14
PFR
2022-06-17 12:20:40
XYZ
2022-06-15 11:00:12
2022-06-10 16:45:05
DFL
2022-06-13 15:00:06
FGT
2022-06-20 10:00:20
2022-06-10 13:34:55
I already used this query to count number of rows on specific date for each column separately :
SELECT
(CAST(datetrunc(‘day’, ‘createdAt_1’ + (INTERVAL '1 day'))) AS timestamp) + (INTERVAL '-1 day')) AS ‘new user’,
count(*) AS ‘count’
FROM Table
WHERE time_interval
GROUP BY ‘new user’
And get something like :
Day
Count
2022-06-10
1
2022-06-13
2
2022-06-15
1
I would like to be able to compare both columns and get percentage on specific day count(createdAt_1) / count(createdAt_2) * 100 but i don’t see how to easily do it.

I wasn't able to verify is Metabase SQL supports the SQL standard coalesce() function, but the purpose of that function is to return the first non-null value it encounters amongst the parameters passed into it (left to right). If it is supported I suggest the query below.
SELECT
datetrunc('day', coalesce('createdAt_1', 'createdAt_2')) AS "new user"
, count(*) AS "count"
FROM TABLE
-- WHERE time_interval is true ??
GROUP BY
datetrunc('day', coalesce('createdAt_1', 'createdAt_2'))
coalesce('createdAt_1', 'createdAt_2') would return the value in createdAt_1 if that isn't NULL, but would return the value in createdAt_2 if createdAt_1 is NULL. If both columns are NULL then overall it returns NULL.
typically labels are denoted by double quotes e.g. "new user"
I don't believe you need to add or subtract intervals, or convert to timestamp. The datetrunc() function should be sufficient in itself.
I have not used the "new user" alias in the group by clause as this is my preference
It isn't clear to me what that where clause is achieving.

Related

BQ: Select latest date from multiple columns

Good day, all. I wrote a question relating to this earlier, but now I have encountered another problem.
I have to calculate the timestamp difference between the install_time and contributer_time columns. HOWEVER, I have three contributor_time columns, and I need to select the latest time from those columns first then subtract it from install time.
Sample Data
users
install_time
contributor_time_1
contributor_time_2
contributor_time_3
1
8:00
7:45
7:50
7:55
2
10:00
9:15
9:45
9:30
3
11:00
10:30
null
null
For example, in the table above I would need to select contributor_time_3 and subtract it from install_time for user 1. For user 2, I would do the same, but with contributor_time_2.
Sample Results
users
install_time
time_diff_min
1
8:00
5
2
10:00
15
3
11:00
30
The problem I am facing is that 1) the contributor_time columns are in string format and 2) some of them have 'null' string values (which means that I cannot cast it into a timestamp.)
I created a query, but I am am facing an error stating that I cannot subtract a string from timestamp. So I added safe_cast, however the time_diff_min results are only showing when I have all three contributor_time columns as a timestamp. For example, in the sample table above, only the first two rows will pull.
The query I have so far is below:
SELECT
users,
install_time,
TIMESTAMP_DIFF(install_time, greatest(contributor_time_1, contributor_time_2, contributor_time_3), MINUTE) as ctct_min
FROM
(SELECT
users,
install_time,
safe_cast(contributor_time_1 as timestamp) as contributor_time_1,
safe_cast(contributor_time_2 as timestamp) as contributor_time_2,
safe_cast(contributor_time_3 as timestamp) as contributor_time_3,
FROM
(SELECT
users,
install_time,
case when contributor_time_1 = 'null' then '0' else contributor_time_1 end as contributor_time_1,
....
FROM datasource
Any help to point me in the right direction is appreciated! Thank you in advance!
Consider below
select users, install_time,
time_diff(
parse_time('%H:%M',install_time),
greatest(
parse_time('%H:%M',contributor_time_1),
parse_time('%H:%M',contributor_time_2),
parse_time('%H:%M',contributor_time_3)
),
minute) as time_diff_min
from `project.dataset.table`
if applied to sample data in your question - output is
Above can be refactored slightly into below
create temp function latest_time(arr any type) as ((
select parse_time('%H:%M',val) time
from unnest(arr) val
order by time desc
limit 1
));
select users, install_time,
time_diff(
parse_time('%H:%M',install_time),
latest_time([contributor_time_1, contributor_time_2, contributor_time_3]),
minute) as time_diff_min
from `project.dataset.table`
less verbose and no redundant parsing - with same result - so just matter of preferences
You can use greatest():
select t.*,
time_diff(install_time, greatest(contributor_time_1, contributor_time_2, contributor_time_3), minute) as diff_min
from t;
Note: this assumes that the values are never NULL, which seems reasonable based on your sample data.

How to find the number of hours between two timestamps in Db2

I would like to subtract 2 timestamp to get the hours between them. I have use days_between function but it returns me an error of Invalid operation:function days_between has no timezone setup. Below is the sample table and timestamps that I want to subtract.
job_number timestamp 1 timestamp 2
123456789 2020-03-16 16:59:26 2020-03-17 10:58:25
134232125 2020-03-18 08:57:05 2020-03-19 01:47:26
The HOURS_BETWEEN function is the cleanest way to find the number of full hours between two timestamps in DB2
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0061478.html
The HOURS_BETWEEN function returns the number of full hours between the specified arguments.
For example
VALUES
( HOURS_BETWEEN('2020-03-17-10.58.25', '2019-03-16-16.59.26')
, HOURS_BETWEEN('2020-03-19-01.47.26', '2019-03-18-08.57.05')
)
returns
1 |2
----|----
8801|8800
Note that the value is negative if the first value is less than the second value in the function
Also note that this function does not exist in version of Db2 (for LUW) lower than 11.1
You can use TIMESTAMPDIFF() function with numeric-expression is equal to 8 to represent hour difference :
SELECT job_number, TIMESTAMPDIFF(8,CHAR(timestamp2 - timestamp1)) AS ts_diff
FROM T
Demo
TIMESTAMPDIFF function has quite a specific implementation. See Table 3. TIMESTAMPDIFF computations.
I've set earlier year (2019) for timestamp 1 values deliberately.
WITH TAB (job_number, timestamp1, timestamp2) AS
(
VALUES
(123456789, TIMESTAMP('2019-03-16-16.59.26'), TIMESTAMP('2020-03-17-10.58.25'))
, (134232125, TIMESTAMP('2019-03-18-08.57.05'), TIMESTAMP('2020-03-19-01.47.26'))
)
SELECT job_number
, TIMESTAMPDIFF(8, CHAR(TIMESTAMP2 - TIMESTAMP1)) HOURS_TSDIFF
, ((DAYS(TIMESTAMP2) - DAYS(TIMESTAMP1)) * 86400 + MIDNIGHT_SECONDS(TIMESTAMP2) - MIDNIGHT_SECONDS(TIMESTAMP1)) / 3600 HOURS_REAL
FROM TAB;
The result is:
|JOB_NUMBER |HOURS_TSDIFF|HOURS_REAL |
|-----------|------------|-----------|
|123456789 |8777 |8801 |
|134232125 |8776 |8800 |

SQL: Update a column after a specific number of days

I have a table that lists the bugs along with the info regarding to who it was assigned and who resolved it.
Bugs | Assigned to | Resolved by
--------------------------------
1 Dev1
2 Dev2
3 Dev3
If after a specific number of days (for e.g., 14 days), if the field 'Resolved by' is still blank, I want it to be updated with the value from the column 'Assigned to'.
I was trying to create a view with a time stamp but I'm not sure how to specify the exact number of days and then update the value from another column.
You can do this in a view with something like this:
create view v_bugs as
select bugs, assigned_to,
coalesce(resolved_by,
(case when createdAt <= sysdate - interval '14' day then assigned_to end)
) as assigned_to
from bugs;
This assumes, of course, that you have a column that specifies when each row was inserted.
Unfortunately, Oracle does not allow sysdate in a virtual column, so you cannot use generated always as to define the column.

Using crosstab, dynamically loading column names of resulting pivot table in one query?

The gem we have installed (Blazer) on our site limits us to one query.
We are trying to write a query to show how many hours each employee has for the past 10 days. The first column would have employee names and the rest would have hours with the column header being each date. I'm having trouble figuring out how to make the column headers dynamic based on the day. The following is an example of what we have working without dynamic column headers and only using 3 days.
SELECT
pivot_table.*
FROM
crosstab(
E'SELECT
"User",
"Date",
"Hours"
FROM
(SELECT
"q"."qdb_users"."name" AS "User",
to_char("qdb_works"."date", \'YYYY-MM-DD\') AS "Date",
sum("qdb_works"."hours") AS "Hours"
FROM
"q"."qdb_works"
LEFT OUTER JOIN
"q"."qdb_users" ON
"q"."qdb_users"."id" = "q"."qdb_works"."qdb_user_id"
WHERE
"qdb_works"."date" > current_date - 20
GROUP BY
"User",
"Date"
ORDER BY
"Date" DESC,
"User" DESC) "x"
ORDER BY 1, 2')
AS
pivot_table (
"User" VARCHAR,
"2017-10-06" FLOAT,
"2017-10-05" FLOAT,
"2017-10-04" FLOAT
);
This results in
| User | 2017-10-05 | 2017-10-04 | 2017-10-03 |
|------|------------|------------|------------|
| John | 1.5 | 3.25 | 2.25 |
| Jill | 6.25 | 6.25 | 6 |
| Bill | 2.75 | 3 | 4 |
This is correct, but tomorrow, the column headers will be off unless we update the query every day. I know we could pivot this table with date on the left and names on the top, but that will still need updating with each new employee – and we get new ones often.
We have tried using functions and queries in the "AS" section with no luck. For example:
AS
pivot_table (
"User" VARCHAR,
current_date - 0 FLOAT,
current_date - 1 FLOAT,
current_date - 2 FLOAT
);
Is there any way to pull this off with one query?
You could select a row for each user, and then per column sum the hours for one day:
with user_work as
(
select u.name as user
, to_char(w.date, 'YYYY-MM-DD') as dt_str
, w.hours
from qdb_works w
join qdb_users u
on u.id = w.qdb_user_id
where w.date >= current_date - interval '2 days'
)
select User
, sum(case when dt_str = to_char(current_date,
'YYYY-MM-DD') then hours end) as Today
, sum(case when dt_str = to_char(current_date - 'interval 1 day',
'YYYY-MM-DD') then hours end) as Yesterday
, sum(case when dt_str = to_char(current_date - 'interval 2 days',
'YYYY-MM-DD') then hours end) as DayBeforeYesterday
from user_work
group by
user
, dt_str
It's often easier to return a list and pivot it client side. That also allows you to generate column names with a date.
Is there any way to pull this off with one query?
No, because a fixed SQL query cannot have any variability in its output columns. The SQL engine determines the number, types and names of every column of a query before executing it, without reading any data except in the catalog (for the structure of tables and other objects), execution being just the last of 5 stages.
A single-query dynamic pivot, if such a thing existed, couldn't be prepared, since a prepared query always have the same results structure, whereas by definition a dynamic pivot doesn't, as the rows that pivot into columns can change between executions. That would be at odds again with the Prepare-Bind-Execute model.
You may find some limited workarounds and additional explanations in other questions, for example: Execute a dynamic crosstab query, but since you mentioned specifically:
The gem we have installed (Blazer) on our site limits us to one
query
I'm afraid you're out of luck. Whatever the workaround, it always need at best one step with a query to figure out the columns and generate a dynamic query from them, and a second step executing the query generated at the previous step.

HiveQL - Query Number of Entries over fixed unit of time

I have a table that is similar to the following:
LOGIN ID (STRING): TIME_STAMP (STRING HH:MM:SS)
BillyJoel 10:45:00
PianoMan 10:45:30
WeDidnt 10:45:45
StartTheFire 10:46:00
AlwaysBurning 10:46:30
Is there any possible way to get a query that gives me a column of the number of logins over a period of time? Something like this:
3 (number of logins from 10:45:00 - 10:45:59)
2 (number of logins from 10:46:00 - 10:46:59)
Note: If you can only do it with int timestamps, that's alright. My original table is all strings, so I thought I would represent that here. The stuff in parentheses don't need to be printed
If you want it by minute, you can just lop off the seconds:
select substr(1, 5, time_stamp) as hhmm, count(*)
from t
group by hhmm
order by hhmm;