snowflake: counting no.of rows present in an hour as single row - sql

I have a user record for every login he does. I need to count how many times user has logged in. But I also need to consider that even though how many times a user logged in half an hour, i need to count as 1 time.
USER_ID TIMESTAMP
A1 2021-03-10 10:00:00
A1 2021-03-10 10:01:00
A1 2021-03-10 10:05:00
A1 2021-03-10 10:15:00
A1 2021-03-10 10:32:00
A1 2021-03-10 11:02:00
A1 2021-03-11 12:00:00
A2 2021-03-10 10:01:00
USER_ID TIMESTAMP
A1 4
A2 1
I am not able to figure out how to use lag and lead with the situation. Any help would be appreciatable.

SELECT user_id, count(distinct(date_trunc('hour',timestamp)::text||iff(minute(timestamp)>30,'_1','_0'))) as count
FROM table
GROUP BY 1 ORDER BY 1;
so this works by truncating to the hour and turning it into a string then add a suffix per half hour. Not the cleanest, but it should work.
Ah this question asked how to get time in 30 minutes truncations.
Of which the time_slice was a nice answer:
SELECT user_id, count(distinct(time_slice(timestamp, 30, 'MINUTE'))) as count
FROM table
GROUP BY user_id, ORDER BY user_id;

Related

Calculating difference (or deltas) between current and previous row with clickhouse

It would be awesome if there was a way to index rows during a query.
Is there a way to SELECT (compute) the difference of a single column between consecutive rows?
Let's say, something like the following query
SELECT
toStartOfDay(stamp) AS day,
count(day ) AS events ,
day[current] - day[previous] AS difference, -- how do I calculate this
day[current] / day[previous] as percent, -- and this
FROM records
GROUP BY day
ORDER BY day
I want to get the integer and percentage difference between the current row's 'events' column and the previous one for something similar to this:
day
events
difference
percent
2022-01-06 00:00:00
197
NULL
NULL
2022-01-07 00:00:00
656
459
3.32
2022-01-08 00:00:00
15
-641
0.02
2022-01-09 00:00:00
7
-8
0.46
2022-01-10 00:00:00
137
130
19.5
My version of Clickhouse doesn't support window-function but, on looking about the LAG() function mentioned in the comments, I found neighbor(), which works perfectly for what I'm trying to do
SELECT
toStartOfDay(stamp) AS day,
count(day ) AS events ,
(events - neighbor(events, -1)) as diff,
(events / neighbor(events, -1)) as perc
FROM records
GROUP BY day
ORDER BY day

How to bring future days to past date and then revert to same old days using postgresql?

I have a db with 6 tables. Each table has a list of date and datetime columns as shown below
Table 1 Table 2 .... Table 6
Date_of_birth Exam_date exam_datetime Result_date Result_datetime
2190-01-13 2192-01-13 2192-01-13 09:00:00 2194-04-13 2194-04-13 07:12:00
2184-05-21 2186-05-21 2186-05-21 07:00:00 2188-02-03 2188-02-03 09:32:00
2181-06-17 2183-06-17 2183-06-17 05:00:00 2185-07-23 2185-07-23 12:40:00
What I would like to do is shift all these future days back to the past date (definitely has to be less than the current date) but retain the same chronological order. Meaning, we can see that the person was born first, then he took the exam, and finally, he got his results.
In addition, I should be able to revert the changes and get back the future dates again.
I expect my output to be something like below
Stage 1 - shift back to old days (it can be any day but it has to be in the past and retain chronological order)
Table 1 Table 2 .... Table 6
Date_of_birth Exam_date exam_datetime Result_date Result_datetime
1990-01-13 1992-01-13 1992-01-13 09:00:00 1994-04-13 1994-04-13 07:12:00
1984-05-21 1986-05-21 1986-05-21 07:00:00 1988-02-03 1988-02-03 09:32:00
1981-06-17 1983-06-17 1983-06-17 05:00:00 1985-07-23 1985-07-23 12:40:00
Stage 2 - Shift forward to future days as how it was earlier
Table 1 Table 2 .... Table 6
Date_of_birth Exam_date exam_datetime Result_date Result_datetime
2190-01-13 2192-01-13 2192-01-13 09:00:00 2194-04-13 2194-04-13 07:12:00
2184-05-21 2186-05-21 2186-05-21 07:00:00 2188-02-03 2188-02-03 09:32:00
2181-06-17 2183-06-17 2183-06-17 05:00:00 2185-07-23 2185-07-23 12:40:00
Subtract two centuries:
update table1
set date_of_birth = date_of_birth - interval '200 year';
You can do something similar for all the other dates.

Is there a way to overlap a time series data table onto another using an sql query?

hoping I can find an answer to my conundrum here.
I have two tables in a Postgres DB and I'd like to overlap on onto the other.
Table A has 3 columns: "start time", "end time" and "state". In each row, the start time is equal to the end time of the preceding row.
Table B has has the same columns but the start & end times aren't adjacent and they do not overlap.
I want to overlap Table B onto Table A to create Table C.
Table C also has the start time of each row equal to the end time of the preceding row.
Here is an example to clarify:
Table A - Overlapee
State
Start Time
End Time
1
12:00:00 AM
12:10:00 AM
2
12:10:00 AM
12:20:00 AM
1
12:20:00 AM
12:30:00 AM
2
12:30:00 AM
12:40:00 AM
1
12:40:00 AM
12:50:00 AM
2
12:50:00 AM
1:00:00 AM
Table B - Overlaper
State
Start Time
End Time
5
12:05:00 AM
12:25:00 AM
6
12:31:00 AM
12:35:00 AM
5
12:40:00 AM
12:50:00 AM
Table C - Result of overlap
State
Start Time
End Time
1
12:00:00 AM
12:05:00 AM
5
12:05:00 AM
12:25:00 AM
1
12:25:00 AM
12:30:00 AM
2
12:30:00 AM
12:31:00 AM
6
12:31:00 AM
12:35:00 AM
2
12:35:00 AM
12:40:00 AM
5
12:40:00 AM
12:50:00 AM
2
12:50:00 AM
1:00:00 AM
As you can see, if the start time of a row in table B falls within a rows time range in table A , then the end time of the row in table A is replaced with the start time of the row in table B. Conversely, if the end time of a row in table B falls within a time range in table A, then the start time of the row in table A is replaced with the end time of the row in table B.
Secondly, If the start time and end time in Table A and B are exactly the same, then the row in B simply replaces that in A.
Thirdly, If the time range for a row in Table B falls within a time range for a row in Table A, then the Table A row is split into 2, with the end time of the first row being the start time of the Table B row and then start time of the second row would be the end time of the Table B row.
Lastly, the state must be preserved for the time range correctly.
Is it possible to accomplish this with a postgres sql query?. I haven't found any similar questions or answers.
Note: There is also an ID column in each table that I left off here for simplicity but can be used if it will help achieve the overlap.
One approach is to extract all the start times and end times, then pull the correct state at that start time, and use lead() to get the end time:
with t as (
select starttime as tm
from a
union -- on purpose to remove duplicates
select endtime
from a
union
select starttime as tm
from b
union -- on purpose to remove duplicates
select endtime
from b
)
select t.tm, lead(t.tm) over (order by t.tm) as endtime,
(select ab.state
from ((select a.*
from a
) union all
(select b.*
from b
)
) ab
where ab.starttime <= t.tm and t.tm < ab.endtime
order by ab.starttime desc
limit 1
) as state
from t;

Calculate Average between columns by comparing two rows in SQL Server

I have the below table
BidID AppID AppStatus StatusTime
1 1 In Review 2019-01-02 12:00:00
1 1 Approved 2019-01-02 13:00:00
1 2 In Review 2019-01-04 13:00:00
1 2 Approved 2019-01-04 14:00:00
2 2 In Review 2019-01-07 15:00:00
2 2 Approved 2019-01-07 17:00:00
3 1 In Review 2019-01-09 13:00:00
4 1 Approved 2019-01-09 13:00:00
What I am trying to do is first to calculate the average of statusTime minutes difference by the following logic
First group by the BidID and then by AppID and then calculate the time difference between the StatusTime between In Review and Approved AppStatus
eg
First Group BidID,Then group App ID
, Then First Check for In Review Status and Find the Next Approved status and then have to calculate min difference between the dates
BidID AppID AppStatus BidAverage
1 -> 1,2 -> For App ID 1(2019-01-02 1hour 1.5
15:48:42.000 - 2019-01-02
12:33:36.000)
For App ID 2(2019-01-04 2hour
10:33:12.000 - 2019-01-04
10:33:12.000)
2-> 2 -> For App ID 2(2019-01-04 1 1
10:33:12.000 - 2019-01-04
10:33:12.000)
3-> 1-> No Calculation since no Approved
4-> 1-> No Calculation since no In Review before Approved
Final Average (1.5 + 1) / 2 = 1.25 for the table
The time difference excluding saturday I have already figured out Time Dfference Exluding Weekend using David's suggestion.
I am not sure how to check if AppStatus is first in In Review and then Approved and then only calculate the time difference and if there is no Approved like in BidID 3 then don't use that in the average calculation and then average it across the APPId and then the BidID
Thanks
I think you can just use min() and max() for simplicity to get the times for the bid/app pairs. The rest is just aggregation and more aggregation.
The processing you describe seems to be:
select avg(avg_bid_diff)
from (select bid, avg(diff*1.0) as avg_bid_diff
from (select bid, appid,
datediff(second, min(starttime), max(statustime)) as diff
from t
where appstatus in ('In Review', 'Approved')
group by bid, appid
having count(*) = 2
) ba
group by bid
) b;
This makes assumptions that are consistent with the provided data -- that the statuses don't have duplicates for the bid/app pairs an that approval is always after review.

MS ACCESS – Return a daily count of booked resources within a date range

Please note: this is not for an Access project as such, but a legacy application that uses an Access database for its back end.
Setup
Part of the application is a kind of Gantt chart, fixed to single day columns, where each row represents a single resource. Resources are booked out for a range of days and a booking is for a single resource, so they cannot overlap on a row. The range of dates that is in view is user selectable, open ended, and can be changed by various methods, including horizontal scrolling using mouse or keyboard.
Problem
I've been tasked with adding a row to the top of the chart to indicate overall resource usage for each day. Of course that's trivially easy to do by simply querying for each day in the range separately, but unfortunately that is proving to be an expensive process and therefore slows down horizontal scrolling a lot. So I'm looking for a way to do it more efficiently, hopefully with fewer database reads.
Here is a highly simplified example of the bookings table:
booking_ID | start_Date | end_Date | resource_ID
----------- -------------- ------------- -------------
1 2014-07-17 2014-07-20 21
2 2014-08-24 2014-08-29 4
3 2014-08-26 2014-09-02 21
4 2014-08-28 2014-09-04 19
Ideally, I would like a single query that returns each day within the specified range, along with a count of how many bookings there are on those days. So querying the data above for 20 days from 2014-07-17 would produce this:
check_Date | resources_Used
----------- ---------------
2014-07-17 1
2014-07-18 1
2014-07-19 1
2014-07-20 1
2014-07-21 0
2014-07-22 0
2014-07-23 0
2014-08-24 1
2014-08-25 1
2014-08-26 2
2014-08-27 2
2014-08-28 3
2014-08-29 3
2014-08-30 2
2014-08-31 2
2014-09-01 2
2014-09-02 2
2014-09-03 1
2014-09-04 1
2014-09-05 0
I can get a list of dates in the range by using a table of integers (starting at 0), with this:
SELECT CDATE('2014-07-17') + ID AS check_Date FROM Integers WHERE ID < 20
And I can get the count of resources used for a single day with something like this:
SELECT COUNT(*) AS resources_Used
FROM booking
WHERE start_Date <= CDATE('2014-09-04')
AND end_Date >= CDATE('2014-09-04')
But I can't figure out how (or if) I can tie them both together to get the desired results. Is this even possible?
Create a table called "calendar" and put a list of dates into it covering the necessary timeframe. It just needs one column called check_date with one row for each date. Use Excel, start at whatever date and just drag down, then import into the new table.
After your calendar table is set up you can run the following:
select c.check_date, count(b.resource_id) as resources_used
from calendar c, bookings b
where c.check_date between b.start_date and b.end_date
group by c.check_date