My problem it's very complex, for me anyway! I'll try to explain :
I have the following fact table :
ID Date_From Date_To Time_From Time_To ID_ACTIV
1 01/02/2018 25/05/2018 08:00:00 10:00:00 41
2 01/06/2018 01/07/2018 10:00:00 13:00:00 41
3 01/02/2018 10/02/2018 10:00:00 11:00:00 42
And a normal date dimension I want to link this dimension with fact tab to get this result table in example:
Date hour ACTIV_COUNT
31/01/2018 10 0
01/02/2018 07 0
01/02/2018 08 1
01/02/2018 09 1
01/02/2018 10 2
01/02/2018 11 1
.
.
01/06/2018 10 1
.
.
The only solution that I found it's to create a named query with a field and populate it with all possible dates and times in each interval, then link it with the date and time dimension.
Have you a better solution ?
Thank you in advance.
Related
I aim to first achieve this
id
employee
Datelog
TimeIn
TimeOut
Hours
Count
5
Two
2022-08-10
09:00:00
16:00:00
07:00:00
1
4
Two
2022-08-09
09:00:00
16:00:00
07:00:00
1
3
Two
2022-08-08
09:00:00
16:00:00
07:00:00
1
2
One
2022-08-05
09:00:00
16:00:00
07:00:00
1
1
Two
2022-08-04
09:00:00
10:00:00
01:00:00
0
and now my main objective here is to give a bonus of 2k to employees whose Totalcount per month >=3.
employee
Month
TotalCount
Bonus
Two
August
3
2000
One
August
1
0
Here's the answer using Postgres. It's pretty much generic other than extracting the month out of datelog that might have a slightly different syntax.
select employee
,max(date_part('month', datelog ))
,count(*)
,case when count(*) >= 3 then 2000 else 0 end as bonus
from t
where hours >= time '06:00:00'
group by employee
employee
max
count
bonus
Two
8
3
2000
One
8
1
0
Fiddle
I am trying to create a table in Snowflake with 15 mins interval. I have tried with generator, but that's not give in the 15 minutes interval. Are there any function which I can use to generate and build this table for couple of years worth data.
Such as
Date
Hour
202-03-29
02:00 AM
202-03-29
02:15 AM
202-03-29
02:30 AM
202-03-29
02:45 AM
202-03-29
03:00 AM
202-03-29
03:15 AM
.........
........
.........
........
Thanks
Use following as time generator with 15min interval and then use other date time functions as needed to extract date part or time part in separate columns.
with CTE as
(select timestampadd(min,seq4()*15 ,date_trunc(hour, current_timestamp())) as time_count
from table(generator(rowcount=>4*24)))
select time_count from cte;
+-------------------------------+
| TIME_COUNT |
|-------------------------------|
| 2022-03-29 14:00:00.000 -0700 |
| 2022-03-29 14:15:00.000 -0700 |
| 2022-03-29 14:30:00.000 -0700 |
| 2022-03-29 14:45:00.000 -0700 |
| 2022-03-29 15:00:00.000 -0700 |
| 2022-03-29 15:15:00.000 -0700 |
.
.
.
....truncated output
| 2022-03-30 13:15:00.000 -0700 |
| 2022-03-30 13:30:00.000 -0700 |
| 2022-03-30 13:45:00.000 -0700 |
+-------------------------------+
There are many answers to this question h e r e already (those 4 are all this month).
But major point to note is you MUST NOT use SEQx() as the number generator (you can use it in the ORDER BY, but that is not needed). As noted in the doc's
Important
This function uses sequences to produce a unique set of increasing integers, but does not necessarily produce a gap-free sequence. When operating on a large quantity of data, gaps can appear in a sequence. If a fully ordered, gap-free sequence is required, consider using the ROW_NUMBER window function.
CREATE TABLE table_of_2_years_date_times AS
SELECT
date_time::date as date,
date_time::time as time
FROM (
SELECT
row_number() over (order by null)-1 as rn
,dateadd('minute', 15 * rn, '2022-03-01'::date) as date_time
from table(generator(rowcount=>4*24*365*2))
)
ORDER BY rn;
then selecting the top/bottom:
(SELECT * FROM table_of_2_years_date_times ORDER BY date,time LIMIT 5)
UNION ALL
(SELECT * FROM table_of_2_years_date_times ORDER BY date desc,time desc LIMIT 5)
ORDER BY 1,2;
DATE
TIME
2022-03-01
00:00:00
2022-03-01
00:15:00
2022-03-01
00:30:00
2022-03-01
00:45:00
2022-03-01
01:00:00
2024-02-28
22:45:00
2024-02-28
23:00:00
2024-02-28
23:15:00
2024-02-28
23:30:00
2024-02-28
23:45:00
I have two data frames in csv files. First data described traffic incidents (df1) and second data has the traffic record data for each 15 minutes(df2). I want to merge between them based on the closest time. I used python pandas_merge_asof and I got the nearest match. but I want the 30 minutes records before and after the match from the traffic record data. And I want to join the closest incidents to the traffic data time. if the incidents occured 14:02:00, it will be mereged with the traffic date that recorded at 14:00:00
For example:
1- Incidents data
Date detector_id Inident_type
09/30/2015 8:00:00 1 crash
09/30/2015 8:02:00 1 congestion
04/22/2014 15:30:00 9 congestion
04/22/2014 15:33:00 9 Emergency vehicle
2 - Traffic data
Date detector_id traffic_volume
09/30/2015 7:30:00 1 55
09/30/2015 7:45:00 1 45
09/30/2015 8:00:00 1 60
09/30/2015 8:15:00 1 200
09/30/2015 8:30:00 1 70
04/22/2014 15:00:00 9 15
04/22/2014 15:15:00 9 7
04/22/2014 15:30:00 9 50
04/22/2014 15:45:00 9 11
04/22/2014 16:00:00 9 7
2- the desired table
Date detector_id traffic_volume Incident_type
09/30/2015 7:30:00 1 55 NA
09/30/2015 7:45:00 1 45 NA
09/30/2015 8:00:00 1 60 Crash
09/30/2015 8:00:00 1 60 congestion
09/30/2015 8:15:00 1 200 NA
09/30/2015 8:30:00 1 70 NA
04/22/2014 15:00:00 9 15 NA
04/22/2014 15:15:00 9 7 NA
04/22/2014 15:30:00 9 50 Congestion
04/22/2014 15:30:00 9 50 Emergency vehicle
04/22/2014 15:45:00 9 11 NA
04/22/2014 16:00:00 9 7 NA
The code that I used as follow
Merge = pd.merge_asof(df2, df1, left_index = True, right_index = True, allow_exact_maches = False,
on='Date', by='detector_id', direction='nearest')
but it gave me this table.
Date detector_id traffic_volume Incident_type
09/30/2015 8:00:00 1 60 Crash
04/22/2014 15:30:00 9 50 Congestion
and I want to know the situation after and before the incidents occur.
Any Idea?
Thank you.
*If I made mistake by asking like this way, please let me know.
For anyone has the same problem and want to do merge by using pandas.merge_asof, you have to use the Tolerance function. This function helps you adjust the time different between the two datasets.
But you may face two problems related to Timedelta and sorting index. so the solution of Timedelta is converting the time to datetime as follow:
df1.Date = pd.to_datetime(df1.Date)
df2.Date = pd.to_datetime(df2.Date)
and the sorting index you need apply sort in your main code as follow:
x = pd.merge_asof(df1.sort_values('Date'), #sort_values fix the error"left Key must be sorted"
df2.sort_values('Date'),
on = 'Date',
by = 'Detector_id',
direction = 'backward',
tolerance =pd.Timedelta('45 min'))
The direction could be nearest which in my case will select all the records accord before and after the match records within 45 minutes.
The direction could be backward will merge all records within 45 minutes after the exact or nearest match
and Forward will select all the records within 45 minutes before the exact or nearest match.
Thank you and hopefully this will help anyone in future.
I have two tables in hive and trying to perform a join:
Table A:
id ord_time
84 10:00:00
84 12:00:00
84 15:00:00
84 4:00:00
Data types:
Id : int
ord_time : String
Table B:
id time_desc beg_tm end_tm
84 Late Night 00:00:00 04:59:59
84 Break Fast 05:00:00 10:29:59
84 Dinner 16:00:00 20:59:59
84 Lunch 11:00:00 13:59:59
84 Snack 14:00:00 15:59:59
Data types:
Id : int
time_desc : String
beg_tm : String
end_tm : String
Query :
Select a.ord_time,b.id,b.time_desc,b.beg_tm,b.end_tm
from Table A a,Table B b
where a.id = b.id
and a.ord_time between b.beg_tm and b.end_tm
When I ran the above query Result was Null.
I want the output to be:
id ord_time time_desc
84 10:00:00 BreakFast
84 12:00:00 Lunch
84 15:00:00 Snack
84 04:00:00 Late Night
Between also didnt work for me. Then I thought
name between "a" and "b"
is equivalent to
name >="a" and name <="b"
So I tried this. And it worked for me. Try for your case. Hope it works for you.
I need help for proper Oracle SQL code to combine rows for a crystal reports command object. This is a part of the bigger query I'm working on and got stuck for the past couple of days.
for eg. if the columns are like below
PatId In_time Out_time
151 01/01/2012 07:00:00 am 01/01/2012 10:00:00 am
151 01/01/2012 11:00:00 am 01/02/2012 08:00:00 am
151 01/02/2012 11:00:00 am 01/02/2012 01:00:00 pm
151 01/03/2012 08:00:00 am 01/03/2012 03:00:00 pm
151 01/06/2012 03:30:00 pm 01/09/2012 07:00:00 am
167 01/03/2012 01:30:00 pm 01/09/2012 07:00:00 am
167 01/13/2012 03:30:00 pm 01/14/2012 07:00:00 am
167 01/14/2012 11:30:00 am 01/15/2012 11:30:00 am
167 01/18/2012 12:00:00 pm 01/19/2012 03:00:00 am
Within a PatId, the code should compare the Out_time of one row to the In_time of the next row, and check whether the time gap is greater than 48 hours. If not, then it is considered part of the same visit. I want one result row per PatID & visit, with min(In_time) and max(Out_time). The time span of the visit (result row) itself may be greater than 48 hours.
For this example, for PatId 151 the time difference between the out_time of 1st row and In_time of 2nd row is less than 48 hours. The difference between Out_time of second row and In_time of 3rd row, as well as between the 3rd and 4th rows, is also less than 48 hours. After this the gap between Out_time of the 4th row and In_time of 5th row is greater than 48 hours. The result for PatId 151 should be as below and same for EmpId 167, the chaining should continue until a gap greater than 48 hours is found.
So the result for the above table should be displayed as,
PatId In_time Out_time
151 01/01/2012 07:00:00 am 01/03/2012 03:00:00 pm
151 01/06/2012 03:30:00 pm 01/09/2012 07:00:00 am
167 01/03/2012 01:30:00 pm 01/09/2012 07:00:00 am
167 01/13/2012 03:30:00 pm 01/15/2012 11:30:00 am
167 01/18/2012 12:00:00 pm 01/19/2012 03:00:00 am
I could not get the logic on how to compare and merge rows.
Thanks in Advance, Abhi
General example of subtracting time - copy/paste to see the output. This example will give you differences in hours, minutes, seconds between two dates. The basic formula is (end_date - start_date) * 86400 (number of seconds in 24 hrs)...:
SELECT trunc(mydate / 3600) hr
, trunc(mod(mydate, 3600) / 60) mnt
, trunc(mod(mydate, 3600) / 60 /60) sec
FROM
(
SELECT (to_date('01/03/2012 10:00:00', 'mm/dd/yyyy hh24:mi:ss') -
to_date('01/01/2012 07:00:00', 'mm/dd/yyyy hh24:mi:ss')) * 86400 mydate
FROM dual
)
/
HR | MNT | SEC
---------------
51 | 0 | 0
You need to check your example and logic. I could not understand what needs to be comnpared with what...