How to group time data into buckets in QlikView? - qlikview

I have a list of times in QlikView. For example:
1:45 am
2:34 am
3:55 am
etc.
How do I split it into groups like this:
1 - 2 am
2 - 3 am
4 - 5 am
etc.
I used the class function, but something is wrong. It works but it doesn't create time buckets, it creates some sort of converted decimal buckets.

You have a couple of options, by far the simplest would be to create a new field which reformats your time field, for example I created TimeBucket which formats the time field into hours, and appends this with the same time but with an hour added for the upper bound:
LOAD
TimeField,
Time(TimeField,'h tt') & ' - ' & Time(TimeField + maketime(1,0,0),'h tt') as TimeBucket;
LOAD
*
INLINE [
TimeField
1:45
2:34
3:55
16:45
17:56
];
This then results in the following:
However, depending on your exact requirements, this solution may have problems due to the nature of Time as this is a dual function.
Another alternative is to use intervalmatch as follows. One point to remember is that intervalmatch includes the end-points in an interval. This means for time, we have to make the "end" times be one second before the start of the next interval, otherwise we will generate two records instead of one if your source data has a time that sits on an interval boundary.
TimeBuckets:
LOAD
maketime(RecNo()-1,0,0) as Start,
maketime(RecNo()-1,59,59) as End,
trim(mid(time(maketime(RecNo()-1),'h tt'),1,2)) & ' - ' & trim(time(maketime(mod(RecNo(),24)),'h tt')) as Bucket
AUTOGENERATE(24);
SourceData:
LOAD
*
INLINE [
TimeField
1:45
2:34
3:55
16:45
17:56
];
BucketedSourceData:
INTERVALMATCH (TimeField)
LOAD
Start,
End
RESIDENT TimeBuckets;
LEFT JOIN (BucketedSourceData)
LOAD
*
RESIDENT TimeBuckets;
DROP TABLES SourceData, TimeBuckets;
This then results in the following:
More information on intervalmatch may be found in both the QlikView installed help as well as the QlikView Reference manual.

Write a nested if statement in your script:
If(TIME>1:45,'bucket 1',
If(TIME>2:45,'bucket 2','Others'
)
)
Not the most elegant, but if you can't get the 1:45 to work with the date() function, you can always convert to military time and just add the hours and minutes, then make buckets out of that.

Related

Finding the Closest Unbooked Dates Using SQL

Scenario
A user selects a date. Based on the selection I check whether the date & time is booked or not (No issues here).
If a date & time is booked, I need to show them n alternative dates. Based on their date and time parameters, and those proposed alternative dates have to be as close as to their chosen date as possible. The list of alternative dates should start from the date the query is ran on My backend handles this.
My Progress So Far
SELECT alternative_date
FROM GENERATE_SERIES(
TIMESTAMP '2022-08-20 05:00:00',
date_trunc('month', TIMESTAMP '2022-08-20 07:00:00') + INTERVAL '1 month - 1 day',
INTERVAL '1 day'
) AS G(alternative_date)
WHERE NOT EXISTS(
SELECT * FROM events T
WHERE T.bookDate::DATE = G.alternative_date::DATE
)
The code above uses the GENERATE_SERIES(...) function in PSQL. It searches for all dates, starting from 2022-08-20, and up to the end of August. It specifically returns the dates which does not exist in the bookDate column (Meaning it has not yet been booked).
Problems I Need Help With
When searching for alternative dates, I'm providing 3 important things
The user's preferred booking date, so I can suggest which other dates are close to him that he can choose? How would I go about doing this? It's the part where I'm facing most trouble.
The user's start and end times, so when providing a list of alternative dates, I can tell him, hey there's free space between 06 and 07 on the date 2022-08-22 for instance. I'm also facing some issues here, a push in the right track will be great!
I want to add another WHERE but it fails, the current WHERE is a NOT EXISTS so it looks for all dates not equaling to what is given. My other WHERE basically means WHERE the place is open for booking or not.
To get closest free dates, you can ORDER BY your result by "distance" of particular alternative date to user's preferred date - the shortest intervals will be first:
ORDER BY alternative_date - TIMESTAMP '2022-08-20 05:00:00'
If you want to recommend time slots smaller than whole dates (hour range), you need to switch the whole thing from dates to hours, i.e. generate_series from 1 day to 1 hour (or whatever your smallest bookable unit is) and excluse invalid hours (nighttime I assume) in WHERE. From there, it is pretty much the same as with dates.
As for "second where", there can be only one WHERE, but it can be composed from multiple conditions - you can add more conditions using AND operator (and it can also be sub-query if needed):
WHERE NOT EXISTS(
SELECT * FROM events T
WHERE T.bookDate::DATE = G.alternative_date::DATE
) AND NOT EXISTS (
SELECT 1 FROM events WHERE "roomId" = '13b46460-162d-4d32-94c0-e27dd9246c79'
)
(warning: this second sub-query is probably dangerous in real world, since the room will be used more than one time, I assume, so you need to add some time condition to the subquery to check against date)

Split date_start and date_end by hours on Metabase

I have table with a column "Begin At" and another column "End At" that represent when a task begin and when a task end i would want to have a Bar display which display the cuantity of tasks that are being done in a specific hour along an interval of time.
For example, from the following table
I would want to be able to see that from 07/12/2021 21:00 to 07/12/2021 22:00 there were 3 tasks being done (row 1, row 2, row 3).
And also as i will have several thousands of rows i would want to use the date widget from metabase in order to specify range of times.
I have been struggling with this from the last week, i tried to create auxiliar questions where to query after but finally my only succeed was to hard code the 24 hours from a day but then i was not able to use the time widget and i needed to specify the dates myself on the sql each time i want to check a specific day and also i only was able to check from 24 to 24 hours, not from for example 02/12/2021 6:00 to 04/12/2021 18:00
My metabase is running on a PostgreSQL database. Is this even possible on Metabase? If not what are your advices to build this? Other plaforms? Pure SQL? Python?
Thank you so much
I am not sure about metabase but from a PostgreSQL point of view this calls for the use of range-types, specifically the tsrange/tstzrange, depending on whether you have time zone information or not.
So a query could be:
SELECT
*
FROM "someTable"
WHERE
tsrange("Begin At", "End At", '[)')
&&
tsrange('02/12/2021 6:00', '04/12/2021 18:00', '[)')
However I don't know how you would get the '02/12/2021 6:00' and '04/12/2021 18:00' out of your metabase user-interface.

difference between 2 datetimes in UTC in Bigquery

I am returning a set of values in a bigquery select statement like this
I need to add a compute field Utlization for each row like this formulae (end_time - start_time)*cores Utilization
This time format is in UTC so I am not sure how to do this , I want to do this in the select statement itself. I am new to BigQuery. Kindly Help . Thanks
You need to use TIMESTAMP_DIFF function in standard SQL.
In your particular case the query would be something like
SELECT TIMESTAMP_DIFF(end_time , start_time , second)*cores as Utilization
FROM <yourtable>
Take into consideration that you can change the time unit of the result and you should change it to fit your needs. I inserted second but you can use microsecond, millisecond, second, minute, hour or day.

Adding a month to Ansidate

select ansidate(xxx) + '3 months' AS date1
from (
select str_to_date(varchar(max_date),'%Y%m%d') as xxx
from comp_stg_rundate
where row_key = 1
)as s
Max_date is saved as an interger in the comp_stg_rundate table, I want to add 3 months onto this date. The fuction DateAdd won't work as I am using VectorWise.
Ok, I don't know VectorWise or how ansidate is stored, however, if it's stored as an integer value I assume it represents a certain amount of time, measured on a specific time unit since a starting time point.
If it's true, you can convert your 3 months to the time unit in which ansidate is being stored and add the converted value.
What version of Vectorwise did you run this against because it works for me on VW 4.0.
However this also works.
timestampadd(MONTH,3,ansidate(xxx))

Simultaneous calls from CDR

I need to come up with an analysis of simultaneus events, when having only starttime and duration of each event.
Details
I've a standard CDR call detail record, that contains among others:
calldate (timedate of each call start
duration (int, seconds of call duration)
channel (a string)
What I need to come up with is some sort of analysys of simultaneus calls on each second, for a given timedate period. For example, a graph of simultaneous calls we had yesterday.
(The problem is the same if we have visitors logs with duration on a website and wish to obtain simultaneous clients for a group of web-pages)
What would your algoritm be?
I can iterate over records in the given period, and fill an array, where each bucket of the array corresponds to 1 second in the overall period. This works and seems to be fast, but if the timeperiod is big (say..1 year), I would need lots of memory (3600x24x365x4 bytes ~ 120MB aprox).
This is for a web-based, interactive app, so my memory footprint should be small enough.
Edit
By simultaneous, I mean all calls on a given second. Second would be my minimum unit. I cannot use something bigger (hour for example) becuse all calls during an hour do not need to be held at the same time.
I would implement this on the database. Using a GROUP BY clause with DATEPART, you could get a list of simultaneous calls for whatever time period you wanted, by second, minute, hour, whatever.
On the web side, you would only have to display the histogram that is returned by the query.
#eric-z-beard: I would really like to be able to implement this on the database. I like your proposal, and while it seems to lead to something, I dont quite fully understand it. Could you elaborate? Please recall that each call will span over several seconds, and each second need to count. If using DATEPART (or something like it on MySQL), what second should be used for the GROUP BY. See note on simultaneus.
Elaborating over this, I found a way to solve it using a temporary table. Assuming temp holds all seconds from tStart to tEnd, I could do
SELECT temp.second, count(call.id)
FROM call, temp
WHERE temp.second between (call.start and call.start + call.duration)
GROUP BY temp.second
Then, as suggested, the web app should use this as a histogram.
You can use a static Numbers table for lots of SQL tricks like this. The Numbers table simply contains integers from 0 to n for n like 10000.
Then your temp table never needs to be created, and instead is a subquery like:
SELECT StartTime + Numbers.Number AS Second
FROM Numbers
You can create table 'simultaneous_calls' with 3 fields: yyyymmdd Char(8),
day_second Number, -- second of the day,
count Number -- count of simultaneous calls
Your web service can take 'count' value from this table and make some statistics.
Simultaneous_calls table will be filled by some batch program which will be started every day after end of the day.
Assuming that you use Oracle, the batch may start a PL/SQL procedure which does the following:
Appends table with 24 * 3600 = 86400 records for each second of the day, with default 'count' value = 0.
Defines the 'day_cdrs' cursor for the query:
Select to_char(calldate, 'yyyymmdd') yyyymmdd,
(calldate - trunc(calldate)) * 24 * 3600 starting_second,
duration duration
From cdrs
Where cdrs.calldate >= Trunc(Sysdate -1)
And cdrs.calldate
Iterates the cursor to increment 'count' field for the seconds of the call:
For cdr in day_cdrs
Loop
Update simultaneos_calls
Set count = count + 1
Where yyyymmdd = cdr.yyyymmdd
And day_second Between cdr.starting_second And cdr.starting_second + cdr.duration;
End Loop;