Custom Interval In Dimension - ssas

I'm looking for recommendations on a best practice here.
I have a requirement where on a given day I must have an arbitrary number of intervals (think buckets of time which are composed of transactions) where I can have at most N intervals per day. These intervals are like time but can be arbitrary lengths i.e. some are seconds, others are minutes.
How the intervals should be formed is based on my source data. On any given day, we always start with interval 1 and it is unknown the total number of intervals we will have by EOD, each interval is defined by a fixed number of transactions. For every interval I am going to need to know the end time as well.
What is the best approach here? Should I be bucketing my fact table and connecting to a standard hour/minute/second dimension or should I be using my transactional data to be making a dimension that accommodates it?
I appreciate your feedback.

If the buckets are on time, you probably have to do it on one of your dimensions. There is a property on the attributes called bucket that can do that for you

Related

Using Optaplanner for long trip planning of a fleet of vehicles in a Vehicle Routing Problem (VRP)

I am applying the VRP example of optaplanner with time windows and I get feasible solutions whenever I define time windows in a range of 24 hours (00:00 to 23:59). But I am needing:
Manage long trips, where I know that the duration between leaving the depot to the first visit, or durations between visits, will be more than 24 hours. So currently it does not give me workable solutions, because the TW format is in 24 hour format. It happens that when applying the scoring rule "arrivalAfterDueTime", always the "arrivalTime" is higher than the "dueTime", because the "dueTime" is in a range of (00:00 to 23:59) and the "arrivalTime" is the next day.
I have thought that I should take each TW of each Customer and add more TW to it, one for each day that is planned.
Example, if I am planning a trip for 3 days, then I would have 3 time windows in each Customer. Something like this: if Customer 1 is available from [08:00-10:00], then say it will also be available from [32:00-34:00] and [56:00-58:00] which are the equivalent of the same TW for the following days.
Likewise I handle the times with long, converted to milliseconds.
I don't know if this is the right way, my consultation would be more about some ideas to approach this constraint, maybe you have a similar problematic and any idea for me would be very appreciated.
Sorry for the wording, I am a Spanish speaker. Thank you.
Without having checked the example, handing multiple days shouldn't be complicated. It all depends on how you model your time variable.
For example, you could:
model the time stamps as a long value denoted as seconds since epoch. This is how most of the examples are model if I remember correctly. Note that this is not very human-readable, but is the fastest to compute with
you could use a time data type, e.g. LocalTime, this is a human-readable time format but will work in the 24-hour range and will be slower than using a primitive data type
you could use a date time data tpe, e.g LocalDateTime, this is also human-readable and will work in any time range and will also be slower than using a primitive data type.
I would strongly encourage to not simply map the current day or current hour to a zero value and start counting from there. So, in your example you denote the times as [32:00-34:00]. This makes it appear as you are using the current day midnight as the 0th hour and start counting from there. While you can do this it will affect debugging and maintainability of your code. That is just my general advice, you don't have to follow it.
What I would advise is to have your own domain models and map them to Optaplanner models where you use a long value for any time stamp that is denoted as seconds since epoch.

How to make a query that computes the difference of two timefield objects/attributes in Django?

Suppose I have a model that has four attributes:
name,
time in,
time out,
date.
time in and time out are timefield objects. Now, I want to write a django query that tells me who was available in the office for most time duration in a given range.
I am not sure how do I calculate the time difference (time out - time in) on the fly. Do I need to put another attribute like time duration? I was hoping to avoid that.
I don't think it's possible using vanilla Django ORM.
Two solutions come to my mind:
Fetch the results in RAM and do the computation.
Add a new field to take care of the duration into your model. You can first do an update query to calculate the duration for all rows in your db.
Class.objects.update(duration=F('time_out')-F('time_in'))
And then you can order_by duration and get the first entry as your max duration.

BUGS model for (nested?) repeated measures ANOVA

I was wondering if anyone has code for a BUGS/JAGS model for a repeated measures ANOVA? Basically, I have a response (y) that I want to model against Time of day, Day, and Treatment. I would also like to include two interaction terms, Treatment x Time of Day and Treatment x Day. There are about 20 individuals in the study, who were measured 4 times per day over about 1 week. I'm not entirely sure where to start, and I'm concerned that the Time of day covariate should also be nested within the Day covariate? If anyone has code for the likelihood portion of the BUGS/JAGS model, it would be greatly appreciated. I can take care of priors. Just can't seem to get off the ground with this one.
There are a few ambiguities in your question.
Do you want Time of Day and Day to enter as continuous covariates or as discrete factors?
Do you want individual identity to enter the model as a fixed or random effect?
If either Day or Time of Day is a factor, do you want to include it as a fixed or random effect?
You ask about whether Time of Day should be nested within Day. This is impossible to answer without knowing more about your data and your aims.
Here's an example of code that assumes that you want to treat individuals as a random effect.
Also assumed: Treatment, Time.of.day, and Day have constant slopes across all individuals. It would be straightforward to extend this model to a fixed- or random-slopes model where different individuals get separate modeled slopes. For example, for a random-slopes model, you'd just modify the beta parameters below to treat them in a manner similar to the alpha parameter.
Following the OP's request, this is the likelihood portion only, and does not include the priors.
for(i in 1:n.observations){
y[i] ~ dnorm(alpha[individual[[i]] + beta1*Day[i] + beta2*Time.of.day[i] + beta3*Treatment[i] + beta4*Treatment[i]*Day[i] + beta5*Treatment[i]*Time.of.day[i], tau.obs)
}
# individual[i] contains the numerical index representing the individual that corresponds to observation i.
for(j in 1:n.individuals){
alpha[j] ~ dnorm(mu, tau)
}

Google Bigquery table decorators

I need to add decorators that will represent from 6 days ago till now.
how should I do it?
lets say the date is realative 604800000 millis from now and it's absolute is 1427061600000
#-604800000
#1427061600000
#now in millis - 1427061600000
#1427061600000 - now in millis
Is there a difference by using relative or absolute times?
Thanks
#-518400000--1
Will give you data for the last 6 days (or last 144 hours).
I think all you need is to read this.
Basically, you have the choice of #time, which is time since Epoch (your #1427061600000). You can also express it as a negative number, which the system will interpret as NOW - time (your #-604800000). These both work, but they don't give the result you want. Instead of returning all that was added in that time range, it will return a snapshot of your table from 6 days ago....
Although you COULD use that snapshot, eliminate all duplicates between that snapshot and your current table, and then take THOSE results as what was added during your 6 days, you're better off with :
Using time ranges directly, which you cover with your 3rd and 4th lines. I don't know if the order makes a difference, but I've always used #time1-time2 with time1<time2 (in your case, #1427061600000 - now in millis).

How to calculate blocks of free time using start and end time?

I have a Ruby on Rails application that uses MySQL and I need to calculate blocks of free (available) time given a table that has rows of start and end datetimes. This needs to be done for a range of dates, so for example, I would need to look for which times are free between May 1 and May 7. I can query the table with the times that are NOT available and use that to remove periods of time between May 1 and May 7. Times in the database are stored at a fidelity of 15 minutes on the quarter hour, meaning all times end at 00, 15, 30 or 45 minutes. There is never a time like 11:16 or 10:01, so no rounding is necessary.
I've thought about creating a hash that has time represented in 15 minute increments and defaulting all of the values to "available" (1), then iterating over an ordered resultset of rows and flipping the values in the hash to 0 for the times that come back from the database. I'm not sure if this is the most efficient way of doing this, and I'm a little concerned about the memory utilization and computational intensity of that approach. This calculation won't happen all the time, but it needs to scale to happening at least a couple hundred times a day. It seems like I would also need to reprocess the entire hash to find the blocks of time that are free after this which seems pretty inefficient.
Any ideas on a better way to do this?
Thanks.
I've done this a couple of ways. First, my assumption is that your table shows appointments, and now you want to get a list of un-booked time, right?
So, the first way I did this was like yours, just a hash of unused times. It's slow and limited and a little wasteful, since I have to re-calculate the hash every time someone needs to know the times that are available.
The next way I did this was borrow an idea from the data warehouse people. I build an attribute table of all time slots that I'm interested in. If you build this kind of table, you may want to put more information in there besides the slot times. You may also include things like whether it's a weekend, which hour of the day it's in, whether it's during regular business hours, whether it's on a holiday, that sort of thing. Then, I have to do a join of all slots between my start and end times and my appointments are null. So, this is a LEFT JOIN, something like:
SELECT *
FROM slots
WHERE ...
LEFT JOIN appointments
WHERE appointments.id IS NULL
That keeps me from having to re-create the hash every time, and it's using the database to do the set operations, something the database is optimized to do.
Also, if you make your slots table a little rich, you can start doing all sorts of queries about not only the available slots you may be after, but also on the kinds of times that tend to get booked, or the kinds of times that tend to always be available, or other interesting questions you might want to answer some day. At the very least, you should keep track of the fields that tell you whether a slot should be one that is being filled or not (like for business hours).
Why not have a flag in the row that indicates this. As time is allocated, flip the flag for every date/time in the appropriate range. For example May 2, 12pm to 1pm, would be marked as not available.
Then it's a simple matter of querying the date range for every row that has the availability flagged set as true.