Employee shift scheduling optimization of multiple constraints - optimization

I am working on filling a daily schedule. During each day there are a number of tasks that have different shift lengths. For example, task_1 has three 8 hour shifts during a 24 hour period and task_2 has six 4 hour shifts during a 24 hour period. There is also a task which is only done during certain hours and consists of four 4 hour shifts during a 24 hour period.
On top of that, certain tasks require a group of employees with different experience levels. Employees can either be simple workers or team leaders. The group of employees is divided into three teams where each team consists of a number of leaders and workers. A constraint I have is that for tasks that require multiple employees , the employees need to be from the same team. For example, group_1 consists of 2 leaders and 3 workers [leader_1, leader_2, worker_1, worker_2, worker_3]. Task_1 requires a team leader and two employees and they need to be from the same group. I can schedule [leader_1, worker_1, worker_2] to the first shift but for the second shift I would only have 1 leader and 1 worker and would need to schedule employees from another group.
I am trying to schedule employees so that there is a maximum amount of time between shifts, i.e. they have as much downtime between shifts as possible. I'm hoping to find a way to automate the scheduling and have been looking at examples of or-tools and pyomo. Right now I don't even have an example code since I'm trying to understand if what I am trying to do is possible with the amount of variables and constraints I have.
Any help would be much appreciated!

Related

MS Project - Assign a resource to multiple tasks that take place on the same day

I have a number of tasks that all are 1 day in duration and take place on the same day. I want to assign a resource to these tasks but only have it calculate for 1 day - it currently calculates 8 hours for each task (even though they're on the same day) for the resource.
Insert the "Work" column. Do this BEFORE you assign resources to the task. Let us assume that you have 4 one -day tasks--all on same day--and you wish to have "Bob" Work on these tasks. So, enter "2 Hours" in the Work column for all 4 tasks. THEN, assign Bob to all 4 tasks. Bob's total Work for the day will be 8 hours.

Distinctcount - suppliers for departments over a period of time - slow performance

In a model that contains the following dimensions:
- Time - granularity month - 5 years - 20 quarters - 60 months
- Suppliers- 6000 suppliers at lowest level
- departments - 500 departments on lowest level
I need to have the distinct count of the suppliers for each department.
I use the function:
with member [measures].[#suppliers] as
distinctcount(([Supplier].[Supplier].[supplier].members
,[Measures].[amount]))
)
select [Measures].[#suppliers] on 0
, order([Department].[Department].[department].members, [#suppliers], BDESC) on 1
from [cube]
where [Time].[Time].[2017 10]:[Time].[Time].[2018 01]
The time component may vary, as the dashboard user is free to choose a reporting period.
But the MDX is very slow. It takes about 38ms to calculate the measure for each row. I want to use this measure to rank the departments and to calculate a cumulative % and assign scores to these values. As you can imagine performance will not improve.
I have tried to use functions and cache the result, but results - for me - got worse (according to the log 2x as bad).
What can I do to improve the performance?
To go fast adding a measure that calculates de Distinct Count on the Supplier ID of the table associated to[Measures].[Amount] will help. In the Schema definition.
The other ones are not scalable as Supplier is growing.
Nonetheless, why did you use DistinctCount instead of Count(NonEmpty())) ?
DistinctCount is mainly for calculating the number of members/tuples that are different in a set. It only makes sense if it's possible to have two same members in a set. As our initial members have no duplicated, it's useless.
Count(NonEmpty()) filters the set whith the nonempty and counts the number of items in the set. This can be easily calculated in parallel

What is the best partitioning strategy for multiple distinct count measures in a cube

I have a cube that has a fact table with a month's worth of data. The fact table is 1.5 billion rows.
Fact table contains the following columns { DateKey,UserKey,ActionKey, ClientKey, ActionCount } .
The fact table contains one row per user per client per action per day with the no of activities done.
Now I want to calculate the below measures in my cube as follows
Avg Days Engaged per user
AVG([Users].[User Key].[User Key], [Measures].[DATE COUNT])
Users Engaged >= 14 Days
SUM([Users].[User Key].[User Key], IIF([Measures].[DATE COUNT] >= 14, 1, 0))
Avg Requests Per User
IIF([Measures].[USER COUNT] = 0, 0 ,[Measures].[ACTIVITY COUNT]/[Measures].[USER COUNT])
So to do this, I have created two distinct count measures DATE COUNT and USER COUNT which are distinct aggregations on the DateKey and UserKey columns of fact table. I want to know partition the measure group(s) ( there are 3 of them bcoz of distinct measure going in it's own measure group).
What is the best strategy to partition the cube? I have read the analysis service distinct count guide end-end and it mentioned that partitioning the cube by non-overlapping user ids is the best strategy for single user queries and user X time is the best for single user time-set queries.
I want to know if I should partition by cube into 75 partitions each (1.5 billion rows/20 million rows per partition) which will have each partition with non-overlapping and sequential user ids or should I partition it into 31 partitions one per day with overlapping userid but distinct days in each partition or 31 * 3 = 93 partitions where I break down the cube into per day and then for each day further partition in to 3 equal parts with non-overlapping user ids within each day (but users will overlap between days) or partition by ActionKey into 45 partitions of un-equal size since most of the times the measures are sliced by Action?
I'm a bit confused because the paper only talks about optimizing on a single distinct count measure, where as I need to do distinct counts on both user and dates for my measures.
any tips ?
I would first take a step back and try the Many-to-Many dimension count technique to achieve Distinct Count results without the overhead of actual Distinct Count aggregations.
Probably the best explanation of this is the "Distinct Count" section of the "Many to Many Revolution 2.0" paper:
http://www.sqlbi.com/articles/many2many/
Note Solution C is the one I am referring to.
You usually find this solution scales much better than a standard "Distinct Count" measure. For example I have one cube with 2b rows in the biggest Fact (and only 4 partitions), and a "M2M Distinct Count" fact on 9m rows - performance is great e.g. 6-7 hours to completely reprocess all the data, less than 5 seconds for most queries. The server is OK but not great e.g. VM, 4 cores, 32 GB RAM (shared with SQL, SSRS, SSIS etc), no SSD.
I think you can get carried away with too many partitions and overcomplicating the design. The basic engine can do wonders with careful design.

Flight Schedule DB Model SQL Server

I am trying to create a DB model in SQL Server for storing Flight schedules (not real time), i have finally come up with 2 DB model but confused, which one to choose to store the schedules.
Approach:1
For each flight, store the schedules in the same column (123X56X) along with flight name, depart time, arrival time, source, destination. (123X56X) It means that particular flight is available on Sunday(1), Monday(2), Tuesday(3), Thursday(5) and Friday(6)
Approach:2
Keep the flight name, depart time, arrival time, source, destination in one table and create a new mapping table for schedules.
Table1 - wk_days
wk_day_id wk_day_short wk_day_long
1 Sun Sunday
2 Mon Monday
Table2 - flight_schedule
flight_sch_id flight_id src_city_id dest_city_id Depart_tm Arrival_tm Duration
1 1 1 2 6:00 8:00 2:00
Table3 - flight_schedule_wk_days
flight_sch_id wk_day_id
1 2
1 3
1 4
2 2
2 3
2 4
Please suggest, which one is better?
A flight schedule database is actually quite a bit more complicated in the real world than either of your examples. (More on this in a moment.)
To answer your question: In general the normalized database approach is a better idea, especially for a transactional database. The second design is normalized. Your first option is reminiscent of old COBOL flat file systems like the original SABRE system.
Using a normalized approach makes your queries much easier and more efficient. Finding out which flights fly on a Tuesday means scanning and doing an in-string analysis on every record under option 1. In option 2 your database can use an index to answer this question without having to read and analyze each record.
On a broader note, a flight is not just an origin and a destination at a particular time on some set of days of the week. Here are some things that a real-world flight schedule database needs to be able to handle:
Flights have an airline identifier
Flights have an operator airline, which can be different from the seller (i.e. "code-share")
Flights can have multiple legs (i.e. multiple sets of origins and destinations under one number)
Flights as defined by scheduling systems do have days of the week, like your model, but they also need to have a start date and end date for the date range in which the flight will operate.
Depending on what your application is intended to do, you might need to take some or all of these into account.

Need ideas/advices about a database structure

Let's think we have 100+ hotels, and each hotel has at least more than 3 room types.
I want to hold hotel's capacity for one year in the past and one year in the future. How should i design the database for easiest use.
Example:
A hotel has 30 rooms. 10 x "Standard
room", 10 x "Duplex Room", 10 x "Delux
room" I will keep this example on
standard rooms. Today is: 13.01.2011 I
want to keep records from 13.01.2010
to 13.01.2012 What i will store in
database is available rooms. Something
like this(for standard room):
13.01.2011: 10
14.01.2011: 9 (means 1 standard room sold for this day)
15.01.2011: 8 (means 2 standard rooms sold for this day)
16.01.2011: 10 (all available for this day)
17.01.2011: 7 (means 3 standard rooms sold for this day)
18.01.2011: 10
etc...
Thanks in advance.
Let me try to summarize your question to see if I understand it properly:
You have a set of Hotels. Each Hotel
has a set of Rooms. Each Room belongs
to one of a number of possible Room
Types. The lowest level of detail
we're interested in here is a Room.
This suggests a table of Hotels, a lookup table of Room Types, and a table of Rooms: each Room will have a reference to its associated Hotel and Room Type.
For any given day, a room is either
booked (sold) or not booked (let's
leave off partial days for simplicity
at this point). For each day in the
year before and the year after the
current day, you wish to know how many
rooms of each type were available (non-booked) at
each hotel.
Now, since hotels need to be able to look at bookings individually, it's likely you would maintain a table of bookings. But these would typically be defined by a Room, a Start Date, and a number of Nights, which isn't ideal for your stated reporting purposes: it isn't broken down by day.
So you may wish to maintain a "Room Booking Log" table, which simply contains a record for each room booked on each day: this could be as simple as a datestamp column plus a Room ID.
This sort of schema would let you generate the output you're describing relatively easily via aggregate queries (displaying the sum of rooms booked per day, grouped by hotel and room type, for example). The model also seems like it would lend itself to an OLAP cube.
I did a homework question like this once. Basically you need at least 3 tables: one which holds the rooms, one which holds the reservations, and another table that links the too because its not a specific room that is reserved at a given time, its a specific type of room.