Finding out the people and book capacity for a new library using analysis - testing

A friend was asked this question which I would like to know what would be an ideal response for this question.
If you were asked to do analysis about how many people and book capacity should a new public library have in the area you live in, and what should be the location for such a library, what are the inputs you would need to perform the analysis, what are the key factors that would need to be considered? How would you test whether your analysis was accurate?
The analysis which I did were as follows: For the book and people capacity as well as for the location:
1 For the location, questions such as: Where are the cities most populated educational institutes located? Where do most college students go after their college/tuitions? Which locations are entertainment hotspots that attract people? For example, Malls, Shopping centers, etc.? Which location is most connected to other parts of the city via different public transport routes like bus, train, and metro?
I think so answering this, would give an idea about the location for where the library should be established.
Coming to the people and book capacity, Well, including the students that are present in the given area, can one estimate an average of the student population for a reference number? (Though I think it doesn't qualify as a correct method?) Can also take an example of an existing library somewhere else and find out how many people do visit it and get a rough estimated number? (This can also be done for getting the book capacity) Finally, I think we can also get a number on how many working professionals, residents and students inhabit the said location and estimate a range on how many people will visit the library?
For the book capacity, A number of books would be educational books depending upon the educational backgrounds of the majority of students, which could be estimated by the number of residing students present there. Besides educational books, the number of fictional and non fictional books could be the average amount of best sellers present in a given month. And lastly, again we can compare with existing libraries which are present in the given vicinity.
For testing our hypothesis, the only way I'm getting is by conducting surveys or by asking people first hand in a QnA interview format near college areas. Any other suggestions?

Related

Can Optaplanner handle the Dial-A-Ride Problem (aka Uber Pooling)?

Optaplanner looks like it's great at Vehicle Routing for problems involving single entities, such as a taxi fleet of taxis transporting single customers between locations.
But what about more complex systems where the vehicle is shared by multiple people at once, such as DARP or Uber pooling where it could like something like:
Pick up customer 1 -> Pick up customer 2 -> Drop off customer 1 -> Pick up customer 3 -> Drop off customer 2 -> Drop off customer 3
As per the description of DARP:
The Dial-a-Ride Problem (DARP) consists of designing vehicle routes and schedules for n users who specify pickup and delivery requests between origins and destinations. The aim is to plan a set of m minimum cost vehicle routes capable of accommodating as many users as possible, under a set of constraints. The most common example arises in door-to-door transportation for elderly or disabled people.
Is this sort of thing possible with Optaplanner?
I looked through the documentation to grasp what Optaplanner could do, but not too sure where its limits lie at.
In theory, doing a mixed VRP in OptaPlanner is possible. In practice, we have not yet gotten around to finding the best possible model which we could recommend to users.
We have an old JIRA for it where some proposals were outlined, but no definitive conclusion was reached.

How to design the models in OptaPlanner in my case

I have been started to learn Opataplanner for sometime, I try to figure out a model design for my use case to progress the solution calculation, here is my case in real world in manufactory's production line:
There is a working order involved list of sequential processes
Each kind of machine can handle fixed types of processes.(assume machine quantity are enough)
The involved team has a number of available employees, each employee has the skills for set of processes with their different own working cost time
production line has fixed number of stations available
each station put one machine/employee or leave empty
question: how to design the model to calculate the maximum output of completion product in one day.
confusion: in the case, the single station will have one employee and one machined populated, and specified dynamic processed to be working on. but the input of factors are referred by each other and dynamic: employee => processes skill , process skill => machines
can please help to guide how to design the models?
Maybe some of the examples are close to your requirements. See their docs here. Specifically the task assignment, cheap time scheduling or project job scheduling examples.
Otherwise, follow the domain modeling guidelines.

Choosing a chat-bot framework for data science research project and understanding the hidden costs of the development and rollout?

The question is about using a chat-bot framework in a research study, where one would like to measure the improvement of a rule-based decision process over time.
For example, we would like to understand how to improve the process of medical condition identification (and treatment) using the minimal set of guided questions and patient interaction.
Medical condition can be formulated into a work-flow rules by doctors; possible technical approach for such study would be developing an app or web site that can be accessed by patients, where they can ask free text questions that a predefined rule-based chat-bot will address. During the study there will be a doctor monitoring the collected data and improving the rules and the possible responses (and also provide new responses when the workflow has reached a dead-end), we do plan to collect the conversations and apply machine learning to generate improved work-flow tree (and questions) over time, however the plan is to do any data analysis and processing offline, there is no intention of building a full product.
This is a low budget academy study, and the PHD student has good development skills and data science knowledge (python) and will be accompanied by a fellow student that will work on the engineering side. One of the conversational-AI options recommended for data scientists was RASA.
I invested the last few days reading and playing with several chat-bots solutions: RASA, Botpress, also looked at Dialogflow and read tons of comparison material which makes it more challenging.
From the sources on the internet it seems that RASA might be a better fit for data science projects, however it would be great to get a sense of the real learning curve and how fast one can expect to have a working bot, and the especially one that has to continuously update the rules.
Few things to clarify, We do have data to generate the questions and in touch with doctors to improve the quality, it seems that we need a way to introduce participants with multiple choices and provide answers (not just free text), being in the research side there is also no need to align with any specific big provider (i.e. Google, Amazon or Microsoft) unless it has a benefit, the important consideration are time, money and felxability, we would like to have a working approach in few weeks (and continuously improve it) the whole experiment will run for no more than 3-4 months. We do need to be able to extract all the data. We are not sure about which channel is best for such study WhatsApp? Website? Other? and what are the involved complexities?
Any thoughts about the challenges and considerations about dealing with chat-bots would be valuable.

How to encode inputs like artist or actor

I am currently developing a neural network that tries to make a suggestion for a specific user based on his recent activities. I will try to illustrate my problem with an example.
Now, let's say im trying to suggest new music to a user based on the music he recently listened to. Since people often listen to artists they know, one input of such a neural network might be the artists he recently listened to.
The problem is the encoding of this feature. As the id of the artist in the database has no meaning for the neural network, the only other option that comes to my mind would be one-hot encoding every artist, but that doesn't sound to promising either regarding the thousands of different artists out there.
My question is: How can i encode such a feature?
The approach you describe is called content-based filtering. The intuition is to recommend items to customer A similar to previous items liked by A. An advantage to this approach is that you only need data about one user, which tends to result in a "personalized" approach for recommendation. But some disadvantages include the construction of features (the problem you're dealing with now), the difficulty to build an interesting profile for new users, plus it will also never recommend items outside a user's content profile. As for the difficulty of representation, features are usually handcrafted and abstracted afterwards. For music specifically, features would be things like 'artist', 'genre', etc. and abstraction for informative keywords (if necessary) is widely done using tf-idf.
This may go outside the scope of the question, but I think it is also worth mentioning an alternative approach to this: collaborative filtering. Rather than similar items, here we instead try to find users with similar tastes and recommend products that they liked. The only data you need here are some sort of user ratings or values of how much they (dis)liked some data - eliminating the need for feature design. Furthermore, since we analyze similar persons rather than items for recommendation, this approach tends to also work well for new users. The general flow for collaborative filtering looks like:
Measure similarity between user of interest and all other users
(optional) Select a smaller subset consisting of most similar users
Predict ratings as a weighted combination of "nearest neighbors"
Return the highest rated items
A popular approach for the similarity weighting in the algorithm is based on the Pearson correlation coefficient.
Finally, something to consider here is the need for performance/scalability: calculating pairwise similarities for millions of users is not really light-weight on a normal computer.

Need help on an ER Diagram for an automobile company

I'm working on a small uni database project, I would like to know if my ER design is good enough for me to move on to further steps.
Further steps involve: Translating ER to Relational diagram, and basically implement it as a database for a database application, in which user can search and browse stuff through an interface.
Here's the project description:
The application is an automobile company, such as General Motors, Ford, Toyota, or Volkswagen
(or maybe a company from yesteryear like Studebaker, Hudson, Nash, or Packard). In our
hypothetical company, it has been decided to redesign a major part of the database that underlies
company operations. Unfortunately, the manager assigned to solicit database design proposals is
not very computer literate and is unable to provide a very detailed specification at the technical
level. Fortunately, you are able to do that. The company needs to keep quite a bit of data, but we
shall focus on the following aspects of corporate operations.
Vehicles: each vehicle as a vehicle identification number (VIN). Lots of stuff is encoded in real VINs (they are well described on Wikipedia), but you can just make them up if you want.
Brands: each company may have several brands (for example, GM has Chevrolet, Pontiac, Buick, Cadillac, GMC, Saturn, Hummer, Saab, Daewoo, Holden, Vauxhall, and Opel and Volkswagen has Volkswagen, Audi, Lamborghini, Bentley, Bugatti, Skoda, and SEAT)
Models: each brand offers several models (for example, Buick’s models are the Enclave, LaCrosse, and Lucerne, and Mercury’s models are the Mariner, Milan, Sable, and Grand Marquis). Each model may come in a variety of body styles (4-door, wagon, etc.)
Options: we’ll stick to color, and maybe engine and transmission.
Dealers and customers: dealers buy vehicles from the manufacturer and sell them to customers. We’ll keep track of sales by date, brand, model, and color; and also by dealer. Note that a dealer may not sell any of the car company’s brands. Dealer’s keep some cars in inventory. Some, of course, are already sold, but the dealer still keeps track of that fact.
Suppliers: suppliers supply certain parts for certain models.
Company-owned manufacturing plants: some plants supply certain parts for certain models; others do final assembly of actual cars.
Customers: in reality, lots of demographic data are gathered. We’ll stick to name, address, phone, gender, and annual income for individual buyers. The customer may also be a company (e.g. Hertz, Avis, or other companies that maintain corporate fleets, but we’ll skip that).
We’ll skip data on corporate finance, pending bailouts, bankruptcy status, etc. Not that these data are unimportant, but we need to keep the project within bounds.
Here's the ER diagram I came up with:
I worked on a multi-tenant car dealership database for a couple of years.
Some things to consider:
You need to differentiate between Products and Assets. The product is the thing you sell (just the specification of a car, with a model number), and the Asset is the thing the customer drives away in (it has a VIN).
You should consider the Party Model as you might sell to employees, buy from Customers, etc.
How to deal with trade-ins? They are probably best seen as an adjustment on a sales order.
How to sell goods, services, financial instruments (warranties) on the same sales order? You need abstraction here.