Machine Learning challenge: technique for collect the coins - optimization

Suppose that there is a company that own a couple of vending machines that collect coins. When the coin safe is full, the machine can not sell any new item. To prevent this, the company must collect the coins before that. But if the company send the technician too early, the company loses money because he made an unnecessary trip. The challenge is to predict the right time to collect coins to minimize the cost of operation.
At each visit (to collect or other operations), a reading of the level of coins in the safe is performed. This data contains historical information regarding the safe filling for each machine.
What is the best ML technique, approach to this problem computationally?

This is the two parts to the problem I see:
1) vending machine model
I would probably build a model for each machine using the historic data. Since you said a linear approach is probably not good, you need to think about things that have influence on the filling of a machine, i.e. time related things like week-day dependency, holiday dependency, etc., other influences like the weather maybe? So you need to attach these factors to the historic data to make a good predictive model. Many machine learning techniques can help creating a model and finding real data correlations. Maybe you should create despriptors from your historical data and try to correlate these to the filling state of a machine. PLS can help reducing the descriptor space and find relevant ones. Neuronal Networks are great if you really have no clue about the underlying math of a correlation. Play around with it. But pretty much any machine learning technique should be able to come up with a decent model
2) money collection
Model the cost for a random trip of the technician to a machine. Take into account the filling grade of the machines and the cost of the trip. You can send the technician on virtual collecting tours and calculate the total cost of collecting the money and the revenues from the machine. Use again maybe a neuronal network with some evolutionary strategy to find an optimum of trips and times. you can use the model of the filling grade of the machines during the virtual optimization, since you probably need to estimate the filling grade of the machines in these virtual collection rounds.
Interesting problems you have...

Related

Calculating Production chain using a database (factorio)

I'm playing the game Factorio, where you build a factory.
For the time being, I made a kind-of flowchart using libreoffice calc to calculate how many machines I need to produce a certain material.
Example image from the spreadsheet
Each block has a recipe saved (blue). This recipe includes what and how much it produces and needs and how much time it takes.
It takes the demand from the previous Block (yellow) and, using the recipe, calculates how many machines (green) it needs to fulfill this demand.
Based on the amount of machines it calculates its own demands (orange).
Then the following blocks do the same, until it has reached the last block.
Doing this in a spreadsheet does work, but it is quite a tedious task.
I showed this to my dad, as I'm quite proud of what I made, and he said that maybe a database would be more suitable.
I definitely see its advantages. For example I could easily summarize the final demands of raw resources, or the total power consumption, etc.
So I got myself Microsoft Access, and I'm pretty lost now. I know the basics of Databases and some SQL-Coding, but I'm not quite sure how I would make this.
My first attempt was:
one table for machines. It includes the machines production speed and other relevant stats.
one table for recipes. Each recipe clearly states what it produces, what it needs, the amount of each, and whether or not it is a basic. Basic means that it is a raw resources, i.e. the production chain would end with this.
one table for units. Each unit has a machine, a recipe and an amount. For example I would have one unit using basic assemblers to produce iron gears. This unit also says how many machines there are, so it needs more and produces more.
I did manage to make a query that calculates the total in and outputs of all units based on their machine and recipe, as well as a total energy consumption.
However, that is nowhere near the spreadsheet I made.
For now we can probably set the Graphical overlay aside, that would probably be quite a bit overkill. However what I do want to be able to make:
enter how much I want of a certain resources
based on that entry the database would create a new table. The first entry would be the unit that produces the requested resources. The second would fulfill the firsts demand, the third fulfills the seconds demand, and so on.
So in the end I would end up with a list of units that will produce my requested resource.
I hope someone can help me. There are programs out there that already do this kind of stuff, but I want to do this myself. If this is a problem that a database isn't suited for, then please tell me so.
Thanks for any help!

How to design the models in OptaPlanner in my case

I have been started to learn Opataplanner for sometime, I try to figure out a model design for my use case to progress the solution calculation, here is my case in real world in manufactory's production line:
There is a working order involved list of sequential processes
Each kind of machine can handle fixed types of processes.(assume machine quantity are enough)
The involved team has a number of available employees, each employee has the skills for set of processes with their different own working cost time
production line has fixed number of stations available
each station put one machine/employee or leave empty
question: how to design the model to calculate the maximum output of completion product in one day.
confusion: in the case, the single station will have one employee and one machined populated, and specified dynamic processed to be working on. but the input of factors are referred by each other and dynamic: employee => processes skill , process skill => machines
can please help to guide how to design the models?
Maybe some of the examples are close to your requirements. See their docs here. Specifically the task assignment, cheap time scheduling or project job scheduling examples.
Otherwise, follow the domain modeling guidelines.

Is it possible to model the Universe in an object oriented manner from the subatomic level upwards?

While I'm certain this must have been tried before, I cant seem to find any examples of this concept being done myself.
What I'm describing goes off of the idea that effectively you could model all "things" which are as objects. From their you can make objects which use other objects. An example would be starting at the fundamental particles in physics combine them to get certain particles like protons neutrons and electrons - then atoms - work your way up to the rest of chemistry etc....
Has this been attempted before and is it possible? How would I even go about it?
If what you mean by "the Universe," is the entire actual universe, the answer to "Is it possible?" is a resounding "Hell no!!!"
Consider a single mole of H2O, good old water. By definition a mole contains ~6*1023 atoms, and knowing the atomic weights involved yields the mass. The density of water is well known. Pulling all the pieces together, we end up with 1 mole is about 18 mL of water. To put that in perspective, the cough syrup dose cup in my medicine cabinet is 20mL. If you could represent the state of each atom using a single byte—I doubt it!—you'd require 1011 terabytes of storage just to represent a snapshot of that mass, and you'd need to update that volume of data every delta-t for the duration you wish to simulate. Additionally, the number of 2-way interactions between N entities grows as O(N2), i.e., on the order of 1046 calculations would be involved, again at every delta-t. To put that into perspective, if you had access to the world's fastest current distributed computer with exaflop capability, it would take you O(1028) seconds (on the order of 1020 years) to perform the calculations for a single simulated delta-t update! You might be able to improve that by playing games with locality, but given the speed of light and the small distances involved you'd have to make a convincing case that heat transfer via thermal radiation couldn't cause state-altering interactions between any pair of atoms within the volume. To sum it up, the storage and calculation requirements are both infeasible for as little as a single mole of mass.
I know from a conversation at a conference a couple of years ago that there are some advanced physics labs that have worked on this approach to get an idea of what happens with a few thousand atoms. However, I can't give specific references since I haven't seen the papers and only heard about it over a beer.

Algorithmically suggest best node to perform demanding computation

At work we perform demanding numerical computations.
We have a network of several Linux boxes with different processing capabilities. At any given time, there can be anywhere from zero to dozens of people connected to a given box.
I created a script to measure the MFLOPS (Million of Floating Point Operations per Second) using the Linpack Benchmark; it also provides number of cores and memory.
I would like to use this information together with the load average (obtained using the uptime command) to suggest the best computer for performing a demanding computation. In other words, its 3:00pm; I have a meeting in two hours; I need to run a demanding process: what node will get me the answer fastest?
I envision a script which will output a suggestion along the lines of:
SUGGESTED HOSTS (IN ORDER OF PREFERENCE)
HOST1.MYNETWORK
HOST2.MYNETWORK
HOST3.MYNETWORK
Such suggestion should favor fast computers (high MFLOPS) if the load average is low and, as load average increases for a given node, it should favor available nodes instead (i.e., I'd rather run in a slower computer with no users than in an eight-core with forty dudes logged in).
How should I prioritize? What algorithm (rationale) would you use? Again, what I have is:
Load Average (1min, 5min, 15min)
MFLOPS measure
Number of users logged in
RAM (installed and available)
Number of cores (important to normalize the load average)
Any thoughts? Thanks!
You don't have enough data to make an well-informed decision. It sounds as though the scheduling is very volatile: "At any given time, there can be anywhere from zero to dozens of people connected to a given box." So the current load does not necessarily reflect the future load of the machines.
To properly asses what hosts someone should use to minimize computation time would require knowing when the current jobs will terminate. If a powerful machine is about to be done doing most of its jobs, it would be a good candidate even though it currently has a high load.
If you want to guess purely on the current situation, you can do a weighed calculation to find out which hosts have the most MFLOPS available.
MFLOPS available = host's MFLOPS + (number of logical processors - load average)
Sort the hosts by MFLOPS available and suggest them in a descending order.
This formula assumes that the MFLOPS of a host is linearly related to its load average. This might not be exactly true, but it's probably fairly close.
I would favor the most recent load average since it's closer to the current/future situation, whereas, jobs from 15 minutes ago might have completed by now.
Have you considered a distributed approach to computation? Not all computations can be broken up such that more than one cpu can work on them. But perhaps your problem space can benefit from some parallelization. Have a look at Hadoop.
You don't need to know FLOPS. beowulf modules paralell computing center has I go to has the script for sure
PDC operates leading-edge, high-performance computers on a national level. PDC offers easily accessible computational resources that primarily cater to the ...

In terms of today's technology, are these meaningful concerns about data size?

We're adding extra login information to an existing database record on the order of 3.85KB per login.
There are two concerns about this:
1) Is this too much on-the-wire data added per login?
2) Is this too much extra data we're storing in the database per login?
Given todays technology, are these valid concerns?
Background:
We don't have concrete usage figures, but we average about 5,000 logins per month. We hope to scale to larger customers, howerver, still in the 10's of 1000's per month, not 1000's per second.
In the US (our market) broadband has 60% market adoption.
Assuming you have ~80,000 logins per month, you would be adding ~ 3.75 GB per YEAR to your database table.
If you are using a decent RDBMS like MySQL, PostgreSQL, SQLServer, Oracle, etc... this is a laughable amount of data and traffic. After several years, you might want to start looking at archiving some of it. But by then, who knows what the application will look like?
It's always important to consider how you are going to be querying this data, so that you don't run into performance bottlenecks. Without those details, I cannot comment very usefully on that aspect.
But to answer your concern, do not be concerned. Just always keep thinking ahead.
How many users do you have? How often do they have to log in? Are they likely to be on fast connections, or damp pieces of string? Do you mean you're really adding 3.85K per time someone logs in, or per user account? How long do you have to store the data? What benefit does it give you? How does it compare with the amount of data you're already storing? (i.e. is most of your data going to be due to this new part, or will it be a drop in the ocean?)
In short - this is a very context-sensitive question :)
Given that storage and hardware are SOOO cheap these days (relatively speaking of course) this should not be a concern. Obviously if you need the data then you need the data! You can use replication to several locations so that the added data doesn't need to move over the wire as far (such as a server on the west coast and the east coast). You can manage your data by separating it by state to minimize the size of your tables (similar to what banks do, choose state as part of the login process so that they look to the right data store). You can use horizontal partitioning to minimize the number or records per table to keep your queries speedy. Lots of ways to keep large data optimized. Also check into Lucene if you plan to do lots of reads to this data.
In terms of today's average server technology it's not a problem. In terms of your server technology it could be a problem. You need to provide more info.
In terms of storage, this is peanuts, although you want to eventually archive or throw out old data.
In terms of network (?) traffic, this is not much on the server end, but it will affect the speed at which your website appears to load and function for a good portion of customers. Although many have broadband, someone somewhere will try it on edge or modem or while using bit torrent heavily, your site will appear slow or malfunction altogether and you'll get loud complaints all over the web. Does it matter? If your users really need your service, they can surely wait, if you are developing new twitter the page load time increase is hardly acceptable.