L_mult={a^i b^ij c^j:i,j≥0}, what would be the graph of diagram machine - finite-automata

L_mult={a^i b^ij c^j:i,j≥0}, what would be the diagram of turing machine.
Here the number of b's will be equal to the number of a's times number of c's. How can draw the turning machine of this lanague.

Related

Dependence of data transfer latency on number of target nodes

Suppose some fixed size data is to be transferred between machines. Let us consider two scenarios:
1 target: Data is transferred from machine A to machine B. Suppose this takes x seconds.
2 targets: Data is transferred from machine A to machine B and C. Suppose this takes y seconds.
What can be said about the relationship between x and y? I think usually y should be greater than x. Am I right? Is there any work/paper/blog that gives such an insight?
I think the relationship between x and y should depend on the topology of network connection between the machines too.

Algorithm for Minimising Sum of Distance of agents to visiting targets

I'm working on implementing a model in Python. As part of this model, I have a set of agents (e.g. humans) that need to visit a set of targets (e.g. places). Each agent has its own initial location (i.e. starting point) and I can calculate the distance from each agent to each target.
What I need at this point is to allocate a first job to each agent in a way that the sum of all travel distances for agents from their starting location to their first job is minimum.
I considered greedy algorithm, but I found examples that proves order of allocation can lead to non-optimal solutions. I also looked into nearest neighbour algorithm in TSP, but all I could find was for one agent (or salesman) not multiple.
Could someone point me to any (non-exhaustive search) algorithm/approach that could be used for this purpose please? Thanks
If the number of agents = number of targets, we end up with a standard assignment problem. This can be solved in different ways:
as an LP (linear programming problem). Technically a MIP but variables are automatically integer-valued, so an LP solver suffices.
as a network problem
or using specialized algorithms.
If, say, the number of locations > number of agents, we still can use an LP/MIP:
min sum((i,j), d(i,j)*x(i,j))
sum(j, x(i,j)) = 1 for all agents i (each agent should be assigned to exactly one location)
sum(i, x(i,j)) <= 1 for all locations j (each location should be assigned to at most one agent)
x(i,j) in {0,1}
For the network approach, we would need to add some dummy nodes.
All these methods are quite fast (this is an easy model). To give you an indication: I solved a random example with 500 agents and 1000 locations as an LP and it took 0.3 seconds.

Karger's Algorithm - Running Time - Edge Contraction

In Karger's Min-Cut Algorithm for undirected (possibly weighted) multigraphs, the main operation is to contract a randomly chosen edge and merge it's incident vertices into one metavertex. This process is repeated until two vertices remain. These vertices correspond to a cut. The algo can be implemented with an adjacency list.
Questions:
how can I find the particular edge, that has been chosen to be contracted?
how does an edge get contracted (in an unweighted and/or weighted graph)?
Why does this procedure take quadratic time?
Edit: I have found some information that the runtime can be quadratic, due to the fact that we have O(n-2) contractions of vertices and each contraction can take O(n) time. It would be great if somebody could explain me, why a contraction takes linear time in an adjacency list. Note a contraction consists of: deleting one adjacent edge, merging the two vertices into a supernode, and making sure that the remaining adjacent edges are connected to the supernode.
Pseudocode:
procedure contract(G=(V,E)):
while |V|>2
choose edge uniformly at random
contract its endpoints
delete self loops
return cut
I have read the related topic Karger Min cut algorithm running time, but it did not help me. Also I do not have so much experience, so a "laymens" term explanation would be very much appreciated!

How to create a loss-function for an unsupervised-learning model, where the ouput resembles the direct input for a game agent?

I'm trying to setup a deep neuronal network, which predicts the next move for a game agent to navigate a world. To control the game agent it takes two float inputs. The first one controls the speed (0.0 = stop/do not move, 1.0 = max. speed). The second controls the steering (-1.0 = turn left, 0.0 = straight, +1.0 = turn right).
I designed the network so the it has two output neurons one for the speed (it has a sigmoid activation applied) and on for the steering (has a tanh activation). The actual input I want to feed the network is the pixel data and some game state values.
To train the network I would simply run a whole game (about 2000frames/samples). When the game is over I want to train the model. Here is where I struggle, how would my loss-function look like? While playing I collect all actions/ouputs from the network, the game state and rewards per frame/sample. When the game is done I also got the information if the agent won or lost.
Edit:
This post http://karpathy.github.io/2016/05/31/rl/ got me inspired. Maybe I could use the discounted (move, turn) value-pairs, multiply them by (-1) if game agent lost and (+1) if it won. Now I can use these values as gradients to update the networks weights?
It would be nice if someone could help me out here.
All the best,
Tobs.
The problem you are talking is belong to reinforcement-learning, where agent interact with environment and collect data that is game state, its action and reward/score it got at end. Now there are many approaches.
The one you are talking is policy-gradient method, And loss function is as E[\sum r], where r is score, which has to be maximized. And its gradient will be A*grad(log(p_theta)), where A is advantage function i.e. +1/-1 for winning/losing. And p_theta is the probability of choosing action parameterized by theta(neural network). Now if it has win, the gradient will be update in favor of that policy because of +1 and vice-versa.
Note: There are many methods to design A, in this case +1/-1 is chosen.
More you can read here in more detail.

k-means empty cluster

I try to implement k-means as a homework assignment. My exercise sheet gives me following remark regarding empty centers:
During the iterations, if any of the cluster centers has no data points associated with it, replace it with a random data point.
That confuses me a bit, firstly Wikipedia or other sources I read do not mention that at all. I further read about a problem with 'choosing a good k for your data' - how is my algorithm supposed to converge if I start setting new centers for cluster that were empty.
If I ignore empty clusters I converge after 30-40 iterations. Is it wrong to ignore empty clusters?
Check out this example of how empty clusters can happen: http://www.ceng.metu.edu.tr/~tcan/ceng465_f1314/Schedule/KMeansEmpty.html
It basically means either 1) a random tremor in the force, or 2) the number of clusters k is wrong. You should iterate over a few different values for k and pick the best.
If during your iterating you should encounter an empty cluster, place a random data point into that cluster and carry on.
I hope this helped on your homework assignment last year.
Handling empty clusters is not part of the k-means algorithm but might result in better clusters quality. Talking about convergence, it is never exactly but only heuristically guaranteed and hence the criterion for convergence is extended by including a maximum number of iterations.
Regarding the strategy to tackle down this problem, I would say randomly assigning some data point to it is not very clever since we might be affecting the clusters quality since the distance to its currently assigned center is large or small. An heuristic for this case would be to choose the farthest point from the biggest cluster and move that the empty cluster, then do so until there are no empty clusters.
Statement: k-means can lead to
Consider above distribution of data points.
overlapping points mean that the distance between them is del. del tends to 0 meaning you can assume arbitary small enough value eg 0.01 for it.
dash box represents cluster assign
legend in footer represents numberline
N=6 points
k=3 clusters (coloured)
final clusters = 2
blue cluster is orphan and ends up empty.
Empty clusters can be obtained if no points are allocated to a cluster during the assignment step. If this happens, you need to choose a replacement centroid otherwise SSE would be larger than neccessary.
*Choose the point that contributes most to SSE
*Choose a point from the cluster with the highest SSE
*If there are several empty clusters, the above can be repeated several times.
***SSE = Sum of Square Error.
Check this site https://chih-ling-hsu.github.io/2017/09/01/Clustering#
You should not ignore empty clusters but replace it. k-means is an algorithm could only provides you local minimums, and the empty clusters are the local minimums that you don't want.
your program is going to converge even if you replace a point with a random one. Remember that at the beginning of the algorithm, you choose the initial K points randomly. if it can converge, how come K-1 converge points with 1 random point can't? just a couple more iterations are needed.
"Choosing good k for your data" refers to the problem of choosing the right number of clusters. Since the k-means algorithm works with a predetermined number of cluster centers, their number has to be chosen at first. Choosing the wrong number could make it hard to divide the data points into clusters or the clusters could become small and meaningless.
I can't give you an answer on whether it is a bad idea to ignore empty clusters. If you do, you might end up with a smaller number of clusters than you defined at the beginning. This will confuse people who expect k-means to work in a certain way, but it is not necessarily a bad idea.
If you re-locate any empty cluster centers, your algorithm will probably converge anyway if that happens a limited number of times. However, you if you have to relocate too often, it might happen that your algorithm doesn't terminate.
For "Choosing good k for your data", Andrew Ng gives the example of a tee shirt manufacturer looking at potential customer measurements and doing k-means to decide if you want to offer S/M/L (k=3) or 2XS/XS/S/M/L/XL/2XL (k=7). Sometimes the decision is driven by the data (k=7 gives empty clusters) and sometimes by business considerations (manufacturing costs are less with only three sizes, or marketing says customers want more choices).
Set a variable to track the farthest distanced point and its cluster based on the distance measure used.
After the allocation step for all the points, check the number of datapoints in each cluster.
If any is 0, as is the case for this question, split the biggest cluster obtained and split further into 2 sub-clusters.
Replace the selected cluster with these sub-clusters.
I hope the issue is fixed now.. Random assignment will affect the clustering structure of the already obtained clustering.