Control Multiple traffic light junctions in SUMO with TRACI - sumo

I'm trying to find a way to control the traffic lights at multiple junctions in a single simulation. I've a grid of 4 x 4 with 16 traffic lights and I want to test a Global algorithm for optimizing traffic flows at each junction in the grid.
I'm using SUMO and python TRACI for this task. I've implemented several single junction local traffic light controlling algorithms earlier but I'm unable to figure out a simple method for multiple junction simulation. Some explanation/strategy or code snippets would be very helpful for me.
Thanks in advance!

Usually the pattern for a control algorithm with traci is
while traci.simulation.getMinExpectedNumber() > 0:
# retrieve data from detectors
# act on traffic light
traci.simulationStep()
There is nothing wrong with doing the following
while traci.simulation.getMinExpectedNumber() > 0:
# retrieve data from detectors
# act on traffic light 1
# act on traffic light 2
# ...
traci.simulationStep()
or even have multiple data retrieval steps interspersed. You can also use the traci step listener which calls arbitrary additional python functions or even connect multiple clients (although you need to know in advance how many). But in any case you will need to rework your existing algorithms in a way that you can separate the code which is done between two calls of simulation step and they somehow need all to operate at the same frequency.

Related

Baselining internal network traffic (corporate)

We are collecting network traffic from switches using Zeek in the form of ‘connection logs’. The connection logs are then stored in Elasticsearch indices via filebeat. Each connection log is a tuple with the following fields: (source_ip, destination_ip, port, protocol, network_bytes, duration) There are more fields, but let’s just consider the above fields for simplicity for now. We get 200 million such logs every hour for internal traffic. (Zeek allows us to identify internal traffic through a field.) We have about 200,000 active IP addresses.
What we want to do is digest all these logs and create a graph where each node is an IP address, and an edge (directed, sourcedestination) represents traffic between two IP addresses. There will be one unique edge for each distinct (port, protocol) tuple. The edge will have properties: average duration, average bytes transferred, number of logs histogram by the hour of the day.
I have tried using Elasticsearch’s aggregation and also the newer Transform technique. While both work in theory, and I have tested them successfully on a very small subset of IP addresses, the processes simply cannot keep up for our entire internal traffic. E.g. digesting 1 hour of logs (about 200M logs) using Transform takes about 3 hours.
My question is:
Is post-processing Elasticsearch data the right approach to making this graph? Or is there some product that we can use upstream to do this job? Someone suggested looking into ntopng, but I did not find this specific use case in their product description. (Not sure if it is relevant, but we use ntop’s PF_RING product as a Frontend for Zeek). Are there other products that does the job out of the box? Thanks.
What problems or root causes are you attempting to elicit with graph of Zeek east-west traffic?
Seems that a more-tailored use case, such as a specific type of authentication, or even a larger problem set such as endpoint access expansion might be a better use of storage, compute, memory, and your other valuable time and resources, no?
Even if you did want to correlate or group on Zeek data, try to normalize it to OSSEM, and there would be no reason to, say, collect tuple when you can collect community-id instead. You could correlate Zeek in the large to Suricata in the small. Perhaps a better data architecture would be VAST.
Kibana, in its latest iterations, does have Graph, and even older version can lever the third-party kbn_network plugin. I could see you hitting a wall with 200k active IP addresses and Elasticsearch aggregations or even summary indexes.
Many orgs will build data architectures beyond the simple Serving layer provided by Elasticsearch. What I have heard of would be a Kappa architecture streaming into the graph database directly, such as dgraph, and perhaps just those edges of the graph available from a Serving layer.
There are other ways of asking questions from IP address data, such as the ML options in AWS SageMaker IP Insights or the Apache Spot project.
Additionally, I'm a huge fan of getting the right data only as the situation arises, although in an automated way so that the puzzle pieces bubble up for me and I can simply lock them into place. If I was working with Zeek data especially, I could lever a platform such as SecurityOnion and its orchestrated Playbook engine to kick off other tasks for me, such as querying out with one of the Velocidex tools, or even cross correlating using the built-in Sigma sources.

Can single CPU core work with multiple clients using Distributed Tensorflow?

In Distributed Tensorflow, we could run multiple clients working with workers in Parameter-Server architecture, which is known as "Between-Graph Replication". According to the documentation,
Between-graph replication. In this approach, there is a separate
client for each /job:worker task, typically in the same process as the
worker task.
it says the client and worker typically are in the same process. However, if they are not in the same process, can number of clients are not equal to the number of workers? Also, can multiple clients share and run on the same CPU core?
Clients are the python programs that define a graph and initialize a session in order to run computation. If you start these programs, the created processes represent the servers in the distributed architecture.
Now it is possible to write programs that do not create a graph and do not run session, but rather just call the server.join() method with the appropriate job name and task index. This way you could theoretically have a single client defining the whole graph and start a session with its corresponding server.target; then within this session, parts of the graph are automatically going to be sent to the other processes/servers and they will do the computations (as long as you have set which server/task is going to do what). This setup describes the in-graph replication architecture.
So, it is basically possible to start several servers/processes on the same machine, that has only a single CPU, but you are not going to gain much parallelism, because context switching between multiple running processes is going to slow you down. So unless the servers are doing some unrelated work, you should rather avoid this kind of setup.
Between-graph just means that every worker is going to have its own client and run its own session respectively.

What is the default strategy for device placement in Tensorflow?

I am trying to set up distributed training. Right now I have one parameter server and two workers. If I add another parameter server how will Tensorflow split up the parameters between the two servers? Is it done randomly or do I need to manually specify it?
They get placed round-robin on available ps tasks, see device_setter_test.py

High speed data acquisiton using REST Services

We need to develop a high speed REST based WCF Service , which will be used for updating 2000 datapoint , each data point changing at 25 msec . Is it possible to implement such high speed data acquisition using WCF
Using WCF yes. I'm not sure REST is the best architectural style for the type of problem you are trying to solve. I also wonder whether HTTP is appropriate.
Having said that you might want to look into CORE which is an effort to apply REST in highly constrained environments like data acquisition.
Here is how I am understanding your question: you expect new data values every 25 ms, or 40 x per second. There are 2000 discrete data values is one device, which means the telemetry flow from each device is around 80,000 values per second. You also have multiple devices, so your throughput will go higher than this, e.g. 800,000 updates per second for 10 devices.
In this scenario, I wouldn't expect the service layer to be a constraint, for the simple reason that it is always possible to scale up the service layer by adding more hosts to receive messages and load balancing between them. Where I would be concerned is any place where all transactions must be processed within the same domain. For example, is all this data winding up in one relational database? In that case you may have a problem with transaction throughput.
Another area that seems problematic in your architecture is the device itself. Is one device going to be capable of gathering and sending out values at 80 kHz? Here is where the REST protocol may have have too high an overhead. So it is device, not server, constraint that might drive you to find a more efficient protocol. This may be a case where writing a custom protocol directly against the socket might be warranted, but that depends on your device.

best use of NSUrlConnection when getting multiple json objects that depend on the previous

what I am doing is I am querying an API to search for articles in various data bases. There are multiple steps involved, each returns a json object. Each step involves a NSUrlConnection with different query strings to the API
step 1: returns json object indicating status of query & record set ID.
step 2: takes record set id from step 1 and returns list of databases that are valid for querying
step 3: queries each database that was ready from step 2 and gets json data array that has results
I am confused as to the best way of going about this. Is it better to use one nsurlconnection and reopen that connection in connection did finish loading based on which step I am in. Or is it better to open a new connection at the end of each subsequent connection?
A couple of observations
Network latency:
The key phenomenon that we need to be sensitive to here (and it sounds like you are) is network latency. Too often we test our apps in an idea scenario (on simulator with high speed internet access, or on device connected to wifi). But when you use an app in a real-world scenario, network latency can seriously impact performance and you'll want to architect a solution that minimizes this.
Simulating sub-optimal, real-world network situations:
By the way, if you're not doing it already, I'd suggest you install the "Network Link Conditioner" which is part of the "Hardware IO Tools" (available from the "Xcode" menu, choose "Open Developer Tool" - "More Developer Tools"). If you install the "Network Link Conditioner", you can then have your simulator simulate a variety of network experiences (e.g. Good 3G connection, Poor Edge connection, etc.).
Minimize network requests:
Anyway, I'd try to figure out how to minimize separate requests that are dependent upon the previous one. For example, I see step 1 and step 2 and wonder if you could merge those two into a single JSON request. Perhaps that's not possible, but hopefully you get the idea. You want to reduce the number of separate requests that have to happen sequentially.
I'd also look at step 3, and those look like they have to be dependent upon step 2, but perhaps you can run a couple of those step 3 requests concurrently, reducing the latency effect there.
Implementation:
In terms of how this would be implemented, I personally use a concurrent NSOperationQueue with some reasonable maxConcurrentOperationCount setting (e.g. 4 or 5, enough to enjoy concurrency and reduce latency, but not so many as to tax either the device or the server) and submit network operations. In this case, you'll probably submit step 1, with a completion operation that will submit step 2, with a completion operation that will submit a series of step 3 requests and these step 3 requests might run concurrent.
In terms of how to make a good network operation object, I might suggest using something like AFNetworking, which already has a decent network operation object (including one that parses JSON), so maybe you can start there.
In terms of re-using a NSURLConnection, generally its one connection per request. If I have had an app that wanted to have a lengthy exchange of messages with a server (e.g. a chat like service where you want the server to be able to send a message to the client whenever it wants, such as in a chat service), I've done a sockets implementation, but that doesn't seem like the right architecture here.
I would dismiss the first connection and create a new one for each connection.
Just, don't ask me why.
BTW, I would understand the question if this was about reusing vs. creating new objects in some performance sensitive context like scrolling through a table or animations or if it is just about of 10 thousands of iterations where it happens. But you are talking about 3 objects to either create new or reuse the old one. What is the gain of even thinking about it?