What is the default strategy for device placement in Tensorflow? - tensorflow

I am trying to set up distributed training. Right now I have one parameter server and two workers. If I add another parameter server how will Tensorflow split up the parameters between the two servers? Is it done randomly or do I need to manually specify it?

They get placed round-robin on available ps tasks, see device_setter_test.py

Related

Baselining internal network traffic (corporate)

We are collecting network traffic from switches using Zeek in the form of ‘connection logs’. The connection logs are then stored in Elasticsearch indices via filebeat. Each connection log is a tuple with the following fields: (source_ip, destination_ip, port, protocol, network_bytes, duration) There are more fields, but let’s just consider the above fields for simplicity for now. We get 200 million such logs every hour for internal traffic. (Zeek allows us to identify internal traffic through a field.) We have about 200,000 active IP addresses.
What we want to do is digest all these logs and create a graph where each node is an IP address, and an edge (directed, sourcedestination) represents traffic between two IP addresses. There will be one unique edge for each distinct (port, protocol) tuple. The edge will have properties: average duration, average bytes transferred, number of logs histogram by the hour of the day.
I have tried using Elasticsearch’s aggregation and also the newer Transform technique. While both work in theory, and I have tested them successfully on a very small subset of IP addresses, the processes simply cannot keep up for our entire internal traffic. E.g. digesting 1 hour of logs (about 200M logs) using Transform takes about 3 hours.
My question is:
Is post-processing Elasticsearch data the right approach to making this graph? Or is there some product that we can use upstream to do this job? Someone suggested looking into ntopng, but I did not find this specific use case in their product description. (Not sure if it is relevant, but we use ntop’s PF_RING product as a Frontend for Zeek). Are there other products that does the job out of the box? Thanks.
What problems or root causes are you attempting to elicit with graph of Zeek east-west traffic?
Seems that a more-tailored use case, such as a specific type of authentication, or even a larger problem set such as endpoint access expansion might be a better use of storage, compute, memory, and your other valuable time and resources, no?
Even if you did want to correlate or group on Zeek data, try to normalize it to OSSEM, and there would be no reason to, say, collect tuple when you can collect community-id instead. You could correlate Zeek in the large to Suricata in the small. Perhaps a better data architecture would be VAST.
Kibana, in its latest iterations, does have Graph, and even older version can lever the third-party kbn_network plugin. I could see you hitting a wall with 200k active IP addresses and Elasticsearch aggregations or even summary indexes.
Many orgs will build data architectures beyond the simple Serving layer provided by Elasticsearch. What I have heard of would be a Kappa architecture streaming into the graph database directly, such as dgraph, and perhaps just those edges of the graph available from a Serving layer.
There are other ways of asking questions from IP address data, such as the ML options in AWS SageMaker IP Insights or the Apache Spot project.
Additionally, I'm a huge fan of getting the right data only as the situation arises, although in an automated way so that the puzzle pieces bubble up for me and I can simply lock them into place. If I was working with Zeek data especially, I could lever a platform such as SecurityOnion and its orchestrated Playbook engine to kick off other tasks for me, such as querying out with one of the Velocidex tools, or even cross correlating using the built-in Sigma sources.

How do I run a data-dependent function on a partitioned region in a member group?

My team uses Geode as a makeshift analytics engine. We store a collection of massive raw data objects (200MB+ each) in Geode, but these objects are never directly returned to the client. Instead, we rely heavily on custom function execution to process these data sets inside Geode, and only return the analysis result set.
We have a new requirement to implement two tiers of data analytics precision. The high-precision analytics will require larger raw data sets and more CPU time. It is imperative that these high-precision analyses do not inhibit the low-precision analytics performance in any way. As such, I'm looking for a solution that keeps these data sets isolated to different servers.
I built a POC that keeps each data set in its own region (both are PARTITIONED). These regions are configured to belong to separate Member Groups, then each server is configured to join one of the two groups. I'm able to stand up this cluster locally without issue, and gfsh indicates that everything looks correct: describe member shows each member hosting the expected regions.
My client code configures a ClientCache that points at the cluster's single locator. My function execution command generally looks like the following:
FunctionService
.onRegion(highPrecisionRegion)
.setArguments(inputObject)
.filter(keySet)
.execute(function);
When I only run the high-precision server, I'm able to execute the function against the high-precision region. When I only run the low-precision server, I'm able to execute the function against the low-precision region. However, when I run both servers and execute the functions one after the other, I invariably get an exception stating that one of the regions cannot be found. See the following Gist for a sample of my code and the exception.
https://gist.github.com/dLoewy/c9f695d67f77ec18a7e60a25c4e62b01
TLDR key points:
Using member groups, Region A is on Server 1 and Region B is on Server 2.
These regions must be PARTITIONED in Production.
I need to run a data-dependent function on one of these regions; The client code chooses which.
As-is, my client code always fails to find one of the regions.
Can someone please help me get on track? Is there an entirely different cluster architecture I should be considering? Happy to provide more detail upon request.
Thanks so much for your time!
David
FYI, the following docs pages mention function execution on Member Groups, but give very little detail. The first link describes running data-independent functions on member groups, but doesn't say how, and doesn't say anything about running data-dependent functions on member groups.
https://gemfire.docs.pivotal.io/99/geode/developing/function_exec/how_function_execution_works.html
https://gemfire.docs.pivotal.io/99/geode/developing/function_exec/function_execution.html
Have you tried creating two different pools on the client, each one targeting a specific server-group, and executing the function as usual with onRegion?, I believe that should do the trick. For further details please have a look at Organizing Servers Into Logical Member Groups.
Hope this helps. Cheers.
As the region data is not replicated across servers it looks like you need to target the onMembers or onServers methods as well as onRegion.

Control Multiple traffic light junctions in SUMO with TRACI

I'm trying to find a way to control the traffic lights at multiple junctions in a single simulation. I've a grid of 4 x 4 with 16 traffic lights and I want to test a Global algorithm for optimizing traffic flows at each junction in the grid.
I'm using SUMO and python TRACI for this task. I've implemented several single junction local traffic light controlling algorithms earlier but I'm unable to figure out a simple method for multiple junction simulation. Some explanation/strategy or code snippets would be very helpful for me.
Thanks in advance!
Usually the pattern for a control algorithm with traci is
while traci.simulation.getMinExpectedNumber() > 0:
# retrieve data from detectors
# act on traffic light
traci.simulationStep()
There is nothing wrong with doing the following
while traci.simulation.getMinExpectedNumber() > 0:
# retrieve data from detectors
# act on traffic light 1
# act on traffic light 2
# ...
traci.simulationStep()
or even have multiple data retrieval steps interspersed. You can also use the traci step listener which calls arbitrary additional python functions or even connect multiple clients (although you need to know in advance how many). But in any case you will need to rework your existing algorithms in a way that you can separate the code which is done between two calls of simulation step and they somehow need all to operate at the same frequency.

Can single CPU core work with multiple clients using Distributed Tensorflow?

In Distributed Tensorflow, we could run multiple clients working with workers in Parameter-Server architecture, which is known as "Between-Graph Replication". According to the documentation,
Between-graph replication. In this approach, there is a separate
client for each /job:worker task, typically in the same process as the
worker task.
it says the client and worker typically are in the same process. However, if they are not in the same process, can number of clients are not equal to the number of workers? Also, can multiple clients share and run on the same CPU core?
Clients are the python programs that define a graph and initialize a session in order to run computation. If you start these programs, the created processes represent the servers in the distributed architecture.
Now it is possible to write programs that do not create a graph and do not run session, but rather just call the server.join() method with the appropriate job name and task index. This way you could theoretically have a single client defining the whole graph and start a session with its corresponding server.target; then within this session, parts of the graph are automatically going to be sent to the other processes/servers and they will do the computations (as long as you have set which server/task is going to do what). This setup describes the in-graph replication architecture.
So, it is basically possible to start several servers/processes on the same machine, that has only a single CPU, but you are not going to gain much parallelism, because context switching between multiple running processes is going to slow you down. So unless the servers are doing some unrelated work, you should rather avoid this kind of setup.
Between-graph just means that every worker is going to have its own client and run its own session respectively.

In distributed TensorFlow, is it possible to share the same queue across different workers?

In TensorFlow, I want to have a filename queue shared across different workers on different machines, such that each machine can get a subset of files to train. I searched a lot, and it seems that only variables could be put on a PS task to be shared. Does anyone have any example? Thanks.
It is possible to share the same queue across workers, by setting the optional shared_name argument when creating the queue. Just as with tf.Variable objects, you can place the queue on any device that can be accessed from different workers. For example:
with tf.device("/job:ps/task:0"): # Place queue on parameter server.
q = tf.FIFOQueue(..., shared_name="shared_queue")
A few notes:
The value for shared_name must be unique to the particular queue that you are sharing. Unfortunately, the Python API does not currently use scoping or automatic name uniqification to make this easier, so you will have to ensure this manually.
You do not need to place the queue on a parameter server. One possible configuration would be to set up an additional "input job" (e.g. "/job:input") containing a set of tasks that perform pre-processing, and export a shared queue for the workers to use.