Dask cluter with workers on different networks

Dask cluter with workers on different networks - ssh

I want to create a cluster using two (or more) laptops connected to different networks. I'm trying dask-scheduler and dask-worker but it works only if the two machines are connected to che same network.
How can I do it?
Thanks.

You do not need to do anything special, except that the machines need to agree on how to address each other: either they will have unique DNS names (unlikely) or fixed IP addresses. You will need to make sure that dask's ports (which you can choose on the command line or config) are open via whatever bridge/route you have between the networks. I'm not sure how you would achieve it with temporary IP (which is more typical for a laptop on wifi).
Note that the client and workers need to be able to initiate contact to the scheduler, and workers need to be able to initiate contact between themselves. It is also best to have the client be able to contact the workers directly.

Related

How can I query the ip-address of a DDS Publisher when using OpenSplice DDSI2

I am using OpenSplice to publish data and to subscribe to data.
On the subscribe side I want to be able to query the ip-address of the publisher.
The primary reason for wanting to do this is that I need to copy very large files from the publisher and I don't need to share the files via DDS and I may need to terminate/'rate limit' the copy if bandwidth becomes constrained/limited.
The copying is one off so I can use rsync and ssh, but to achieve this I need the ip-address of the publisher. I could iterate over the network interfaces on the publisher side, however there is likly to be more that one valid network interface.
I've spent quite sometime trying to find a way to query the necessary information from the dds::sub::DataReader (or associated classes) but my search skills have failed me and was wondering if it was at all possible before I fallback to something less elegant.
There is a related RTI question/answer: Get IP Address of DataWriter/Publisher on RTI DDS?

Is there a log file of running processes in Server Advantage

My name is Josue
I need your help with this:
Is there any way to audit or monitor the server processes that connect to the
Advantage Database Server?
Is there a log of running processes?
Thank's

There is no existing log of processes that use Advantage Database Server. Because it is a client/server architecture, there is no mechanism that I am aware of that can easily associate a connection on the server to a specific process.
However, it would be possible to use the system procedure sp_mgGetConnectedUsers() to obtain some of this information. It might be possible to use it to obtain the information you are looking for at a given point in time (a snapshot).
The output of that procedure includes three fields that you might be interested in. The Address column gives the address of the machine that connected to Advantage. It is typically the IP address of the client application. But it can also be of the form "IPC Connection N", which indicates that it is using shared memory for communications; this means that the client process is running on the same machine as the server.
The TSAddress column might also be of interest. If the connection is made by a client that is running through terminal services (e.g., a remote desktop), then that column contains the IP address of the client machine. If you are interested in knowing processes that originate from the server machine itself, then you would need this field to differentiate between those and clients that connected through terminal services.
The other column of potential interest would be ApplicationID. By default, that field contains the process name (e.g., the executable) of the client application. This could help identify the actual process. It is not guaranteed, though. The application itself can change that value through mechanisms such as sp_SetApplicationID.

Advantage Database Replication

I have a client that wants two sites to have the ability to sync databases so information at Site A can be synced with Site B so the two sites can look at the same data.
I'm not even sure of the infrastructure required. Would a VPN required to connect the 2 databases or would an internet based database work ie/Site A to InternetDatabase and Site B to InternetDatabase. Each site copies data to it periodically and then the InternetDatabase syncs it and the Sites can then pull data down.
My other thought was something like Dropbox. If Site A and Site B use a Dropbox account to sync the ADT files etc can the database at each site then sync with those ADT files?
Thanks

If the two sites update completely different tables, then something like Dropbox might work for that. Dropbox does not synchronize/merge the contents of files. That means if both site A and site B updated some file, then you would be responsible for writing the code to merge the changes.
Advantage Database Server has support for replication built in natively, so that would likely be the simplest solution. Advantage replication is performed on a record-by-record basis and is handled asynchronously. If the target database cannot be reached, the updates are stored in a queue and processed periodically. If the connection between the two sites is open/available constantly, the lag between the source update and the replicated update is typically small but obviously depends on the network bandwidth and latency.
You could use a VPN for the connection between the two sites, but it would not be required. If you do not use some kind of VPN, though, you should make sure the communication is encrypted between the two sites (it is an option when setting up the subscriptions).
Edit For the communication, all you need is "normal" network connectivity. The primary issue is dealing with things like firewalls and NAT. With Advantage, you define which port it uses. If you use a TCP/IP connection, you would need to make sure the configured port allows inbound connections to the ads.exe process. You can use UDP as well, but if you are dealing with firewalls, it is probably going to be simpler with TCP.
Your question about duplicate keys is a good one. If both sites either add a record with the same primary key or update the same record concurrently, then it results in a conflict. There is an option to simply ignore conflicts in which case the last update wins. More realistically, you would want to write an ON CONFLICT trigger to handle the conflicts.

Cache Regions in Velocity/AppFabric using WCF

I have a service based architecture where a web farm full of asp clients hit application server farm of WCF services. Obviously all the database access is done by the WCF services. Now I would like to cache my frequently used database retrieved objects using Velocity at the service tier level. I am considering to make each physical application server also part of the cache cluster.
According to Velocity documentation, if I use regions, objects are stored only at a single host. I actually wouldn't have any problem if each host kept it's own cache provided that I could somehow synchronize them.
So my questions are
If I create one region on one host is it also created on another one?
When I clear a cache region, is it cleared on one host only?
If I subscribe to a region level notification on all the hosts, can I catch events of one host on another one?
In this scenario should I use regions at all or stay away from them?
I hope my questions are clear. Actually I am more interested in a solution to my problem than answers to my questions

Yes you are right in reading the doc that the region will exists only in one host.
" I actually wouldn't have any problem if each host kept it's own cache provided that I could somehow synchronize them."
When you say synchronize, you mean when HA in enabled ? Velocity would actually take care of that if thats what you meant.
For the questions:
1. No.
2. Yes
3. Notifications will be sent to the client. So i am not sure if there is anyway to send notifications to other host.
4. Regions gives Search capabilities and takes away HA from you. In your case, you could use the advantages of HA.

Having regions not necessarily means that you don't have HA. if your create your own cache (and don't use the 'default' one) you can create it with Secondarys = 1 (HA on)
now let’s say you have 4 cache hosts; when you define a region , it will have both primary and secondary hosts. so each action on the region will result it being applied in both.
Shany

Named caches distribute across participating nodes. Named regions live on a single node. Regions can be HA, but they cannot take full advantage of distributed cache scaling, as their object load does not distribute across participating nodes in the cluster. Also, using named caches with HA requires three nodes minimum, rather than two nodes if you used the "default" cache only.

Functional Server Naming Conventions

I've seen "The Coolest Server Names," and I've seen another smaller-ish question related to mine, which was unfortunately closed.
It's a serious question though, as I'm on an internal applications dev team that manages the apps on a couple dozen servers. The networking folks typically don't care what we call the servers as long as they know about 'em, so we can come up with whatever conventions.
The apps the servers deal with can be home-grown custom apps, or they can be larger vendor ones like SharePoint. They can be:
In multiple networking environments that can't speak to each other (think firewalled-off external servers versus intranet-esque servers)
In different physical locations (California office versus New York, etc.)
In multiple deployment tiers (production, staging, testing, dev)
Have one or many functions (web server, DB server, mail server, app server)
Load-balanced or not
Standby (for disaster recovery purposes) or primary
Whew! Think it's even possible to come up with a convention that can address all of these aspects, or significant ones? It'd be nice to hear a server name (or DNS entry for it) and be able to immediately know what it does, and it works for getting new guys up to speed as well. "sharepoint-IPC-1 is down" could be parsed into "the internal production SharePoint web server in the California datacenter that's the first node in the load balancing is down!"...but that seems overly complicated at first glance.
Another thing in the back of my mind is that an old mail relay server is getting decommissioned, which means we have to scour through a lot of old apps to repoint hardcoded server values (I know... :).

Here are some general guidelines I try to abide by, based on mistakes I've made in the past.
Never base your machine names on...
Hardware Machines get swapped out all the time, and you don't want to have to do too much work if you change from an IBM server, to a Sun server, to a Dell server.
Location Equipment and even entire server rooms can be moved based on business requirements or technical issues.
Intended Use As your product evolves, so too may the intended use of each server. Having a machine named "dbsrv" but eventually acts as a file server too, is confusing.
Owner The person who "owns" the equipment (an employee) can change, due to firings, layoffs, and moves within the company.
Subnet As I said before, labs can move, and so can subnets. One of the main goals of DNS is to free you from being tied to a specific IP address, so why tie yourself down needlessly?
Now, some suggestions for the situation you described...
Machines spread across a region This is what subdomains are for in DNS. You could have "west.company.com" and "east.company.com".
Have one or many functions Don't name them based on intended use. If you name them based on some large collection of names--Greek gods, for example--you will eventually intuitively know that zeus.east means your master database server and apollo.west is your backup database server. Worst case, look it up in a spreadsheet.
Load-balanced or not You can take two approaches. You could have a unique name per node behind the load balancer, or you could do something like athena-1.east, athena-2.east, etc. Either way, a load balancer will (hopefully) free you from worrying too much about what each node is named.
Standby or not This doesn't sound like a criterion that should have an impact on the machine name.
What I'm essentially saying is:
Separate your equipment into different regional subdomains
Choose a naming scheme with plenty of names (Greek gods in this example)
Don't base the names on any of the criteria I mentioned above (intended use, location, etc.)
Trying to do anything more than that will be more trouble than it's worth.

I know that it's tempting to assign names to servers that describe their functions and other similar attributes and in a perfect world that will work but in practice I have found that after a while these things get messed up as functions and other parameters of the servers change (as the requirements of the business change) so the names no longer reflect the reality.
I think you should assign unique names to the servers that do not tell anything about the function or other parameters and have some sort of (up to date) list detailing those things so that your people can look it up. That's what we do here.
The other extreme is using IP addresses only or having names based on IP addresses which can lead to a disaster too if you ever have to change your IP addresses.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas