Efficient way to download gigabyte files from SFTP server programmatically

Efficient way to download gigabyte files from SFTP server programmatically - ssh

I have several 30GB files on the SFTP server. I want to download them programmatically. Currently I am using sftp get command to download them, but the time to download each file is about 3.5 hours. Is there a way to download these files fast?

In all likelihood, the limiting factor here is the speed of your network. sftp and SSH in general will use as much bandwidth as possible unless you've restricted them in some way.
The speed you're getting is about 2.38 MB/s, which is a reasonable download speed over a home network connection, but would be unreasonably slow on a LAN. You haven't said which you're using, though.
It is theoretically possible that the encryption is too slow if (a) you're using an AES algorithm and one side doesn't support AES in hardware or (b) you're on a 10 Gb network. However, I very much doubt that's the case here, since all algorithms in modern OpenSSH versions can max out a 1 Gib connection.
I would investigate the speed and configuration of the network if you're on a LAN, or the speed of the SFTP server or disk if one side is an appliance or embedded device.

Related

Commit transfer performance for large files to HTTP+SVN server

I have a SVN repository behind an Apache HTTPS server that stores small and large (+1GB) files. When I commit a large file, the transfer speed is about 10MB/sec (using a 1GBit network line). When I look at CPU utilization on the server, it is saturated with about 85% being consumed by apache2, and some 15% by the disk driver.
I have already tried disabling Apache logging and SSL, but that didn't help to improve the transfer speed. This makes me think that mod_dav_svn is using most of the CPU? I have also tried to increase the amount of available cores on the server (default = 1 core), but this mysteriously slows down the commits while httpd remains using 1 core. And setting SVNCompressionLevel 0 also didn't result in any noticeable speed improvement.
Is there any way to significantly increase the transfer speed through parallelization or some other optimization?
Server:
Debian 9.3
Apache 2.4.25
libapache2-mod-svn 1.9.5
svn repository: default FSFS config (i.e. all commented out in fsfs.conf). The HDD can write up to 30Mb/sec (hardware limited) without saturating the CPU (tested with copying). FS is NTFS, using ntfs-3g with big_writes enabled which is using some 10-15% CPU while writing #10MB/sec.
Client:
svn 1.8.13
CPU: first generation Intel Core #3.20Ghz
Obviously, I would be very pleased if I could transfer at 25-30MB/sec.

Is there any way to significantly increase the transfer speed through
parallelization or some other optimization?
Yes, there is. However, the question lacks necessary details about the SVN client and server version, the server's and FSFS repository configuration and the hardware it runs on. It is hard to tell what kind of optimizations will help in your case. You may want to upgrade your server and client to the latest versions and disable the compression in the server's config.
FYI: VisualSVN Server in my tests can deliver 1Gbps speed.

Redis localhost limitations and costs

I downloaded Redis server and cli to my local machine and it working good.
I just wanted to know if I can use it also in production server:
Are there any critical limitations? For example: Can I use 100 GB for free? (It will be on my computer).
I know that Redis labs cost money per month but if I download the redis to my machine and not using the redis labs, would it be free? (and the cost will be only the storage of the machine I using).

Redis is an open source software, licensed under BSD. That basically means you can do anything you want with it, without owing anyone anything.
Redis Labs, the home of open source Redis and the provider of commercial products that leverage on it, offers a wide spectrum of solutions - whether hosted, as-a-service, downloadable, remotely managed and so forth. You can (and should sometimes) use them, but that's definitely not a requirement.
Disclaimer: I work at Redis Labs and with the open source project.

Is it possible to monitor SMART disk information from a VM?

I am working on a server ubuntu 14.04.1 LTS
In fact i don't have an access to the server itself, but at a VM.
I am trying to monitor SMART disk information (like Temperature_Celsius and othet stuff like that) but only from the VM.
I think it is impossible because the VM havent any real access to the physical server, but I am not sure of it.
Thank you for read and i hope someone could answer me fast.

It will work provided that the VM "owns" the disk and the virtualisation engine permits arbitrary commands to be sent to the disk. In the mass-hosting case where multiple virtual machines are sharing a disk, that's a no-go, but it can be viable for custom configurations.
For example, you can use VMWare to pass-through a USB-SATA converter to the guest. Provided the guest supports sending SMART commands to USB Mass Storage devices — and anything you're likely to run in 2014 will have this support — you're good.

Does it make sense to put all development works in Cloud?

Is it possible having virtual machines in the cloud, install visual studio there, and making developers using the 'cloud' to do day-to-day programming work? Is the cost going to be too high? Is the speed going to be too slow?
Where can I find statistics or numbers to convince people?

I like using remote virtual machines to run development servers, but I don't like using my IDE on a remote server. The latency is noticeable. If you're without an internet connection you can't work. My happy compromise is to have a dev server available (EC2) and sync it with my laptop via git.

It is completely possible to do this, using a service like Rackspace you can set up a fairly powerful windows server for as little as $60 a month:
http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing
In my experience using Remote Desktop to log into a Rackspace Windows Cloud Server has been snappy and quick (of course a lot of that depends on the strength of your internet connection). The process of standing up the server is lighting fast, backing it up is even easier, and it can be easily resized down the line if you need more storage/bandwidth.
These days I don't understand why a small to mid sized organization would actually waste capital on server hardware.
Evan

Best way to simulate a WAN network

Simplified, I have an application where data is intended to flow over the internet between two servers. Ideally, I'd like to test at what point the software ceases to function. At what lowerbound limit (bandwidth, latency, dropped packets) do things stop working to test the reliability of the software.
What I thought I would do was the following:
Setup up 3 machines (VMware instances)
Install the 2 applications on two of the servers.
Setup up the 3rd server to sit between the two machines by doing some sort of magic with Routing and Remote Access on Windows 2003
Install either Traffic Shaper XP or NetLimiter to limit the bandwidth
Run something like TMnetSim Network Simulator to simulate a bad connection.
Does this sound like a good idea or are there easier/better ways of doing this? I'm not that comfortable on Linux and my team mates are even less so.

WANem does exactly this. We have used it both in a virtual machine on the desktop and on a dedicated old pc and it worked great. It can simulate all sorts of broken connectivity.

FreeBSDs ipfw has provisions to simulate links with a given bandwith, latency or error rate. You could use that FreeBSD machine as your machine "in the middle" in your above setup.
You probably can also run at least one of the endpoints on the same machine if you want to reduce the amount of servers involved.

Someone actually packaged up the settings and whatnot necessary for the FreeBSD solution to this problem and they call it DUMMYNET.
It simulates/enforces queue and bandwidth limitations, delays, packet losses, and multipath effects. It also implements a variant of Weighted Fair Queueing called WF2Q+. It can be used on user's workstations, or on FreeBSD machines acting as routers or bridges.
It can simulate exactly what you want, and its free and will boot onto commodity hardware. They even have a canned install of it that is small enough to put on a floppy disk (!) that you can download at that link.

Maybe it is time to learn a bit about Linux because adding a 50ms delay on every outgoing packet can be done in typing just one line:
tc qdisc add dev eth0 root netem delay 50ms
For more see the Linux Traffic Control HOWTO

We had a similar requirement some ten years ago - I'll see if I can recall how we managed it.
If I remember, we wrote a socket proxy program which was controlled by inetd on a UNIX box. This socket would accept connections from a client and open equivalent sessions through to the server. It would then loop, passing messages in both directions.
The way we achieved WAN characteristics was to introduce random delays (with upper and lower limits) in both the connection establishment and the passing of data once the link was up.
It also had the feature to drop the link occasionally as WAN links were less reliable for us than local traffic.
I recall we had to make it threaded to stop the delays from affecting reverse traffic on the link.

There is a very good (and free) Microsoft solution for that, we use it for quite some time and it works great, it can very easily simulate every thing(packet loss, low bandwidth, disconnection, latency....)
This is the best solution i found for a windows environment
More information and a download link can be found here: MARCO blog post
this product has gone some evolution and it is now integrated into visual studio as part of the automation testing, but i found the use of the standalone(that is quite hard to find, so keep a local copy) to work much better. keep in mind that you need at least two computers(or VMs) since you need to pass through a network adapter in order for the application to work its magic.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Efficient way to download gigabyte files from SFTP server programmatically - ssh

I have several 30GB files on the SFTP server. I want to download them programmatically. Currently I am using sftp get command to download them, but the time to download each file is about 3.5 hours. Is there a way to download these files fast?

Related

Commit transfer performance for large files to HTTP+SVN server

Redis localhost limitations and costs

Is it possible to monitor SMART disk information from a VM?

Does it make sense to put all development works in Cloud?

Best way to simulate a WAN network

Categories

Resources