how reliable would it be to download over a 100,000 files via wget from a bash file over ssh? - ssh

I have a bash file that contains wget commands to download over 100,000 files totaling around 20gb of data.
The bash file looks something like:
wget http://something.com/path/to/file.data
wget http://something.com/path/to/file2.data
wget http://something.com/path/to/file3.data
wget http://something.com/path/to/file4.data
And there are exactly 114,770 rows of this. How reliable would it be to ssh into a server I have an account on and run this? Would my ssh session time out eventually? would I have to be ssh'ed in the entire time? What if my local computer crashed/got shut down?
Also, does anyone know how many resources this would take? Am I crazy to want to do this on a shared server?
I know this is a weird question, just wondering if anyone has any ideas. Thanks!

Use
#nohup ./scriptname &>logname.log
This will ensure
The process will continue even if ssh session is interrupted
You can monitor it, as it is in action
Will also recommend, that you can have some prompt at regular intervals, will be good for log analysis. e.g. #echo "1000 files copied"
As far as resource utilisation is concerned, it entirely depends on the system and majorly on network characteristics. Theoretically you can callculate the time with just Data Size & Bandwidth. But in real life, delays, latencies, and data-losses come into picture.
So make some assuptions, do some mathematics and you'll get the answer :)

Depends on the reliability of the communication medium, hardware, ...!
You can use screen to keep it running while you disconnect from the remote computer.

You want to disconnect the script from your shell and have it run in the background (using nohup), so that it continues running when you log out.
You also want to have some kind of progress indicator, such as a log file that logs every file that was downloaded, and also all the error messages. Nohup sends stderr and stdout into files.
With such a file, you can pick up broken downloads and aborted runs later on.
Give it a test-run first with a small set of files to see if you got the command down and like the output.

I suggest you detach it from your shell with nohup.
$ nohup myLongRunningScript.sh > script.stdout 2>script.stderr &
$ exit
The script will run to completion - you don't need to be logged in throughout.
Do check for any options you can give wget to make it retry on failure.

If it is possible, generate MD5 checksums for all of the files and use it to check if they all were transferred correctly.

Start it with
nohup ./scriptname &
and you should be fine.
Also I would recommend that you log the progress so that you would be able to find out where it stopped if it does.
wget url >>logfile.log
could be enough.
To monitor progress live you could:
tail -f logfile.log

It may be worth it to look at an alternate technology, like rsync. I've used it on many projects and it works very, very well.

Related

Getting a PDF out of the SSH to the own system

Given:
Connection to the Uni's secure shell like this:
me#my_computer~$ ssh <my_name>#unixyz.cs.xy.com
Password:***********
Welcome to Unixyz. You now can access a terminal on system unixyz:
my_name#unixyz~$ ls
Desktop Documents Pictures Music desired_document.pdf
my_name#unixyz-$
Taks/Question:
Getting the desired_document.pdf to my own system. I have thought of some options so far:
1)Since i can access an editor like nano I could write a C/Java programm , compile it in the home directory and make that program send the pdf. Problem with that: Had to code a client on the Uni machine and a server on my own system. On top of that I only know how to transfer text given to the stdin and no pdf's. And its obviously too much work for the given task
2) I found some vague information about commands: scp and sftp. Unfortunately, I can not figure out how it is done exactly.
The latter is basicly my questions: Are the commands scp and sftp valid options for doing the desired and how are they used?
EDIT:
I received a first answer and the problem persists: As stated, i use:
scp me# server.cs.xyz.com:/path/topdf /some/local/dir
which gives me:
/some/local/dir: no such file or directory
I'm not sure in which environment you are.
Do you use Linux or Windows as your every-day operating system?
If you are using windows, there are some ui-based scp/ssh implementations that enable you to transfer these files using an explorer based ui.
For example there is https://winscp.net/
You can indeed use scp to do exacty that, and it's easier than it might look:
scp your_username# unixyz.cs.xy.com:path/to/desired_document.pdf /some/local/dir
The key is the colon after the servername where you add your path
Optionally you can pass in the password as well, but that's bad practice, for obvious reasons.
I actually got the answer myself and the error that I was having. Both, the guy with the answer and the commentor where right. BUT:
scp must be launched when you are in YOUR terminal, I always tried to do it while I was connected to the remote server.
2 hours wasted because of that.

Allowing a PHP script to ssh, using sudo

I need to allow a PHP script on my local web server, to SSH to another machine to perform a specified task on some files. My httpd runs as _www with low permissions, so setting up direct passwordless SSH is difficult, not to say ill-advised.
The way I do it now is to have a minimal PHP script that sudo-exec's (as me) a shell script which is outside of the document root. The shell script in turn calls (as me) the PHP code that does the actual SSH work, and prints its output. Here's the code.
read_remote_files.php (The script I call from my browser):
exec('sudo -u me -n /home/me/run_php.sh /path/to/my_prog.php', $results);
print $results;
/home/me/run_php.sh (Runs as me, calls whatever it's given):
php $1 2>&1
sudoers:
_www ALL = (me) NOPASSWD: /home/me/run_php.sh
This all works, as my_prog.php is called as me and can SSH as me. It seems it's not too insecure since run_php.sh can't be called directly from a browser (outside document root). The issue I'm having is that my_prog.php isn't called as an HTTP program so doesn't have access to the HTTP environment variables (DOCUMENT_ROOT etc).
Two questions:
Am I making this too complicated?
Is there an easy way for my final script to get the HTTP variables?
Thanks!
Andy
Many systems do stuff like this using a (privileged) cron job that frequently checks for the existence of a file, a database record or some other resource, and then performs actions if there are any.
The huge advantage of this is that there is no direct interaction between the PHP script and the privileged script at all. The PHP script leaves the instructions in a resource, the privileged script fetches it. As long as the instructions can't lead to the system getting compromised or damaged, it's definitely more secure than sudoing.
The disadvantage is that you can't push changes whenever you like; you have to wait until the cron job runs again. But maybe it's an option anyway?
"I need to allow a PHP script on my local web server, to SSH to another machine to perform a specified task on some files."
I think that you are phrasing this in terms of a solution that you have difficulty in getting to work rather than a requirement. Surely what you should be saying is "I want to invoke a task on machine B from a PHP script running under Apache on Machine A." And then research solutions to this -- to which there are many from a simple 'roll-your-own' RPC tunnelled over HTTP(S) to using an XMLRPC or SOA framework.
Two caveats:
Do a phpinfo(); on both machines to check what extensions are available and
Also check your php.ini setting to make sure that your service provider hasn't disabled any functions that you expect to use (or do a Q&D script to echo 'disable_functions = ' . ini_get('disable_functions') . "\n"; ...)
If you browse here and the wider internet you'll find many examples. Here is one that I use for a similar purpose.

Why is the minidlna database not being refreshed?

I am developing a MiniDLNA server to stream media over WiFi. Existing files are shown properly. However, when I add new files to media folders the changes are not updated across MiniDLNA clients. I have also tried to restart the server but it does not reflect the changes.
I changed inotify_interval = 60 but it's still not updating files.db which is the MiniDLNA media list database. If I delete this database and restart the server it shows the changes.
Does anyone know what the problem might be?
$ minidlnad -h
…
-r forces a rescan
-R forces a rebuild
In summary, the most reliable way to have MiniDLNA rescan all media files is by issuing the following set of commands:
$ sudo minidlnad -R
$ sudo service minidlna restart
Client-side script to rescan server
However, every so often MiniDLNA will be running on a server. Here is a client-side script to request a rescan on such a server:
#!/usr/bin/env bash
ssh -t server.on.lan 'sudo minidlnad -R && sudo service minidlna restart'
AzP already provided most of the information, but some of it is incorrect.
First of all, there is no such option inotify_interval. The only option that exists is notify_interval and has nothing to do with inotify.
So to clarify, notify_interval controls how frequently the (mini)dlna server announces itself in the network. The default value of 895 means it will announce itself about once every 15 minutes, meaning clients will need at most 15 minutes to find the server. I personally use 1-5 minutes depending on client volatility in the network.
In terms of getting minidlna to find files that have been added, there are two options:
The first is equivalent to removing the file files.db and consists in restarting minidlna while passing the -R argument, which forces a full rescan and builds the database from scratch. Since version 1.2.0 there's now also the -r argument which performs a rebuild action. This preserves any existing database and drops and adds old and new records, respectively.
The second is to rely on inotify events by setting inotify=yes and restarting minidlna. If inotify is set to =no, the only option to update the file database is the forced full rescan.
Additionally, in order to have inotify working, the file-system must support inotify events, which is not the case in most remote file-systems. If you have minidlna running over NFS it will not see any inotify events because these are generated on the server side and not on the client.
Finally, even if inotify is working and is supported by the file-system, the user under which minidlna is running must be able to read the file, otherwise it will not be able to retrieve necessary metadata. In this case, the logfile (usually /var/log/minidlna.log) should contain useful information.
MiniDLNA uses inotify, which is a functionality within the Linux kernel, used to discover changes in specific files and directories on the file system. To get it to work, you need inotify support enabled in your kernel.
The notify_interval (notice the lack of a leading 'i'), as far as I can tell, is only used if you have inotify disabled. To use the notify_interval (ie. get the server to 'poll' the file system for changes instead of automatically being notified of them), you have to disable the inotify functionality.
This is how it looks in my /etc/minidlna.conf:
# set this to no to disable inotify monitoring to automatically discover new files
# note: the default is yes
inotify=yes
Make sure that inotify is enabled in your kernel.
If it's not enabled, and you don't want to enable it, a forced rescan is the way to force MiniDLNA to re-scan the drive.
I have recently discovered that minidlna doesn't update the database if the media file is a hardlink. If you want these files to show up in the database, a full rescan is necessary.
ex: If you have a file /home/movies/foo.mkv and a hardlink in /home/minidlna/video/foo.mkv, where '/home/minidlna' is your minidlna share, you will have to do a rescan till that file appears in the db (and subsequently your dlna client).
I'm still trying to find a way around this. If anyone has any input, it's most welcome.
There is a patch for the sourcecode of minidlna at sourceforge available that does not make a full rescan, but a kind of incremental scan. That worked fine, but with some later version, the patch is broken. See here Link to SF
Regards
Gerry
I have solved it with a small script:
Every 15 seconds it checks the size of the directory (/media/seriesPI). The service is restarted if there are changes
#!/bin/bash
function sizeFiles(){
for i in $(du /media/seriesPI/ | awk '{print $1}')
do
cad+=$i
done
}
sizeFiles
#first size
first=$cad
cad=''
while [ true ]
do
sizeFiles
echo "$first != $cad"
if [ "$first" != "$cad" ] ; then
echo "Directory size has changed!"
echo "Restart service MiniDLNA"
sudo service minidlna restart
#update new size
first=$cad
else
echo "There are no changes in the directory"
fi
echo "waiting 15 seconds..."
sleep 15
cad=''
done
Resolved with crontab root
10 * * * * /usr/bin/minidlnad -r

How can I remotely log on to a machine, execute a script which sets up an environment, then accept user input?

I've been trying to figure out a way to do this for a few hours now, and am having no luck.
I have a large environment file that I have saved as a ksh script. This script works perfect if I type . ./setEnv.sh
However, what I'm trying to do is use either ssh or rsh to log on to a remote system, execute this script, then allow me to use the system in it's modified form. I am able to successfully execute the script, but the connection always closes after execution. I would like to be able to keep this connection open.
Any idea on how I can do this?
At the moment, it does not matter if I use SSH or RSH to accomplish this. RSH is preferable. I am using a variety of Linux and Solaris operating systems, so a catch-all method would be nice.
Thanks,
Matt
Couldn't you do something like that ?
ssh user#host "./setEnv.sh && your-command"

telnet to different IPs and run commands

I'm not sure if this is possible or not.
What I'm looking for is a way to tell telnet to use a certain IP address to log into and then run commands where the commands change based on a user's MAC address.
Basically it would be:
tell telnet to use x.x.x.x as the IP to log into and put in the correct username and password
tell telnet to run commands (based on the user's MAC address) that can change based on which user stats you want to see, for example: show macaddress
export the output to notepad
close
expect can do this. If you don't have Tcl but Python, try Pexpect.
If you just want to run one command, use ssh (which allows you to log in, run a command and which will return with the error code of the command, so you can handle errors, too).
If you want to run more than a single command, write a script, use scp to copy that script to the other side and then execute the script with ssh. I've used this approach with great success to build a simple spider that could run a script to gather system information over a large number of hosts.
I think you're looking for expect (it automates these kind of interactive applications). Here is a gratis chapter from the authority on expect, the book "Exploring Expect".
Also you should use SSH if this is over the internet. Telnet is insecure as it's a plain text protocol.
Not to blow my own horn, but you may be able to twist a personal app of mine (note: Sorry, I've removed this.) to this end.
There's currently no documentation other than what is on that page and no public source code (though I've been meaning to get onto that, and will work that out tomorrow if you're interested), but I'd be happy to answer any questions.
That said, any MUD client could be turned to the same use too.