GNU Parallel -q option causing BCP "unknown option" errors (different string quotes on local vs remote hosts) - bcp

Seeing very strange behavior where when when using gnu parallel to distribute export jobs using bcp from mssql-tools. It appears that when using the -q option for parallel, strings are interpreted differently on local host than on remote hosts.
Running only as a loop through files on local host, the bcp processes throws no errors
However, distributing the file exports with parallel, the bcp processes executing on the local host throw
/opt/mssql-tools/bin/bcp: unknown option
errors, while those executing on remote hosts (via a --sshloginfile param) finish successfully. The basic code being run looks like...
# setting some vars to pass
TO_SERVER_ODBCDSN="-D -S MyMSSQLServer"
TO_SERVER_IP="-S 172.18.54.22"
DB="$dest_db" #TODO: enforce being more careful with this value
TABLE="$tablename" # MUST exist beforehand, case matters
USER=$(tail -n+1 $source_home/mssql-creds.txt | head -1)
PASSWORD=$(tail -n+2 $source_home/mssql-creds.txt | head -1)
DATAFILES="/some/path/to/files/"
TARGET_GLOB="*.tsv"
RECOMMEDED_IMPORT_MODE='-c' # makes a HUGE difference, see https://stackoverflow.com/a/16310219/8236733
DELIMITER="\\\t" # (currently not used) DO NOT use format like "'\t'", nested quotes seem to cause hard-to-catch error, want "\t" literal
....
bcpexport() {
filename=$1
TO_SERVER_ODBCDSN=$2
DB=$3
TABLE=$4 # MUST exist beforehand, case matters
USER=$5
PASSWORD=$6
RECOMMEDED_IMPORT_MODE=$7 # makes a HUGE difference, see https://stackoverflow.com/a/16310219/8236733
DELIMITER=$8 # not currently used
WORKDIR=$9
LOGDIR=${10}
....
/opt/mssql-tools/bin/bcp "$TABLE" in "$localfile" \
$TO_SERVER_ODBCDSN \
-U $USER -P $PASSWORD \
-d $DB \
$RECOMMEDED_IMPORT_MODE
-t "\t" \
-e ${localfile}.bcperror.log
}
export -f bcpexport
parallelization_pernode=5
parallel -q -j $parallelization_pernode \
--sshloginfile $source_home/parallel-nodes.txt \
--env bcpexport \
bcpexport {} "$TO_SERVER_ODBCDSN" $DB $TABLE $USER $PASSWORD $RECOMMEDED_IMPORT_MODE $DELIMITER $workingdir $logdir \
::: $DATAFILES/$TARGET_GLOB #from hdfs nfs gateway
Looking at the bash interpretation of the processes (by running ps -aux | grep bcp on the hosts that parallelis given in the --sshloginfile) for the remote hosts we see...
/bin/bash -c bcpexport() { ... /opt/mssql-tools/bin/bcp "$TABLE" in "$localfile" $TO_SERVER_ODBCDSN -U $USER -P $PASSWORD -d $DB $RECOMMEDED_IMPORT_MODE; -t "\t" -e ${localfile}.bcperror.log; ...
for the local host, the bash interpretation is...
/bin/bash -c bcpexport() { ... /opt/mssql-tools/bin/bcp "$TABLE" in "$localfile" $TO_SERVER_ODBCDSN -U $USER -P $PASSWORD -d $DB $RECOMMEDED_IMPORT_MODE; -t "\t" -e ${localfile}.bcperror.log; ...
that is, they look the same.
My current thought is that the "\t" in the bcp command is being interpreted in a problematic way. Debugging parallel without vs with the -q option we see...
$ parallel -j 5 --sshloginfile ./parallel-nodes.txt echo "Number {}: Running on \`hostname\`: \t" ::: 1 2 3 4 5
Number 4: Running on HW04.ucera.local: t
Number 1: Running on HW04.ucera.local: t
Number 2: Running on HW03.ucera.local: t
Number 5: Running on HW03.ucera.local: t
Number 3: Running on HW02.ucera.local: t
$ parallel -q -j 5 --sshloginfile ./parallel-nodes.txt echo "Number {}: Running on \`hostname\`: \t" ::: 1 2 3 4 5
Number 1: Running on `hostname`:
Number 4: Running on `hostname`:
Number 3: Running on `hostname`: \t
Number 2: Running on `hostname`: \t
Number 5: Running on `hostname`: \t
The bcp command needs the "\t" literal not the "t" literal (and I suspect several other similar string corruptions (also I do believe that \t is the default for bcp anyway, but this is just an example and want to keep \t for code clarity)), but not sure how to get this for both local and remote nodes or even why this behavior differs by remote vs local.
Basically, need the the strings to be exactly the same for both local and remote hosts even if strings have spaces or escape characters in them (note, I think this used to not be the case (have older script on other machines that don't have this problem))
Not sure if this is counts more as a parallel problem or a bcp problem (currently thinking something is going wrong with the -q option in parallel, but not sure). Anyone have any debugging suggestions or fixes? Ideas of what could be happening?

Firstly, the reason why hostname is not expanded is due to -q. It quotes the ` so that it does not expand.
Secondly, I think what you see is the different behaviours in built-in echo and /bin/echo. Built-in echo depends on the shell. Here I compare echo \\\\t in different shells:
$ parallel --onall --tag -S sh#lo,bash#lo,csh#lo,tcsh#lo,ksh#lo,zsh#lo echo \\\\t ::: a
bash#lo \t a
tcsh#lo a
sh#lo a
ksh#lo \t a
zsh#lo a
csh#lo \t a
That does not, however, get you closer to a solution. If I were you I would use env_parallel to copy the environment variables. And if the login shell on the remote systems are not the same as your shell, then set PARALLEL_SHELL to force using that shell.
So:
#!/bin/bash
env_parallel --session
# setting some vars to pass
TO_SERVER_ODBCDSN="-D -S MyMSSQLServer"
:
:
PARALLEL_SHELL=bash env_parallel -q -j $parallelization_pernode ...
(no need to use neither --env nor 'export -f' when using 'env_parallel --session')
# Cleanup (not needed if this is the last line in the script)
env_parallel --end-session

Related

Pass arguments for SQL statement in a shell file from another shell file through ssh command

I am passing command line arguments to a shell file i.e assignRole.sh which contains an SQL command which will use these arguments like below
ssh -o StrictHostKeyChecking=no -T $key < /oracle/oracle_user/makhshif/./assignRole.sh name open_mode >> /oracle/oracle_user/dftest.txt
This gives me error and does not accept arguments of name and open_mode and gives error, but if I execute the statement outside of ssh command like:
/oracle/oracle_user/makhshif/./assignRole.sh name open_mode
This runs fine
What is the problem with ssh command and how should I adjust these parameters so these can be accepted for the shell script assignRole.sh
< /oracle/oracle_user/makhshif/./assignRole.sh
This commands sends a content of that file to stdin. So obviously it can't process variables that you haven't send to remote machine. Just preprocess your script or create a script on remote machine and call it with arguments
Though it's even easier to pass variables like this:
ssh -o StrictHostKeyChecking=no -T $key "var1=$var1 var2=$var2" < /oracle/oracle_user/makhshif/./assignRole.sh name open_mode >> /oracle/oracle_user/dftest.txt
For example my function for executing update scripts on all cluster nodes:
# functions:
ssh_exec(){
local DESCR="$1"; shift
local SCRIPT="$1"; shift
local hosts=("$#")
echo =================================================
echo = $DESCR
echo = Going to execute $SCRIPT...
read -a res -p "Enter 'skip' to skip this step or press Enter to execute: "
if [[ $res = "skip" ]]
then
echo Skipping $SCRIPT...
else
echo Executing $SCRIPT...
for host in "${hosts[#]}"
do
local cur=${!host}
echo Executing $SCRIPT on $host - $cur...
sshpass -p "$rootpass" ssh -o "StrictHostKeyChecking no" root#${cur} \
"ns1=$ns1 ns2=$ns2 search=$search zoo1=$zoo1 zoo2=$zoo2 zoo3=$zoo3 node0=$node0 pass=$pass CURIP=$cur CURHOST=$host bash -s" \
<$SCRIPT >log-$SCRIPT-$cur.log 2>&1
echo Done.
done
echo =================================================
fi
}
Then I use it like this:
read -p "Please check that Solr started successfully and Press [Enter] key to continue..."
#Solr configset and collections:
ssh_exec "Solr configset and collections" script06.sh zoo1 zoo2 zoo3
This command executes script06.sh on 3 servers (zoo1,zoo2,zoo3)
As Sayan said, using < redirects the output of running the assignRole.sh script locally, but you want to execute that script on the remote host, with the arguments.
Pass the whole command as the final argument to ssh, in quotes:
ssh -o StrictHostKeyChecking=no -T $key "/oracle/oracle_user/makhshif/./assignRole.sh name open_mode" >> /oracle/oracle_user/dftest.txt
or split into multiple lines for readability:
ssh -o StrictHostKeyChecking=no -T $key \
"/oracle/oracle_user/makhshif/./assignRole.sh name open_mode" \
>> /oracle/oracle_user/dftest.txt

Run RapSearch-Program with Torque PBS and qsub

My problem is that I have a cluster-server with Torque PBS and want to use it to run a sequence-comparison with the program rapsearch.
The normal RapSearch command is:
./rapsearch -q protein.fasta -d database -o output -e 0.001 -v 10 -x t -z 32
Now I want to run it with 2 nodes on the cluster-server.
I've tried with: echo "./rapsearch -q protein.fasta -d database -o output -e 0.001 -v 10 -x t -z 32" | qsub -l nodes=2 but nothing happened.
Do you have any suggestions? Where I'm wrong? Help please.
Standard output (and error output) files are placed in your home directory by default; take a look. You are looking for a file named STDIN.e[numbers], it will contain the error message.
However, I see that you're using ./rapsearch but are not really being explicit about what directory you're in. Your problem is therefore probably a matter of changing directory into the directory that you submitted from. When your terminal is in the directory of the rapsearch executable, try echo "cd \$PBS_O_WORKDIR && ./rapsearch [arguments]" | qsub [arguments] to submit your job to the cluster.
Other tips:
You could add rapsearch to your path if you use it often. Then you can use it like a regular command anywhere. It's a matter of adding the line export PATH=/full/path/to/rapsearch/bin:$PATH to your .bashrc file.
Create a submission script for use with qsub. Here is a good example.

How to execute postgres' sql queries from batch file?

I need to execute SQL from batch file.
I am executing following to connect to Postgres and select data from table
C:/pgsql/bin/psql -h %DB_HOST% -p 5432 -U %DB_USER% -d %DB_NAME%
select * from test;
I am able to connect to database, however I'm getting the error
'select' is not recognized as an internal or external command,
operable program or batch file.
Has anyone faced such issue?
This is one of the query i am trying, something similar works in shell script, (please ignore syntax error in the query if there are any)
copy testdata (col1,col2,col3) from '%filepath%/%csv_file%' with csv;
You could pipe it into psql
(
echo select * from test;
) | C:/pgsql/bin/psql -h %DB_HOST% -p 5432 -U %DB_USER% -d %DB_NAME%
When closing parenthesis are part of the SQL query they have to be escaped with three carets.
(
echo insert into testconfig(testid,scenarioid,testname ^^^) values( 1,1,'asdf'^^^);
) | psql -h %DB_HOST% -p 5432 -U %DB_USER% -d %DB_NAME%
Use the -f parameter to pass the batch file name
C:/pgsql/bin/psql -h %DB_HOST% -p 5432 -U %DB_USER% -d %DB_NAME% -f 'sql_batch_file.sql'
http://www.postgresql.org/docs/current/static/app-psql.html
-f filename
--file=filename
Use the file filename as the source of commands instead of reading commands interactively. After the file is processed, psql terminates. This is in many ways equivalent to the meta-command \i.
If filename is - (hyphen), then standard input is read until an EOF indication or \q meta-command. Note however that Readline is not used in this case (much as if -n had been specified).
if running on Linux, this is what worked for me (need to update values below with your user, db name etc)
psql "host=YOUR_HOST port=YOUR_PORT dbname=YOUR_DB_NAME user=YOUR_USER_NAME password=YOUR_PASSWORD" -f "fully_qualified_path_to_your_script.sql"
You cannot put the query on separate line, batch interpreter will assume it's another command instead of a query for psql. I believe you will need to quote it as well.
I agree with Spidey:
1] if you are passing the file with the sql use -f or --file parameter
When you want to execute several commands the best way to do that is to add parameter -f, and after that just type path to your file without any " or ' marks (relative paths works also):
psql -h %host% -p 5432 -U %user% -d %dbname% -f ..\..\folder\Data.txt
It also works in .NET Core. I need it to add basic data to my database after migrations.
Kindly refer to the documentation
1] if you are passing the file with the sql use -f or --file parameter
2] if you are passing individual command use -c or --command parameter
If you are trying the shell script
psql postgresql://$username:$password#$host/$database < /app/sql_script/script.sql

Expect script SSH not successful

I have made two scripts:
This one fetches IP address & Hostnames:
#!/bin/bash
for i in `cat ~/script/hosts.txt`
do HOSTNAME=`echo $i|awk -F: '{print $1}'`
IP=`echo $i|awk -F: '{print $2}'`
TIMESTAMP=`date +%Y-%m-%d`
~/script/expect.sh $HOSTNAME $IP
done
This one does SSH into the devices:
#!/usr/bin/expect
set timeout 20
set HOSTNAME [lindex $argv 0]
set IP [lindex $argv 1]
exp_internal 1
spawn ssh -o StrictHostKeyChecking=no root#$IP
exit
I want to make a script to bakcup multiple device configurations.
Problem is that SSH is failing due to following errors:
$ ./main.sh
spawn ssh -o StrictHostKeyChecking=no root#10.102.82.235
: Name or service not knownname 10.102.82.235
spawn ssh -o StrictHostKeyChecking=no root#10.102.82.239
: Name or service not knownname 10.102.82.239
When I debug, I see the following error
spawn id exp4 sent <ssh: Could not resolve hostname 10.102.82.235\r\r: Name or
service not known\r\r\n>
: Name or service not knownname 10.102.82.235
I think the issue is due to these characters: "\r\r", "\r\r\n"
Is there any way I can filter these out?
Not an answer, but your shell script can use much improvement:
#!/bin/bash
while IFS=: read -r host ip; do
timestamp=$(date +%Y-%m-%d %T)
~/script/expect.sh "$host" "$ip"
done < ~/script/hosts.txt
Notes:
don't use for line in `cat file` to read the lines of a file -- a for loop reads words from a file
use $(...) for command substitution, not `...` -- improved readability, and easy to nest
don't use UPPERCASE_VARIABLES -- those should be reserved for the shell's use.
your (unused) TIMESTAMP variable actually contains a date, no time.
quote your "$variables" unless you can explain why you want them unquoted.

Unix Df -k ouput in csv format

I’m trying to create a shell script to get the server stats of 100 servers and load the details into a table.Initially I’m creating a parameter file which has the list of all the servers, then I ‘m connecting these servers through ssh and run df –k. ssh keys are already setup.
Issues I’m facing is that I’m not able to associate server name to the result, I want server name added as a column to the df –k output.
Also the output format cannot be loaded into a table as there is no delimiter or tab or space properly formatted to load. I have tried sed & various other options but no luck.
#!/bin/ksh
PARMFILE=/opt/sdw/scripts/db_scripts/server_stats.txt
value=$(<server_list1.txt)
echo "$value"
sourceservers=`grep =/opt/sdw/scripts/db_scripts/server_stats.txt |cut -d= -f2`
#Input array passed as parameter file to the script
set -A array_value $value
vLen=${#array_value[#]}
echo $vLen
for(( j=0; j<$vLen; j++))
do
#echo "${array_value[$j]}"
#ssh -q "${array_value[$j]}"; df -k
ssh -q "${array_value[$j]}" 'df -h' >> df.out
ssh -q "${array_value[$j]}" df -h | column -t >> df1.out
ssh -q "${array_value[$j]}" df -k | tr -s " " | sed 's/ /, /g' | sed '1 s/, / /g' | column -t >> df3.out
[[ ! $? = 0 ]] && echo Failure, errno $?, cannot connect to host "${array_value[$j]}" >> sshfailed.list
done
Output
Filesystem Size Used Avail Use%
Mounted on
/dev/mapper/vg00-lvol3 1.5G 434M 923M 32%
Desired output
Filesystem, Size, Used , Avail, Use%, Mounted on, Servername
/dev/mapper/vg00-lvol3, 1.5G, 434M, 923M, 32%, / br724
put "${array_value[$j]}" in a variable like $server for better readability.
then in sed, do a substitution like sed "s/Mounted on/Mounted on $server/g"