How to get exit status or completion message from an asynchronous gcloud sql export job? - sql

I have a daily export task for pulling data from gcloud sql to a bucket in cloud storage, but the job winds up timing out and gcloud sends back an error saying this. Nonetheless, the SQL instance continues running the export, and the files make it to their destination, no problem.
In order to get around the timeout error, which is fouling up our logs, I have tried adding the --async flag, which gets around the error as it should, but there is no exit or completion message.
gcloud --project=$PROJECT sql export csv cloud-sql --database=$DB $BUCKET/$(date +%Y%m%d)_$NAME.csv --async --query="cat $SQLPATH/$table.sql" >> $LOG 2>&1
Is there some bash code or modification I can make to receive a status update or exit response to that I can accurately log that the job has been completed?

You can start this bash code at the same time you issue the command in another process. It has a function that gets the status of the operation (whose id is saved in the log), then the first status read and the next ones are compared until they are different, and the status is printed periodically:
OPERATION=$(cat $LOG|tr / \ | awk '{print $NF}'| tail -n 1)
get_status(){
CURRENT_STATUS=$(gcloud sql operations describe $OPERATION | grep status: | awk '{print $NF}')
}
FIRST_STATUS=$CURRENT_STATUS
echo FIRST STATUS: $FIRST_STATUS
while [ $FIRST_STATUS == $CURRENT_STATUS ]
do
get_status
echo CURRENT STATUS: $CURRENT_STATUS
sleep 5
done
echo CURRENT STATUS: $CURRENT_STATUS
echo DONE!

Related

Is there a way to get a nice error report summary when running many jobs on DRMAA cluster?

I need to run a snakemake pipeline on a DRMAA cluster with a total number of >2000 jobs. When some of the jobs have failed, I would like to receive in the end an easy readable summary report, where only the failed jobs are listed instead of the whole job summary as given in the log.
Is there a way to achieve this without parsing the log file by myself?
These are the (incomplete) cluster options:
jobs: 200
latency-wait: 5
keep-going: True
rerun-incomplete: True
restart-times: 2
I am not sure if there is another way than parsing the log file yourself, but I've done it several times with grep and I am happy with the results:
cat .snakemake/log/[TIME].snakemake.log | grep -B 3 -A 3 error
Of course you should change the TIME placeholder for whichever run you want to check.

Big query make a quick backup with many tables

Currently I'm copying tables with something like this:
#!/bin/sh
export SOURCE_DATASET="BQPROJECTID:BQSOURCEDATASET"
export DEST_PREFIX="TARGETBQPROJECTID:TARGETBQDATASET._YOUR_PREFIX"
for f in `bq ls -n TOTAL_NUMBER_OF_TABLES $SOURCE_DATASET |grep TABLE | awk '{print $1}'`
do
export CLONE_CMD="bq --nosync cp $SOURCE_DATASET.$f $DEST_PREFIX$f"
echo $CLONE_CMD
echo `$CLONE_CMD`
done
(script from here), but it takes ~20min (because of ~600 tables). Maybe there is another way (preferably faster), to make a backup?
As a suggestion you may use Scheduling queries to schedule recurring queries in BigQuery, with this option you will be able to schedule your backups on a daily, weekly, monthly or custom periodicity, leaving the backups of your tables for nights or weekends. You can find more information about it in the following link.
But remember, the time that you backup takes will depend on your tables size.
Well, due to you mentioned that Scheduling queries is not an option for you, another option you can try is run your cp command in the background, this because you are working with a for loop and you are waiting to finish each process, instead of that you can run multiple process in background to get better performance. I made a simple script to test it and it works! First I made test without background process:
#!/bin/bash
start_global=$(date +'%s');
for ((i=0;i<100;i++))
do
start=$(date +'%s');
bq --location=US cp -a -f -n [SOURCE_PROJECT_ID]:[DATASET].[TABLE]
[TARGET_PROJECT_ID]:[DATASET].[TABLE]
echo "It took $(($(date +'%s') - $start)) seconds to iteration umber:
$i"
done
echo "It took $(($(date +'%s') - $start_global)) seconds to the entire
process"
It takes me around 5 seconds per table copied (160 Mb approx), so I spend more less 10 minutes in that process, so I modified the script to use background process:
#!/bin/bash
start_global=$(date +'%s');
for ((i=0;i<100;i++))
do
bq --location=US cp -a -f -n [SOURCE_PROJECT_ID]:[DATASET].[TABLE]
[TARGET_PROJECT_ID]:[DATASET].[TABLE] &
pid_1=$! # Get background process id
done
if wait $pid_1
then
echo -e "Processes termination successful"
else
echo -e "Error"
fi
echo "It took $(($(date +'%s') - $start_global)) seconds to the entire
process"
In this way I only spend 3 minutes to finish the execution.
You may adapt this idea to your implementation, just consider the quotas for Copy jobs, you can check it here.

How can i view all comments posted by users in bitbucket repository

In the repository home page , i can see comments posted in recent activity at the bottom, bit it only shows 10 commnets.
i want to all the comments posted since beginning.
Is there any way
Comments of pull requests, issues and commits can be retrieved using bitbucket’s REST API.
However it seems that there is no way to list all of them at one place, so the only way to get them would be to query the API for each PR, issue or commit of the repository.
Note that this takes a long time, since bitbucket has seemingly set a limit to the number of accesses via API to repository data: I got Rate limit for this resource has been exceeded errors after retrieving around a thousand results, then I could retrieve about only one entry per second elapsed from the time of the last rate limit error.
Finding the API URL to the repository
The first step is to find the URL to the repo. For private repositories, it is necessary to get authenticated by providing username and password (using curl’s -u switch). The URL is of the form:
https://api.bitbucket.org/2.0/repositories/{repoOwnerName}/{repoName}
Running git remote -v from the local git repository should provide the missing values. Check the forged URL (below referred to as $url) by verifying that repository information is correctly retrieved as JSON data from it: curl -u username $url.
Fetching comments of commits
Comments of a commit can be accessed at $url/commit/{commitHash}/comments.
The resulting JSON data can be processed by a script. Beware that the results are paginated.
Below I simply extract the number of comments per commit. It is indicated by the value of the member size of the retrieved JSON object; I also request a partial response by adding the GET parameter fields=size.
My script getNComments.sh:
#!/bin/sh
pw=$1
id=$2
json=$(curl -s -u username:"$pw" \
https://api.bitbucket.org/2.0/repositories/{repoOwnerName}/{repoName}/commit/$id/comments'?fields=size')
printf '%s' "$json" | grep -q '"type": "error"' \
&& printf "ERROR $id\n" && exit 0
nComments=$(printf '%s' "$json" | grep -o '"size": [0-9]*' | cut -d' ' -f2)
: ${nComments:=EMPTY}
checkNumeric=$(printf '%s' "$nComments" | tr -dc 0-9)
[ "$nComments" != "$checkNumeric" ] \
&& printf >&2 "!ERROR! $id:\n%s\n" "$json" && exit 1
printf "$nComments $id\n"
To use it, taking into account the possibility for the error mentioned above:
A) Prepare input data. From the local repository, generate the list of commits as wanted (run git fetch -a prior to update the local git repo if needed); check out git help rev-list for how it can be customised.
git rev-list --all | sort > sorted-all.id
cp sorted-all.id remaining.id
B) Run the script. Note that the password is passed here as a parameter – so first assign it to a variable safely using stty -echo; IFS= read -r passwd; stty echo, in one line; also see security considerations below. The processing is parallelised onto 15 processes here, using the option -P.
< remaining.id xargs -P 15 -L 1 ./getNComments.sh "$passwd" > commits.temp
C) When the rate limit is reached, that is when getNComments.sh prints !ERROR!, then kill the above command (Ctrl-C), and execute these below to update the input and output files. Wait a while for the request limit to increase, then re-execute the above one command and repeat until all the data is processed (that is when wc -l remaining.id returns 0).
cat commits.temp >> commits.result
cut -d' ' -f2 commits.result | sort | comm -13 - sorted-all.id > remaining.id
D) Finally, you can get the commits which received comments with:
grep '^[1-9]' commits.result
Fetching comments of pull requests and issues
The procedure is the same as for fetching commits’ comments, but for the following two adjustments:
Edit the script to replace in the URL commit by pullrequests or by issues, as appropriate;
Let $n be the number of issues/PRs to search. The git rev-list command above becomes: seq 1 $n > sorted-all.id
The total number of PRs in the repository can be obtained with:
curl -su username $url/pullrequests'?state=&fields=size'
and, if the issue tracker is set up, the number of issues with:
curl -su username $url/issues'?fields=size'
Hopefully, the repository has few enough PRs and issues so that all data can be fetched in one go.
Viewing comments
They can be viewed normally via the web interface on their commit/PR/issue page at:
https://bitbucket.org/{repoOwnerName}/{repoName}/commits/{commitHash}
https://bitbucket.org/{repoOwnerName}/{repoName}/pull-requests/{prId}
https://bitbucket.org/{repoOwnerName}/{repoName}/issues/{issueId}
For example, to open all PRs with comments in firefox:
awk '/^[1-9]/{print "https://bitbucket.org/{repoOwnerName}/{repoName}/pull-requests/"$2}' PRs.result | xargs firefox
Security considerations
Arguments passed on the command line are visible to all users of the system, via ps ax (or /proc/$PID/cmdline). Therefore the bitbucket password will be exposed, which could be a concern if the system is shared by multiple users.
There are three commands getting the password from the command line: xargs, the script, and curl.
It appears that curl tries to hide the password by overwriting its memory, but it is not guaranteed to work, and even if it does, it leaves it visible for a (very short) time after the process starts. On my system, the parameters to curl are not hidden.
A better option could be to pass the sensitive information through environment variables. They should be visible only to the current user and root via ps axe (or /proc/$PID/environ); although it seems that there are systems that let all users access this information (do a ls -l /proc/*/environ to check the environment files’ permissions).
In the script simply replace the lines pw=$1 id=$2 with id=$1, then pass pw="$passwd" before xargs in the command line invocation. It will make the environment variable pw visible to xargs and all of its descendent processes, that is the script and its children (curl, grep, cut, etc), which may or may not read the variable. curl does not read the password from the environment, but if its password hiding trick mentioned above works then it might be good enough.
There are ways to avoid passing the password to curl via the command line, notably via standard input using the option -K -. In the script, replace curl -s -u username:"$pw" with printf -- '-s\n-u "%s"\n' "$authinfo" | curl -K - and define the variable authinfo to contain the data in the format username:password. Note that this method needs printf to be a shell built-in to be safe (check with type printf), otherwise the password will show up in its process arguments. If it is not a built-in, try with print or echo instead.
A simple alternative to an environment variable that will not appear in ps output in any case is via a file. Create a file with read/write permissions restricted to the current user (chmod 600), and edit it so that it contains username:password as its first line. In the script, replace pw=$1 with IFS= read -r authinfo < "$1", and edit it to use curl’s -K option as in the paragraph above. In the command line invocation replace $passwd with the filename.
The file approach has the drawback that the password will be written to disk (note that files in /proc are not on the disk). If this too is undesirable, it is possible to pass a named pipe instead of a regular file:
mkfifo pipe
chmod 600 pipe
# make sure printf is a builtin, or use an equivalent instead
(while :; do printf -- '%s\n' "username:$passwd"; done) > pipe&
pid=$!
exec 3<pipe
Then invoke the script passing pipe instead of the file. Finally, to clean up do:
kill $pid
exec 3<&-
This will ensure the authentication info is passed directly from the shell to the script (through the kernel), is not written to disk and is not exposed to other users via ps.
You can go to Commits and see the top line for each commit, you will need to click on each one to see further information.
If I find a way to see all without drilling into each commit, I will update this answer.

Expect script does not work under crontab

I have an expect script which I need to run every 3 mins on my management node to collect tx/rx values for each port attached to DCX Brocade SAN Switch using the command #portperfshow#
Each time I try to use crontab to execute the script every 3 mins, the script does not work!
My expect script starts with #!/usr/bin/expect -f and I am calling the script using the following syntax under cron:
3 * * * * /usr/bin/expect -f /root/portsperfDCX1/collect-all.exp sanswitchhostname
However, when I execute the script (not under cron) it works as expected:
root# ./collect-all.exp sanswitchhostname
works just fine.
Please Please can someone help! Thanks.
The script collect-all.exp is:
#!/usr/bin/expect -f
#Time and Date
set day [timestamp -format %d%m%y]
set time [timestamp -format %H%M]
#logging
set LogDir1 "/FPerf/PortsLogs"
et timeout 5
set ipaddr [lrange $argv 0 0]
set passw "XXXXXXX"
if { $ipaddr == "" } {
puts "Usage: <script.exp> <ip address>\n"
exit 1
}
spawn ssh admin#$ipaddr
expect -re "password"
send "$passw\r"
expect -re "admin"
log_file "$LogDir1/$day-portsperfshow-$time"
send "portperfshow -tx -rx -t 10\r"
expect timeout "\n"
send \003
log_file
send -- "exit\r"
close
I had the same issue, except that my script was ending with
interact
Finally I got it working by replacing it with these two lines:
expect eof
exit
Changing interact to expect eof worked for me!
Needed to remove the exit part, because I had more statements in the bash script after the expect line (calling expect inside a bash script).
There are two key differences between a program that is run normally from a shell and a program that is run from cron:
Cron does not populate (many) environment variables. Notably absent are TERM, SHELL and HOME, but that's just a small proportion of the long list that will be not defined.
Cron does not set up a current terminal, so /dev/tty doesn't resolve to anything. (Note, programs spawned by Expect will have a current terminal.)
With high probability, any difficulties will come from these, especially the first. To fix, you need to save all your environment variables in an interactive session and use these in your expect script to repopulate the environment. The easiest way is to use this little expect script:
unset -nocomplain ::env(SSH_AUTH_SOCK) ;# This one is session-bound anyway
puts [list array set ::env [array get ::env]]
That will write out a single very long line which you want to put near the top of your script (or at least before the first spawn). Then see if that works.
Jobs run by cron are not considered login shells, and thus don't source your .bashrc, .bash_profile, etc.
If you want that behavior, you need to add it explicitly to the crontab entry like so:
$ crontab -l
0 13 * * * bash -c '. .bash_profile; etc ...'
$

cron script to act as a queue OR a queue for cron?

I'm betting that someone has already solved this and maybe I'm using the wrong search terms for google to tell me the answer, but here is my situation.
I have a script that I want to run, but I want it to run only when scheduled and only one at a time. (can't run the script simultaneously)
Now the sticky part is that say I have a table called "myhappyschedule" which has the data I need and the scheduled time. This table can have multiple scheduled times even at the same time, each one would run this script. So essentially I need a queue of each time the script fires and they all need to wait for each one before it to finish. (sometimes this can take just a minute for the script to execute sometimes its many many minutes)
What I'm thinking about doing is making a script that checks myhappyschedule every 5 min and gathers up those that are scheduled, puts them into a queue where another script can execute each 'job' or occurrence in the queue in order. Which all of this sounds messy.
To make this longer - I should say that I'm allowing users to schedule things in myhappyschedule and not edit crontab.
What can be done about this? File locks and scripts calling scripts?
add a column exec_status to myhappytable (maybe also time_started and time_finished, see pseudocode)
run the following cron script every x minutes
pseudocode of cron script:
[create/check pid lock (optional, but see "A potential pitfall" below)]
get number of rows from myhappytable where (exec_status == executing_now)
if it is > 0, exit
begin loop
get one row from myhappytable
where (exec_status == not_yet_run) and (scheduled_time <= now)
order by scheduled_time asc
if no such row, exit
set row exec_status to executing_now (maybe set time_started to now)
execute whatever command the row contains
set row exec_status to completed
(maybe also store the command output/return as well, set time_finished to now)
end loop
[delete pid lock file (complementary to the starting pid lock check)]
This way, the script first checks if none of the commands is running, then runs first not-yet run command, until there are no more commands to be run at the given moment. Also, you can see what command is executing by querying the database.
A potential pitfall: if the cron script is killed, a scheduled task will remain in "executing_now" state. That's what the pid lock at beginning and end is for: to see if the cron script terminated properly. pseudocode of create/check pidlock:
if exists pidlockfile then
check if process id given in file exists
if not exists then
update myhappytable set exec_status = error_cronscript_died_while_executing_this
where exec_status == executing_now
delete pidlockfile
else (previous instance still running)
exit
endif
endif
create pidlockfile containing cron script process id
You can use the at(1) command inside your script to schedule its next run. Before it exits, it can check myhappyschedule for the next run time. You don't need cron at all, really.
I came across this question while researching for a solution to the queuing problem. For the benefit of anyone else searching here is my solution.
Combine this with a cron that starts jobs as they are scheduled (even if they are scheduled to run at the same time) and that solves the problem you described as well.
Problem
At most one instance of the script should be running.
We want to cue up requests to process them as fast as possible.
ie. We need a pipeline to the script.
Solution:
Create a pipeline to any script. Done using a small bash script (further down).
The script can be called as
./pipeline "<any command and arguments go here>"
Example:
./pipeline sleep 10 &
./pipeline shabugabu &
./pipeline single_instance_script some arguments &
./pipeline single_instance_script some other_argumnts &
./pipeline "single_instance_script some yet_other_arguments > output.txt" &
..etc
The script creates a new named pipe for each command. So the above will create named pipes: sleep, shabugabu, and single_instance_script
In this case the initial call will start a reader and run single_instance_script with some arguments as arguments. Once the call completes, the reader will grab the next request off the pipe and execute with some other_arguments, complete, grab the next etc...
This script will block requesting processes so call it as a background job (& at the end) or as a detached process with at (at now <<< "./pipeline some_script")
#!/bin/bash -Eue
# Using command name as the pipeline name
pipeline=$(basename $(expr "$1" : '\(^[^[:space:]]*\)')).pipe
is_reader=false
function _pipeline_cleanup {
if $is_reader; then
rm -f $pipeline
fi
rm -f $pipeline.lock
exit
}
trap _pipeline_cleanup INT TERM EXIT
# Dispatch/initialization section, critical
lockfile $pipeline.lock
if [[ -p $pipeline ]]
then
echo "$*" > $pipeline
exit
fi
is_reader=true
mkfifo $pipeline
echo "$*" > $pipeline &
rm -f $pipeline.lock
# Reader section
while read command < $pipeline
do
echo "$(date) - Executing $command"
($command) &> /dev/null
done