ASE ISQL output to file, occassionally is empty or blank - sql

Give this unix script, which is scheduled batch run:
isql -U$USR -S$SRVR -P$PWD -w2000 < $SCRIPTS/sample_report.sql > $TEMP_DIR/sample_report.tmp_1
sed 's/-\{3,\}//g' $TEMP_DIR/sample_report.tmp_1 > $TEMP_DIR/sample_report.htm_1
uuencode $TEMP_DIR/sample_report.htm_1 sample_report.xls > $TEMP_DIR/sample_report.mail_1
mailx -s "Daily Sample Report" email#example.com < $TEMP_DIR/sample_report.mail_1
There are occasionally cases where the sample_report.xls attached in the mail, is empty, zero lines.
I have ruled out the following:
not command processing timeout - by adding the -t30 to isql, I get the xls and it contains the error, not empty
not sql error - by forcing an error in the sql, I get the xls and it contains the error, not empty
not sure of login timeout - by adding -l1, it does not timeout, but I can't specify a number lower than 1 second, so I can't say
I cannot reproduce this, as I do not know the cause. Has anyone else experienced this or have way to address this? Any suggestions how to find the cause? Is it the unix or the Sybase isql?

I found the cause. Since this is scheduled, and this particular report takes a long time to generate. Other scheduled scripts, I found have this line of code:
rm -f $TEMP_DIR/*
If the this long running report, overlaps with one of the scheduled scripts with the line above, the .tmp_1 can possibly be deleted, hence blank by the time it is mailed. I replicated this by manually deleting the .tmp_1 while the report was still writing the sql in there.

Related

Any specific problems running (linux) BCP on "too many" threads?

Are there any specific problems with running Microsoft's BCP utility (on CentOS 7, https://learn.microsoft.com/en-us/sql/linux/sql-server-linux-migrate-bcp?view=sql-server-2017) on multiple threads? Googling could not find much, but am looking at a problem that seems to be related to just that.
Copying a set of large TSV files from HDFS to a remote MSSQL Server with some code of the form
bcpexport() {
filename=$1
TO_SERVER_ODBCDSN=$2
DB=$3
TABLE=$4
USER=$5
PASSWORD=$6
RECOMMEDED_IMPORT_MODE=$7
DELIMITER=$8
echo -e "\nRemoving header from TSV file $filename"
echo -e "Current head:\n"
echo $(head -n 1 $filename)
echo "$(tail -n +2 $filename)" > $filename
echo "First line of file is now..."
echo $(head -n 1 $filename)
# temp. workaround safeguard for NFS latency
#sleep 5 #FIXME: appears to sometimes cause script to hang, workaround implemented below, throws error if timeout reached
timeout 30 sleep 5
echo -e "\nReplacing null literal values with empty chars"
NULL_WITH_TAB="null\t" # WARN: assumes the first field is prime-key so never null
TAB="\t"
sed -i -e "s/$NULL_WITH_TAB/$TAB/g" $filename
echo -e "Lines containing null (expect zero): $(grep -c "\tnull\t" $filename)"
# temp. workaround safeguard for NFS latency
#sleep 5 #FIXME: appears to sometimes cause script to hang, workaround implemented below
timeout 30 sleep 5
/opt/mssql-tools/bin/bcp "$TABLE" in "$filename" \
$TO_SERVER_ODBCDSN \
-U $USER -P $PASSWORD \
-d $DB \
$RECOMMEDED_IMPORT_MODE \
-t "\t" \
-e ${filename}.bcperror.log
}
export -f bcpexport
parallel -q -j 7 bcpexport {} "$TO_SERVER_ODBCDSN" $DB $TABLE $USER $PASSWORD $RECOMMEDED_IMPORT_MODE $DELIMITER \
::: $DATAFILES/$TARGET_GLOB
where $DATAFILES/$TARGET_GLOB constructs a glob that lists a set of files in a directory.
When running this code for a set of TSV files, finding that sometimes some (but not all) of the parallel BCP threads fail, ie. some files successfully copy to MSSQL Server
Starting copy...
5397376 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 154902 Average : (34843.8 rows per sec.)
while others output error message
Starting copy...
BCP copy in failed
Usually, see this pattern: a few successful BCP copy-in operations in the first few threads returned, then a bunch of failing threads return their output until run out of files (GNU Parallel returns output only when whole thread done to appear same as if sequential).
Notice in the code there is the -e option to produce an error file for each BCP copy-in operation (see https://learn.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-2017#e). When examining the files after observing these failing behaviors, all are blank, no error messages.
Only have seen this with the number of threads >= 10 (and only for certain sets of data (assuming has something to do with total number of files are files sizes, and yet...)), no errors seen so far when using ~7 threads, which further makes me suspect this has something to do with multi-threading.
Monitoring system resources (via free -mh) shows that generally ~13GB or RAM is always available.
May be helpful to note that the data bcp is trying to copy-in may be ~500000-1000000 records long with an upper limit of ~100 columns per record.
Does anyone have any idea what could be going on here? Note, am pretty new to using BCP as well as GNU Parallel and multi-threading.
No, no issues specific to the BCP program being run in multiple threads. You seem to be on the track of what I would say your issue is, system resources. Have you monitored system resources while increasing the number of threads? If anything, there is likely an issue with BCP executing properly when memory/cpu/network resources are low. Regarding the "-e" option, it is meant to output data errors. login errors, bad table names... many other errros are not reported in the file created with the -e option. When you get output using the "-e" option, you'll see info like "value truncated" and such... will give you line numbers and sample data that was at issue.
TLDR: Adding more threads to run concurrently to have bcp copy-in files of data seems to have the affect of overwhelming the endpoint MSSQL Server with write instructions, causing the bcp threads to fail (maybe timeing out?). When the number of threads becomes too many seems to depend on the size of the files getting copy-in'ed by bcp (ie. both the number of records in the file as well as the width of each record (ie. number of columns)).
Long version (more reasons for my theory):
1.
When running a larger number of bcp threads and looking at the processes started on the machine (https://clustershell.readthedocs.io/en/latest/tools/clush.html)
ps -aux | grep bcp
seeing a bunch of sleeping processes (notice the S, see https://askubuntu.com/a/360253/760862) as shown below (added newlines for readability)
me 135296 14.5 0.0 77596 6940 ? S 00:32 0:01
/opt/mssql-tools/bin/bcp TABLENAME in /path/to/tsv/1_16_0.tsv -D -S MyMSSQLServer -U myusername -P -d myDB -c -t \t -e /path/to/logfile
These threads appear to sleep for very long time. Further debugging into why these threads are sleeping suggests that they may in fact be doing their intended job (which would further imply that the problem may be coming from BCP itself (see https://stackoverflow.com/a/52748660/8236733)). From https://unix.stackexchange.com/a/47259/260742 and https://unix.stackexchange.com/a/36200/260742)
A process in S state is usually in a blocking system call, such as reading or writing to a file or the network, or waiting for another called program to finish.
(eg. writing to the MSSQL Server endpoint destination given to bcp in the ODBCDSN)
Your process will be in S state when it is doing reads and possibly writes that are blocking. Can also happen while waiting on semaphores or other synchronization primitives... This is all normal and expected, and not usually a problem... you don't want it to waste CPU while it's waiting for user input.
2. When running different sets of files of varying record-amount-per-file (eg. ranges of 500000 - 1000000 rows/file) and record-width-per-file (~10 - 100 columns/row), found that in cases with either very large data width or amounts, running a fixed set of bcp threads would fail.
Eg. for a set of ~33 TSVs with ~500000 rows each, each row being ~100 columns wide, a set of 30 threads would write the first few OK, but then all the rest would start returning failure messages. Incorporating a bit from #jamie's answer, the fact the the failure messages returned are "BCP copy in failed" errors does not necessarily mean it has do do with the content of the data in question. Having no actual content being written into the -e errorlog files from my process, #jamie's post says this
Regarding the "-e" option, it is meant to output data errors. login errors, bad table names... many other errros are not reported in the file created with the -e option. When you get output using the "-e" option, you'll see info like "value truncated" and such... will give you line numbers and sample data that was at issue.
Meanwhile, a set of ~33 TSVs with ~500000 rows each, each row being ~100 wide, and still using 30 bcp threads would complete quickly and without error (also would be faster when reducing the number of threads or file set). The only difference here being the overall size of the data being bcp copy-in'ed to the MSSQL Server.
All this while
free -mh
still showed that the machine running the threads still had ~15GB of free RAM remaining in each case (which is again why I suspect that the problem has to do with the remote MSSQL Server endpoint rather than with the code or local machine itself).
3. When running some of the tests from (2), found that manually killing the parallel process (via CTL+C) and then trying to remotely truncate the testing table being written to with /opt/mssql-tools/bin/sqlcmd -Q "truncate table mytable" on the local machine would take a very long time (as opposed to manually logging into the MSSQL Server and executing a truncate mytable in the DB). Again this makes me think that this has something to do with the MSSQL Server having too many connections and just being overwhelmed.
** Anyone with any MSSQL Mgmt Studio experience reading this (I have basically none), if you see anything here that makes you think that my theory is incorrect please let me know your thoughts.

Monit for "cron-like" tasks

Have some batch-type jobs that I would like to move from cron to Monit but am struggling to get them to work properly. These scripts typically run once a day, but on occasion have to be re-ran later in the day. Goal is to take advantage of the monit & m/monit front-ends to re-run as well as be alerted on failure in similar fashion to other things under monit.
The below was my first attempt. I know the docs say to use range/wildcard for minute field but I have my monit daemon set to cycle every 20 seconds so thought I'd be able to get away with this.
check program test.sh
with path "/usr/local/bin/test.sh"
every "0 7 * * *"
if status != 0 then alert
This does not seem to work as it seems like it picks up the exit status of the program on the NEXT run. So I have a zombie process sitting around until 7am the next day, at which time I'll see the status from the previous day's run.
Would be nice if this ran immediate or if there was a way to schedule something as "batch" that would only run once when started (either from command line or web gui). Example below.
check program test.sh
with path "/usr/local/bin/test.sh"
mode batch
if status != 0 then alert
Is it possible to do what I want? Can a 'check program' be scheduled that will only run one time when started or using the 'every [cron]' type syntax supported by monit?
TIA for any suggestions.
The latest version of monit (5.18) now picks up the exit status on the next daemon cycle, not on the next execution of the program like in the past (which might not be until the next day).

while [[ condition ]] stalls on loop exit

I have a problem with ksh in that a while loop is failing to obey the "while" condition. I should add now that this is ksh88 on my client's Solaris box. (That's a separate problem that can't be addressed in this forum. ;) I have seen Lance's question and some similar but none that I have found seem to address this. (Disclaimer: NO I haven't looked at every ksh question in this forum)
Here's a very cut down piece of code that replicates the problem:
1 #!/usr/bin/ksh
2 #
3 go=1
4 set -x
5 tail -0f loop-test.txt | while [[ $go -eq 1 ]]
6 do
7 read lbuff
8 set $lbuff
9 nwords=$#
10 printf "Line has %d words <%s>\n" $nwords "${lbuff}"
11 if [[ "${lbuff}" = "0" ]]
12 then
13 printf "Line consists of %s; time to absquatulate\n" $lbuff
14 go=0 # Violate the WHILE condition to get out of loop
15 fi
16 done
17 printf "\nLooks like I've fallen out of the loop\n"
18 exit 0
The way I test this is:
Run loop-test.sh in background mode
In a different window I run commands like "echo some nonsense >>loop_test.txt" (w/o the quotes, of course)
When I wish to exit, I type "echo 0 >>loop-test.txt"
What happens? It indeed sets go=0 and displays the line:
Line consists of 0; time to absquatulate
but does not exit the loop. To break out I append one more line to the txt file. The loop does NOT process that line and just falls out of the loop, issuing that "fallen out" message before exiting.
What's going on with this? I don't want to use "break" because in the actual script, the loop is monitoring the log of a database engine and the flag is set when it sees messages that the engine is shutting down. The actual script must still process those final lines before exiting.
Open to ideas, anyone?
Thanks much!
-- J.
OK, that flopped pretty quick. After reading a few other posts, I found an answer given by dogbane that sidesteps my entire pipe-to-while scheme. His is the second answer to a question (from 2013) where I see neeraj is using the same scheme I'm using.
What was wrong? The pipe-to-while has always worked for input that will end, like a file or a command with a distinct end to its output. However, from a tail command, there is no distinct EOF. Hence, the while-in-a-subshell doesn't know when to terminate.
Dogbane's solution: Don't use a pipe. Applying his logic to my situation, the basic loop is:
while read line
do
# put loop body here
done < <(tail -0f ${logfile})
No subshell, no problem.
Caveat about that syntax: There must be a space between the two < operators; otherwise it looks like a HEREIS document with bad syntax.
Er, one more catch: The syntax did not work in ksh, not even in the mksh (under cygwin) which emulates ksh93. But it did work in bash. So my boss is gonna have a good laugh at me, 'cause he knows I dislike bash.
So thanks MUCH, dogbane.
-- J
After articulating the problem and sleeping on it, the reason for the described behavior came to me: After setting go=0, the control flow of the loop still depends on another line of data coming in from STDIN via that pipe.
And now that I have realized the cause of the weirdness, I can speculate on an alternative way of reading from the stream. For the moment I am thinking of the following solution:
Open the input file as STDIN (Need to research the exec syntax for that)
When the condition occurs, close STDIN (Again, need to research the syntax for that)
It should then be safe to use the more intuitive:while read lbuffat the top of the loop.
I'll test this out today and post the result. I'd hope someone else benefit from the method (if it works).

Bad spawn_id while executing expect command

I am writing a script that will copy Valgrind onto whatever shelf that we enter on the command line. The syntax is as follows:
vgrindCopy [shelf number]
For some reason, the files will copy over without any issue, but after the copy completes the follow error is observed:
bad spawn_id (process died earlier?)
while executing
"expect "#""
Here is a copy of the relevant code:
function login_shelf {
expect -c "
set timeout 15
spawn $1
expect \"password:\"
send \"$PW\r\"
expect \"#\"
sleep 1
exit
"
}
# login and make the valgrind directory at /sfs/software/shelf/current
set -- /opt/swe/tools/ext/gnu/valgrind-3.7.0/i686-linux2.6/lib/valgrind/*
login_shelf "/opt/corp/projects/shelftools/bin/app rsync -Lau $* $shelf:/shelf/valgrind"
After playing around with the code, I found that if I remove the line "expect \"#\"", then the program doesn't copy any of the files over anymore. What odd as well is that I'm seeing the issue when I run the script, but a co-worker is not.
Has anyone had a similar issue and determined the cause? Any help would be greatly appreciated as always!
Your code is spawning the rsync and at the expect \"#\" is waiting for rsync to output a #, which it never does, so it exits and expect reports the error.
When you remove the expect \"#\" the expect script exits, terminating the rsync.
Instead of expect \"#\" you should wait for rsync to exit:
expect eof
wait

cron script to act as a queue OR a queue for cron?

I'm betting that someone has already solved this and maybe I'm using the wrong search terms for google to tell me the answer, but here is my situation.
I have a script that I want to run, but I want it to run only when scheduled and only one at a time. (can't run the script simultaneously)
Now the sticky part is that say I have a table called "myhappyschedule" which has the data I need and the scheduled time. This table can have multiple scheduled times even at the same time, each one would run this script. So essentially I need a queue of each time the script fires and they all need to wait for each one before it to finish. (sometimes this can take just a minute for the script to execute sometimes its many many minutes)
What I'm thinking about doing is making a script that checks myhappyschedule every 5 min and gathers up those that are scheduled, puts them into a queue where another script can execute each 'job' or occurrence in the queue in order. Which all of this sounds messy.
To make this longer - I should say that I'm allowing users to schedule things in myhappyschedule and not edit crontab.
What can be done about this? File locks and scripts calling scripts?
add a column exec_status to myhappytable (maybe also time_started and time_finished, see pseudocode)
run the following cron script every x minutes
pseudocode of cron script:
[create/check pid lock (optional, but see "A potential pitfall" below)]
get number of rows from myhappytable where (exec_status == executing_now)
if it is > 0, exit
begin loop
get one row from myhappytable
where (exec_status == not_yet_run) and (scheduled_time <= now)
order by scheduled_time asc
if no such row, exit
set row exec_status to executing_now (maybe set time_started to now)
execute whatever command the row contains
set row exec_status to completed
(maybe also store the command output/return as well, set time_finished to now)
end loop
[delete pid lock file (complementary to the starting pid lock check)]
This way, the script first checks if none of the commands is running, then runs first not-yet run command, until there are no more commands to be run at the given moment. Also, you can see what command is executing by querying the database.
A potential pitfall: if the cron script is killed, a scheduled task will remain in "executing_now" state. That's what the pid lock at beginning and end is for: to see if the cron script terminated properly. pseudocode of create/check pidlock:
if exists pidlockfile then
check if process id given in file exists
if not exists then
update myhappytable set exec_status = error_cronscript_died_while_executing_this
where exec_status == executing_now
delete pidlockfile
else (previous instance still running)
exit
endif
endif
create pidlockfile containing cron script process id
You can use the at(1) command inside your script to schedule its next run. Before it exits, it can check myhappyschedule for the next run time. You don't need cron at all, really.
I came across this question while researching for a solution to the queuing problem. For the benefit of anyone else searching here is my solution.
Combine this with a cron that starts jobs as they are scheduled (even if they are scheduled to run at the same time) and that solves the problem you described as well.
Problem
At most one instance of the script should be running.
We want to cue up requests to process them as fast as possible.
ie. We need a pipeline to the script.
Solution:
Create a pipeline to any script. Done using a small bash script (further down).
The script can be called as
./pipeline "<any command and arguments go here>"
Example:
./pipeline sleep 10 &
./pipeline shabugabu &
./pipeline single_instance_script some arguments &
./pipeline single_instance_script some other_argumnts &
./pipeline "single_instance_script some yet_other_arguments > output.txt" &
..etc
The script creates a new named pipe for each command. So the above will create named pipes: sleep, shabugabu, and single_instance_script
In this case the initial call will start a reader and run single_instance_script with some arguments as arguments. Once the call completes, the reader will grab the next request off the pipe and execute with some other_arguments, complete, grab the next etc...
This script will block requesting processes so call it as a background job (& at the end) or as a detached process with at (at now <<< "./pipeline some_script")
#!/bin/bash -Eue
# Using command name as the pipeline name
pipeline=$(basename $(expr "$1" : '\(^[^[:space:]]*\)')).pipe
is_reader=false
function _pipeline_cleanup {
if $is_reader; then
rm -f $pipeline
fi
rm -f $pipeline.lock
exit
}
trap _pipeline_cleanup INT TERM EXIT
# Dispatch/initialization section, critical
lockfile $pipeline.lock
if [[ -p $pipeline ]]
then
echo "$*" > $pipeline
exit
fi
is_reader=true
mkfifo $pipeline
echo "$*" > $pipeline &
rm -f $pipeline.lock
# Reader section
while read command < $pipeline
do
echo "$(date) - Executing $command"
($command) &> /dev/null
done