Here's the output of psql -q -t -c "SELECT 1;"
1
Time: 0.379 ms
It should be possible, but I can't find a way to prevent the execution time from being printed at the bottom - is there a way to prevent it? (Note: I'm already specifying the quiet mode and tuples only flags, to no avail)
Adding -A ignores the .psqlrc file and does the trick
Related
Are there any specific problems with running Microsoft's BCP utility (on CentOS 7, https://learn.microsoft.com/en-us/sql/linux/sql-server-linux-migrate-bcp?view=sql-server-2017) on multiple threads? Googling could not find much, but am looking at a problem that seems to be related to just that.
Copying a set of large TSV files from HDFS to a remote MSSQL Server with some code of the form
bcpexport() {
filename=$1
TO_SERVER_ODBCDSN=$2
DB=$3
TABLE=$4
USER=$5
PASSWORD=$6
RECOMMEDED_IMPORT_MODE=$7
DELIMITER=$8
echo -e "\nRemoving header from TSV file $filename"
echo -e "Current head:\n"
echo $(head -n 1 $filename)
echo "$(tail -n +2 $filename)" > $filename
echo "First line of file is now..."
echo $(head -n 1 $filename)
# temp. workaround safeguard for NFS latency
#sleep 5 #FIXME: appears to sometimes cause script to hang, workaround implemented below, throws error if timeout reached
timeout 30 sleep 5
echo -e "\nReplacing null literal values with empty chars"
NULL_WITH_TAB="null\t" # WARN: assumes the first field is prime-key so never null
TAB="\t"
sed -i -e "s/$NULL_WITH_TAB/$TAB/g" $filename
echo -e "Lines containing null (expect zero): $(grep -c "\tnull\t" $filename)"
# temp. workaround safeguard for NFS latency
#sleep 5 #FIXME: appears to sometimes cause script to hang, workaround implemented below
timeout 30 sleep 5
/opt/mssql-tools/bin/bcp "$TABLE" in "$filename" \
$TO_SERVER_ODBCDSN \
-U $USER -P $PASSWORD \
-d $DB \
$RECOMMEDED_IMPORT_MODE \
-t "\t" \
-e ${filename}.bcperror.log
}
export -f bcpexport
parallel -q -j 7 bcpexport {} "$TO_SERVER_ODBCDSN" $DB $TABLE $USER $PASSWORD $RECOMMEDED_IMPORT_MODE $DELIMITER \
::: $DATAFILES/$TARGET_GLOB
where $DATAFILES/$TARGET_GLOB constructs a glob that lists a set of files in a directory.
When running this code for a set of TSV files, finding that sometimes some (but not all) of the parallel BCP threads fail, ie. some files successfully copy to MSSQL Server
Starting copy...
5397376 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 154902 Average : (34843.8 rows per sec.)
while others output error message
Starting copy...
BCP copy in failed
Usually, see this pattern: a few successful BCP copy-in operations in the first few threads returned, then a bunch of failing threads return their output until run out of files (GNU Parallel returns output only when whole thread done to appear same as if sequential).
Notice in the code there is the -e option to produce an error file for each BCP copy-in operation (see https://learn.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-2017#e). When examining the files after observing these failing behaviors, all are blank, no error messages.
Only have seen this with the number of threads >= 10 (and only for certain sets of data (assuming has something to do with total number of files are files sizes, and yet...)), no errors seen so far when using ~7 threads, which further makes me suspect this has something to do with multi-threading.
Monitoring system resources (via free -mh) shows that generally ~13GB or RAM is always available.
May be helpful to note that the data bcp is trying to copy-in may be ~500000-1000000 records long with an upper limit of ~100 columns per record.
Does anyone have any idea what could be going on here? Note, am pretty new to using BCP as well as GNU Parallel and multi-threading.
No, no issues specific to the BCP program being run in multiple threads. You seem to be on the track of what I would say your issue is, system resources. Have you monitored system resources while increasing the number of threads? If anything, there is likely an issue with BCP executing properly when memory/cpu/network resources are low. Regarding the "-e" option, it is meant to output data errors. login errors, bad table names... many other errros are not reported in the file created with the -e option. When you get output using the "-e" option, you'll see info like "value truncated" and such... will give you line numbers and sample data that was at issue.
TLDR: Adding more threads to run concurrently to have bcp copy-in files of data seems to have the affect of overwhelming the endpoint MSSQL Server with write instructions, causing the bcp threads to fail (maybe timeing out?). When the number of threads becomes too many seems to depend on the size of the files getting copy-in'ed by bcp (ie. both the number of records in the file as well as the width of each record (ie. number of columns)).
Long version (more reasons for my theory):
1.
When running a larger number of bcp threads and looking at the processes started on the machine (https://clustershell.readthedocs.io/en/latest/tools/clush.html)
ps -aux | grep bcp
seeing a bunch of sleeping processes (notice the S, see https://askubuntu.com/a/360253/760862) as shown below (added newlines for readability)
me 135296 14.5 0.0 77596 6940 ? S 00:32 0:01
/opt/mssql-tools/bin/bcp TABLENAME in /path/to/tsv/1_16_0.tsv -D -S MyMSSQLServer -U myusername -P -d myDB -c -t \t -e /path/to/logfile
These threads appear to sleep for very long time. Further debugging into why these threads are sleeping suggests that they may in fact be doing their intended job (which would further imply that the problem may be coming from BCP itself (see https://stackoverflow.com/a/52748660/8236733)). From https://unix.stackexchange.com/a/47259/260742 and https://unix.stackexchange.com/a/36200/260742)
A process in S state is usually in a blocking system call, such as reading or writing to a file or the network, or waiting for another called program to finish.
(eg. writing to the MSSQL Server endpoint destination given to bcp in the ODBCDSN)
Your process will be in S state when it is doing reads and possibly writes that are blocking. Can also happen while waiting on semaphores or other synchronization primitives... This is all normal and expected, and not usually a problem... you don't want it to waste CPU while it's waiting for user input.
2. When running different sets of files of varying record-amount-per-file (eg. ranges of 500000 - 1000000 rows/file) and record-width-per-file (~10 - 100 columns/row), found that in cases with either very large data width or amounts, running a fixed set of bcp threads would fail.
Eg. for a set of ~33 TSVs with ~500000 rows each, each row being ~100 columns wide, a set of 30 threads would write the first few OK, but then all the rest would start returning failure messages. Incorporating a bit from #jamie's answer, the fact the the failure messages returned are "BCP copy in failed" errors does not necessarily mean it has do do with the content of the data in question. Having no actual content being written into the -e errorlog files from my process, #jamie's post says this
Regarding the "-e" option, it is meant to output data errors. login errors, bad table names... many other errros are not reported in the file created with the -e option. When you get output using the "-e" option, you'll see info like "value truncated" and such... will give you line numbers and sample data that was at issue.
Meanwhile, a set of ~33 TSVs with ~500000 rows each, each row being ~100 wide, and still using 30 bcp threads would complete quickly and without error (also would be faster when reducing the number of threads or file set). The only difference here being the overall size of the data being bcp copy-in'ed to the MSSQL Server.
All this while
free -mh
still showed that the machine running the threads still had ~15GB of free RAM remaining in each case (which is again why I suspect that the problem has to do with the remote MSSQL Server endpoint rather than with the code or local machine itself).
3. When running some of the tests from (2), found that manually killing the parallel process (via CTL+C) and then trying to remotely truncate the testing table being written to with /opt/mssql-tools/bin/sqlcmd -Q "truncate table mytable" on the local machine would take a very long time (as opposed to manually logging into the MSSQL Server and executing a truncate mytable in the DB). Again this makes me think that this has something to do with the MSSQL Server having too many connections and just being overwhelmed.
** Anyone with any MSSQL Mgmt Studio experience reading this (I have basically none), if you see anything here that makes you think that my theory is incorrect please let me know your thoughts.
I want to generate some trace file from disk IO, but the problem is I need the actual input data along with timestamp, logical address and access block size, etc.
I've been trying to solve the problem by using the "blktrace | blkparse" with "iozone" on the ubuntu VirtualBox environment, but it seems not working.
There is an option in blkparse for setting the output format to show the packet data, -f "%P", but it dose not print anything.
below is the command that i use:
$> sudo blktrace -a issue -d /dev/sda -o - | blkparse -i - -o ./temp/blktrace.sda.iozone -f "%-12C\t\t%p\t%d\t%S:%n:%N\t\t%P\n"
$> iozone -w -e -s 16M -f ./mnt/iozone.dummy -i 0
In the printing format "%-12C\t\t%p\t%d\t%S:%n:%N\t\t%P\n", all other things are printed well, but the "%P" is not printed at all.
Is there anyone who knows why the packet data is not displayed?
OR anyone who knows other way to get the disk IO packet data with actual input value?
As far as I know blktrace does not capture the actual data. It just capture the metadata. One way to capture real data is to write your own kernel module. Some students at FIU.edu did that in this paper:
"I/O deduplication: Utilizing content similarity to ..."
I would ask this question in linux-btrace mailing list as well:
http://vger.kernel.org/majordomo-info.html
I have query which gives me result as
app_no
--------
(0 rows)
I need to get only the rows part and that too just the number. I am saving the result into a variable but I am not able to parse it.
napp=`psql -U postgres appdb -c "select appno from app.apps where properties&2048=1024
cap=$(echo "$napp"|sed -n 's/[0-9][0-9] rows/\1/p')
echo "$cap"
I just need number of rows and that too just number.
If you need the number of appno entries that match, then you should probably use:
SELECT COUNT(*) FROM app.apps WHERE properties & 2048 = 1024
but the answer will always be 0 because the condition is always going to give 0 or false. You need the same bit twice, either both 1024 or both 2048.
SELECT COUNT(*) FROM app.apps WHERE properties & 1024 = 1024
SELECT COUNT(*) FROM app.apps WHERE properties & 2048 = 2048
SQL interfaces that insist on headings and summaries are a nuisance when shell scripting. However, the psql manual suggests that -q and -t may help (with -A too, perhaps):
-A or --no-align
Switches to unaligned output mode. (The default output mode is otherwise aligned.)
-q or --quiet
Specifies that psql should do its work quietly. By default, it prints welcome messages and various informational output. If this option is used, none of this happens. This is useful with the -c option. Within psql you can also set the QUIET variable to achieve the same effect.
-t or --tuples-only
Turn off printing of column names and result row count footers, etc. This is equivalent to the \t command.
If you want to cut the string as-is :
napp=$(psql -U postgres appdb -c "
select appno frpm app.apps
where properties&2048=1024;"
)
cap=$(echo "$napp" | sed -nr 's/.*\(([0-9]+) rows.*/\1/p')
echo "$cap"
But a better solution is the Jonathan Leffler's one
I am trying to delete last row in the file generated by nzsql.Please find the below query.
nzsql -A -c "SELECT * FROM AM_MAS_DIVISION_DIM" > abc.out
When I execute this query the output will be generated and stored in abc.out.This will include both header columns as well as some time information at the bottom.But I don't need the bottom metadata and want to keep only my header columns. How can I do this using only nzsql.Please help me.Thanks in advance.
use -r flag in the nzsql command to avoid getting that row [assuming the metadata referred in question is the row count summary line, ex: (3 rows)]
-r Suppresses the row count that is displayed at the end of the SQL output.
reference: http://pic.dhe.ibm.com/infocenter/ntz/v7r0m3/index.jsp?topic=%2Fcom.ibm.nz.adm.doc%2Fr_sysadm_nzsql_command.html
Why don't you just pipe the output to a unix command to remove it? I think something like this will work:
nzsql -A -c "SELECT * FROM AM_MAS_DIVISION_DIM" | sed '$d' > abc.out
Seems to be a recommended solution for getting rid of the last line (although ed, gawk, and other tools can handle it).
I have an expect script which I need to run every 3 mins on my management node to collect tx/rx values for each port attached to DCX Brocade SAN Switch using the command #portperfshow#
Each time I try to use crontab to execute the script every 3 mins, the script does not work!
My expect script starts with #!/usr/bin/expect -f and I am calling the script using the following syntax under cron:
3 * * * * /usr/bin/expect -f /root/portsperfDCX1/collect-all.exp sanswitchhostname
However, when I execute the script (not under cron) it works as expected:
root# ./collect-all.exp sanswitchhostname
works just fine.
Please Please can someone help! Thanks.
The script collect-all.exp is:
#!/usr/bin/expect -f
#Time and Date
set day [timestamp -format %d%m%y]
set time [timestamp -format %H%M]
#logging
set LogDir1 "/FPerf/PortsLogs"
et timeout 5
set ipaddr [lrange $argv 0 0]
set passw "XXXXXXX"
if { $ipaddr == "" } {
puts "Usage: <script.exp> <ip address>\n"
exit 1
}
spawn ssh admin#$ipaddr
expect -re "password"
send "$passw\r"
expect -re "admin"
log_file "$LogDir1/$day-portsperfshow-$time"
send "portperfshow -tx -rx -t 10\r"
expect timeout "\n"
send \003
log_file
send -- "exit\r"
close
I had the same issue, except that my script was ending with
interact
Finally I got it working by replacing it with these two lines:
expect eof
exit
Changing interact to expect eof worked for me!
Needed to remove the exit part, because I had more statements in the bash script after the expect line (calling expect inside a bash script).
There are two key differences between a program that is run normally from a shell and a program that is run from cron:
Cron does not populate (many) environment variables. Notably absent are TERM, SHELL and HOME, but that's just a small proportion of the long list that will be not defined.
Cron does not set up a current terminal, so /dev/tty doesn't resolve to anything. (Note, programs spawned by Expect will have a current terminal.)
With high probability, any difficulties will come from these, especially the first. To fix, you need to save all your environment variables in an interactive session and use these in your expect script to repopulate the environment. The easiest way is to use this little expect script:
unset -nocomplain ::env(SSH_AUTH_SOCK) ;# This one is session-bound anyway
puts [list array set ::env [array get ::env]]
That will write out a single very long line which you want to put near the top of your script (or at least before the first spawn). Then see if that works.
Jobs run by cron are not considered login shells, and thus don't source your .bashrc, .bash_profile, etc.
If you want that behavior, you need to add it explicitly to the crontab entry like so:
$ crontab -l
0 13 * * * bash -c '. .bash_profile; etc ...'
$