Troubleshooting BCP and Format File Errors - sql

First off, sorry for the long post. I wanted to be thorough with my examples/data, and the bulk of this post is just that.
I inherited a Bulk Import Process using a format file (.fmt) at my new job. This process was created by the guy that worked here before me, and it is my job to learn this process (and fix it now). I have limited knowledge of this stuff, but I have done some research. After a few weeks, I haven't really gotten anywhere. Here is what I am working with...
--BCP Command to import data from C:\Desktop\20180629_2377167_PR_NP.txt to table LA_Temp.dbo.ProvReg
bcp LA_Temp.dbo.ProvReg IN C:\Desktop\20180629_2377167_PR_NP.txt -f C:\Desktop\PROVREG.FMT -T -S SERVERNAME -k -m 1000000
--Table Structure which format file is created from:
SELECT [NPI]
,[D1]
,[EntityType]
,[D2]
,[ReplaceNPI]
,[D3]
,[ProvName]
,[D4]
,[MailAddr1]
,[D5]
,[MailAddr2]
,[D6]
,[MailCity]
,[D7]
,[MailState]
,[D8]
,[MailZip]
,[D9]
,[MailCountry]
,[D10]
,[MailPhone]
,[D11]
,[MailFax]
,[D12]
,[LocAddr1]
,[D13]
,[LocAddr2]
,[D14]
,[LocCity]
,[D15]
,[LocState]
,[D16]
,[LocZip]
,[D17]
,[LocCountry]
,[D18]
,[LocPhone]
,[D19]
,[LocFax]
,[D20]
,[Taxonomy1]
,[D21]
,[Taxonomy2]
,[D22]
,[Taxonomy3]
,[D23]
,[OtherProvID]
,[D24]
,[OtherProvIDType]
,[D25]
,[ProvEnumDate]
,[D26]
,[LastUpdate]
,[D27]
,[DeactivateRC]
,[D28]
,[DeactivateDate]
,[D29]
,[ReactivateDate]
,[D30]
,[Gender]
,[D31]
,[License]
,[D32]
,[LicenseState]
,[D33]
,[AuthorizedContact]
,[D34]
,[ContactTitle]
,[D35]
,[ContactPhone]
,[D36]
,[PanelOpen]
,[D37]
,[Language1]
,[D38]
,[Language2]
,[D39]
,[Language3]
,[D40]
,[Language4]
,[D41]
,[Language5]
,[D42]
,[AgeRestrict]
,[D43]
,[PCPMax]
,[D44]
,[PCPActual]
,[D45]
,[PCPAll]
,[D46]
,[EnrollInd]
,[D47]
,[EnrollDate]
,[D48]
,[FamilyOnly]
,[D49]
,[SubSpec1]
,[D50]
,[SubSpec2]
,[D51]
,[SubSpec3]
,[D52]
,[ContractName]
,[D53]
,[ContractBegin]
,[D54]
,[ContractEnd]
,[D55]
,[Parish1]
,[D56]
,[Parish2]
,[D57]
,[Parish3]
,[D58]
,[Parish4]
,[D59]
,[Parish5]
,[D60]
,[Parish6]
,[D61]
,[Parish7]
,[D62]
,[Parish8]
,[D63]
,[Parish9]
,[D64]
,[Parish10]
,[D65]
,[Parish11]
,[D66]
,[Parish12]
,[D67]
,[Parish13]
,[D68]
,[Parish14]
,[D69]
,[Parish15]
,[D70]
,[PCPInd]
,[D71]
,[DisplayOnline]
,[D72]
,[ExpAgeRestrict]
,[D73]
,[Suffix]
,[D74]
,[Title]
,[D75]
,[PrescriberInd]
,[Spaces]
,[End]
FROM [LA_Temp].[dbo].[ProvReg]
--Example Text File Data (this is one line)
9999999999 ^0^ ^ ^3800 HMA BLVD STE 305 ^ ^METAIRIE ^LA^70006 ^ ^5048729679^ ^3800 HMA BLVD ^ ^METAIRIE ^LA^70006 ^ ^9999999999^ ^207Q00000X^ ^ ^0000000^2001^ ^00000000^ ^00000000^00000000^F^ ^LA^ ^ ^ ^N^1^0^0^0^0^2^00000^00000^00000^ ^ ^ ^ ^ ^ ^000000000000000000000000000000^00000000^00000000^26^00^00^00^00^00^00^00^00^00^00^00^00^00^00^0^0^Accept patients of age 000-000^ ^MD ^ ^
--Format file
11.0
153
1 SQLCHAR 0 40 "\t" 1 NPI SQL_Latin1_General_Pref_CP1_CI_AS
2 SQLCHAR 0 2 "\t" 2 D1 SQL_Latin1_General_Pref_CP1_CI_AS
3 SQLCHAR 0 2 "\t" 3 EntityType
...all the way to...
153 SQLCHAR 0 2 "\r\n" 153 End
I have changed directories, servername, and some of the text file data to maintain security, however, it is very similar.
Here is the problem I am encountering:
With the "\t" used in the format file I just created from the SQL table, I get the error: [Microsoft][SQL Server Native Client 11.0]Unexpected EOF encountered in BCP data-file.
If I change this to just "" or "^" (as I 'think' it should be since the text file is using carrot delimiter), the rows began to copy with error
[Microsoft][SQL Server Native Client 11.0]String data, right truncation SQLState = 22001, NativeError = 0. BCP copy in failed.
If anyone can please point me in the right direction here for troubleshooting this issue, or if you see anything out of place, please let me know. As I mentioned, I have been at this for some time, and can use any suggestions I can get. Unfortunately, there is no one at my company I can ask about this.

try adding the -e option to your bcp command. this will give you an error file in which BCP will write some samlpe lines from the file that it had problems with. Very helpful with troubleshooting the type of error you are getting now (you are correct to change your delimiter in the format file).
The error you are getting now "string data" and "truncation" is just as it states. However, this truncation can be occurring for a number of reasons. The destination table's columns may not be large enough to hold the data that is contained between the defined field delimiters. There may be delimiters appearing in your data and so this could be tricking the bcp utility into thinking a column has ended before it was intended to end in the file (this is less likely with the delimiter you are using... but ya never know... I always prefer fixed width if possible.). And, of course, the source of the data may very well have written you a file that contradict whatever agreed upon spec led you to define your destination as you have.
The error is accurate, teh trick is finding where. Use the -e option to allow BCP to capture problematic lines:
BCP table_dest IN "C:\FILE.TXT" -S SVR -T -f"C:\FORMAT_FILE.txt" -e"C:\ERROR_FILE.txt"
The "error_file.txt" will include line numbers and will include a sample of lines that it couldn't handle. Just copy and past to find in the file youare trying to load to see for yourself.
Strongly suggest using a more advanced text editing tool. Do not use windows notepad or wordpad. Use something like notepad++ or ultraedit to inspect ascii text files.

Related

Creating a Format File for Bulk Import

I am trying to create a Format File to bulk import a .csv file but i, am getting an error.
Query I used
"BCP -SMSSQLSERVER01.[Internal_Checks].[Jan_Flat] format out -fC:\Desktop\exported data\Jan_FlatFormat.fmt -c -T -Uasda -SMSSQLSERVER01 -PPASSWORD"
I am getting an error
"A valid table name is required for in, out, or format options."
This is the error. can anyone suggest what need to do.
According to the bcp Utility documentation the first parameter should be a [Database.]Schema.{Table | View | "query"}, so don't put -SMSSQLSERVER01 where you've got it. Also use format nul instead of format out.
Try using:
bcp.exe [Internal_Checks].[Jan_Flat] format nul "-fC:\Desktop\exported data\Jan_FlatFormat.fmt" -c -SMSSQLSERVER01 -T -Uasda -PPASSWORD
Note the quotes " around the -f switch because your path name contains space characters.
Also note that the -c switch causes single-byte characters (ASCII/OEM/codepage with SQLCHAR) to be written out. If your table contains nchar, nvarchar or ntext columns you should consider using the -w switch instead so as to write out UTF-16 encoded data (using SQLNCHAR).

while [[ condition ]] stalls on loop exit

I have a problem with ksh in that a while loop is failing to obey the "while" condition. I should add now that this is ksh88 on my client's Solaris box. (That's a separate problem that can't be addressed in this forum. ;) I have seen Lance's question and some similar but none that I have found seem to address this. (Disclaimer: NO I haven't looked at every ksh question in this forum)
Here's a very cut down piece of code that replicates the problem:
1 #!/usr/bin/ksh
2 #
3 go=1
4 set -x
5 tail -0f loop-test.txt | while [[ $go -eq 1 ]]
6 do
7 read lbuff
8 set $lbuff
9 nwords=$#
10 printf "Line has %d words <%s>\n" $nwords "${lbuff}"
11 if [[ "${lbuff}" = "0" ]]
12 then
13 printf "Line consists of %s; time to absquatulate\n" $lbuff
14 go=0 # Violate the WHILE condition to get out of loop
15 fi
16 done
17 printf "\nLooks like I've fallen out of the loop\n"
18 exit 0
The way I test this is:
Run loop-test.sh in background mode
In a different window I run commands like "echo some nonsense >>loop_test.txt" (w/o the quotes, of course)
When I wish to exit, I type "echo 0 >>loop-test.txt"
What happens? It indeed sets go=0 and displays the line:
Line consists of 0; time to absquatulate
but does not exit the loop. To break out I append one more line to the txt file. The loop does NOT process that line and just falls out of the loop, issuing that "fallen out" message before exiting.
What's going on with this? I don't want to use "break" because in the actual script, the loop is monitoring the log of a database engine and the flag is set when it sees messages that the engine is shutting down. The actual script must still process those final lines before exiting.
Open to ideas, anyone?
Thanks much!
-- J.
OK, that flopped pretty quick. After reading a few other posts, I found an answer given by dogbane that sidesteps my entire pipe-to-while scheme. His is the second answer to a question (from 2013) where I see neeraj is using the same scheme I'm using.
What was wrong? The pipe-to-while has always worked for input that will end, like a file or a command with a distinct end to its output. However, from a tail command, there is no distinct EOF. Hence, the while-in-a-subshell doesn't know when to terminate.
Dogbane's solution: Don't use a pipe. Applying his logic to my situation, the basic loop is:
while read line
do
# put loop body here
done < <(tail -0f ${logfile})
No subshell, no problem.
Caveat about that syntax: There must be a space between the two < operators; otherwise it looks like a HEREIS document with bad syntax.
Er, one more catch: The syntax did not work in ksh, not even in the mksh (under cygwin) which emulates ksh93. But it did work in bash. So my boss is gonna have a good laugh at me, 'cause he knows I dislike bash.
So thanks MUCH, dogbane.
-- J
After articulating the problem and sleeping on it, the reason for the described behavior came to me: After setting go=0, the control flow of the loop still depends on another line of data coming in from STDIN via that pipe.
And now that I have realized the cause of the weirdness, I can speculate on an alternative way of reading from the stream. For the moment I am thinking of the following solution:
Open the input file as STDIN (Need to research the exec syntax for that)
When the condition occurs, close STDIN (Again, need to research the syntax for that)
It should then be safe to use the more intuitive:while read lbuffat the top of the loop.
I'll test this out today and post the result. I'd hope someone else benefit from the method (if it works).

How can I delete a specific line (e.g. line 102,206,973) from a 30gb csv file?

What method can I use to delete a specific line from a csv/txt file that is too big too load into memory and edit manually?
Background
My question is actually an indirect solution to a problem related with importing csv into sql databases.
I have a series of 10-30gb csv files I want to import and populate an sqlite table from within R (Since they are too large to import as data frames as a whole into R). I am using the 'RSQlite' package for this.
A couple fail because of an error related to one of the lines being badly formatted. The populating process is then cancelled. R returns the line number which caused the process to fail.
The error given is:
./csvfilename line 102206973 expected 9 columns of data but found 3)
So I know exactly the line which causes the error.
I see 2 potential 'indirect' solutions which I was hoping someone could help me with.
(i) Deleting the line causing the error in 20+gb files. e.g. line 102,206,973 in the example above.
I am not concerned with 'losing' the data in line 102,206,973 by just skipping or deleting it. However I have tried and failed to somehow access the csv file and to remove the line.
(ii) Using sqlite directly (or anything else?) to import an csv which does allow you to skip lines or an error.
Although not likely to be related directly to the solution, here is the R code used.
db <- dbConnect(SQLite(), dbname=name_of_table)
dbWriteTable(conn = db, name ="currentdata", value = csvfilename, row.names = FALSE, header = TRUE)
Thanks!
To delete a specific line you can use sed:
sed -e '102206973d' your_file
If you want the replacement to be done in-place, do
sed -i.bak -e '102206973d' your_file
This will create a backup names your_file.bak and your_file will have the specified line removed.
Example
$ cat a
1
2
3
4
5
$ sed -i.bak -e '3d' a
$ cat a
1
2
4
5
$ cat a.bak
1
2
3
4
5

What's "\." in postgresql dump file?

I have a Postgresql dump file which i'm trying to restore. I get this error regarding an invalid data i guess.
ERROR: invalid input syntax for integer: "."
and when i checked the file, there are data like this:
469215 2009-10-10 18:16:47.041377 0 1
471217 2009-10-10 18:25:12.536352 0 1
473224 2009-10-17 09:46:43.041604 0 1
473228 2009-10-22 10:58:40.194244 0 1
.
so i was wondering what is this "." do?
i check some other working dumps and they ended their data line with "." which i guess it's the correct syntax!
Please tell me what's the correct syntax and what does it do?
thank you
Seems that it marks the end of a COPY statement
From the documentation
End of data can be represented by a single line containing just backslash-period (\.).
An end-of-data marker is not necessary when reading from a file, since the end of file
serves perfectly well; it is needed only when copying data to or from client
applications using pre-3.0 client protocol.

Nano hacks: most useful tiny programs you've coded or come across

It's the first great virtue of programmers. All of us have, at one time or another automated a task with a bit of throw-away code. Sometimes it takes a couple seconds tapping out a one-liner, sometimes we spend an exorbitant amount of time automating away a two-second task and then never use it again.
What tiny hack have you found useful enough to reuse? To make go so far as to make an alias for?
Note: before answering, please check to make sure it's not already on favourite command-line tricks using BASH or perl/ruby one-liner questions.
i found this on dotfiles.org just today. it's very simple, but clever. i felt stupid for not having thought of it myself.
###
### Handy Extract Program
###
extract () {
if [ -f $1 ] ; then
case $1 in
*.tar.bz2) tar xvjf $1 ;;
*.tar.gz) tar xvzf $1 ;;
*.bz2) bunzip2 $1 ;;
*.rar) unrar x $1 ;;
*.gz) gunzip $1 ;;
*.tar) tar xvf $1 ;;
*.tbz2) tar xvjf $1 ;;
*.tgz) tar xvzf $1 ;;
*.zip) unzip $1 ;;
*.Z) uncompress $1 ;;
*.7z) 7z x $1 ;;
*) echo "'$1' cannot be extracted via >extract<" ;;
esac
else
echo "'$1' is not a valid file"
fi
}
Here's a filter that puts commas in the middle of any large numbers in standard input.
$ cat ~/bin/comma
#!/usr/bin/perl -p
s/(\d{4,})/commify($1)/ge;
sub commify {
local $_ = shift;
1 while s/^([ -+]?\d+)(\d{3})/$1,$2/;
return $_;
}
I usually wind up using it for long output lists of big numbers, and I tire of counting decimal places. Now instead of seeing
-rw-r--r-- 1 alester alester 2244487404 Oct 6 15:38 listdetail.sql
I can run that as ls -l | comma and see
-rw-r--r-- 1 alester alester 2,244,487,404 Oct 6 15:38 listdetail.sql
This script saved my career!
Quite a few years ago, i was working remotely on a client database. I updated a shipment to change its status. But I forgot the where clause.
I'll never forget the feeling in the pit of my stomach when I saw (6834 rows affected). I basically spent the entire night going through event logs and figuring out the proper status on all those shipments. Crap!
So I wrote a script (originally in awk) that would start a transaction for any updates, and check the rows affected before committing. This prevented any surprises.
So now I never do updates from command line without going through a script like this. Here it is (now in Python):
import sys
import subprocess as sp
pgm = "isql"
if len(sys.argv) == 1:
print "Usage: \nsql sql-string [rows-affected]"
sys.exit()
sql_str = sys.argv[1].upper()
max_rows_affected = 3
if len(sys.argv) > 2:
max_rows_affected = int(sys.argv[2])
if sql_str.startswith("UPDATE"):
sql_str = "BEGIN TRANSACTION\\n" + sql_str
p1 = sp.Popen([pgm, sql_str],stdout=sp.PIPE,
shell=True)
(stdout, stderr) = p1.communicate()
print stdout
# example -> (33 rows affected)
affected = stdout.splitlines()[-1]
affected = affected.split()[0].lstrip('(')
num_affected = int(affected)
if num_affected > max_rows_affected:
print "WARNING! ", num_affected,"rows were affected, rolling back..."
sql_str = "ROLLBACK TRANSACTION"
ret_code = sp.call([pgm, sql_str], shell=True)
else:
sql_str = "COMMIT TRANSACTION"
ret_code = sp.call([pgm, sql_str], shell=True)
else:
ret_code = sp.call([pgm, sql_str], shell=True)
I use this script under assorted linuxes to check whether a directory copy between machines (or to CD/DVD) worked or whether copying (e.g. ext3 utf8 filenames -> fusebl
k) has mangled special characters in the filenames.
#!/bin/bash
## dsum Do checksums recursively over a directory.
## Typical usage: dsum <directory> > outfile
export LC_ALL=C # Optional - use sort order across different locales
if [ $# != 1 ]; then echo "Usage: ${0/*\//} <directory>" 1>&2; exit; fi
cd $1 1>&2 || exit
#findargs=-follow # Uncomment to follow symbolic links
find . $findargs -type f | sort | xargs -d'\n' cksum
Sorry, don't have the exact code handy, but I coded a regular expression for searching source code in VS.Net that allowed me to search anything not in comments. It came in very useful in a particular project I was working on, where people insisted that commenting out code was good practice, in case you wanted to go back and see what the code used to do.
I have two ruby scripts that I modify regularly to download all of various webcomics. Extremely handy! Note: They require wget, so probably linux. Note2: read these before you try them, they need a little bit of modification for each site.
Date based downloader:
#!/usr/bin/ruby -w
Day = 60 * 60 * 24
Fromat = "hjlsdahjsd/comics/st%Y%m%d.gif"
t = Time.local(2005, 2, 5)
MWF = [1,3,5]
until t == Time.local(2007, 7, 9)
if MWF.include? t.wday
`wget #{t.strftime(Fromat)}`
sleep 3
end
t += Day
end
Or you can use the number based one:
#!/usr/bin/ruby -w
Fromat = "http://fdsafdsa/comics/%08d.gif"
1.upto(986) do |i|
`wget #{sprintf(Fromat, i)}`
sleep 1
end
Instead of having to repeatedly open files in SQL Query Analyser and run them, I found the syntax needed to make a batch file, and could then run 100 at once. Oh the sweet sweet joy! I've used this ever since.
isqlw -S servername -d dbname -E -i F:\blah\whatever.sql -o F:\results.txt
This goes back to my COBOL days but I had two generic COBOL programs, one batch and one online (mainframe folks will know what these are). They were shells of a program that could take any set of parameters and/or files and be run, batch or executed in an IMS test region. I had them set up so that depending on the parameters I could access files, databases(DB2 or IMS DB) and or just manipulate working storage or whatever.
It was great because I could test that date function without guessing or test why there was truncation or why there was a database ABEND. The programs grew in size as time went on to include all sorts of tests and become a staple of the development group. Everyone knew where the code resided and included them in their unit testing as well. Those programs got so large (most of the code were commented out tests) and it was all contributed by people through the years. They saved so much time and settled so many disagreements!
I coded a Perl script to map dependencies, without going into an endless loop, For a legacy C program I inherited .... that also had a diamond dependency problem.
I wrote small program that e-mailed me when I received e-mails from friends, on an rarely used e-mail account.
I wrote another small program that sent me text messages if my home IP changes.
To name a few.
Years ago I built a suite of applications on a custom web application platform in PERL.
One cool feature was to convert SQL query strings into human readable sentences that described what the results were.
The code was relatively short but the end effect was nice.
I've got a little app that you run and it dumps a GUID into the clipboard. You can run it /noui or not. With UI, its a single button that drops a new GUID every time you click it. Without it drops a new one and then exits.
I mostly use it from within VS. I have it as an external app and mapped to a shortcut. I'm writing an app that relies heavily on xaml and guids, so I always find I need to paste a new guid into xaml...
Any time I write a clever list comprehension or use of map/reduce in python. There was one like this:
if reduce(lambda x, c: locks[x] and c, locknames, True):
print "Sub-threads terminated!"
The reason I remember that is that I came up with it myself, then saw the exact same code on somebody else's website. Now-adays it'd probably be done like:
if all(map(lambda z: locks[z], locknames)):
print "ya trik"
I've got 20 or 30 of these things lying around because once I coded up the framework for my standard console app in windows I can pretty much drop in any logic I want, so I got a lot of these little things that solve specific problems.
I guess the ones I'm using a lot right now is a console app that takes stdin and colorizes the output based on xml profiles that match regular expressions to colors. I use it for watching my log files from builds. The other one is a command line launcher so I don't pollute my PATH env var and it would exceed the limit on some systems anyway, namely win2k.
I'm constantly connecting to various linux servers from my own desktop throughout my workday, so I created a few aliases that will launch an xterm on those machines and set the title, background color, and other tweaks:
alias x="xterm" # local
alias xd="ssh -Xf me#development_host xterm -bg aliceblue -ls -sb -bc -geometry 100x30 -title Development"
alias xp="ssh -Xf me#production_host xterm -bg thistle1 ..."
I have a bunch of servers I frequently connect to, as well, but they're all on my local network. This Ruby script prints out the command to create aliases for any machine with ssh open:
#!/usr/bin/env ruby
require 'rubygems'
require 'dnssd'
handle = DNSSD.browse('_ssh._tcp') do |reply|
print "alias #{reply.name}='ssh #{reply.name}.#{reply.domain}';"
end
sleep 1
handle.stop
Use it like this in your .bash_profile:
eval `ruby ~/.alias_shares`