Exporting image data with bcp in sql2000 - blob

Hi I have a a sql 2000 database with a large number of scanned documents stored as pdfs and word documents stored in an image data type.
I need to export them to files.
I have written code to to do this using xp_cmdshell and bcp. Looking at other questions I have created a fmt file as below:
8.0
1
1 SQLIMAGE 0 0 "" 1 FILEDATA ""
the command is
bcp "select filedata FROM attacheddocuments where pkey = '+ convert (varchar, #imageid) + '" queryout "c:\scans\' + #imagefilename + '" -T -f c:\scans\attached.fmt
however when I run the query it creates all the files but they cannot be opened in either word or acrobat. both report that the file is corrupt.
If instead I run the command
bcp "select filedata FROM attacheddocuments where pkey = '+ convert (varchar, #imageid) + '" queryout "c:\scans\' + #imagefilename + '" -T -N
The pdf files now open ok but the word documents are still corrupt.
Does anyone have any ideas where I am going wrong?

I know this is a really old post, I am having this issue with all files except PDF's.
I have tried without the -N and with. Using a format file and not using a format file. Strange thing is I used to use my script often, had not used it for a while. During that period some SQL updates have came out. Not the scripts only exports PDF documents not being corrupt.
Zip files and just about any file I can run through a repair tool and they are fixed. But that is not a doable option due to volume.
format file
Plus My format file had a 4 instead of 0. That alone made all files but PDF's documents corrupt with corrupt headers.

Related

How to avoid empty xml file

Apologizing in advance if i am posting this in the wrong place. I have a windows batch script that calls a sql query to produce an xml file which gets transferred to a 3rd party. It is scheduled to run every 15 min. My issue is that there are times when there are no records and it creates an empty file. This causes issues when it reaches the 3rd party.
This is the section that is in my .sql file (which runs fine) but I cant figure out how to only produce the xml if there are records in it.
select #cmd = ' bcp "select * from dbo.WelcomeEmail CustomerDetail for xml auto, root(''CustomerDetails''), elements" ' + 'queryout "d:\sample.xml"
I tried this but it gave me an error bcp failed:
select #cmd = ' bcp "select * from dbo.WelcomeEmail where email is not null CustomerDetail for xml auto, root(''CustomerDetails''), elements" ' + 'queryout "d:\sample.xml"
if you need me to provide additional info, please let me know. Thanks!
See this answer: How can I check the size of a file in a Windows batch script?
You could check to see if the file contains zero bytes (or is whatever the size of the file is when there's no data) and then just delete the file.

Powershell script to convert PDF to TIFF with Ghostscript

I have been asked to write a script that automatically converts PDF files to TIFF files so they can be processed furter. With a lot of help from Google and this site. (I never studied any programming language) I created the code below.
Even though it's working now, it is not quite what I was hoping for since it creates 13 files every time it runs where it should create only 1.
Could someone be kind enough to take a look at the script and tell me where I went wrong?
Thank you in advance!
EDIT:
In this (test) case there's only one PDF in the folder and it's named test.pdf, however the idea is that the script looks through all the PDF in the given folder since it's unsure how many PDF's are in the folder at any given time. Let it run as a service in the background(?)
I'll edit the post with the error code/description once I find out how to get them, I can't keep up with the command line.
#Path to your Ghostscript EXE
$tool = 'C:\\Program Files\\gs\\gs9.10\\bin\\gswin64c.exe'
#Directory containing the PDF files that will be converted
$inputDir = 'C:\\test\\'
#Output path where converted PDF files will be stored
$outputDirPDF = 'C:\\test\\oud\\'
#Output path where the TIF files will be saved
$outputDir = 'C:\\test\\TIFF'
$pdfs = get-childitem $inputDir -recurse | where {$_.Extension -match "pdf"}
foreach($pdf in $pdfs)
{
$tif = $outputDir + $pdf.BaseName + ".tif"
if(test-path $tif)
{
"tif file already exists " + $tif
}
else
{
'Processing ' + $pdf.Name
$param = "-sOutputFile=$tif"
& $tool -q -dNOPAUSE -sDEVICE=tiffg4 $param -r300 $pdf.FullName -c quit
}
Move-Item $pdf $outputDirPDF
}
It's working now, apparently I was missing an "exit" at the end of the code. It might not be the most beautiful piece of code, but it seems to do the job so I'm happy with it.
Below the piece of code that actually works;
#Path to your Ghostscript EXE
$tool = 'C:\\Program Files\\gs\\gs9.10\\bin\\gswin64c.exe'
#Directory containing the PDF files that will be converted
$inputDir = 'C:\\test\\'
#Output path where converted PDF files will be stored
$outputDirPDF = 'C:\\test\\oud\\'
#Output path where the TIF files will be saved
$outputDir = 'C:\\test\\TIFF\\'
$pdfs = get-childitem $inputDir -recurse | where {$_.Extension -match "pdf"}
foreach($pdf in $pdfs)
{
$tif = $outputDir + $pdf.BaseName + ".tif"
if(test-path $tif)
{
"tif file already exists " + $tif
}
else
{
'Processing ' + $pdf.Name
$param = "-sOutputFile=$tif"
& $tool -q -dNOPAUSE -sDEVICE=tiffg4 $param -r300 $pdf.FullName -c quit
}
Move-Item $pdf $outputDirPDF
}
EXIT
It appears to be creating one TIFF file for each PDF file in the source directory. How many PDF files are in the directory (and any sub-directories) ? How many pages in the input PDF file ?
I note that you move the original PDF from 'InputDir' to 'OutputDirPDF' when completed, but 'OutputDirPDF' is a child of 'InputDir', so if you recurse child directories when looking for input files you may find files you have already processed. NB I know nothing about Powershell so this may be just fine.
I'd suggest making 'InputDir' and 'OutputDirPDF' at the same level, eg "c:\temp\input" and "c:\temp\outputPDF".
That's about all I can say on the information here, you could state what the input PDF filename(s) and output Filename(s) are, and what the processing messages say.

Unable to open BCP host data-file

Below is an example of the BCP Statement.
I'm not accustomed to using BCP so your help and candor is greatly appreciated
I am using it with a format file as well.
If I execute from CMD prompt it works fine but from SQL I get the error.
The BCP statement is all on one line and the SQL Server Agent is running as Local System.
The SQL server, and script are on the same system.
I ran exec master..xp_fixeddrives
C,45589
E,423686
I've tried output to C and E with the same result
EXEC xp_cmdshell 'bcp "Select FILENAME, POLICYNUMBER, INSURED_DRAWER_100, POLICY_INFORMATION, DOCUMENTTYPE, DOCUMENTDATE, POLICYYEAR FROM data.dbo.max" queryout "E:\Storage\Export\Data\max.idx" -fmax-c.fmt -SSERVERNAME -T
Here is the format file rmax-c.fmt
10.0
7
1 SQLCHAR 0 255 "$#Y#$" 1 FILENAME
2 SQLCHAR 0 40 "" 2 POLICYNUMBER
3 SQLCHAR 0 40 "" 3 INSURED_DRAWER_100
4 SQLCHAR 0 40 "" 4 POLICY_INFORMATION
5 SQLCHAR 0 40 "" 5 DOCUMENTTYPE
6 SQLCHAR 0 40 "" 6 DOCUMENTDATE
7 SQLCHAR 0 8 "\r\n" 7 POLICYYEAR
Due to formating in this post the last column of the format file is cut off but reads SQL_Latin1_General_CP1_CI_AS for each column other that documentdate.
Does the output path exist? BCP does not create the folder before trying to create the file.
Try this before your BCP call:
EXEC xp_cmdshell 'MKDIR "E:\Storage\Export\Data\"'
First, rule out an xp_cmdshell issue by doing a simple 'dir c:*.*';
Check out my blog on using BCP to export files.
I had problems on my system in which I could not find the path to BCP.EXE.
Either change the PATH variable of hard code it.
Example below works with Adventure Works.
-- BCP - Export query, pipe delimited format, trusted security, character format
DECLARE #bcp_cmd4 VARCHAR(1000);
DECLARE #exe_path4 VARCHAR(200) =
' cd C:\Program Files\Microsoft SQL Server\100\Tools\Binn\ & ';
SET #bcp_cmd4 = #exe_path4 +
' BCP.EXE "SELECT FirstName, LastName FROM AdventureWorks2008R2.Sales.vSalesPerson" queryout ' +
' "C:\TEST\PEOPLE.TXT" -T -c -q -t0x7c -r\n';
PRINT #bcp_cmd4;
EXEC master..xp_cmdshell #bcp_cmd4;
GO
Before changing the path to \110\ for SQL Server 2012 and the name of the database to [AdventureWorks2012], I received the following error.
After making the changes, the code works fine from SSMS. The service is running under NT AUTHORITY\Local Service. The SQL Server Agent is disabled. The output file was created.
Please check, the file might be opened in another application or program.
If it is the case, bcp.exe cannot overwrite the existing file contents.
In my case, I solved The problem in the following way:
my command was :
bcp "select Top 1000 * from abc.dbo.abcd" queryout FileNameWithDirectory -c -t "|" -r "0x0a" -S 192.111.1.111 -U xx -P xxxxx
My FileNameWithDirectory was too long. like "D:\project-abc\R&D\abc-608\FilesNeeded\FilesNeeded\DataFiles\abc.csv".
I change into a simpler directory like : "D:\abc.csv"
Problem solved.
So I guess the problem occurred due to file name exceeding. thus the file was not found.
If it works from the command line but not from the SQL Agent, I think it is an authentication issue.
The SQL Server Agent is running under a account. Make sure that the account has the ability to read the format file and generate the output file.
Also, make sure the account has the ability to execute the xp_cmdshell stored procedure.
Write back with your progress ...
I received this after I shared my output folder, even when there were no files open.
I created a new, unshared folder for output and all was fine.
(might help someone ;-))
In my case this fix was simply running in administrator mode.
This error can be due to insufficient write permissions to the target folder.
This is a common issue, since the user writing the query might have access to a folder, but the SQL Server Agent or logged-in server account which actually invokes bcp.exe may not.
Destination path has to already exist (except for file name).
Remove no_output from your command, if you use one offcourse
SET #sql = 'BCP ....'
EXEC master..xp_cmdshell #sql , no_output
EXEC master..xp_cmdshell #sql
In case anyone else runs into the same problem: I had ...lesPerson" queryout' rather than ...lesPerson" queryout '
If your code is writing the data file, and then reading it with BCP, make sure that you CLOSE THE DATA FILE before trying to read it!
Failure to do so gives: 'Unable to open host data-file'.
Python example:
# Management of temporary bulk insert file.
def openBulkInsertFile(self) :
self.bulkInsertFile = open('c:/tmp/bulkInsertContent.txt', 'w', newline='')
self.csvWriter = csv.writer(self.bulkInsertFile)
def closeBulkInsertFile(self) :
self.bulkInsertFile.close()
When using a Job in SQL the user that uses the SQL express server is the current user logged, you should give write permission to that user in the folder where the Batch writes the output.
This happens usually only with bcp, when using type commands the ownership goes to the computer(Administrator) and the command runs with out problem.
So if you have a long command in your job just look for the bcp parts.

BCP utility - query hangs, txt file is created but nothing happens (no data)

for whatever reason, when I get to the step where I generate a txt file with data from a query using the BCP utility, it hangs on the file creation. then if i try to query the database for those tables, it really won't let me.
does anyone know why this would happen? the query is actually very simple:
SET #cmdQueryout = 'bcp "SELECT X FROM Database.dbo.Details WHERE DetailsId = (SELECT MAX(DetailsId) FROM Database.dbo.Details WHERE CommitDateTime IS NOT NULL AND LEFT(PolicyNumber, 3) != ''NYD'') ORDER BY X, Y, Z" queryout "' + #detailFilePath + '" -c -T'
EXEC master..xp_cmdshell #cmdQueryout
I can see it created the first file but there's no data in it and it stops there.
I can open the file but if I try to delete, it won't let me because BCP is using the file.
The query should not take more than a few seconds to run so why would it stop like this?
EDIT - If I run this by itself in another query window, it works.
But if it's in a SQL job and in a transaction, it does not work.
Found out the issue.
I was using a trusted connection when I needed to specify a username & password.
-Uusername -Ppassword
instead of -T.

Reading PDF into a blob then sending as an attachment

I am trying to read a PDF into a blob object then do an INSERT into my oracle database so that it can be sent off as an attachment. Now the email portion is working, and it adds an attachment but the attachment is always corrupt and I can't open it. Below is the code where I create my blob pdf, can someone help me figure out why this isn't creating the proper attachment?
ls_pdf_name = ls_pdf_path + "\" + "invnum_" + ls_invoice + ".pdf"
ls_pdf_filename = "invoice_" + ls_invoice + ".pdf"
ls_rc = wf_check_pdf_status(ll_invoice_number, ls_sub_type, ll_user_supp_id)
If ls_rc = "Y" Then
li_fnum = FileOpen(ls_pdf_name, StreamMode!)
li_bytes = FileRead(li_fnum, bPDF)
FileClose(li_fnum)
ll_rc = wf_update_pdf_tables(bPDF, ls_pdf_filename, ls_sub_type, ll_user_supp_id, ll_invoice_number, ls_month, ls_year)
EDIT
So I took Calvin's advice and switched my insert to the following:
Here is the INSERT statement that puts the blob into the table
INSERT INTO ATTACH_DOCUMENT
(id, filename, mime_type, date_time_created)
VALUES
(ATTACH_DOCUMENT_SEQ.NEXTVAL, :pdf_filename, 'application/pdf', CURRENT_TIMESTAMP);
UPDATEblob ATTACH_DOCUMENT
SET data = :pdf
WHERE id = ATTACH_DOCUMENT_SEQ.CURRENTVAL;
But when I go to open the PDF email attachment from my email, Adobe opens up with this error - Could not open because it is either not a supported file type or because it has been damaged (for example it was sent as an email attachment and wasn't decoded correctly)
Thanks
How big is the PDF file?
You may not be getting all the contents with a simple FileRead() - try using FileReadEx()
The first thing to do is to check if the pdf is correctly saved in the database :
don't remove the pdf file after it is inserted in the db.
using a sql interpreter calculate the size of the blob column you have inserted the file into and verify that it matches the file size.
You didn't mention the database you are using, for example in ms sql server you could use the datalength() function to do this. Depending on the database you may check if the pdf is corrupted by calculating its md5 hash.
if you use a capable query tool (eg TOAD for Oracle) you could save the blob as a pdf file and verify that is readable.