I have a SQL Server database that is storing the contents of files in a table. Specifically, there are 2 fields:
Contents: varbinary(max) field that always starts with '0x1F.....'
FileType: varchar(5) field that has the type of file, such as PDF, docx, etc.
How can I convert the contents back into a file? I am trying to use Coldfusion, if that is possible, to convert it. If not, what are the steps to convert the binary into a file?
I tried the following (assuming a docx filetype) but it didn't produce a valid word file:
<cfset DecodedValue = BinaryDecode(contents,"hex")>
<cffile action="WRITE" output="#DecodedValue#" file="C:\decodedfile.docx">
Thanks to User Ageax, the first 4 size bytes of 31,-117,8,0 show the content is stored in GZIP format instead.
I first save the content as a gzip then extract the file. My code is as follows:
<cfquery name="getfile" datasource="tempdb">
select content from table
</cfquery>
<cfset FileWrite("C:\mygzipfile.gzip", getfile.content)>
To extract gzip to a file using coldfusion, I used the solution at: http://coldfusion-tip.blogspot.com/2012/04/unzip-gz-file-in-coldfusion.html
tldr;
The data is already binary, so ditch the binaryX() functions and save the content directly to a file. Read the first few bytes of the binary to verify the file type. In this case, turns out the document was actually stored in GZIP format, not raw DOCX.
Don't be misled by how SSMS chooses to display it. SSMS displays binary in user friendly hex format, but it's still stored as binary. Just write the binary directly to the file, without any BinaryX functions.
<cfset FileWrite("C:\decodedfile.docx", contents)>
Also, check your DSN settings and ensure the "BLOB - Enable binary large object retrieval (BLOB)" setting is enabled, so binary values aren't truncated at 64K (default buffer size).
Update 1:
The FileWrite() code above works correctly IF the "contents" column contains the binary of a valid .docx file. Perhaps the data is being stored differently than we're thinking? Run a query to retrieve the binary of a single document and output the first four bytes. What is the result? Typically, the first four bytes of .docx files should be 80, 75, 3, 4.
<!--- print size and first 4 bytes --->
<cfoutput>
size in bytes = #arrayLen(qYourQuery.contents)#<br>
<cfloop from="1" to="4" index="x">
byte #x# = #qYourQuery.contents[1][x]#<br>
</cfloop>
</cfoutput>
Update 2:
Closest I could find to 1F 8B 08 is GZIP. Try using probeContentType() on the saved file. What does it report?
<cfscript>
paths = createObject("java", "java.nio.file.Paths");
files = createObject("java", "java.nio.file.Files");
input = paths.get("c:/yourFileName.docx", []);
writeDump(files.probeContentType(input));
</cfscript>
Related
<cfif isPDFFile("book.pdf")>
Not corrupted!<br/>
<cfelse>
Corrupted pdf file!
</cfif>
I'm new to coldfusion. Can anybody help me how to check and download corrupted pdf files using coldfusion.
if book.pdf is corrupted one then isPDFFile() returns false (i.e this function returns that book.pdf is not a pdf file). So,can we use this point for checking if a pdf file is corrupted or not?
Is this the right way to do? If not,what's the right way and how to download those corrupted pdf files?
ColdFusion's isPDFfile function already returns if the file is invalid or corrupt. But you may want to distinguish between the cause of the return value:
<cfset pdfFileLocation = "book.pdf">
<cfif (not isSimpleValue(pdfFileLocation)) or (not len(pdfFileLocation))>
<cfoutput>File's location is invalid.</cfoutput>
<cfelseif not fileExists(pdfFileLocation)>
<cfoutput>File not found on location #htmlEditFormat(pdfFileLocation)#.</cfoutput>
<cfelseif not isPDFfile(pdfFileLocation)>
<cfoutput>File is either not a PDF document or its content is damaged.</cfoutput>
<cfelse>
<cfoutput>File is a valid PDF document.</cfoutput>
</cfif>
What do you mean with "download"? In your example, you already have the file book.pdf in the current directory (relative path). If you want to repair the document, use ColdFusion's fileReadBinary function to inspect the binary data. Repairing PDF isn't exactly a child's play though.
I'm trying to generate excel file from Oracle by FILE_UTIL. In Oracle document they gave some mode of operations like
W - Write
R - Read
WB -Write Byte
RB - Read Byte
Unable to understand the difference between W and WB. Thanks in advance.
The documentation you're referring to seems to the this, which says slightly more than you indicated in the question:
Specifies how the file is opened. Modes include:
r -- read text
w -- write text
a -- append text
rb -- read byte mode
wb -- write byte mode
ab -- append byte mode
The documentation also says:
byte_mode Indicates whether the file was open as a binary file, or as a text file
So the b indicates byte mode rather than text mode. The file is accessed as a character stream if it's in text mode, so the file should be encoded in the database character set, as it says in the operational notes for that package. And it's accessed as a binary stream in byte mode. Several methods, such as get_line, will raise an exception for a file opened in byte mode as a 'line' has no meaning for binary data.
So it you're processing a file that is text, which could be stored as a CLOB, then use the text-mode flags. If you're processing a file that contains binary data like an image or PDF, which could be stored as a BLOB, use the byte-mode flags.
Excel files contain binary data whether you have a .xls or .xlsx file, so you'd need to use byte mode. If you were generating a .csv file though, you'd probably want text-mode.
I have multiple pdfs stored in my database as blobs and I need to merge them to create a single pdf that needs to be streamed to the user.
I understand that it is fairly easy to do this if I am rendering a single pdf from blob, but I cannot figure how to merge multiple blobs.
<cfheader name="Content-Disposition" value="inline; filename=#document.name#.#document.ext#">
<cfcontent type="application/pdf" variable="#document.content#">
I see that CFPDF helps with this functionality, but cant seem to be able to get my blob into a cfpdf variable. A similar question has been asked here before, but it doesnt have the answer I seek.
Thanks!
Try converting the blob data to a file (in-memory using ram:// if possible to save writing to disk) and then use that as the cfpdf merge source. You can do it for each blob as you loop over your query within the cfpdf action="merge" tag:
<cfquery name="q" datasource="test">
SELECT content FROM pdfs
</cfquery>
<cfpdf action="merge" name="mergedPdf">
<cfloop query="q">
<cfset tempBinary=q.content><!---intermediate var seems to be necessary in some environments --->
<cffile action="write" output="#tempBinary#" file="ram://temp.pdf">
<cfpdfparam source="ram://temp.pdf">
</cfloop>
</cfpdf>
<cfcontent type="application/pdf" variable="#ToBinary( mergedPdf )#" reset="true">
Note you can use a single temp file - no need to create a different one for each blob in the query.
I think you can do the following
Retrieve each pdf blob stored in db and create pdf using cfpdf and store them in some temp directory
<cfpdf action="write" destination="c:\pdfs\1.pdf" source="#mypdfblob1" >
Retrieve all such blobs and store them as pdfs in the temp directory
Merge all those pdfs using cfpdf merge by specifying the temp directory in cfpdf merge
<cfpdf action="merge" directory="C:\pdfs" destination="C:\result.pdf">
I ran into this problem when uploading a file with a super long name - my database field was only set to 50 characters. Since then, I have increased my database field length, but I'd like to have a way to check the length of the filename before uploading. Below is my code. The validation returns '85' as the character length. And it returns the same count for every different file I upload (none of which have a file name length of 85).
<cfscript>
missing_info = "<p>There was a slight problem with your submission. The following are required or invalid:</p><ul>";
// Check the length of the file name for our database field
if ( len(Form["ResumeFile1"]) gt 100 )
{
missing_info = missing_info & "<li>'Resume File 1' is invalid. Character length must be less than 100. Current count is " & len(Form["ResumeFile1"]) & ".</li>";
validation_error = true;
ResumeFileInvalidMarker = true;
}
</cfscript>
Anyone see anything wrong with this?
Thanks!
http://www.cfquickdocs.com/cf9/#cffile.upload
After you upload the file, the variable "clientFileName" will give you the name of the uploaded file, without a file extension.
The only way to read the filename before you upload it would be to use JavaScript to read and parse the value (file path) in the file field.
A quick clarification in the wording of your question. By the time your code executes the file upload has already happened. The file resides in a temporary directory on the ColdFusion server and the form field related to the file upload contains the temporary filename for that file. Aside from checking to see if a file has been specified, do not do anything directly with that file or you'll be circumventing some built in security.
You want to use the cffile tag with the upload action (or equivalent udf) to move the temp file into a folder of your choosing. At that point you get access to a structure containing lots of information. Usually I "upload" into a temporary directory for the application, which should be outside of the webroot for security.
At this point you'll then want to do any validation against the file, such as filename length, file type, file size, etc and delete the file if it fails any checks. If it passes all checks then you move it into it's final destination which may be inside the webroot.
In your case you'll want to check the cffile structure element clientFile which is the original filename including extension (which you'll need to check, since an extension doesn't need to be present and can be any length).
I have a form which allows a user to upload a file to the server. How can I validate that the uploaded file is in fact the expected format (CSV, or at least validate that it is a text file) in ColdFusion 8?
For simple formats like CSV, just check yourself, for example via regex.
<cffile action="read" file="#uploadedFile#" variable="contents" charset="UTF-8">
<cfset LooksLikeCSV = REFind("^([^;]*;)+[^;]*$", contents)>
You can place additional checks with regard to file size limits or forbidden characters.
For other file formats, you can check for header signatures that occur in the first few bytes of the file.
You could even write a full parser for your expected file format - for CSV validation, you could do a ListToArray() at CR/LF and check each line individually against a regex. XML should work pretty straightforward as well - just try to pass it to XmlParse(). Binary formats like images are a little more difficult, but libraries exist there as well.
I dont know if it can help you but Ben Nadel wrote excellents posts about CSV:
http://www.bennadel.com/blog/483-Parsing-CSV-Data-Using-ColdFusion.htm
http://www.bennadel.com/blog/976-Regular-Expressions-Make-CSV-Parsing-In-ColdFusion-So-Much-Easier-And-Faster-.htm
http://www.bennadel.com/blog/501-Parsing-CSV-Values-In-To-A-ColdFusion-Query.htm
I think it's as simple as specifying the accept value in cffile ...Unfortunately the CF8 docs don't specify the value as part of the info for cffile ... It's under file management ...
<cffile action=”upload” filefield=”filename” destination=”#destination#” accept=”text/csv”>
CF8 » Controlling the type of file uploaded