I am using cfprint from ColdFusion to print multiple PDFs from a directory. The problem I am having is that when the files are spooled to the printer the size of the file dramatically increases and slows down everything. The file in the folder is 125K and when it is in the printer spool it increases up to 15.7MB. Here is the ColdFusion code:
<cfprint
source="[FILELOCATION]/[FILE].pdf"
color="yes"
printer="[printer name]">
The files will eventually print but it can take upwards of 15-20 minutes. Does anyone have any solutions for this issue? I have tried with both CF generated PDFs and ones that I have created from scratch. Thanks
Queue up two to five at a time. Pause to allow processing. Mark them as printed, move or delete them, move to the next batch...Time this out yourself to see how much time you need to allow. That way you don't compound a bunch of work for the server and create a bottleneck on your CF server.
If you are just doing this with one server consider using a secondary low priority server and run a developer edition fully paid for EULA compliant registered version of Coldfusion (or Railo) and dedicate that server for just printing so your other server can do useful things.
Edit
So the OP has a Coldfusion print bottleneck. In your server that does the printing (same as your CF server I assume?) and IF this is a windows server (not sure your server version), there is print queue folder. Provided you have access to this folder, you can do a few things. You can create a method for FTP-ing your files to this folder (or copy if it is the same server). The printer will queue up the job and off it goes. You can do some functions like check the print queue for file count. If the file count is greater zero, check back in 15 minutes. If the count is zero, copy over a few more files.
You be creating a scheduled task in your CFAdmin and automate. There is a getprinterInfo() so you can check if the printer is off line and do other things like check for another printer somewhere else if you need to reroute print jobs. You can also set up several print servers and attach printers to them and hit several print servers and check print queue folders.
The magic is endless, goal is to offset work to something other than your Coldfusion server.
So to recap:
Seperate concerns by not doing cfprint
Create escape routes to other priters if you can.
If you must use coldfusion then queue up a dedicated Coldfusion server for print management stuff.
Use getPrinterInfo() and dump out things to see what you can use, trap etc.
Ben forta has a tool that can check for several printers, consider incorperating this.
Next use cfftp (or cffile if you are on the same server) provided you have access and copy files to print queue folders, doing no cfprint at all.
Here is a link on print spool stuff (another link in the doc shows you how you can change the spool location).
When it is over you are going to be the coldfusion print master with escape routes and checks and everything.
Related
We are having a strange issue with Directory.GetFiles method trying search for a Word Document from a UNC Folder Share (NTFS Disk) on a Win2008R2 VMServer. The share contain over 10K Files in the Parent Folder and 75K Files in a SubDirectory.
It was all working fine in Win2003 Server. When migrated to Win2008R2 Server, the WinForms application freezes over this method and taking almost 13 minutes to Open a single File from a Client machine connected to the File Share via a VPN Network that has Download Speed bandwidth of 1Mbps (not throughput).
After search & research, we realized the Windows Search service was not turned on and the Service was started and the share was indexed. We saw a performance improvement where the time taken to open a file using GetFiles Method came down to 3 Minutes from 13 minutes.
But this is not consistent. During day time when bandwidth is much lower than 1MBPS (say 0.5 MBPS) the time-span to open the document is again between 8-12 minutes.
At this point we are not sure of which one is causing the problem?
Not possible solutions:
1) Creating multiple directories and organizing files.
2) Increasing bandwidth.
3) Using direct filepath instead of Directory.GetFiles/EnumerateFiles
Any help is highly appreciated. Thanks!
Oh yeah, good stuff. You will notice that even if the service is off, running it twice (within a short tiem of each other) will run much faster the second time. Actually, here is a good one for you, run it twice, let the first one run for a minute. The second one will catch up the first one almost immediately and then they will both be at the same spot for the rest of the time. (if what I said makes sense).
Here is what is happening, GetFiles() and GetDirectories() do use the indexing service. Also, if your indexing service is off, this just means it will not automatically get data about the files, but when you access the file (windows explorer / GetFiles) it will index them, so that if you ask for the same thing with a set amount of time, it wont have to query the Hard-Drives' Table-Of-Contents again.
As far as it running faster and slower when the indexing service is on, this is because windows knows it cannot keep track of every file on the computer. Therefore, after a set amount of time, the file is considered stale and the indexing service will do an IO call to get the info to update the index database, when you ask about the file.
This wiki talks about it, just a little. Not very thorough.
i have a page with sstv (slow scan TV) jpg images, 12 of them in a table..they change as the ham operators send the sstv.. my question ..how can i set the images named 1.jpg - 12.jpg to be downloadable in a zip file via a download link ..the zipping would need to occur server side when the download link is clicked..is that possible? ...... or.. how can i add a download link under each individual image if the complete 12 zipped up isnt possible?
thanks for any help... i have tried .htaccess to make the images themselves downloadable and could not get it to work, it broke the complete page when i did i think because i use htaccess to password protect the site for a group of club members...
Zipping the files up every time someone clicks the download link would probably be costly in terms of server-side processing. If you still want to do it, I would suggest writing a dynamic script in something such as PHP to handle doing it.
However, what I would suggest is writing a short cron script (Windows Scheduled Task if you're on a Windows box) to periodically zip up the files to a predetermined filename and location. For example, a simple cron entry might look like:
15 * * * * zip /path/to/downloadfile.zip /path/to/zip/images >/dev/nul
Use crontab -e to edit your crontab (or sudo crontab -e to edit the root crontab). Then put a link to the downloadfile.zip to let people download it. Yes, it's generated every 15 minutes (feel free to tweak the timing) instead of on demand, but that's generally better to give you consistent server performance.
If you absolutely must have it generated on demand, look into, for example, PHP's Zip library of functions to do the actual compression, and something like tempnam to allow you to save the file to a guaranteed unique temporary filename and server it to your client.
Remote clients will upload images (and perhaps some instructional files in specially formatted text) to a "drop folder." Once the upload is complete we need to begin processing these images. It would be an easy, but flawed, solution to just have a script automatically begin processing any files in the folder every few seconds (the files can be move out of the folder once processed); but problems would arise when attempting to process large images which are only partially transfered.
What are some tricks I can use to ensure the files are fully uploaded before processing them?
A few of my own thoughts:
The script can check the validity of the file; ie, a partial jpeg would result in an error and you could respond to that error in the script, this would be fairly CPU intensive though. Some files have special markers on the end, but I can't count on this, I'm not sure what formats I'll be dealing with.
I've heard of "file handles" but haven't really figured out the basics of what they are and how I can tell if there is a "file handle" on a particular file. Basically the FTP daemon (actually, I'm on Windows, so "service") would keep a "handle" on the file while it's being uploaded and you would know not to process that file. These are just a few of my thoughts but I'm not really sure if they will work or if there are better or more accepted ways of solving this problem.
If you have an server-side script upload system (PHP, ASP, JSP, whatever), you could instruct the script to call another script to process the files, or to create a flag-file indicating the upload is done, something like this.
If your server is Linux-based, you can use lsof to check if the file is open. As your ftp/script/cgi will close the file after upload completes, lsof will not show the file in the list.
If your server is Windows-based, you can use Process Explorer to list the open files.
By what method are your users uploading the images?
Every once in awhile I am fed a large data file that my client uploads and that needs to be processed through CMFL. The problem is that if I put the processing on a CF page, then it runs into a timeout issue after 120 seconds. I was able to move the processing code to a CFC where it seems to not have the timeout issue. However, sometime during the processing, it causes ColdFusion to crash and has to restarted. There are a number of database queries (5 or more, mixture of updates and selects) required for each line (8,000+) of the file I go through as well as other logic provided by me in the form of CFML.
My question is what would be the best way to go through this file. One caveat, I am not able to move the file to the database server and process it entirely with the DB. However, would it be more efficient to pass each line to a stored procedure that took care of everything? It would still be a lot of calls to the database, but nothing compared to what I have now. Also, what would be the best way to provide feedback to the user about how much of the file has been processed?
Edit:
I'm running CF 6.1
I just did a similar thing and use CF often for data parsing.
1) Maintain a file upload table (Parent table). For every file you upload you should be able to keep a list of each file and what status it is in (uploaded, processed, unprocessed)
2) Temp table to store all the rows of the data file. (child table) Import the entire data file into a temporary table. Attempting to do it all in memory will inevitably lead to some errors. Each row in this table will link to a file upload table entry above.
3) Maintain a processing status - For each row of the datafile you bring in, set a "process/unprocessed" tag. This way if it breaks, you can start from where you left off. As you run through each line, set it to be "processed".
4) Transaction - use cftransaction if possible to commit all of it at once, or at least one line at a time (with your 5 queries). That way if something goes boom, you don't have one row of data that is half computed/processed/updated/tested.
5) Once you're done processing, set the file name entry in the table in step 1 to be "processed"
By using the approach above, if something fails, you can set it to start where it left off, or at least have a clearer path of where to start investigating, or worst case clean up in your data. You will have a clear way of displaying to the user the status of the current upload processing, where it's at, and where it left off if there was an error.
If you have any questions, let me know.
Other thoughts:
You can increase timeouts, give the VM more memory, put it in 64 bit but all of those will only increase the capacity of your system so much. It's a good idea to do these per call and do it in conjunction with the above.
Java has some neat file processing libraries that are available as CFCS. if you run into a lot of issues with speed, you can use one of those to read it into a variable and then into the database
If you are playing with XML, do not use coldfusion's xml parsing. It works well for smaller files and has fits when things get bigger. There are several cfc's written out there (check riaforge, etc) that wrap some excellent java libraries for parsing xml data. You can then create a cfquery manually if need be with this data.
It's hard to tell without more info, but from what you have said I shoot out three ideas.
The first thing, is with so many database operations, it's possible that you are generating too much debugging. Make sure that under Debug Output settings in the administrator that the following settings are turned off.
Enable Robust Exception Information
Enable AJAX Debug Log Window
Request Debugging Output
The second thing I would do is look at those DB queries and make sure they are optimized. Make sure selects are happening with indicies, etc.
The third thing I would suspect is that the file hanging out in memory is probably suboptimal.
I would try looping through the file using file looping:
<cfloop file="#VARIABLES.filePath#" index="VARIABLES.line">
<!--- Code to go here --->
</cfloop>
Have you tried an event gateway? I believe those threads are not subject to the same timeout settings as page request threads.
SQL Server Integration Services (SSIS) is the recommended tool for complex ETL (Extract, Transform, and Load) work, which is what this sounds like. (It can be configured to access files on other servers.) The question might be, can you work up an interface between Cold Fusion and SSIS?
If you can upgrade to cf8 and take advantage of cfloop file="" which would give you greater speed and the file would not be put in memory (which is probably the cause of the crashing).
Depending on the situation you are encountering you could also use cfthread to speed up processing.
Currently, an event gateway is the only way to get around the timeout limits of an HTTP request cycle. CF does not have a way to process CF pages offline, that is, there is no command-line invocation (one of my biggest gripes about CF - very little offling processing).
Your best bet is to use an Event Gateway or rewrite your parsing logic in straight Java.
I had to do the same thing, Ben Nadel has written a bunch of great articles uses java file io, to allow you to more speedily read files, write files etc...
Really helped improve the performance of our csv importing application.
I have thousands of small CSV files I want to aggregate (with a little munging in-script first). They are on a NAS device, a "SNAP" Server to be more exact. I've had some success with VBA from Excel - doing about 700 files in about a minute, if I recall (was a month ago). Actually, it was half-success: the snap server is home to 80% pdfs and some proprietary-format files and only 20% CSVs. The Loop to test for filetype took the execution time north of 2 hours and the script apparently completely ignored date filtering I put in. The quick result or 'success' was on 700 copies of the CSVs I made and put on my C drive. I've been doing VBA scripting for almost 20 years, and I think I'm decent at it; I do a lot of CSV reading and writing from VBA the last 9 years. So my question is more about your experience with snap servers or NAS generally.
Can I not treat the snap server more or less like any drive/folder with VBA?
Would VBScript be more appropriate? (already using FileSystemObject, after all)
If I can use VBS can I store the script on the NAS and run it using taskscheduler?
I'd appreciate any tips or gotchas from you folks who have experience with snap servers!
Some thoughts on the choice of language:
VB Script is more lightweight than VBA in that it does not require MS Office to be installed. The syntax is similar so there is no real productivity difference.
Moving forward Powershell would be strongly recommended for Windows system admin tasks, general text file processing, etc.
Some thoughts on using the NAS server:
a) If running your script on a workstation you should be able to use a URI string \\myserver\myshare to connect to a share on the NAS. If not you may need to map a drive letter to that share before your script runs.
b) If you want to run your script on the NAS there are 2 things to consider: is the NAS OS locked so that you may not add your own scheduled task and is it Linux or some flavor of Windows. Many NAS products use embedded Linux so running a VBA or VBScript solution directly on the NAS may not work unless it is based on something like Embedded XP and you have access to Scheduled Tasks, etc.
Hope this helps...