Exporting hundreds of thousands of records with ColdFusion - sql

Using ColdFusion 9.0.1, I need to export hundreds of thousands of database records to Excel XLSX or CSV (XLSX is preferred). This must be done on demand. So far I've tried using cfspreadsheet but it chokes when exporting more than a couple thousand rows in the XLSX format. However, exporting to XLS works fine (of course there is a ~65,000 row limit).
What are my options to export so many records? Theoretically the users could need to export as many as one million records. I'm also using SQL Server 2008 R2 -- is there a way to somehow export the records to a file there and then send the file through CF to the user? What options do I have? Thanks.

Since you are using SQL Server 2008, you could take advantage of SQL Server Reporting Services (SSRS) and create a report that can be called via web service (or HTTP GET/POST) by ColdFusion. SSRS has the capability to export reports as Excel as well. You'll need to read up on SSRS to make this work, but it's fairly easy to do.

As you've discovered, doing this with ColdFusion's <cfspreadsheet/> tag fails because it builds the entire document in memory, which leads to JVM OutOfMemory errors. What you need is something that buffers output to disk so you don't run out of memory. This suggests CSV, which is far easier to buffer. I imagine there are ways to do it with Excel as well, but I don't know them.
So two options for you:
use a Java library
use ColdFusion's fileOpen(), fileWrite(), fileClose() methods
I'll cover each in turn.
Java Libary
opencsv is my preference. This assumes of course you know how to setup a .jar on the ColdFusion classpath. If you do then it's a matter using its APIs to open a file and specify data for each line. It's really quite simple. Check its docs for examples.
ColdFusion Methods
Be forewarned there be dragons here.
If you are exporting numbers or strings that do not contain any double quotes or commas you can probably do this. If not, figuring out what to escape and how is why you use a library in the first place. Code is roughly as such:
<!--- query to get whatever data you're working with --->
<cfset csvFile = fileOpen(filePath, 'read')>
<cfloop query="yourQuery">
<cfset csvRow = ""><!--- construct a csv row here from the query row --->
<cfset fileWrite(csvFile, csvRow)>
</cfloop>
<cfset fileClose(csvFile)>
If the query data you're working with is also large you may be dealing with a nested loop to chunk it out.

Dustin, I had to investigate this myself, and as of this writing (Summer 2011), POI does a fine job of generating large files, but you have to use xlsx. The 3.8 beta source ships with an example named "BigGridDemo" which generates a 100K, 4-column workbook very quickly. I modified it to generate 300K, 125-column sheet, and it handled it in about 2 minutes. It created a 1.6 GB, 3.6-million-row workbook in a little over half an hour.
Granted, the code isn't the prettiest to look at, but it works. I suspect it'll pretty up a bit when ported to ColdFusion.

Related

Can I export my SQL code (queries) to my computer as a .sql document?

I was asked to analyze a dataset in BigQuery (never used it until now) and I need to export my code as .sql documents. Is there a way to do this?
Big Query lets you save the queries on the cloud so you can use them later, as you may have noticed.
But if you really want to store them as files in your computer, you can always copy the code, paste them in a notepad and them save it as .sql, which can then be imported in platforms like PostgreSQL, etc.
Big Query doesn't allow you to load .sql files, though.

Export from .xls to .sql / creating sql queries

Okay guys, I've been having this problem for a few weeks now and I'm getting no-where with it. I have OpenOffice and regular Office softwares. Both produce flawed .csv files, or at least phpMyAdmin can't read neither of these. Yes, I've been trying to change server's settings of uploading, etc. I also tried to contact my web hosting service and they claimed that all the .csv files I've produced are flawed.
Anyway, I'm looking for a way to convert .xls table to SQL. Most of the softwares out there cost money that I don't have. Furthermore, I've seen PHP systems that do just that, so I know this is possible.
No need converted to. sql, you can import directly with phpmyadmin or using tools like navicat for mysql in phpmyadmin go to the option to import, find the file, select the file type (csv or csv loaddata), in part below defines the column separator (if you do not know which opens the file with notepad)
if a very large file using navicat.
Flawed is "defective"?. I assume you have problem with excel, maybe you have defined the same column separator for separating thousands or decimals, use openoffice to open the file

Dataset to Excel - VB.NET

I guess this question is already out in the internet a lot. I have gone through many of them but still stuck with this problem
My requirement is to get one of the Dataset Tables to a Excel file. I have all the data I need in a Dataset.Table object. Lot of the code on the internet talks about looping through the columns and rows and assigning it to the cell in Excel file. I am able to do that but that really doesnt solve the purpose as large datasets wiht a few thousand rows takes more than 5 minutes to execute and get an output.
Is there any other efficient way to do it? Any input is appreciated as every bit of information is useful to me.
Thank you
EPPlus is free, very fast and very powerful Excel tool for visual studio using the and can do everything you want - They have the functionality to output a datatable directly to Excel using the LoadFromDataTable() function.
You could create a CSV file, and then open it with excel and convert within excel.
If I am understanding correctly you want to do an excel file from a "dataset". You can try using CSV (http://en.wikipedia.org/wiki/Comma-separated_values); the format for CSV is really simple. For performance, store as much as data in the memory and finally write to a file, otherwise if you are writing to a file everytime you are reading a row from a dataset, then it will take much longer. Make sure your file ends with the extension of .csv otherwise MS excel will not open it. Hopefully this helps a bit.
Use GemBox Spreadsheet library (http://gemboxsoftware.com/). It does what you need.
They also have a free version.

Get a valid schema of large (1 GB) xml files

I need to bulk load huge xml files to SQL Server 2005. I decided to use SQLXMLBULKLOAD in my C# app, but I need to get valid xsd-schemas of those xml files to load them. Which is best way to generate xsd file?
I tried MS VS xsd.exe, but it tries to load the file into memory, which causes OutOfMemory exception.
Thanks!
Strip the file down to create a smaller one that is representative of the whole, then generate an XSD from that. You can then tailor the result if necessary.
There are quite a few tools to generate schemas from instances, but I don't know how many of them are able to operate in pure streaming mode. One tool which will work regardless of the file size is the DTDGenerator that was originally part of Saxon; you can find it here:
http://saxon.sourceforge.net/dtdgen.html
It produces a DTD rather than a schema, but there are plenty of tools available to convert a DTD to a schema.

Out of memory error when merging large numbers of PDFs using Zend_PDF

We're using the Zend_PDF module in SugarCRM to merge pdf invoices that our system generates. I have been able to successfully merge a number of PDFs (around 10 to 30 in my tests), but we're getting memory errors when we try to merge larger numbers of pdf files. The error looks something like this:
[30-Jan-2012 14:10:20] PHP Fatal error: Allowed memory size of 268435456 bytes exhausted at /usr/local/src/php-5.3.8/Zend/zend_operators.c:1265 (tried to allocate 68134 bytes) in /srv/www/htdocs/sugar6_mf/Zend/Pdf/Element/Object/Stream.php on line 442
The above error was generated when we tried to merge 457 pdf files - that's files, not pages. We're going to need to merge 5,000 and more at a time eventually.
Can anyone offer any help/advice on how to address this?
If needed, ask, and I'll post the code on how the merged pdf is being generated.
Thanks.
I should preface this answer by saying that I know nothing about SugarCRM - my response is based solely on my knowledge of Zend_Pdf.
If my understanding is correct, you have a PHP script (hopefully not running inside Apache considering the length of time it will take to process 5,000 files) that is taking multiple PDF files as input using the Zend_Pdf::load() method and then iterating through the pages of each PDF object and adding them to one target instance of Zend_Pdf, which you are then writing to a file using the save() method.
Using this approach, even if you unset() each of the source PDF objects after you've added the pages to the target PDF object, you'll still need enough memory to store the entire output file. If you blew through 250MB with only 457 files, then I'm guessing your input PDF files are probably about 500KB, so your output file is going to be absolutely huge, so you are still going to end up running out of memory.
My advice would be to ditch this method entirely and use pdftk instead, which you could invoke using the exec() function. I'm sure there's a limit to the size of the arguments you can provide to exec(), so it will probably be a multi-step process with several intermediate files, but ultimately I think this will be a faster, more robust solution.
And just to re-iterate an earlier point, I would not run this process within Apache. I would set up a cron job that runs at the appropriate intervals and drops the output file into a secure area on your web/file server.