Compressing PDF to ZIP and save in table inside SQL - sql

I'm getting a Base64 file send by some application to my application and I need to Decode it in SQL, make that file to be a *.zip and then store as a zip in a proper column in table.
The decoding part is ready. Now, I'm stuck on the step in which I need to make that file to be a ZIP.
The question is: Is there any way or chance to make the compression only using SQL?

Related

Generate A Large File Inside s3 with .NET

I would to generate a big file (several TB) with special format using my C# logic and persist it to S3. What is the best way to do this. I can launch a node in EC2 and then write the big file into EBS and then upload the file from the EBS into S3 using the S3 .net Clinent library.
Can I stream the file content as I am generating in my code and directly stream it to S3 until the generation is done specially for such large file and out of memory issues. I can see this code help with stream but it sounds like the stream should have already filled up with. I obviously can not put such a mount of data to memory and also do not want to save it as a file to the disk first.
PutObjectRequest request = new PutObjectRequest();
request.WithBucketName(BUCKET_NAME);
request.WithKey(S3_KEY);
request.WithInputStream(ms);
s3Client.PutObject(request);
What is my best bet to generate this big file ans stream it to S3 as I am generating it?
You certainly could upload any file up to 5 TB that's the limit. I recommend using the streaming and multipart put operations. Uploading a file 1TB could easily fail in the process and you'd have to do it all over, break it up into parts when you're storing it. Also you should be aware that if you need to modify the file you would need to download the file, modify the file and re-upload. If you plan on modifying the file at all i recommend trying to split it up into smaller files.
http://docs.amazonwebservices.com/AmazonS3/latest/dev/UploadingObjects.html

Ways to achieve de-duplicated file storage within Amazon S3?

I am wondering the best way to achieve de-duplicated (single instance storage) file storage within Amazon S3. For example, if I have 3 identical files, I would like to only store the file once. Is there a library, api, or program out there to help implement this? Is this functionality present in S3 natively? Perhaps something that checks the file hash, etc.
I'm wondering what approaches people have use to accomplish this.
You could probably roll your own solution to do this. Something along the lines of:
To upload a file:
Hash the file first, using SHA-1 or stronger.
Use the hash to name the file. Do not use the actual file name.
Create a virtual file system of sorts to save the directory structure - each file can simply be a text file that contains the calculated hash. This 'file system' should be placed separately from the data blob storage to prevent name conflicts - like in a separate bucket.
To upload subsequent files:
Calculate the hash, and only upload the data blob file if it doesn't already exist.
Save the directory entry with the hash as the content, like for all files.
To read a file:
Open the file from the virtual file system to discover the hash, and then get the actual file using that information.
You could also make this technique more efficient by uploading files in fixed-size blocks - and de-duplicating, as above, at the block level rather than the full-file level. Each file in the virtual file system would then contain one or more hashes, representing the block chain for that file. That would also have the advantage that uploading a large file which is only slightly different from another previously uploaded file would involve a lot less storage and data transfer.

uploading a file in coldfusion

We would like to know if there is a way to upload a file by using coldfusion.Our requirement is to upload a csv file..So intially we move the csv file data to a temporary table and then we shud write a proc and call it that would kick of the actual upload process.
The simple answer is, yes. ColdFusion makes this is quite a simple process. Just read the CF9 docs concerning cffile (http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-738f.html)

DotNetZip - create zip from accessed file

it is possible to use DotNetZip to create a zip from an accessed file (eg log file from another application) ?
so create a zip when the log file gets written through the other application
Hmm, well, yes, if you are willing to write some code.
One way to do it is to compress the file AFTER it has been written and closed.
You would need to have an app that runs with a filesystem watcher, and when it sees the log file being closed, it compresses that log file into a zip.
If you mean to imply, a distinct app that writes to a file and it automagically writes into a zip file, no I don't know of a simple way to do that. There is one possibility: if the 3rd party app accepts a System.IO.Stream in which to write the log entries. In that case, you can do that with DotNetZip. You can get a writeable stream from DotNetZip, into which the app writes content. It is compressed as it is written, and when the writing is complete, DotNetZip closes the zipfile. To use this, check the ZipFile.AddEntry() method that accepts a WriteDelegate. It's in the documentation.

Comparing uncompressed local files to compressed files stored on Amazon S3?

We put hundreds of image files on Amazon S3 that our users need to synchronize to their local directories. In order to save storage space and bandwidth, we zip the files stored on S3.
On the user's end they have a python script that runs every 5 min to get a current list of files, and download new/updated files.
My question is what's the best way determine what is new or changed to download?
Currently we add an additional header that we put with the compressed file which contains the MD5 value of the uncompressed file...
We start with a file like this:
image_file_1.tif 17MB MD5 = xxxx1234
We compress it (with 7zip) and put it to S3 (with Python/Boto):
image_file_1.tif.z 9MB MD5 = yyy3456 x-amz-meta-uncompressedmd5 = xxxx1234
The problems is we can't get a large list of files from S3 that include the x-amz-meta-uncompressedmd5 header without an additional API for EACH one (SLOW for hundreds/thousands of files).
Our most practical solution is have users get a full list of files (without the extra headers), download the files that do not exist locally. If it does exist locally, then do and additional API call to get the full headers to compare local MD5 checksum against x-amz-meta-uncompressedmd5.
I'm thinking there must be a better way.
You could include the MD5 hash of the uncompressed image into the compressed filename.
So image_file_1.tif could become image_file_1.xxxx1234.tif.z
Your user python file which does the synchronising would therefore have the information needed to determine if it needed to go get the file again from S3, and could either strip out the MD5 part of the filename, or maintain it, depending on what you wanted to do.
Or, you could also maintain, on S3, a single file containing the full file list including the MD5 metadata. So the python script just need to fetch that single file, parse that, and then decide what to do.