How to read from a pdf and use it as a blob - blob

All:
I am pretty new to Java IO, right now what I need to do is to read a blob object( it is a PDF file retrieved from Database) and send it to Apache Tika to parse.
But I do not have the access to that blob object right now( need to wait for other team to pass it), so in order to figure out my part ahead of time( read blob, turn it into inputstream, feed it to Tika), I wonder how can I build a blob from a PDF file? And I am kinda wondering how to convert that blob into inputstream, any help?
Thanks

Related

Save pdf file loaded in iFrame to database after edit Oracle APEX

I am trying to save a PDF file that is loaded in an iFrame after sign it, i am using the (PSPDFKit standalone) in Oracle APEX 190200 version.
I need save the is database instead of download file.
How I can get file and save file in database through AJAX callback?
Screenshot:
You can use instance.exportPDF() to get the PDF as an ArrayBuffer. Then you can convert the ArrayBuffer to Blob and send it to the server. Hopefully, this should solve your issue.
I would suggest you to reach our support directly. We offer a blazing fast assistance and the questions are handled directly by the Web team: https://pspdfkit.com/support/request/.

How to read encrypted and gzipped blob data from u-sql

I would like to read file from a blob that is first compressed (gz) and then encrypted. The encryption done using Azure SDK when file uploaded to Blob (BlobEncryptionPolicy passed to CloudBlockBlob.UploadFromStreamAsync method).
There blob file have .gz extension so U-SQL trying to decompress but fails as the file is encrypted.
Is it possible to set my u-sql script to handle the decompression automatically same as done by Azure SDK (for instance by CloudBlockBlob.BeginDownloadToStream)?
If not and I need to use custom extractor, how can I prevent the U-SQL from trying to decompress the stream automatically?
The decompression is automatically triggered by the ".gz" extension. So you would have to rename the document. Also, please note that you cannot call to any external resource to decrypt from within your user-code. You will have to pass all keys as parameters to the custom extractor.
Finally, if you store the data in ADLS, you get transparent encryption of the data and it makes the whole thing a lot easier. Why are you storing it in Windows Azure Blob Storage instead?

Generate A Large File Inside s3 with .NET

I would to generate a big file (several TB) with special format using my C# logic and persist it to S3. What is the best way to do this. I can launch a node in EC2 and then write the big file into EBS and then upload the file from the EBS into S3 using the S3 .net Clinent library.
Can I stream the file content as I am generating in my code and directly stream it to S3 until the generation is done specially for such large file and out of memory issues. I can see this code help with stream but it sounds like the stream should have already filled up with. I obviously can not put such a mount of data to memory and also do not want to save it as a file to the disk first.
PutObjectRequest request = new PutObjectRequest();
request.WithBucketName(BUCKET_NAME);
request.WithKey(S3_KEY);
request.WithInputStream(ms);
s3Client.PutObject(request);
What is my best bet to generate this big file ans stream it to S3 as I am generating it?
You certainly could upload any file up to 5 TB that's the limit. I recommend using the streaming and multipart put operations. Uploading a file 1TB could easily fail in the process and you'd have to do it all over, break it up into parts when you're storing it. Also you should be aware that if you need to modify the file you would need to download the file, modify the file and re-upload. If you plan on modifying the file at all i recommend trying to split it up into smaller files.
http://docs.amazonwebservices.com/AmazonS3/latest/dev/UploadingObjects.html

Write files to S3 through Java

I have a program which takes input from S3, generates a text file, and then sends it to the mapper class. I am unable to write the file to S3, from where the mapper can read it later. Now, I realize that we cannot write files to S3 directly, so I am trying to upload the text file created to S3 using copyFromLocalFile(). However, I get a null pointer exception in the following line:
fs.copyFromLocalFile(true, new Path(tgiPath), mapIP);
I am creating the text file in main function, so I am not sure where exactly it's being created. The only reason behind the null pointer exception, that I can think of is that the text file is not being written on the local disk. So my question is: How do I write files on the local disk? If I just specify the name of the file while creating it, where is it created and how do I access it?
Have a look at Jets3t
This seems to be exactly what you need.
Jets3t is awesome, but I am using Google's App Engine, and it doesn't work on there because of threading limitations.
I banged my head against the wall until I came up with a solution that worked on App Engine by combining a bunch of existing libraries: http://socialappdev.com/using-amazon-s3-with-google-app-engine-02-2011

vb.net PDF integration

We have a database that has decompiled data versions of PDF files stored as blobs.
The old method to retrieve the information was to grab the blob, compile the file on the local disk, and then use process.start to launch the new .pdf file that was created.
I guess I am looking for a class that can accept the decompile PDF data, turn it into a compiled PDF and show the end results without having to save the file to the local disk.
Anyone have any suggestions?
thanks.
You could just create the stream and send that into a Document Viewer control for PDF. There are tons of PDF Viewer controls, as long as they support a DataStream you should be fine. I'd let the process compile the PDF File into memory as a File Stream and send it to the PDF viewer. I have done the same thing but to a Microsoft format like xps, and doc but as long as the viewer supports a file stream you should have no problem.
This pdf viewer has a cost, but it might help: http://www.gnostice.com/nl_article.asp?id=109 the example they show goes by filename but its possible it might support the data stream method. Most views do.
Run through all the compiled .pdfs with your old method and store them somewhere.