Multipart Upload Amazon S3 - file-upload

I'm trying to upload a file on Amazon S3 using their APIs. I tried using their sample code and it creates various parts of files. Now, the problem is, how do I pause the upload and then resume it ? See the following code as given on their documentation:
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3Client;
import com.amazonaws.services.s3.model.AbortMultipartUploadRequest;
import com.amazonaws.services.s3.model.CompleteMultipartUploadRequest;
import com.amazonaws.services.s3.model.InitiateMultipartUploadRequest;
import com.amazonaws.services.s3.model.InitiateMultipartUploadResult;
import com.amazonaws.services.s3.model.PartETag;
import com.amazonaws.services.s3.model.UploadPartRequest;
public class UploadObjectMPULowLevelAPI {
public static void main(String[] args) throws IOException {
String existingBucketName = "*** Provide-Your-Existing-BucketName ***";
String keyName = "*** Provide-Key-Name ***";
String filePath = "*** Provide-File-Path ***";
AmazonS3 s3Client = new AmazonS3Client(new ProfileCredentialsProvider());
// Create a list of UploadPartResponse objects. You get one of these
// for each part upload.
List<PartETag> partETags = new ArrayList<PartETag>();
// Step 1: Initialize.
InitiateMultipartUploadRequest initRequest = new
InitiateMultipartUploadRequest(existingBucketName, keyName);
InitiateMultipartUploadResult initResponse =
s3Client.initiateMultipartUpload(initRequest);
File file = new File(filePath);
long contentLength = file.length();
long partSize = 5242880; // Set part size to 5 MB.
try {
// Step 2: Upload parts.
long filePosition = 0;
for (int i = 1; filePosition < contentLength; i++) {
// Last part can be less than 5 MB. Adjust part size.
partSize = Math.min(partSize, (contentLength - filePosition));
// Create request to upload a part.
UploadPartRequest uploadRequest = new UploadPartRequest()
.withBucketName(existingBucketName).withKey(keyName)
.withUploadId(initResponse.getUploadId()).withPartNumber(i)
.withFileOffset(filePosition)
.withFile(file)
.withPartSize(partSize);
// Upload part and add response to our list.
partETags.add(
s3Client.uploadPart(uploadRequest).getPartETag());
filePosition += partSize;
}
// Step 3: Complete.
CompleteMultipartUploadRequest compRequest = new
CompleteMultipartUploadRequest(
existingBucketName,
keyName,
initResponse.getUploadId(),
partETags);
s3Client.completeMultipartUpload(compRequest);
}
catch (Exception e)
{
s3Client.abortMultipartUpload(new AbortMultipartUploadRequest(
existingBucketName, keyName, initResponse.getUploadId()));
}
}
}
I have also tried the TransferManager example which takes an Upload object and calls a tryPause(forceCancel) method. But the problem here is, it gets cancelled everytime I try and pause it.
My question is, how do I use the above code with pause and resume functionalities ? Also, just to note that I would also like to upload multiple files with same functionalities.... Help would be much appreciated.

I think you should use the Transfer Manager sample if you can. If it's being canceled, it's likely that it just isn't possible to pause it(with the given configuration of the TransferManager you are using).
This might be because you paused it too early to make "pausing" mean anything besides canceling, you are trying to use encryption, or the file isn't big enough. I believe the default minimum file size is 16MB. However, you can change the configuration of the TransferManager to allow you to pause depending on tryPause is failing, except in the case of encryption where I don't think there's anything you can do.
If you want to enable pause/resume for a file smaller than that size, you can call the setMultipartUploadThreshold(long) method in TransferManagerConfiguration. If you want to be able to pause earlier, you can use setMinimumUploadPartSize to set it to use smaller chunks.
In any case, I would advise you to use the TransferManager if possible, since it's made to do this kind of thing for you. It might be helpful to see why the transfer is not being paused when you use tryPause.

TransferManager performs the upload and download asynchronously and doesn't block the current thread. When you call the resumeUpload, TransferManager returns immediately with a reference to Upload. You can use this reference to enquire on the status of the upload.

Related

Use of RabbitTemplate.convertSendAndReceive with org.springframework.messaging.Message

I have successfully used the following to send an org.springframework.amqp.core.Message and receive a byte []
import org.springframework.amqp.core.Message;
import org.springframework.amqp.core.MessageBuilder;
import org.springframework.amqp.rabbit.core.RabbitTemplate;
Message message =
MessageBuilder.withBody(payload)..setCorrelationIdString(id).build();
byte [] response = (byte[]) rabbitTemplate.convertSendAndReceive(message,m -> {
m.getMessageProperties().setCorrelationIdString(id);
This works fine if the queues are set up to handle the message correctly for Message<?>. But I have a series of queues that use the message type org.springframework.messaging.Message specifically Message<String>.
Is there a way I can use rabbitTemplate.convertSendAndReceive to send the org.springframework.messaging.Message Message< String>. Such that the following would work.
import org.springframework.messaging.Message;
import org.springframework.integration.support.MessageBuilder;
import org.springframework.amqp.rabbit.core.RabbitTemplate;
Message<String> message =
MessageBuilder.withPayload(payload).setCorrelationId(id).build();
Object returnObject = rabbitTemplate.convertSendAndReceive(message);
I have looked at the MessageConverter but I am unsure if I can use that.
Alternatively, should I use org.springframework.messaging.core.GenericMessagingTemplate.convertSendAndReceive
UPDATE.
I can make it work if I change what I have on the queues from
#Transformer(inputChannel = Processor.INPUT, outputChannel = Processor.OUTPUT)
public Message<String> transform(Message<String> inMessage) {
to
#Transformer(inputChannel = Processor.INPUT, outputChannel = Processor.OUTPUT)
public Message<String> transform(Message<?> inMessage) { GenericMessage<?>
genericMessage = (GenericMessage<?>)inMessage.getPayload();
String payload = (String)genericMessage.getPayload();
but I would rather not have to change the transformers to make this work as the code in question is for integration tests and existing code already works with what I already have.
END UPDATE
I think I have given enough information but please let me know if more details are required. Ideally, I am looking for a code example or to point me to the documentation that answers my question.
Use the RabbitMessagingTemplate documentation here.
public Message<?> sendAndReceive(String exchange, String routingKey, Message<?> requestMessage)

Read a file from the cache in CEFSharp

I need to navigate to a web site that ultimately contains a .pdf file and I want to save that file locally. I am using CEFSharp to do this. The nature of this site is such that once the .pdf appears in the browser, it cannot be accessed again. For this reason, I was wondering if once you have a .pdf displayed in the browser, is there a way to access the source for that file in the cache?
I have tried implementing IDownloadHandler and that works, but you have to click the save button on the embedded .pdf. I am trying to get around that.
OK, here is how I got it to work. There is a function in CEFSharp that allows you to filter an incoming web response. Consequently, this gives you complete access to the incoming stream. My solution is a little on the dirty side and not particularly efficient, but it works for my situation. If anyone sees a better way, I am open for suggestions. There are two things I have to assume in order for my code to work.
GetResourceResponseFilter is called every time a new page is downloaded.
The PDF is that last thing to be downloaded during the navigation process.
Start with the CEF Minimal Example found here : https://github.com/cefsharp/CefSharp.MinimalExample
I used the WinForms version. Implement the IRequestHandler and IResponseFilter in the form definition as follows:
public partial class BrowserForm : Form, IRequestHandler, IResponseFilter
{
public readonly ChromiumWebBrowser browser;
public BrowserForm(string url)
{
InitializeComponent();
browser = new ChromiumWebBrowser(url)
{
Dock = DockStyle.Fill,
};
toolStripContainer.ContentPanel.Controls.Add(browser);
browser.BrowserSettings.FileAccessFromFileUrls = CefState.Enabled;
browser.BrowserSettings.UniversalAccessFromFileUrls = CefState.Enabled;
browser.BrowserSettings.WebSecurity = CefState.Disabled;
browser.BrowserSettings.Javascript = CefState.Enabled;
browser.LoadingStateChanged += OnLoadingStateChanged;
browser.ConsoleMessage += OnBrowserConsoleMessage;
browser.StatusMessage += OnBrowserStatusMessage;
browser.TitleChanged += OnBrowserTitleChanged;
browser.AddressChanged += OnBrowserAddressChanged;
browser.FrameLoadEnd += browser_FrameLoadEnd;
browser.LifeSpanHandler = this;
browser.RequestHandler = this;
The declaration and the last two lines are the most important for this explanation. I implemented the IRequestHandler using the template found here:
https://github.com/cefsharp/CefSharp/blob/master/CefSharp.Example/RequestHandler.cs
I changed everything to what it recommends as default except for GetResourceResponseFilter which I implemented as follows:
IResponseFilter IRequestHandler.GetResourceResponseFilter(IWebBrowser browserControl, IBrowser browser, IFrame frame, IRequest request, IResponse response)
{
if (request.Url.EndsWith(".pdf"))
return this;
return null;
}
I then implemented IResponseFilter as follows:
FilterStatus IResponseFilter.Filter(Stream dataIn, out long dataInRead, Stream dataOut, out long dataOutWritten)
{
BinaryWriter sw;
if (dataIn == null)
{
dataInRead = 0;
dataOutWritten = 0;
return FilterStatus.Done;
}
dataInRead = dataIn.Length;
dataOutWritten = Math.Min(dataInRead, dataOut.Length);
byte[] buffer = new byte[dataOutWritten];
int bytesRead = dataIn.Read(buffer, 0, (int)dataOutWritten);
string s = System.Text.Encoding.UTF8.GetString(buffer);
if (s.StartsWith("%PDF"))
File.Delete(pdfFileName);
sw = new BinaryWriter(File.Open(pdfFileName, FileMode.Append));
sw.Write(buffer);
sw.Close();
dataOut.Write(buffer, 0, bytesRead);
return FilterStatus.Done;
}
bool IResponseFilter.InitFilter()
{
return true;
}
What I found is that the PDF is actually downloaded twice when it is loaded. In any case, there might be header information and what not at the beginning of the page. When I get a stream segment that begins with %PDF, I know it is the beginning of a PDF so I delete the file to discard any previous contents that might be there. Otherwise, I just keep appending each segment to the end of the file. Theoretically, the PDF file will be safe until you navigate to another PDF, but my recommendation is to do something with the file as soon as the page is loaded just to be safe.

Tess4J doOCR() for *First Page* of pdf / tif

Is there a way to tell Tess4J to only OCR a certain amount of pages / characters?
I will potentially be working with 200+ page PDF's, but I really only want to OCR the first page, if that!
As far as I understand, the common sample
package net.sourceforge.tess4j.example;
import java.io.File;
import net.sourceforge.tess4j.*;
public class TesseractExample {
public static void main(String[] args) {
File imageFile = new File("eurotext.tif");
Tesseract instance = Tesseract.getInstance(); // JNA Interface Mapping
// Tesseract1 instance = new Tesseract1(); // JNA Direct Mapping
try {
String result = instance.doOCR(imageFile);
System.out.println(result);
} catch (TesseractException e) {
System.err.println(e.getMessage());
}
}
}
Would attempt to OCR the entire, 200+ page into a single String.
For my particular case, that is way more than I need it to do, and I'm worried it could take a very long time if I let it do all 200+ pages and then just substring the first 500 or so.
The library has a PdfUtilities class that can extract certain pages of a PDF.

How to capture and record video from webcam using JavaCV

I'm new to JavaCV and I have difficult time finding good tutorials about different issues on the topics that I'm interested in. I've succeed to implement some sort of real time video streaming from my webcam but the problem is that I use this code snippet which I found on the net :
#Override
public void run() {
FrameGrabber grabber = new VideoInputFrameGrabber(0); // 1 for next
// camera
int i = 0;
try {
grabber.start();
IplImage img;
while (true) {
img = grabber.grab();
if (img != null) {
cvFlip(img, img, 1);// l-r = 90_degrees_steps_anti_clockwise
cvSaveImage((i++) + "-aa.jpg", img);
// show image on window
canvas.showImage(img);
}
that results in multiple jpg files.
What I really want to do is capture my webcam input and along with showing it I want to save it in a proper video file. I find out about FFmpegFrameRecorder but don't know how to implement it. Also I've been wondering what are the different options for the format of the video file, because flv maybe would be more useful for me.
It's been quite a journey. Still a few things that I'm not sure what's the meaning behind them, but here is a working example for capturing and recording video from a webcam using JavaCV:
import com.googlecode.javacv.CanvasFrame;
import com.googlecode.javacv.FFmpegFrameRecorder;
import com.googlecode.javacv.OpenCVFrameGrabber;
import com.googlecode.javacv.cpp.avutil;
import com.googlecode.javacv.cpp.opencv_core.IplImage;
public class CameraTest {
public static final String FILENAME = "output.mp4";
public static void main(String[] args) throws Exception {
OpenCVFrameGrabber grabber = new OpenCVFrameGrabber(0);
grabber.start();
IplImage grabbedImage = grabber.grab();
CanvasFrame canvasFrame = new CanvasFrame("Cam");
canvasFrame.setCanvasSize(grabbedImage.width(), grabbedImage.height());
System.out.println("framerate = " + grabber.getFrameRate());
grabber.setFrameRate(grabber.getFrameRate());
FFmpegFrameRecorder recorder = new FFmpegFrameRecorder(FILENAME, grabber.getImageWidth(),grabber.getImageHeight());
recorder.setVideoCodec(13);
recorder.setFormat("mp4");
recorder.setPixelFormat(avutil.PIX_FMT_YUV420P);
recorder.setFrameRate(30);
recorder.setVideoBitrate(10 * 1024 * 1024);
recorder.start();
while (canvasFrame.isVisible() && (grabbedImage = grabber.grab()) != null) {
canvasFrame.showImage(grabbedImage);
recorder.record(grabbedImage);
}
recorder.stop();
grabber.stop();
canvasFrame.dispose();
}
}
It was somewhat hard for me to make this work so in addition to those that may have the same issue, if you follow the official guide about how to setup JavaCV on Windows 7/64bit and want to capture video using the code above you should create a new directory in C:\ : C:\ffmpeg and extract the files from the ffmped release that you've been told to download in the official guide. Then you should add C:\ffmpeg\bin to your Enviorment variable PATH and that's all. About this step all credits go to karlphillip
and his post here

Bigquery: Extract Job does not create file

I am working on a Java application which uses Bigquery as the analytics engine. Was able to run query jobs (and get results) using the code on Insert a Query Job. Had to modify the code to use service account using this comment on stackoverflow.
Now, need to run an extract job to export a table to a bucket on GoogleStorage. Based on Exporting a Table, was able to modify the Java code to insert extract jobs (code below). When run, the extract job's status changes from PENDING to RUNNING to DONE. The problem is that no file is actually uploaded to the specified bucket.
Info that might be helpful:
The createAuthorizedClient function returns a Bigquery instance and works for query jobs, so probably no issues with the service account, private key etc.
Also tried creating and running the insert job manually on google's api-explorer and the file is successfully created in the bucket. Using the same values for project, dataset, table and destination uri as in code so these should be correct.
Here is the code (pasting the complete file in case somebody else finds this useful):
import java.io.File;
import java.io.IOException;
import java.security.GeneralSecurityException;
import java.util.Arrays;
import java.util.List;
import com.google.api.client.googleapis.auth.oauth2.GoogleCredential;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.jackson.JacksonFactory;
import com.google.api.services.bigquery.Bigquery;
import com.google.api.services.bigquery.Bigquery.Jobs.Insert;
import com.google.api.services.bigquery.BigqueryScopes;
import com.google.api.services.bigquery.model.Job;
import com.google.api.services.bigquery.model.JobConfiguration;
import com.google.api.services.bigquery.model.JobConfigurationExtract;
import com.google.api.services.bigquery.model.JobReference;
import com.google.api.services.bigquery.model.TableReference;
public class BigQueryJavaGettingStarted {
private static final String PROJECT_ID = "123456789012";
private static final String DATASET_ID = "MY_DATASET_NAME";
private static final String TABLE_TO_EXPORT = "MY_TABLE_NAME";
private static final String SERVICE_ACCOUNT_ID = "123456789012-...#developer.gserviceaccount.com";
private static final File PRIVATE_KEY_FILE = new File("/path/to/privatekey.p12");
private static final String DESTINATION_URI = "gs://mybucket/file.csv";
private static final List<String> SCOPES = Arrays.asList(BigqueryScopes.BIGQUERY);
private static final HttpTransport TRANSPORT = new NetHttpTransport();
private static final JsonFactory JSON_FACTORY = new JacksonFactory();
public static void main (String[] args) {
try {
executeExtractJob();
} catch (Exception e) {
e.printStackTrace();
}
}
public static final void executeExtractJob() throws IOException, InterruptedException, GeneralSecurityException {
Bigquery bigquery = createAuthorizedClient();
//Create a new Extract job
Job job = new Job();
JobConfiguration config = new JobConfiguration();
JobConfigurationExtract extractConfig = new JobConfigurationExtract();
TableReference sourceTable = new TableReference();
sourceTable.setProjectId(PROJECT_ID).setDatasetId(DATASET_ID).setTableId(TABLE_TO_EXPORT);
extractConfig.setSourceTable(sourceTable);
extractConfig.setDestinationUri(DESTINATION_URI);
config.setExtract(extractConfig);
job.setConfiguration(config);
//Insert/Execute the created extract job
Insert insert = bigquery.jobs().insert(PROJECT_ID, job);
insert.setProjectId(PROJECT_ID);
JobReference jobId = insert.execute().getJobReference();
//Now check to see if the job has successfuly completed (Optional for extract jobs?)
long startTime = System.currentTimeMillis();
long elapsedTime;
while (true) {
Job pollJob = bigquery.jobs().get(PROJECT_ID, jobId.getJobId()).execute();
elapsedTime = System.currentTimeMillis() - startTime;
System.out.format("Job status (%dms) %s: %s\n", elapsedTime, jobId.getJobId(), pollJob.getStatus().getState());
if (pollJob.getStatus().getState().equals("DONE")) {
break;
}
//Wait a second before rechecking job status
Thread.sleep(1000);
}
}
private static Bigquery createAuthorizedClient() throws GeneralSecurityException, IOException {
GoogleCredential credential = new GoogleCredential.Builder()
.setTransport(TRANSPORT)
.setJsonFactory(JSON_FACTORY)
.setServiceAccountScopes(SCOPES)
.setServiceAccountId(SERVICE_ACCOUNT_ID)
.setServiceAccountPrivateKeyFromP12File(PRIVATE_KEY_FILE)
.build();
return Bigquery.builder(TRANSPORT, JSON_FACTORY)
.setApplicationName("My Reports")
.setHttpRequestInitializer(credential)
.build();
}
}
Here is the output:
Job status (337ms) job_dc08f7327e3d48cc9b5ba708efe5b6b5: PENDING
...
Job status (9186ms) job_dc08f7327e3d48cc9b5ba708efe5b6b5: PENDING
Job status (10798ms) job_dc08f7327e3d48cc9b5ba708efe5b6b5: RUNNING
...
Job status (53952ms) job_dc08f7327e3d48cc9b5ba708efe5b6b5: RUNNING
Job status (55531ms) job_dc08f7327e3d48cc9b5ba708efe5b6b5: DONE
It is a small table (about 4MB) so the job taking about a minute seems ok. Have no idea why no file is created in the bucket OR how to go about debugging this. Any help would be appreciated.
As Craig pointed out, printed the status.errorResult() and status.errors() values.
getErrorResults(): {"message":"Backend error. Job aborted.","reason":"internalError"}
getErrors(): null
It looks like there was an access denied error writing to the path: gs://pixalate_test/from_java.csv. Can you make sure that the user that was performing the export job has write access to the bucket (and that the file doesn't already exist)?
I've filed an internal bigquery bug on this issue ... we should give a better error in this situation.
.
I believe the problem is with the bucket name you're using -- mybucket above is just an example, you need to replace that with a bucket you actually own in Google Storage. If you've never used GS before, the intro docs will help.
Your second question was how to debug this -- I'd recommend looking at the returned Job object once the status is set to DONE. Jobs that end in an error still make it to DONE state, the difference is that they have an error result attached, so job.getStatus().hasErrorResult() should be true. (I've never used the Java client libraries, so I'm guessing at that method name.) You can find more information in the jobs docs.
One more difference, I notice is you are not passing job type as config.setJobType(JOB_TYPE);
where constant is private static final String JOB_TYPE = "extract";
also for json, need to set format as well.
I had the same problem. But it turned out was that I typed the name of the table wrong. However, Google did not generate an error message saying that "the table does not exists." That would have helped me locate my problem.
Thanks!