Sending a file with XMLHttpRequest() to Tika server - xmlhttprequest

I'm trying to send a PDF for content extraction to a Tika Server but always get the error: "Cannot convert text from stream using the source encoding"
This is how Tika is expecting the files:
"All services that take files use HTTP "PUT" requests. When "PUT" is used, the original file must be sent in request body without any additional encoding (do not use multipart/form-data or other containers)." Source https://wiki.apache.org/tika/TikaJAXRS#Services
What is the correct way of sendig the file with XMLHttpRequest()?
Code:
var response, error, file, blob, xhr;
file = new File("/PROJECT/web/dateien/ai/pdf.pdf");
blob = file.toBuffer().toBlob("application/pdf");
url = "http://localhost:9998/tika";
// send data
try {
xhr = new XMLHttpRequest();
xhr.open("PUT", url);
xhr.setRequestHeader("Accept", "text/plain");
xhr.send(blob);
} catch (e) {
error = e;
}
({
response: xhr.responseText,
status: xhr.statusText,
error: error,
type: xhr.responseType,
blob: blob
});
Error:

I suspect PUT request to be converted into a POST request by wakanda when there is blob in XHR body. Can you wireshark your XHR request and add details ? If so, you can probably fill an issue in wakanda (https://github.com/Wakanda/wakanda-issues/issues)
Hope it helps,
Yann

Related

Download signed image from s3 does not work

I have a strange issue relating to S3 signed URL
I want to download the file from S3 on my browser. Every file type worked as expected, except the image files. I do not know why
Here is my javascript
<html>
<script>
fetch('<s3 signed url>', {
method: 'GET',
// For the image file, I always got the CORS error but for other file types, it works as expected
// mode: 'no-cors',
})
.then((res) => {
return res.blob();
})
.then((blob) => {
var url = window.URL.createObjectURL(new Blob([blob]));
var a = document.createElement('a');
a.href = url;
a.download = 'file.png';
document.body.appendChild(a);
a.click();
});
</script>
If I generated a signed URL for pdf or doc ... then download it with the above code, it works
But if I generated a signed URL for an image file and then download it with the above code, it does not work.
I always got this error in the console
Access to fetch at 'https://.......' from origin 'null' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
Then I added the
{mode: 'no-cors'}
Then it works but the context of the file is always empty (ZERO bytes)
Why? how can I download an image from S3?
Did you try config CORS on S3 bucket policy?
https://docs.aws.amazon.com/AmazonS3/latest/userguide/ManageCorsUsing.html

How to get response from streamed response in flutter? [duplicate]

This question already has answers here:
How to get response body with request.send() in dart
(8 answers)
Closed 1 year ago.
I am developing an app using flutter and I am using http library to call the api I built.
and I want to make a multipart request to send files and it sends it as well but I can't receive any response from the server because the object returned is StreamResponse.
please tell me how to get the body of the response.
Code Snippet:
var request = http.MultipartRequest('POST', Uri.parse('$SERVER_URL/signup'));
request.files.add(await http.MultipartFile.fromPath(
'image',
user.image,
filename: 'image',
contentType: MediaType('multipart/form-data', 'multipart/form-data'),
));
StreamResponse x = await request.send();
//get body of the response
Thanks,
Just use http.Response.fromStream()
import 'package:http/http.dart' as http;
var streamedResponse = await request.send();
var response = await http.Response.fromStream(streamedResponse);
StreamedResponse has a stream getter as you'd expect that delivers a stream of byte arrays.
Assuming that you want characters instead of bytes, push those through the appropriate character decoder - let's assume UTF-8.
You then might want all those joined into a single string, so we can use join. Giving you:
print(await x.stream.transform(utf8.decoder).join());

Webhook call failed. Error: Failed to parse webhook JSON response: Expect message object but got: [Chinese letters]

I'm building my own WebhookClient for dialog flow. My code is the following (using Azure Functions, similar to Firebase Functions):
module.exports = async function(context, req) {
const agent = new WebhookClient({ request: context.req, response: context.res });
function welcome(agent) {
agent.add(`Welcome to my agent!!`);
}
let intentMap = new Map();
intentMap.set("Look up person", welcome);
agent.handleRequest(intentMap);
}
I tested the query and the response payload looks like this:
{
"fulfillmentText": "Welcome to my agent!!",
"outputContexts": []
}
And the headers in the response look like this:
Transfer-Encoding: chunked
Content-Type: application/json; charset=utf-8
Server: Microsoft-IIS/10.0
X-Powered-By: ASP.NET
Date: Tue, 11 Dec 2018 18:16:06 GMT
But when I test my bot in dialog flow, it returns the following:
Webhook call failed. Error: Failed to parse webhook JSON response:
Expect message object but got:
"笀ഀ਀  ∀昀甀氀昀椀氀氀洀攀渀琀吀攀砀琀∀㨀 ∀圀攀氀挀漀洀攀 琀漀 洀礀 愀最攀渀琀℀℀∀Ⰰഀ਀  ∀漀甀琀瀀甀琀䌀漀渀琀攀砀琀猀∀㨀 嬀崀ഀ਀紀".
There's Chinese symbols!? Here's a video of me testing it out in DialogFlow: https://imgur.com/yzcj0Kw
I know this should be a comment (as it isn't really an answer), but it's fairly verbose and I didn't want it to get lost in the noise.
I have the same problem using WebAPI on a local machine (using ngrok to tunnel back to Kestrel). A friend of mine has working code (he's hosting in AWS rather than Azure), so I started examining the differences between our responses. I've notice the following:
This occurs with Azure Functions and WebAPI (so it's not that)
The JSON payloads are identical (so it's not that)
Working payload isn't chunked
Working payload doesn't have a content type
As an experiment, I added this code to Startup.cs, in the Configure method:
app.Use(async (context, next) =>
{
var original = context.Response.Body;
var memory = new MemoryStream();
context.Response.Body = memory;
await next();
memory.Seek(0, SeekOrigin.Begin);
if (!context.Response.Headers.ContentLength.HasValue)
{
context.Response.Headers.ContentLength = memory.Length;
context.Response.ContentType = null;
}
await memory.CopyToAsync(original);
});
This code disables response chunking, which is now causing a new and slightly more interesting error for me in the google console:
*Webhook call failed. Error: Failed to parse webhook JSON response: com.google.gson.stream.MalformedJsonException: Unterminated object at line 1 column 94 path $.\u0000\\"\u0000f\u0000u\u0000l\u0000f\u0000i\u0000l\u0000l\u0000m\u0000e\u0000n\u0000t\u0000M\u0000e\u0000s\u0000s\u0000a\u0000g\u0000e\u0000s\u0000\\"\u0000.\
I thought this could be encoding at first, so I stashed my JSON as a string and used the various Encoding classes to convert between them, to no avail.
I fired up Postman and called my endpoint (using the same payload as Google) and I can see the whole response payload correctly - it's almost as if Google's end is terminating the stream part-way through reading...
Hopefully, this additional information will help us figure out what's going on!
Update
After some more digging and various server/lambda configs, I spotted this post here: https://github.com/googleapis/google-cloud-dotnet/issues/2258
It turns out that json.net IS the culprit! I guess it's something to do with the formatters on the way out of the pipeline. In order to prove this, I added this hard-coded response to my POST controller and it worked! :)
return new ContentResult()
{
Content = "{\"fulfillmentText\": null,\"fulfillmentMessages\": [],\"source\": null,\"payload\": {\"google\": {\"expectUserResponse\": false,\"userStorage\": null,\"richResponse\": {\"items\": [{\"simpleResponse\": {\"textToSpeech\": \"Why hello there\",\"ssml\": null,\"displayText\": \"Why hello there\"}}],\"suggestions\": null,\"linkOutSuggestion\": null}}}}",
ContentType = "application/json",
StatusCode = 200
};
Despite the HTTP header saying the charset is utf-8, that is definitely using the utf-16le character set, and then the receiving side is treating them as utf-16be. Given you're running on Azure, it sounds like there is some configuration you need to make in Azure Functions to represent the output as UTF-8 instead of using UTF-16 strings.

Correct code to upload local file to S3 proxy of API Gateway

I created an API function to work with S3. I imported the template swagger. After deployment, I tested with a Node.js project by the npm module aws-api-gateway-client.
It works well with: get bucket lists, get bucket info, get one item, put a bucket, put a plain text object, however I am blocked with put a binary file.
firstly, I ensure ACL is allowed with all permissions on S3. secondly, binary support also added
image/gif
application/octet-stream
The code snippet is as below. The behaviors are:
1) after invokeAPI, the callback function is never hit, after sometime, the Node.js project did not respond. no any error message. The file size (such as an image) is very small.
2) with only two times, the uploading seemed to work, but the result file size is bigger (around 2M bigger) than the original file, so the file is corrupt.
Could you help me out? Thank you!
var filepathname = './items/';
var filename = 'image1.png';
fs.stat(filepathname+filename, function (err, stats) {
var fileSize = stats.size ;
fs.readFile(filepathname+filename,'binary',function(err,data){
var len = data.length;
console.log('file len' + len);
var pathTemplate = '/my-test-bucket/' +filename ;
var method = 'PUT';
var params = {
folder: '',
item:''
};
var additionalParams = {
headers: {
'Content-Type': 'application/octet-stream',
//'Content-Type': 'image/gif',
'Content-Length': len
}
};
var result1 = apigClient.invokeApi(params,pathTemplate,method,additionalParams,data)
.then(function(result){
//never hit :(
console.log(result);
}).catch( function(result){
//never hit :(
console.log(result);
});;
});
});
We encountered the same problem. API Gateway is meant for limited data (10MB as of now), limits shown here,
http://docs.aws.amazon.com/apigateway/latest/developerguide/limits.html
Self Signed URL to S3:
Create an S3 self signed URL for POST from the lambda or the endpoint where you are trying to post.
How do I put object to amazon s3 using presigned url?
Now POST the image directly to S3.
Presigned POST:
Apart from posting the image if you want to post additional properties, you can post it in multi-form format as well.
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#createPresignedPost-property
If you want to process the file after delivering to S3, you can create a trigger from S3 upon creation and process with your Lambda or anypoint that need to process.
Hope it helps.

Invalid 'HttpContent' instance provided. It does not have a 'multipart' content-type header with a 'boundary' parameter

I'm writing a web API that has a post method accepting files uploaded from UI.
public async Task<List<string>> PostAsync()
{
if (Request.Content.IsMimeMultipartContent("form-data"))
{
string uploadPath = HttpContext.Current.Server.MapPath("~/uploads");
var streamProvider = new MyStreamProvider(uploadPath);
await Request.Content.ReadAsMultipartAsync(streamProvider);
return streamProvider.FileData
.Select(file => new FileInfo(file.LocalFileName))
.Select(fi => "File uploaded as " + fi.FullName + " (" + fi.Length + " bytes)")
.ToList();
}
else
{
HttpResponseMessage response = Request.CreateResponse(HttpStatusCode.BadRequest, "Invalid Request!");
throw new HttpResponseException(response);
}
}
Then I post a request for the above action by postman.
I set the content-type header to multipart/form-data
but an error occurred during the execution of action.
here is the error message body :
"Invalid 'HttpContent' instance provided. It does not have a 'multipart' content-type header with a 'boundary' parameter.\r\nParameter name: content"
I went to the postman headers but I found that the request header content type was set to application-json.
You are looking on the response header which is json format and this is ok for you.
Your real problem is with the postman request, so just remove the 'Content-Type: multipart/form-data' entry from request header.
It's enough to upload a file as form-data and send the request.
Look what happen when you set the Content-Type manually vs. when you not:
Postman knows to set both the content type and boundary, since you set only the content type
First: Postman have a bug in handling file-based requests.
You can try adding this to your WebApiConfig.cs it worked for me:
GlobalConfiguration.Configuration.Formatters.XmlFormatter.SupportedMediaTypes.Clear();